title: "Elasticsearch Scala Example" date: 2021-02-24 type: technical_note draft: false
In this example notebook we show how to write/read data to/from Elasticsearch using spark. We use the dataset from American Kennel Club dog breed data
wget -O akc_breed_info.csv https://query.data.world/s/msmjhcmdjslsvjzcaqmtreu52gkuno
import io.hops.util.Hops
val df = spark.read.option("header","true").csv("hdfs:///Projects/"+ Hops.getProjectName() +"/Resources/akc_breed_info.csv")
(df.write
.format("org.elasticsearch.spark.sql")
.options(Hops.getElasticConfiguration("newindex"))
.mode("Overwrite")
.save())
import io.hops.util.Hops df: org.apache.spark.sql.DataFrame = [Breed: string, height_low_inches: string ... 3 more fields]
val reader = spark.read.format("org.elasticsearch.spark.sql").options(Hops.getElasticConfiguration("newindex"))
val df = reader.load().na.drop.orderBy($"breed")
df.show()
reader: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@416ac438 df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Breed: string, height_high_inches: string ... 3 more fields] +--------------------+------------------+-----------------+---------------+--------------+ | Breed|height_high_inches|height_low_inches|weight_high_lbs|weight_low_lbs| +--------------------+------------------+-----------------+---------------+--------------+ | Affenpinscher| 12| 9| 12| 8| | Afghan Hound| 27| 25| 60| 50| | Airdale Terrier| 24| 22| 45| 45| | Akita| 28| 26| 120| 80| | Alaskan Malamute| na| na| na| na| | American Eskimo| 19| 9| 30| 25| | American Foxhound| 25| 22| 70| 65| |American Stafford...| 19| 17| 50| 40| |American Water Sp...| 18| 15| 45| 25| | Anatolian Sheepdog| 29| 27| 150| 100| |Australian Cattle...| 20| 17| 45| 35| | Australian Shepherd| 23| 18| 60| 40| | Australian Terrier| 10| 10| 14| 10| | Basenji| 17| 17| 22| 20| | Basset Hound| 14| 14| 50| 40| | Beagle| 16| 13| 30| 18| | Bearded Collie| 22| 20| 60| 40| | Beauceron| 27| 24| 120| 100| | Bedlington Terrier| 16| 15| 23| 18| | Belgian Malinois| 26| 22| 65| 60| +--------------------+------------------+-----------------+---------------+--------------+ only showing top 20 rows