To give you a feel for what is possible in Elasticsearch and how easy it is to use, let's start by walking through a simple tutorial that covers basic concepts such as indexing, search, and aggregations.
We'll introduce some new terminology and basic concepts along the way, but it is OK if you don't understand everything immediately. We'll cover all the concepts introduced here in much greater depth throughout the rest of the book.
So, sit back and enjoy a whirlwind tour of what Elasticsearch is capable of.
We happen to work for Megacorp, and as part of HR's new "We love our drones!" initiative, we have been tasked with creating an employee directory. The directory is supposed to foster employer empathy and real-time, synergistic, dynamic collaboration, so it has a few business requirements:
The first order of business is storing employee data. This will take the form of an employee document': a single document represents a single employee. The act of storing data in Elasticsearch is called indexing, but before we can index a document, we need to decide where to store it.
In Elasticsearch, a document belongs to a type, and those types live inside an index. You can draw some (rough) parallels to a traditional relational database:
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
Elasticsearch ⇒ Indices ⇒ Types ⇒ Documents ⇒ Fields
An Elasticsearch cluster can contain multiple indices (databases), which in turn contain multiple types (tables). These types hold multiple documents (rows), and each document has multiple fields (columns).
You may already have noticed that the word index is overloaded with several meanings in the context of Elasticsearch. A little clarification is necessary:
Index (noun)
Index (verb)
INSERT
keyword in
SQL except that, if the document already exists, the new document would
replace the old.Inverted index
Relational databases add an index, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch and Lucene use a structure called an inverted index for exactly the same purpose.
By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is not searchable. We discuss inverted indexes in more detail in inverted-index.
So for our employee directory, we are going to do the following:
employee
.megacorp
index.In practice, this is easy (even though it looks like a lot of steps). We can perform all of those actions in a single command:
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
Notice that the path /megacorp/employee/1
contains three pieces of
information:
megacorp
employee
1
The request body--the JSON document--contains all the information about this employee. His name is John Smith, he's 25, and enjoys rock climbing.
Simple! There was no need to perform any administrative tasks first, like creating an index or specifying the type of data that each field contains. We could just index a document directly. Elasticsearch ships with defaults for everything, so all the necessary administration tasks were taken care of in the background, using default values.
Before moving on, let's add a few more employees to the directory:
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}