Now that we have some data stored in Elasticsearch, we can get to work on the business requirements for this application. The first requirement is the ability to retrieve individual employee data.
This is easy in Elasticsearch. We simply execute an HTTP +GET+ request and specify the address of the document--the index, type, and ID. Using those three pieces of information, we can return the original JSON document:
GET /megacorp/employee/1
And the response contains some metadata about the document, and John Smith's
original JSON document as the _source
field:
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
TIP
In the same way that we changed the HTTP verb from
PUT
toGET
in order to retrieve the document, we could use theDELETE
verb to delete the document, and theHEAD
verb to check whether the document exists. To replace an existing document with an updated version, we justPUT
it again.
A GET
is fairly simple--you get back the document that you ask for. Let's
try something a little more advanced, like a simple search!
The first search we will try is the simplest search possible. We will search for all employees, with this request:
GET /megacorp/employee/_search
You can see that we're still using index megacorp
and type employee
, but
instead of specifying a document ID, we now use the _search
endpoint. The
response includes all three of our documents in the hits
array. By default,
a search will return the top 10 results.
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
NOTE
The response not only tells us which documents matched, but also includes the whole document itself: all the information that we need in order to display the search results to the user.
Next, let's try searching for employees who have "Smith" in their last name. To do this, we'll use a lightweight search method that is easy to use from the command line. This method is often referred to as a query-string search, since we pass the search as a URL query-string parameter:
GET /megacorp/employee/_search?q=last_name:Smith
We use the same _search
endpoint in the path, and we add the query itself in
the q=
parameter. The results that come back show all Smiths:
{
...
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
Query-string search is handy for ad hoc searches from the command line, but it has its limitations (see search-lite). Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated, robust queries.
The domain-specific language (DSL) is specified using a JSON request body. We can represent the previous search for all Smiths like so:
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
This will return the same results as the previous query. You can see that a
number of things have changed. For one, we are no longer using query-string
parameters, but instead a request body. This request body is built with JSON,
and uses a match
query (one of several types of queries, which we will learn
about later).
Let's make the search a little more complicated. We still want to find all employees with a last name of Smith, but we want only employees who are older than 30. Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:
GET /megacorp/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 } (1)
}
},
"query" : {
"match" : {
"last_name" : "smith" (2)
}
}
}
}
}
(1) This portion of the query is a range
filter, which will find all ages
older than 30—gt
stands for greater than.
(2) This portion of the query is the((("match queries"))) same match
query that we used before.
Don't worry about the syntax too much for now; we will cover it in great
detail later. Just recognize that we've added a filter that performs a
range search, and reused the same match
query as before. Now our results show only one employee who happens to be 32 and is named Jane Smith:
{
...
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
The searches so far have been simple: single names, filtered by age. Let's try a more advanced, full-text search--a task that traditional databases would really struggle with.
We are going to search for all employees who enjoy rock climbing:
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
You can see that we use the same match
query as before to search the about
field for ``rock climbing.'' We get back two matching documents:
{
...
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
...
"_score": 0.16273327, (1)
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_score": 0.016878016, (1)
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
(1) The relevance scores
By default, Elasticsearch sorts matching results by their relevance score,
that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith's about
field clearly says "rock climbing" in it.
But why did Jane Smith come back as a result? The reason her document was
returned is because the word "rock" was mentioned in her about
field.
Because only "rock" was mentioned, and not "climbing," her _score
is
lower than John's.
This is a good example of how Elasticsearch can search within full-text fields and return the most relevant results first. This ((("relevance", "importance to Elasticsearch")))concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn't.
Finding individual words in a field is all well and good, but sometimes you
want to match exact sequences of words or phrases.((("phrase matching"))) For instance, we could
perform a query that will match only employee records that contain both rock''
_and_
climbing'' and that display the words are next to each other in the phrase
``rock climbing.''
To do this, we use a slight variation of the match
query called the
match_phrase
query:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
This, to no surprise, returns only John Smith's document:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
]
}
}
Many applications like to highlight snippets of text from each search result so the user can see why the document matched the query. Retrieving highlighted fragments is easy in Elasticsearch.
Let's rerun our previous query, but add a new highlight
parameter:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
When we run this query, the same hit is returned as before, but now we get a
new section in the response called highlight
. This contains a snippet of
text from the about
field with the matching words wrapped in <em></em>
HTML tags:
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</em>" (1)
]
}
}
]
}
}
(1) The highlighted fragment from the original text
You can read more about the highlighting of search snippets in the highlighting reference documentation.