[[mapping-intro]] === Mapping
In order to be able to treat date fields as dates, numeric fields as numbers, and string fields as full-text or exact-value strings, Elasticsearch needs to know what type of data each field contains. ((("mapping (types)"))) This information is contained in the mapping.
As explained in <
We discuss mappings in detail in <
[[core-fields]] ==== Core Simple Field Types
Elasticsearch supports the ((("fields", "core simple types")))((("types", "core simple field types")))following simple field types:
[horizontal]
string
byte
, short
, integer
, long
float
, double
boolean
date
When you index a document that contains a new field--one previously not
seen--Elasticsearch ((("types", "mapping for", "dynamic mapping of new types")))((("JSON", "datatypes", "simple core types")))((("dynamic mapping")))((("boolean type")))((("long type")))((("double type")))((("date type")))((("strings", "sring type")))will use <
[horizontal] JSON type :: Field type
Boolean: true
or false
:: boolean
Whole number: 123
:: long
Floating point: 123.45
:: double
String, valid date: 2014-09-15
:: date
String: foo bar
:: string
NOTE: This means that if you index a number in quotes ("123"
), it will be
mapped as type string
, not type long
. However, if the field is
already mapped as type long
, then Elasticsearch will try to convert
the string into a long, and throw an exception if it can't.
==== Viewing the Mapping
We can view the mapping that Elasticsearch has((("mapping (types)", "viewing"))) for one or more types in one or
more indices by using the /_mapping
endpoint. At the <tweet
in index
gb
:
This shows us the mapping for the ((("properties")))fields (called properties) that Elasticsearch generated dynamically from the documents that we indexed:
{ "gb": { "mappings": { "tweet": { "properties": { "date": { "type": "date", "format": "dateOptionalTime" }, "name": { "type": "string" }, "tweet": { "type": "string" }, "user_id": { "type": "long" } } } } }
Incorrect mappings, such as ((("mapping (types)", "incorrect mapping")))having an age
field mapped as type string
instead of integer
, can produce confusing results to your queries.
[[custom-field-mappings]] ==== Customizing Field Mappings
While the basic field datatypes are ((("mapping (types)", "customizing field mappings")))((("fields", "customizing field mappings")))sufficient for many cases, you will often need to customize the mapping ((("string fields", "customized mappings")))for individual fields, especially string fields. Custom mappings allow you to do the following:
The most important attribute of a field is the type
. For fields
other than string
fields, you will seldom need to map anything other
than type
:
{ "number_of_clicks": { "type": "integer" }
Fields of type string
are, by default, considered to contain full text.
That is, their value will be passed through((("analyzers", "string values passed through"))) an analyzer before being indexed,
and a full-text query on the field will pass the query string through an
analyzer before searching.
The two most important mapping((("string fields", "mapping attributes, index and analyzer"))) attributes for string
fields are
index
and analyzer
.
===== index
The index
attribute controls((("index attribute, strings"))) how the string will be indexed. It
can contain one of three values:
analyzed
::
First analyze the string and then index it. In other words, index this field as full text.
not_analyzed
::
Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
no
::
Don't index this field at all. This field will not be searchable.
The default value of index
for a string
field is analyzed
. If we
want to map the field as an exact value, we need to set it to
not_analyzed
:
{ "tag": { "type": "string", "index": "not_analyzed" }
The other simple types (such as long
, double
, date
etc) also accept the
index
parameter, but the only relevant values are no
and not_analyzed
,
===== analyzer
For analyzed
string fields, use ((("analyzer attribute, string fields")))the analyzer
attribute to
specify which analyzer to apply both at search time and at index time. By
default, Elasticsearch uses the standard
analyzer,((("standard analyzer", "specifying another analyzer for strings"))) but you can change this
by specifying one of the built-in analyzers, such((("english analyzer"))) as
whitespace
, simple
, or english
:
{ "tweet": { "type": "string", "analyzer": "english" }
In <
[[updating-a-mapping]] ==== Updating a Mapping
You can specify the mapping for a type when you first ((("types", "mapping for", "updating")))((("mapping (types)", "updating")))create an index.
Alternatively, you can add the mapping for a new type (or update the mapping
for an existing type) later, using the /_mapping
endpoint.
Although you can add to an existing mapping, you can't change it. If a field already exists in the mapping, the data from that
We can update a mapping to add a new field, but we can't change an existing
field from analyzed
to not_analyzed
.
To demonstrate both ways of specifying mappings, let's first delete the gb
index:
// SENSE: 052_Mapping_Analysis/45_Mapping.json
Then create a new index, specifying that the tweet
field should use
the english
analyzer:
PUT /gb <1> { "mappings": { "tweet" : { "properties" : { "tweet" : { "type" : "string", "analyzer": "english" }, "date" : { "type" : "date" }, "name" : { "type" : "string" }, "user_id" : { "type" : "long" } } } }
// SENSE: 052_Mapping_Analysis/45_Mapping.json
<1> This creates the index with the mappings
specified in the body.
Later on, we decide to add a new not_analyzed
text field called tag
to the
tweet
mapping, using the _mapping
endpoint:
PUT /gb/_mapping/tweet { "properties" : { "tag" : { "type" : "string", "index": "not_analyzed" } }
// SENSE: 052_Mapping_Analysis/45_Mapping.json
Note that we didn't need to list all of the existing fields again, as we can't change them anyway. Our new field has been merged into the existing mapping.
==== Testing the Mapping
You can use the analyze
API to((("mapping (types)", "testing"))) test the mapping for string fields by
name. Compare the output of these two requests:
GET /gb/_analyze?field=tweet Black-cats <1>
GET /gb/_analyze?field=tag
// SENSE: 052_Mapping_Analysis/45_Mapping.json
<1> The text we want to analyze is passed in the body.
The tweet
field produces the two terms black
and cat
, while the
tag
field produces the single term Black-cats
. In other words, our
mapping is working correctly.