Elasticsearch-mapping映射
# mapping映射是什么?
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:
- which string fields should be treated as full text fields.
- which fields contain numbers, dates, or geolocations.
- the format (opens new window) of date values.
- custom rules to control the mapping for dynamically added fields (opens new window).
Since the first release of Elasticsearch, each document has been stored in a single index and assigned a single mapping type. A mapping type was used to represent the type of document or entity being indexed, for instance a
user
type and atweet
type.Each mapping type could have its own fields, so the
user
type might have afull_name
field, auser_name
field, and antweet
type could have acontent
field, atweeted_at
field and, like theuser
type, auser_name
field.
mapping映射来定义一个文档如何被处理,一级包含的这些属性如何被存储和处理的。
- 哪个String字段,应该被当成全文检索的
- 哪个属性是包含数字/日期/地理坐标
- 格式化日期
也就说,在ES中mapping要来定义每个列的字段类型。
ES在第一次保存数据时,自动猜测这些要保存的数据是以什么数据类型保存。
# mapping的数据类型
mapping的数据类型非常多。
# Core datatypes
string
-
long
,integer
,short
,byte
,double
,float
,half_float
,scaled_float
-
date
Date nanoseconds (opens new window)
date_nanos
-
boolean
-
binary
-
integer_range
,float_range
,long_range
,double_range
,date_range
当然,数据的类型也是可以修改的,但是在了解数据类型之前,先了解一下mapping。
# 映射类型(已弃用)
在以前,每种映射是有映射类型,而映射类型决定用什么数据类型来存储数据。
Indices created in Elasticsearch 7.0.0 or later no longer accept a
_default_
mapping. Indices created in 6.x will continue to function as before in Elasticsearch 6.x. Types are deprecated in APIs in 7.0, with breaking changes to the index creation, put mapping, get mapping, put template, get template and get field mappings APIs.
但是在6.0版本以后,映射类型已经弃用了。
# 以前的ES概念
以前的ES下面有各种索引,而各种索引下面有各种类型,类型下面才有各种文档。
现在没有类型,直接将文档,存在索引下。
# 为什么去掉type类型?
Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”.
This was a bad analogy that led to incorrect assumptions. In an SQL database, tables are independent of each other. The columns in one table have no bearing on columns with the same name in another table. This is not the case for fields in a mapping type.
In an Elasticsearch index, fields that have the same name in different mapping types are backed by the same Lucene field internally. In other words, using the example above, the
user_name
field in theuser
type is stored in exactly the same field as theuser_name
field in thetweet
type, and bothuser_name
fields must have the same mapping (definition) in both types.This can lead to frustration when, for example, you want
deleted
to be adate
field in one type and aboolean
field in another type in the same index.On top of that, storing different entities that have few or no fields in common in the same index leads to sparse data and interferes with Lucene’s ability to compress documents efficiently.
For these reasons, we have decided to remove the concept of mapping types from Elasticsearch.
- 关系型数据库中两个数据表示是独立的,即使他们里面有相同名称的列也不影响使用,但ES中不是这样的。
Elasticsearch
是基于Lucene
开发的搜索疫情,而ES中不同type
(类型)下名称相同的field
最终在Lucene
中的处理方式是一样的。- 两个不同
type
下的两个user_name
,在ES同一个索引下其实被认为是同一个field
,你必须在两个不同的type
重定义相同的field
映射。否则,不同type中的相同field
就会在处理中出现冲突的情况,导致Lucene
处理效率下降。 - 去掉type就是为了提高ES处理数据的效率。
- 两个不同
- Elasticsearch 7.x
- URL中的
type
参数可选,比如,索引一个文档不再要求提供type
参数
- URL中的
- Elasticsearch 8.x
- URL中不再支持
type
参数
- URL中不再支持
- 解决:将索引从多类型迁移到单类型,每种类型文档一个独立索引
# 查看映射
查看mapping
GET bank/_mapping
返回:
{
"bank" : {
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"employer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
可以看出text
属性下面都有个子属性keyword
,比如age是long。
在数据第一次存储时,如果不去指定映射的数据类型,ES将自动猜测要存储数据类型,并进行存储。所以ES自动存储的数据数据类型,很多时候不是我们想要的。
# 创建索引并指定映射
可以在创建某个索引(index
)时,指定要存储的映射(mapping
)。
# 语法格式
PUT /my-index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
2
3
4
5
6
7
8
9
10
- age是数字,默认ES猜测定义为long,这里自定义为integer
- email,自定义为keyword,就会精确匹配
- name,定义为text,就会保存时和检索时都会自动分词
创建成功返回:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my-index"
}
2
3
4
5
查询mapping映射:
# 请求
GET my-index/_mapping
# 返回
{
"my-index" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"name" : {
"type" : "text"
}
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 添加新的映射
对已存在的index新增一个mapping
# 语法
PUT /my-index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": false
}
}
}
2
3
4
5
6
7
8
9
# mapping的index
代表什么意思?
The following example adds
employee-id
, akeyword
field with anindex
(opens new window) mapping parameter value offalse
. This means values for theemployee-id
field are stored but not indexed or available for search.
The
index
option controls whether field values are indexed. It acceptstrue
orfalse
and defaults totrue
. Fields that are not indexed are not queryable.
mapping中的index接收ture或者false,默认是true,这个index是控制这个属性能不能被索引,如果一字段不能被索引,就不会被检索。
# 更新映射
更新某个索引(index
)下的映射(mapping
)
Except for supported mapping parameters (opens new window), you can’t change the mapping or field type of an existing field. Changing an existing field could invalidate data that’s already indexed.
If you need to change the mapping of a field, create a new index with the correct mapping and reindex (opens new window) your data into that index.
Renaming a field would invalidate data already indexed under the old field name. Instead, add an
alias
(opens new window) field to create an alternate field name.
对于已存在的映射,是不能更新的,如果非要修改,只能创建新的索引并且指定新的映射,然后将数据迁移过去。
# 数据迁移
- 语法格式
POST _reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
2
3
4
5
6
7
8
9
如果旧版本的数据存在type,属性在source中指定type
# 1.创建新的映射和指定索引
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "keyword"
},
"employer": {
"type": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text"
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# 2.开始迁移
- 请求
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}
2
3
4
5
6
7
8
9
10
- 返回
#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
"took" : 332,
"timed_out" : false,
"total" : 1000,
"updated" : 0,
"created" : 1000,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
这里显示,1000个数据话费了332毫秒,转移成功了。
重新检索一下newbank
# 请求
GET newbank/_search
# 返回
{
"took" : 1117,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "13",
"_score" : 1.0,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "18",
"_score" : 1.0,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "daleadams@boink.com",
"city" : "Orick",
"state" : "MD"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.0,
"_source" : {
"account_number" : 25,
"balance" : 40540,
"firstname" : "Virginia",
"lastname" : "Ayala",
"age" : 39,
"gender" : "F",
"address" : "171 Putnam Avenue",
"employer" : "Filodyne",
"email" : "virginiaayala@filodyne.com",
"city" : "Nicholson",
"state" : "PA"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "32",
"_score" : 1.0,
"_source" : {
"account_number" : 32,
"balance" : 48086,
"firstname" : "Dillard",
"lastname" : "Mcpherson",
"age" : 34,
"gender" : "F",
"address" : "702 Quentin Street",
"employer" : "Quailcom",
"email" : "dillardmcpherson@quailcom.com",
"city" : "Veguita",
"state" : "IN"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "37",
"_score" : 1.0,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "mcgeemooney@reversus.com",
"city" : "Tooleville",
"state" : "OK"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "44",
"_score" : 1.0,
"_source" : {
"account_number" : 44,
"balance" : 34487,
"firstname" : "Aurelia",
"lastname" : "Harding",
"age" : 37,
"gender" : "M",
"address" : "502 Baycliff Terrace",
"employer" : "Orbalix",
"email" : "aureliaharding@orbalix.com",
"city" : "Yardville",
"state" : "DE"
}
},
{
"_index" : "newbank",
"_type" : "_doc",
"_id" : "49",
"_score" : 1.0,
"_source" : {
"account_number" : 49,
"balance" : 29104,
"firstname" : "Fulton",
"lastname" : "Holt",
"age" : 23,
"gender" : "F",
"address" : "451 Humboldt Street",
"employer" : "Anocha",
"email" : "fultonholt@anocha.com",
"city" : "Sunriver",
"state" : "RI"
}
}
]
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
至此,数据迁移成功!并且我们也学会了如何指定映射规则。