Elasticsearch-Query DSL

# 前言

前面了解了ES的两种搜索方式，第二种方式我们称为QUERY DSL，就是查询领域对象语言。

# 基本语法格式

Elasticsearch提供了一个可以执行查询的Json风格的DSL（Domain-specific language 领域对象语言）。这个被称为Query Dsl。该查询语言非常全面，并且刚开始的有事感觉有点复杂，真正学好它的方法是从一些基础实例开始的。

# 一个查询语句的典型结构

{
    QUERY_NAME:{
        ARGUMENT:VALUE,
        ARGUMENT:VALUE,...
    }
}

1
2
3
4
5
6

# 查询全部

GET bank/_search
{
  "query": {
    "match_all": {}
  }
}

1
2
3
4
5
6

# 排序

# 结构

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "FIELD": {
        "order": "desc"
      }
    }
  ]
}

1
2
3
4
5
6
7
8
9
10
11
12
13

# 实例

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ]
}

1
2
3
4
5
6
7
8
9
10
11
12
13

# 简写实例

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "balance": "desc"
    }
  ]
}

1
2
3
4
5
6
7
8
9
10
11

# 查询某个字段

如果是针对某个字段，那么它的机构如下:

{
    QUERY_NAME:{
        FIELD_NAME:{
            ARGUMENT:VALUE,
	        ARGUMENT:VALUE,...
        }
    }
}

1
2
3
4
5
6
7
8

# 分页查询

GET bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}

1
2
3
4
5
6
7
8
9

form是从第几条开始查
size每次查询多少条记录

# 返回部分字段

使用_source来决定返回什么字段

# 语法格式

GET bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 0,
  "size": 10,
  "_source": "{field}"
}

1
2
3
4
5
6
7
8
9
10

# 实例

GET bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 0,
  "size": 10,
  "_source": "balance"
}

1
2
3
4
5
6
7
8
9
10

# 多个字段

如果是多个字段，用中括号

GET bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 0,
  "size": 10,
  "_source": ["balance","account_number"]
}

1
2
3
4
5
6
7
8
9
10

# match全文检索

# 语法格式

GET bank/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  }
}

1
2
3
4
5
6
7
8

# 实例-模糊查询

# 单个关键词

请求

GET bank/_search
{
  "query": {
    "match": {
      "address": "Kings"
    }
  }
}

1
2
3
4
5
6
7
8

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 5.990829,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "20",
        "_score" : 5.990829,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "elinorratliff@scentric.com",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "722",
        "_score" : 5.990829,
        "_source" : {
          "account_number" : 722,
          "balance" : 27256,
          "firstname" : "Roberts",
          "lastname" : "Beasley",
          "age" : 34,
          "gender" : "F",
          "address" : "305 Kings Hwy",
          "employer" : "Quintity",
          "email" : "robertsbeasley@quintity.com",
          "city" : "Hayden",
          "state" : "PA"
        }
      }
    ]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

返回结果出现了2条记录，地址包含了Kings,这种查询，称为全文检索。

2条记录得分一模一样，因为分词得分都一样。

# 多个关键词

假设要查询的关键词有2个

GET bank/_search
{
  "query": {
    "match": {
      "address": "mill lane"
    }
  }
}

1
2
3
4
5
6
7
8

会发现出现了4条记录，只要地址的词包含任意一个关键词mill或者lane，都会出现在结果中，而命中2个词的会排在前面，得分也就更高。

# 总结

全文检索按照评分进行排序，会对检索条件进行分词匹配。

# match_phrase短语匹配

将需要匹配的值当成一个整体单词（不分词）进行检索。只要包含这个短语的，都算符合条件。

比如像上面的例子，稍微修改一下。

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "mill lane"
    }
  }
}

1
2
3
4
5
6
7
8

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 9.507477,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "136",
        "_score" : 9.507477,
        "_source" : {
          "account_number" : 136,
          "balance" : 45801,
          "firstname" : "Winnie",
          "lastname" : "Holland",
          "age" : 38,
          "gender" : "M",
          "address" : "198 Mill Lane",
          "employer" : "Neteria",
          "email" : "winnieholland@neteria.com",
          "city" : "Urie",
          "state" : "IL"
        }
      }
    ]
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

结果只返回1条记录，就是刚才匹配查询的第一条记录。

# .keyword 关键词

我们前面在match全文检索，每个text文本字段都可以使用.keyword，其实这种效果查出来，跟match_phrase短语匹配的效果是一样的。

match_phrase短语匹配

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "789 Madison Street"
    }
  }
}

1
2
3
4
5
6
7
8

match全文检索，.keyword匹配

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "789 Madison"
    }
  }
}

1
2
3
4
5
6
7
8

通过这2个实例，发现数据都一样，那他们的区别是什么？

得分不一样，match_phrase的得分是12.93分，而.keyword是6.5分，但是得分好像不能判断他们之间的区别，那么我们再改一下关键词。

match_phrase短语匹配

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "789 Madison"
    }
  }
}

1
2
3
4
5
6
7
8

match全文检索，.keyword匹配

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "789 Madison"
    }
  }
}

1
2
3
4
5
6
7
8

可以发现，match_phrase是能搜到1条记录，但是.keyword是搜不到任何记录，这是因为match_phrase是短语匹配，只要包含这个短语都是符合条件的，而.keyword是精确匹配，只有address精确等于789 Madison，才算是符合条件。

# multi_match多字段匹配

比如，我有一个需求，要state或者address字段包含mill的数据。

GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill",
      "fields": ["state","address"]
    }
  }
}

1
2
3
4
5
6
7
8
9

多字段匹配时，是否会分词？

GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "mill movico",
      "fields": ["city","address"]
    }
  }
}

1
2
3
4
5
6
7
8
9

通过返回的结果，是会分词的，只要2个字段任意包含2个词，都会返回。

# range区间查询

# 语法格式

GET bank/_search
{
  "query": {"range": {
    "FIELD": {
      "gte": 10,
      "lte": 20
    }
  }}
}

1
2
3
4
5
6
7
8
9

# bool复合查询

# must必须满足

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ]
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

可以看得到，返回3条记录都是符合字段要求的

# must not必须不……

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "age": "38"
          }
        }
      ]
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

在原有基础上加多一个条件，年龄限制为38，返回结果就成了上一个例子的其中1条。

# should应该

翻译过来就是，如果满足这个条件最好，但是如果不满足也可以。

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "age": "18"
          }
        }
      ],
      "should": [
        {"match": {
          "lastname": "Wallace"
        }}
      ]
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

通过返回结果，可以看出，should，满足了条件的数据得分最高。

# 总结

must，should，条件符合时，都会提供相关性得分。

但是 must not，会被当成一个过滤器，过滤器最大作用是不会贡献得分

# filter结果过滤

filter，作为过滤器，是不会贡献得分。

比如以前的must查询

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {"range": {
          "age": {
            "gte": 18,
            "lte": 30
          }
        }}
      ]
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

会返回一条记录，并且是1分，最高得分也是1分。

改成filter过滤

GET bank/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {
          "age": {
            "gte": 18,
            "lte": 30
          }
        }}
      ]
    }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

会有498条记录，并且都是0分，最高得分也是0分。

在bool复合查询中，也可以加入filter

# term

term和match 是最常用的两个查询。

# Term query

Returns documents that contain an exact term in a provided field.

You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.

Avoid using the term query for text (opens new window) fields.

By default, Elasticsearch changes the values of text fields as part of analysis (opens new window). This can make finding exact matches for text field values difficult.

To search text field values, use the match (opens new window) query instead.

term也是一个查询条件，文本信息（全文检索）不推荐用term，建议使用match，精确的字段建议使用term。

是因为ES保存text字段时，存在数据分析和分词的问题，要term来做全文检索是非常困难的。

例子：

GET bank/_search
{
  "query": {
    "term": { //match
      "address": "789 Madsion Street"
    }
  }
}

1
2
3
4
5
6
7
8

当用term来检索地址时，会一条记录都找不到，但是改为match时，就可以找得到386条记录。

正常情况，都是用于某个非text字段

GET bank/_search
{
  "query": {
    "term": {   //term改成match，结果一样
      "age": "28"
    }
  }
}

1
2
3
4
5
6
7
8

# 总结

全文检索的时候用match，其他非text字段匹配用term。
match会将搜索词拆解，进行模糊匹配。
term是代表完全匹配，也就是精确查询，搜索前不会再对搜索词进行分词拆解。

帮我改善此页面

#Elasticsearch #Query Dsl

上次更新: 2020/12/17, 08:38:35

← Elasticsearch-Search API Elasticsearch-aggregations聚合分析→