0%

ElasticSearch-Dis Max Query

如下我们新建一个 blogs 索引并添加两个文档:

1
2
3
4
5
6
7
8
9
10
11
PUT /blogs/_doc/1
{
"title": "Quick brown rabbits",
"body": "Brown rabbits are commonly seen."
}

PUT /blogs/_doc/2
{
"title": "Keeping pets healthy",
"body": "My quick brown fox eats rabbits on a regular basis."
}

观察如下 DSL 语句,现在我们需要去检索 title 和 body 字段中包含 Brown fox 的文档,其中文档 1 title 字段和 body 字段仅有 brown 匹配,文档 2 则是 title 没有匹配的词项,而 body 字段则匹配 brownfox 两个词项,所以按分析文档 2 的相关性应该更高,相关性得分应该更高。

1
2
3
4
5
6
7
8
9
10
11
12
POST /blogs/_search
{
"explain": true, // 显示算分过程
"query": {
"bool": {
"should": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}

但是实际的算分结果却是文档 1 的得分更高,响应结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.90425634,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.90425634,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
}

造成文档 1 的相关性得分更高的原因出现在 bool should 的得分算法上,其算分过程实际上会先对 should 语句中的两个查询进行评分然后在进行相加,接着是对相加的结果乘以匹配语句的总数,最后除以所有语句的总数得出最后的结果。

显然,上例中 title 和 body 相互竞争,不应该将分数进行简单叠加,而是应该找到单个最佳匹配的字段的评分。ElasticSearch 支持使用 Disjunction Max Query 来将任何与任一查询匹配的文档作为结果返回,并采用字段上最匹配的评分最终评分进行返回。

还是以上述为例,在使用 Brown fox 进行检索时,对于文档 1 来说 title 和 body 都仅匹配一个 brown 词项,而文档 2 则 body 匹配两个词项,那么文档 2 的最优得分比文档 1 的最优得分要高,那么文档 2 的相关性得分则更高,DSL 语句示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
// 响应
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.77041256,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}

但是,简单的采用如上示例中的 dis-max query 进行检索也会出现一些问题,例如下述语句,按分析应该是文档 2 应该更符合匹配规则,相关性得分应该更高,但是实际两个文档的得分确实一致的。不过这是符合 dis-max query 的最佳字段评分算分规则的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
]
}
}
}
// 响应
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.6931472,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6931472,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
}

为了避免这种问题,可以通过 Tie Breaker 参数进行调整。使用该参数后,会将非最佳匹配字段语句的评分与 tie_breaker 进行相乘,最后与最佳匹配字段语句的得分进行求和一起对分数进行规范化。Tie Breaker 参数是一个介于 0 - 1 之间的浮点数,0 代表使用最佳匹配,而 1 代表所有语句同等重要。DSL 示例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
// 响应
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.8151411,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.8151411,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}
------ 本文结束------