如下我们新建一个 blogs 索引并添加两个文档:
1 2 3 4 5 6 7 8 9 10 11 PUT /blogs/_doc/1 { "title" : "Quick brown rabbits" , "body" : "Brown rabbits are commonly seen." } PUT /blogs/_doc/2 { "title" : "Keeping pets healthy" , "body" : "My quick brown fox eats rabbits on a regular basis." }
观察如下 DSL 语句,现在我们需要去检索 title 和 body 字段中包含 Brown fox
的文档,其中文档 1 title 字段和 body 字段仅有 brown
匹配,文档 2 则是 title 没有匹配的词项,而 body 字段则匹配 brown
和 fox
两个词项,所以按分析文档 2 的相关性应该更高,相关性得分应该更高。
1 2 3 4 5 6 7 8 9 10 11 12 POST /blogs/_search { "explain": true, // 显示算分过程 "query": { "bool": { "should": [ { "match": { "title": "Brown fox" }}, { "match" : { "body" : "Brown fox" }} ] } } }
但是实际的算分结果却是文档 1 的得分更高,响应结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 { "took" : 2 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 }, "hits" : { "total" : { "value" : 2 , "relation" : "eq" }, "max_score" : 0.90425634 , "hits" : [ { "_index" : "blogs" , "_type" : "_doc" , "_id" : "1" , "_score" : 0.90425634 , "_source" : { "title" : "Quick brown rabbits" , "body" : "Brown rabbits are commonly seen." } }, { "_index" : "blogs" , "_type" : "_doc" , "_id" : "2" , "_score" : 0.77041256 , "_source" : { "title" : "Keeping pets healthy" , "body" : "My quick brown fox eats rabbits on a regular basis." } } ] } }
造成文档 1 的相关性得分更高的原因出现在 bool should 的得分算法上,其算分过程实际上会先对 should 语句中的两个查询进行评分然后在进行相加,接着是对相加的结果乘以匹配语句的总数,最后除以所有语句的总数得出最后的结果。
显然,上例中 title 和 body 相互竞争,不应该将分数进行简单叠加,而是应该找到单个最佳匹配的字段的评分。ElasticSearch 支持使用 Disjunction Max Query 来将任何与任一查询匹配的文档作为结果返回,并采用字段上最匹配的评分最终评分进行返回。
还是以上述为例,在使用 Brown fox
进行检索时,对于文档 1 来说 title 和 body 都仅匹配一个 brown
词项,而文档 2 则 body 匹配两个词项,那么文档 2 的最优得分比文档 1 的最优得分要高,那么文档 2 的相关性得分则更高,DSL 语句示例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 POST blogs/_search { "query" : { "dis_max" : { "queries" : [ { "match" : { "title" : "Brown fox" }}, { "match" : { "body" : "Brown fox" }} ] } } } // 响应 { "took" : 4 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 }, "hits" : { "total" : { "value" : 2 , "relation" : "eq" }, "max_score" : 0.77041256 , "hits" : [ { "_index" : "blogs" , "_type" : "_doc" , "_id" : "2" , "_score" : 0.77041256 , "_source" : { "title" : "Keeping pets healthy" , "body" : "My quick brown fox eats rabbits on a regular basis." } }, { "_index" : "blogs" , "_type" : "_doc" , "_id" : "1" , "_score" : 0.6931472 , "_source" : { "title" : "Quick brown rabbits" , "body" : "Brown rabbits are commonly seen." } } ] } }
但是,简单的采用如上示例中的 dis-max query 进行检索也会出现一些问题,例如下述语句,按分析应该是文档 2 应该更符合匹配规则,相关性得分应该更高,但是实际两个文档的得分确实一致的。不过这是符合 dis-max query 的最佳字段评分算分规则的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 POST blogs/_search { "query" : { "dis_max" : { "queries" : [ { "match" : { "title" : "Quick pets" }}, { "match" : { "body" : "Quick pets" }} ] } } } // 响应 { "took" : 0 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 }, "hits" : { "total" : { "value" : 2 , "relation" : "eq" }, "max_score" : 0.6931472 , "hits" : [ { "_index" : "blogs" , "_type" : "_doc" , "_id" : "1" , "_score" : 0.6931472 , "_source" : { "title" : "Quick brown rabbits" , "body" : "Brown rabbits are commonly seen." } }, { "_index" : "blogs" , "_type" : "_doc" , "_id" : "2" , "_score" : 0.6931472 , "_source" : { "title" : "Keeping pets healthy" , "body" : "My quick brown fox eats rabbits on a regular basis." } } ] } }
为了避免这种问题,可以通过 Tie Breaker 参数进行调整。使用该参数后,会将非最佳匹配字段语句的评分与 tie_breaker 进行相乘,最后与最佳匹配字段语句的得分进行求和一起对分数进行规范化。Tie Breaker 参数是一个介于 0 - 1 之间的浮点数,0 代表使用最佳匹配,而 1 代表所有语句同等重要。DSL 示例如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 POST blogs/_search { "query" : { "dis_max" : { "queries" : [ { "match" : { "title" : "Quick pets" }}, { "match" : { "body" : "Quick pets" }} ], "tie_breaker" : 0.2 } } } // 响应 { "took" : 8 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 }, "hits" : { "total" : { "value" : 2 , "relation" : "eq" }, "max_score" : 0.8151411 , "hits" : [ { "_index" : "blogs" , "_type" : "_doc" , "_id" : "2" , "_score" : 0.8151411 , "_source" : { "title" : "Keeping pets healthy" , "body" : "My quick brown fox eats rabbits on a regular basis." } }, { "_index" : "blogs" , "_type" : "_doc" , "_id" : "1" , "_score" : 0.6931472 , "_source" : { "title" : "Quick brown rabbits" , "body" : "Brown rabbits are commonly seen." } } ] } }