1. From / Size
默认情况下,查询按照相关度算分排序,返回前 10 条记录。From 指明开始位置,Size 指明期望获取⽂档的总数。例如:
1 2 3 4 5 6 7 8
| POST tmdb/_search { "from": 0, "size": 5, "query": { "match_all": { } } }
|
ES 天⽣就是分布式的。查询信息,但是数据分别保存在多个分⽚,多台机器上,ES 天⽣就需要满⾜排序的需要(按照相关性算分)。当⼀个查询: from = 990, size =10,会在每个分⽚上先都获取 1000 个⽂档。然后, 通过 Coordinating Node 聚合所有结果。最后再通过排序选取前 1000 个⽂档。⻚数越深,占⽤内存越多。为了避免深度分⻚带来的内存开销。ES 有⼀个设定,默认限定到 10000 个⽂档。因此到分页超过一万就会报错:
2. Search After
使用 Search After 避免深度分⻚的问题,可以实时获取下⼀⻚⽂档信息。但是局限是不⽀持指定⻚数(From),只能往下翻页。
第⼀步搜索需要指定 sort,并且保证值是唯⼀的 (可以通过加⼊ _id 保证唯⼀性),然后使⽤上⼀次,最后⼀个⽂档的 sort 值进⾏查询。
首先先添加如下几条记录:
1 2 3 4 5 6 7 8 9 10 11 12
| POST users/_doc {"name":"user1","age":10}
POST users/_doc {"name":"user2","age":11}
POST users/_doc {"name":"user2","age":12}
POST users/_doc {"name":"user2","age":13}
|
然后按照 age 和 _id 进行逆向排序,查询请求和响应如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| // 查询第一个分页 POST users/_search { "size": 2, "query": { "match_all": {} }, "sort": [ {"age": "desc"} , {"_id": "asc"} ] } // 响应 { // ... "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "t2jl_YUB6tlQqDEVZC16", "_score" : null, "_source" : { "name" : "user2", "age" : 13 }, "sort" : [ 13, "t2jl_YUB6tlQqDEVZC16" ] }, { "_index" : "users", "_type" : "_doc", "_id" : "tmjl_YUB6tlQqDEVXC3S", "_score" : null, "_source" : { "name" : "user2", "age" : 12 }, "sort" : [ 12, "tmjl_YUB6tlQqDEVXC3S" ] } ] } }
|
接着就可以使用 age 为 12 及其 id 值往下第二页:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
| // 查询第二个分页 POST users/_search { "size": 2, "query": { "match_all": {} }, "search_after": [ 12, "tmjl_YUB6tlQqDEVXC3S" ], "sort": [ {"age": "desc"} , {"_id": "asc"} ] } // 响应 { // ... "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "tWjl_YUB6tlQqDEVVS2T", "_score" : null, "_source" : { "name" : "user2", "age" : 11 }, "sort" : [ 11, "tWjl_YUB6tlQqDEVVS2T" ] }, { "_index" : "users", "_type" : "_doc", "_id" : "tGjl_YUB6tlQqDEVTC1Z", "_score" : null, "_source" : { "name" : "user1", "age" : 10 }, "sort" : [ 10, "tGjl_YUB6tlQqDEVTC1Z" ] } ] } }
|
使用 Scroll API 会创建⼀个快照,有新的数据写⼊以后,⽆法被查到,每次查询后,输⼊上⼀次的 Scroll Id。我们还是沿用前面的 users 索引,首先我们先查询第一个分页并获取分页的 scroll_id,其中 scroll=5m
表示此次创建的快照过期时间为 5m:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
| POST /users/_search?scroll=5m { "size": 2, "query": { "match_all" : { } } } // 响应 { "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAqc0WdEJaaFFKcjJSdTZHSm41TUhFRmJLQQ==", "took" : 16, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "tGjl_YUB6tlQqDEVTC1Z", "_score" : 1.0, "_source" : { "name" : "user1", "age" : 10 } }, { "_index" : "users", "_type" : "_doc", "_id" : "tWjl_YUB6tlQqDEVVS2T", "_score" : 1.0, "_source" : { "name" : "user2", "age" : 11 } } ] } }
|
接着我们便可以使用第一个分页的 scroll_id 来获取第二个分页数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
| POST /_search/scroll { "scroll" : "1m", "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAqc0WdEJaaFFKcjJSdTZHSm41TUhFRmJLQQ==" } // 响应 { "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAqc0WdEJaaFFKcjJSdTZHSm41TUhFRmJLQQ==", "took" : 18, "timed_out" : false, "terminated_early" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 4, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "users", "_type" : "_doc", "_id" : "tmjl_YUB6tlQqDEVXC3S", "_score" : 1.0, "_source" : { "name" : "user2", "age" : 12 } }, { "_index" : "users", "_type" : "_doc", "_id" : "t2jl_YUB6tlQqDEVZC16", "_score" : 1.0, "_source" : { "name" : "user2", "age" : 13 } } ] } }
|
我们可以使用第二个分页的 scroll_id 查询第三个分页,可以发现响应为空:
1 2 3 4 5
| POST /_search/scroll { "scroll" : "1m", "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAqc0WdEJaaFFKcjJSdTZHSm41TUhFRmJLQQ==" }
|
即使我们再增加一个新的文档,此时响应依旧为空。
1 2
| POST users/_doc {"name":"user5","age":50}
|
4. 不同的搜索类型和使⽤场景
- Regular
- Scroll
- Pagination
- From 和 Size
- 如果需要深度分⻚,则选⽤ Search After