ES搜索推荐
ES针对不同的应用场景,把suggester主要分为以下四种:
- term suggester
- phrase suggester
- completion suggester
- context suggester
Suggester examples | Elastic Documentation
term suggester
针对单独term的搜索推荐,不考虑搜索短语中多个term的关系。自动修正用户输入的拼写错误,当用户输入的词项不存在于索引中时提供相近的替代词项,处理因大小写、拼写错误等导致搜索无结果的问题。
工作原理:
- 分词处理:将用户输入的文本拆分为单个词项
- 生成候选词:对每个词项,从索引中查找编辑距离相近的词项
- 排序与过滤:基于词项在索引中的频率与编辑距离对候选词进行排序,过滤掉不满足条件的候选词
参数说明
参数 | 说明 |
---|---|
text | 用户搜索的文本 |
field | 要从哪个字段选取推荐数据 |
size | 每个建议返回的最大结果数 |
sort | 如何按照提示词项排序 * score 分数>词频>词项本身* frequency 词频>分数>词项本身 |
suggest_mode | 搜索推荐的推荐模式 * missing 默认,仅为不在索引中的词项生成建议词* popular 原词存在时仅当候选词频率更高时返回建议* always 始终返回建议词 |
max_edits | 允许的最大编辑距离 |
prefix_length | 前缀匹配时必须满足的最少字符 |
min_word_length | 最少包含的单词数量 |
min_doc_freq | 候选词的最小文档频率,过滤低频词 |
max_term_freq | 原词的最大文档频率(高于此值的原词不生成建议) |
POST /my_index/_search
{
"suggest": {
"my_term_suggestion": {
"text": "musi gare",
"term": {
"field": "text",
"suggest_mode": "popular",
"size": 3,
"max_edits": 2
}
}
}
}
phrase suggester
term suggester可以对单个term进行建议或纠错,不会考虑多个term之间的关系,但是phrase suggester在term suggester的基础上,考虑多个term之间的关系
使用phrase sugges需要创建特殊的mapping:
PUT test
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"trigram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
]
}
},
"filter": {
"single": {
"type": "shingle",
"min_single_size": 2,
"max_single_size": 3
}
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
}
}
}
}
}
}
shingle
:在单词粒度进行ngram
GET test/_search
{
"suggest": {
"text": "Luceen and elasticsearchc",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"max_errors": 1,
"confidence": 1,
"direct_generator": [
{
"field": "title.trigram",
"suggest_mode": "always"
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
推荐的结果并不一定是真实存在的文档
completion suggester
基于内存(使用FST数据结构)而非索引,性能强悍。需要结合特定的completion类型,只适合前缀推荐。
PUT suggest_carinfo
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"suggest": {
"type": "completion",
"analyzer": "ik_max_word"
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
GET suggest_carinfo/_search
{
"suggest": {
"car_suggest": {
"prefix": "宝马5系",
"completion": {
"field": "title.suggest",
"skip_duplicates": true,
"fuzzy": {
"fuzziness": 2
}
}
}
}
}
context suggester
Context Suggester 通过在索引阶段将上下文信息与建议词条一起存储,然后在查询阶段使用这些上下文来过滤或影响建议的排序。
context suggester用法示例:
// 定义索引
PUT place
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "place_type",
"type": "category"
},
{
"name": "location",
"type": "geo",
"precision": 4
}
]
}
}
}
}
// 插入数据
PUT place/_doc/1
{
"suggest": {
"input": ["timmy's", "starbucks", "bunkin donuts"],
"contexts": {
"place_type": ["cafe", "food"]
}
}
}
PUT place/_doc/2
{
"suggest": {
"input": ["monkey", "timmy's", "Lamborghini"],
"contexts": {
"place_type": ["money"]
}
}
}
PUT place/_doc/3
{
"suggest":{
"input": "timmy's",
"contexts": {
"location": [
{"lat": 43.6624803, "lon": -79.3863353},
{"lat": 43.6624718, "lon": -79.3873227}
]
}
}
}
// 检索
POST place/_search?pretty
{
"suggest": {
"place_suggestion": {
"prefix": "sta",
"completion": {
"field": "suggest",
"size": 10,
"contexts": {
"place_type": ["cafe", "restaurants"]
}
}
}
}
}
POST place/_search?pretty
{
"suggest": {
"place_suggestion": {
"prefix": "tim",
"completion": {
"field": "suggest",
"contexts": {
"place_type": [
{"context": "cafe"},
{"context": "money", "boost": 2} // 权重提升
]
}
}
}
}
}
POST place/_search
{
"suggest": {
"place_suggestion": {
"prefix": "tim",
"completion": {
"field": "suggest",
"size": 10,
"contexts": {
"location": {"lat": 43.662, "lon": -79.380}
}
}
}
}
}
通过字段指定上下文:
PUT place_path_category
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "place_type",
"type": "category",
"path": "cat" // 类别上下文从cat字段读取
},
{
"name": "location",
"type": "geo",
"precision": 4,
"path": "loc" // 地理上下文从loc字段读取
}
]
},
"loc": {
"type": "geo_point"
}
}
}
}
PUT place_path_category/_doc/1
{
"suggest": ["timmy's", "starbucks", "dunkin donuts"],
"cat": ["cafe", "food"]
}
POST place_path_category/_search?pretty
{
"suggest": {
"place_suggestion": {
"prefix": "tim",
"completion": {
"field": "suggest",
"size": 10,
"contexts": {
"place_type": ["cafe"]
}
}
}
}
}