ES模糊查询

prefix：前缀搜索

以xx开头的搜索，不计算相关度评分。注意：

前缀搜索匹配的是term，而不是field。
前缀搜索的性能很差
前缀搜索没有缓存
前缀搜索尽可能把前缀长度设置的更长

GET my_index/_search
{
  "query": {
    "prefix": {
      "text": {
        "value": "enj"
      }
    }
  }
}

wildcard：通配符

通配符运算符是匹配一个或多个字符的占位符。例如，*通配符运算符匹配零个或多个字符。您可以将通配符运算符与其他字符结合使用以创建通配符模式。

注意：通配符匹配的也是term

GET my_index/_search
{
  "query": {
    "wildcard": {
      "text": {
        "value": "thr*gh"
      }
    }
  }
}

regexp：正则表达式

GET my_index/_search
{
  "query": {
    "regexp": {
      "text": {
        "value": ".*ball"
      }
    }
  }
}

fuzzy：模糊查询

模糊的四种情况：

混淆字符 (box → fox)
缺少字符 (black → lack)
多出字符 (sic → sick)
颠倒次序 (act →cat)

参数：

value: （必须, 关键词）
fuzziness: 编辑距离, (0, 1, 2)并非越大越好, 召回率高但结果不准确
1. 两段文本之间的Damerau-Levenshtein距离是使一个字符串与另一个字符串匹配所需的插入、删除、替换和调换的数量
2. 距离公式：Levenshtein是lucene的, es改进版：Damerau-Levenshtein
3. 对于颠倒次序，Levenshtein算法的编辑距离是2，而Damerau-Levenshtein算法的编辑距离是1
transpositions: （可选, 布尔值）指示编辑是否包括两个相邻字符的变位（ab→ba）。默认为true。

GET my_index/_search
{
  "query": {
    "fuzzy": {
      "text": {
        "value": "coolorufl",
        "fuzziness": 2
      }
    }
  }
}

match查询与fuzzy查询的区别：

match查询是分词的，fuzzy查询是不分词的。

// A colorful flower blooms in the garden.
GET my_index/_search
{
  "query": {
    "fuzzy": {  // fuzzy查询直接对原始字符串进行查询
      "text": {
        "value": "flowel colorful",  // 无结果
        "fuzziness": 1
      }
    }
  }
}
GET my_index/_search
{
  "query": {
    "match": {  // match查询正确处理分词后的查询
      "text": {
        "query": "flowel cxlxrfxl",  // 有结果
        "fuzziness": 1
      }
    }
  }
}

match_phrase

match_phrase

match_phrase 会分词
被检索字段必须包含match_phrase中的所有词项并且顺序必须是相同的
被检索字段包含的match_phrase中的词项之间不能有其他词项 match_phrase_prefix match_phrase_prefix与match_phrase相同，但它多了一个特性，它允许在文本的最后一个词项上的前缀匹配。它先在倒排索引中做前缀搜索，然后在匹配的doc中做match_phrase。参数
analyzer指定何种分析器来对该短语进行分词处理
max_expansions限制最后一个词项的前缀扩展梳理。仅match_phrase_prefix有此参数。不等于结果数。分片级别。
boost 用于设置该查询的权重
slop 允许短语间的词项间隔。slop参数表示查询词项相隔多远时仍然能将文档视为匹配。原理解析

ngram与edge-ngram

ngram 将文本按固定长度（或范围）切割成连续的字符片段（n-grams）。例如，单词 "quick" 会被拆分成多个连续的子串：

min_gram=1, max_gram=3 → ["q", "qu", "qui", "u", "ui", "uic", "i", "ic", "ick", "c", "ck", "k"]

min_gram、max_gram默认为1、2

edge_ngram 是 ngram 的变种，仅从词条的开头（边缘）生成片段。例如，"quick" 的 edge_ngram（min_gram=1, max_gram=3）：

输出 → ["q", "qu", "qui"]（不会生成 "uic" 或 "ick"）

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "2_3_ngram": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 3
        }
      },
      "analyzer": {
        "my_ngram": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["2_3_ngram"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "my_ngram",
        "search_analyzer": "standard"
      }
    }
  }
}

GET my_index/_analyze
{
  "text": "A colorful flower blooms in the garden.",
  "analyzer": "my_ngram"
}

GET my_index/_search
{
  "query": {
    "match_phrase": {
      "text": "olo er blo"
    }
  }
}

prefix：前缀搜索​

wildcard：通配符​

regexp：正则表达式​

fuzzy：模糊查询​

match_phrase​

ngram与edge-ngram​