Elasticsearch 基础教程

Elasticsearch 高级教程

Elasticsearch 插件

Elasticsearch 笔记

es 去重查询(聚合、分组、分页、求和统计等)

Elasticsearch 笔记 Elasticsearch 笔记


elasticsearch(es) 如何针对指定字段进行去重相关查询,完成如聚合、分组、分页、类似求和统计等操作?

获取所有的不同值

es 获取指定字段所有可能的值,可以使用桶聚合的 terms 聚合,如下示例:

GET {index}/_search
{
  "size": 0,
  "aggs": {
    "distinct_aggs": {
      "terms": {
        "field": "status"
      }
    }
  }
}

如上示例,获取指定索引的 status 字段的不同值,size 字段设置为 0,表示搜索出来的文档数为 0 个,也表示不关心文档内容只要聚合结果。 如果为 1 ,就会搜索出 1 个文档。返回如下:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 58439,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "distinct_aggs": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 3,
          "doc_count": 46619
        },
        {
          "key": 2,
          "doc_count": 11810
        },
        {
          "key": 1,
          "doc_count": 10
        }
      ]
    }
  }
}

去重后分页

分页的话,肯定需要有排序规则,接着如上示例,增加的获取的条数参数 size 和 排序参数 order 即可:

GET {index}/_search
{
  "size": 0,
  "aggs": {
    "distinct_aggs": {
      "terms": {
        "field": "item_id",
        "size" : 1000,
        "order": {
          "_term": "asc"
        }
      }
    }
  }
}

输出如下:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 58463,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "distinct_aggs": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1,
          "doc_count": 32
        },
        {
          "key": 2,
          "doc_count": 11811
        },
        {
          "key": 3,
          "doc_count": 46620
        },
        ...
      ]
    }
  }
}

聚合求和统计

聚合字段的排序,也可以通过指定字段的求和等计算统计结果后进行升降序排序,具体示例如下:

GET {index}/_search
{
  "size": 0,
  "aggs": {
    "item_terms": {
      "terms": {
        "field": "item_id",
        "size": 1000,
        "order":[{
          "gmv_stat": "desc"
        },{
          "gmv_180d": "desc"
        }]
      },
      "aggs": {
        "gmv_stat": {
          "sum": {
            "field": "gmv"
          }
        },
        "gmv_180d": {
          "sum": {
            "script": "doc['gmv_90d'].value*2"
          }
        }
      }
    }
  }
}

返回如下:

{
  ...
  "aggregations": {
    "item_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 260,
      "buckets": [
        {
          "key": 23388,
          "doc_count": 18,
          "gmv_stat": {
            "value": 176220
          },
          "gmv_180d": {
            "value": 89732
          }
        },
        {
          "key": 96117,
          "doc_count": 16,
          "gmv_stat": {
            "value": 129306
          },
          "gmv_180d": {
            "value": 56988
          }
        },
        ...
      ]
    }
  }
}