Skip to content

Add support for Significant Terms aggregation #146

@tomconte

Description

@tomconte

Reference documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

"An aggregation that returns interesting or unusual occurrences of terms in a set." In Kibana, it can only be applied to text fields.

According to the documentation, this aggregation is primarily used with a "parent-level aggregation to segment the data ready for analysis". In other terms, it is typically a sub-aggregation of another bucket aggregation, like Terms or a Histogram.

This makes this issue dependant on support for sub-bucket aggregations (#145).

Before implementing, we need to identify how to perform a similar query in Kusto. According to the doc, Significant Terms "are the terms that have undergone a significant change in popularity measured between a foreground and background set. [...] In the simplest case, the foreground set of interest is the search results matched by a query and the background set used for statistical comparisons is the index or indices from which the results were gathered."

Sample request:

  "aggs": {
    "2": {
      "histogram": {
        "field": "AvgTicketPrice",
        "interval": 100,
        "min_doc_count": 1
      },
      "aggs": {
        "3": {
          "significant_terms": {
            "field": "DestCountry",
            "size": 3
          }
        }
      }
    }

Response: (extract)

  "aggregations": {
    "2": {
      "buckets": [
        {
          "3": {
            "doc_count": 749,
            "bg_count": 13059,
            "buckets": [
              {
                "key": "IT",
                "doc_count": 243,
                "score": 0.2552994319244096,
                "bg_count": 2371
              },
              {
                "key": "US",
                "doc_count": 172,
                "score": 0.11694192970564077,
                "bg_count": 1987
              },
              {
                "key": "CH",
                "doc_count": 79,
                "score": 0.10476945913799716,
                "bg_count": 691
              }
            ]
          },
          "key": 100,
          "doc_count": 749
        },
        {
          "3": {
            "doc_count": 1067,
            "bg_count": 13059,
            "buckets": [
              {
                "key": "IT",
                "doc_count": 241,
                "score": 0.0551183926043845,
                "bg_count": 2371
              },
              {
                "key": "CH",
                "doc_count": 83,
                "score": 0.03656787843506986,
                "bg_count": 691
              },
              {
                "key": "US",
                "doc_count": 185,
                "score": 0.02418926301801488,
                "bg_count": 1987
              }
            ]
          },
          "key": 200,
          "doc_count": 1067
        },

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions