_analyze api does not correctly use normalizers when specified #48650

dougnelas · 2019-10-29T19:00:43Z

7.3.1 (bin/elasticsearch --version):

Plugins installed: []

Embedded Java 11 (java -version):

OS version (uname -a if on a Unix-like system):

Description: When using the _analyze api endpoint on an index with normalizer defined in the index settings. The output is coming from the default analyzer instead of the normalizer under test,

Steps to reproduce:

Create a test index

PUT word_delimiter_test
{
  "settings": {
    "analysis": {
      "char_filter": {
        "filter_noisy_characters": {
          "pattern": "[.-:\"]",
          "type": "pattern_replace",
          "replacement": " "
        },
        "convert_dots": {
          "flags": "CASE_INSENSITIVE",
          "pattern": "\\.(net|js|io)",
          "type": "pattern_replace",
          "replacement": "dot$1"
        }
      },
      "filter": {
        "word_delimiter": {
          "split_on_numerics": true,
          "generate_word_parts": true,
          "generate_number_parts": true,
          "catenate_all": true,
          "type": "word_delimiter_graph",
          "type_table": [
            "# => ALPHA",
            "+ => ALPHA"
          ]
        },
        "synonym": {
          "type": "synonym_graph",
          "synonyms": [
            "casp, comptia advanced security practitioner"
          ]
        }
      },
      "analyzer": {
        "test": {
          "char_filter": [
            "convert_dots"
          ],
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "synonym",
            "word_delimiter",
            "flatten_graph"
          ]
        }
      },
      "normalizer": {
        "languages_normalizer": {
          "filter": [
            "trim"
          ],
          "type": "custom",
          "char_filter": [
            "convert_dots",
            "filter_noisy_characters"
          ]
        }
      }
    }
  }
}```

Then run an _analyze endpoint to test the normalizer

GET word_delimiter_test/_analyze
{
  "text": "Wi-fi",
  "normalizer": "languages_normalizer"
}
Expected output should be "wifi"

But the output is analyzed

```json
{
  "tokens" : [
    {
      "token" : "wi",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "fi",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-29T20:54:32Z

Pinging @elastic/es-search (:Search/Analysis)

jtibshirani · 2019-10-29T20:56:18Z

Thanks @dougnelas for raising this, it indeed looks like we have a bug there.

Currently the `_analyze` endpoint doesn't correctly use normalizers specified in the request. This change fixes that by returning the resolved normalizer from TransportAnalyzeAction#getAnalyzer and updates test to be able to catch this in the future. Closes #48650

jtibshirani added :Search Relevance/Analysis >bug labels Oct 29, 2019

gaobinlong mentioned this issue Nov 5, 2019

_analyze api does not correctly use normalizers when specified #48866

Merged

cbuescher closed this as completed in a666fb2 Nov 14, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

javanna added the Team:Search Relevance label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

_analyze api does not correctly use normalizers when specified #48650

_analyze api does not correctly use normalizers when specified #48650

dougnelas commented Oct 29, 2019

elasticmachine commented Oct 29, 2019

Uh oh!

jtibshirani commented Oct 29, 2019 •

edited

Loading

Uh oh!

_analyze api does not correctly use normalizers when specified #48650

_analyze api does not correctly use normalizers when specified #48650

Comments

dougnelas commented Oct 29, 2019

elasticmachine commented Oct 29, 2019

Uh oh!

jtibshirani commented Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtibshirani commented Oct 29, 2019 •

edited

Loading