Skip to content

Reported hits count are inconsistent between _search and _search/template #52801

@consulthys

Description

@consulthys
Contributor

Elasticsearch version (bin/elasticsearch --version): 7.5.0

Plugins installed: []

JVM version (java -version): Elastic Cloud

OS version (uname -a if on a Unix-like system): Elastic Cloud

Description of the problem including expected versus actual behavior:

Since ES 7, one must use rest_total_hits_as_int=true in order to revert to the old behavior of getting an exact number of total hits in the search response. I feel there is a discrepancy in how the search and _search/template endpoints behave regarding the reported number of hits.

In my tests below, I'm querying an index with more than 10000 documents with the exact same JSON query (as a normal query and as a template query depending on which endpoint I'm targeting).

{
  "query": {
    "match_all": {}
  }
}

A. When using the _search endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

B. When using the _search?rest_total_hits_as_int=true endpoint, I get this:

"total" : 173175,

C. When using the _search/template endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

So far, so good, everything is consistent.

D. But when I hit the _search/template?rest_total_hits_as_int=true endpoint, I get this:

"total" : 10000,

The only way I found to get the exact total with the _search/template endpoint is by adding the "track_total_hits": true parameter to the template query.

E. When doing so, I get this when hitting the _search/template endpoint

"total" : {
  "value" : 173175,
  "relation" : "eq"
},

F. and this when when hitting the _search/template?rest_total_hits_as_int=true endpoint

"total" : 173175,

There are two take-aways here:

  1. Since A and C are consistent, I feel that B and D should also be consistent.
  2. I also think that B is wrong and should require "track_total_hits": true in the query in order to spit out the exact number of hits (like in cases E and F)

Steps to reproduce:

It's easy to reproduce this on any index that has more than 10K documents and creating a simple match_all template query.

Activity

changed the title [-]_search and _search/template are inconsistent with rest_total_hits_as_int[/-] [+]Reported hits count are inconsistent between _search and _search/template[/+] on Feb 26, 2020
added
:Search/SearchSearch-related issues that do not fall into other categories
on Feb 26, 2020
elasticmachine

elasticmachine commented on Feb 26, 2020

@elasticmachine
Collaborator

Pinging @elastic/es-search (:Search/Search)

gaobinlong

gaobinlong commented on Mar 1, 2020

@gaobinlong
Contributor

From the source code I found that when rest_total_hits_as_int is set to true in _search api(like B), trackTotalHitsUpTo is set to Integer.MAX_VALUE, so we can only get the accurate hits count. But in _search/tempate api(like D), the value of trackTotalHitsUpTo is lost so we get 10000. So the result of D is incorrect I think.

jimczi

jimczi commented on Mar 2, 2020

@jimczi
Contributor

It's lost because the templated search parses the _source late in the action. We should check if trackTotalHits is set before parsing and throw an error if the template search tries to lower it (set to false or to a number). Since you already started to look @gaobinlong , would you be interested in providing a pull request ?

consulthys

consulthys commented on Mar 2, 2020

@consulthys
ContributorAuthor

Thanks @gaobinlong and @jimczi for looking into this.
I'm also interested to know which of A-F is supposed to be the correct intended behavior.

jimczi

jimczi commented on Mar 2, 2020

@jimczi
Contributor

I'm also interested to know which of A-F is supposed to be the correct intended behavior.

Yes sorry, the expectation when setting rest_total_hits_as_int is that the total number of hits ix tracked accurately since the rest response will return hits.total as a numeric value (as opposed to an object in the new format). So D is a bug, the default for track_total_hits when rest_total_hits_as_int is set should be to track the number of hits accurately. E and F is a correct workaround but it shouldn't be needed if we fix D.

consulthys

consulthys commented on Mar 2, 2020

@consulthys
ContributorAuthor

Thank @jimczi so when specifying rest_total_hits_as_int=true one wouldn't have to also specify track_total_hits: true. That makes sense.

gaobinlong

gaobinlong commented on Mar 2, 2020

@gaobinlong
Contributor

@jimczi OK, I'm glad to do that. @consulthys, only D is incorrect, when rest_total_hits_as_int is set to true, the total hits count should be accurate.

added a commit that references this issue on Mar 16, 2020
e2effa9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categories>bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @consulthys@gaobinlong@elasticmachine@jimczi

        Issue actions

          Reported hits count are inconsistent between _search and _search/template · Issue #52801 · elastic/elasticsearch