Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reported hits count are inconsistent between _search and _search/template #52801

Closed
consulthys opened this issue Feb 26, 2020 · 7 comments
Closed
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@consulthys
Copy link
Contributor

consulthys commented Feb 26, 2020

Elasticsearch version (bin/elasticsearch --version): 7.5.0

Plugins installed: []

JVM version (java -version): Elastic Cloud

OS version (uname -a if on a Unix-like system): Elastic Cloud

Description of the problem including expected versus actual behavior:

Since ES 7, one must use rest_total_hits_as_int=true in order to revert to the old behavior of getting an exact number of total hits in the search response. I feel there is a discrepancy in how the search and _search/template endpoints behave regarding the reported number of hits.

In my tests below, I'm querying an index with more than 10000 documents with the exact same JSON query (as a normal query and as a template query depending on which endpoint I'm targeting).

{
  "query": {
    "match_all": {}
  }
}

A. When using the _search endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

B. When using the _search?rest_total_hits_as_int=true endpoint, I get this:

"total" : 173175,

C. When using the _search/template endpoint, I get this:

"total" : {
  "value" : 10000,
  "relation" : "gte"
},

So far, so good, everything is consistent.

D. But when I hit the _search/template?rest_total_hits_as_int=true endpoint, I get this:

"total" : 10000,

The only way I found to get the exact total with the _search/template endpoint is by adding the "track_total_hits": true parameter to the template query.

E. When doing so, I get this when hitting the _search/template endpoint

"total" : {
  "value" : 173175,
  "relation" : "eq"
},

F. and this when when hitting the _search/template?rest_total_hits_as_int=true endpoint

"total" : 173175,

There are two take-aways here:

  1. Since A and C are consistent, I feel that B and D should also be consistent.
  2. I also think that B is wrong and should require "track_total_hits": true in the query in order to spit out the exact number of hits (like in cases E and F)

Steps to reproduce:

It's easy to reproduce this on any index that has more than 10K documents and creating a simple match_all template query.

@consulthys consulthys changed the title _search and _search/template are inconsistent with rest_total_hits_as_int Reported hits count are inconsistent between _search and _search/template Feb 26, 2020
@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >bug labels Feb 26, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@gaobinlong
Copy link
Contributor

From the source code I found that when rest_total_hits_as_int is set to true in _search api(like B), trackTotalHitsUpTo is set to Integer.MAX_VALUE, so we can only get the accurate hits count. But in _search/tempate api(like D), the value of trackTotalHitsUpTo is lost so we get 10000. So the result of D is incorrect I think.

@jimczi
Copy link
Contributor

jimczi commented Mar 2, 2020

It's lost because the templated search parses the _source late in the action. We should check if trackTotalHits is set before parsing and throw an error if the template search tries to lower it (set to false or to a number). Since you already started to look @gaobinlong , would you be interested in providing a pull request ?

@consulthys
Copy link
Contributor Author

Thanks @gaobinlong and @jimczi for looking into this.
I'm also interested to know which of A-F is supposed to be the correct intended behavior.

@jimczi
Copy link
Contributor

jimczi commented Mar 2, 2020

I'm also interested to know which of A-F is supposed to be the correct intended behavior.

Yes sorry, the expectation when setting rest_total_hits_as_int is that the total number of hits ix tracked accurately since the rest response will return hits.total as a numeric value (as opposed to an object in the new format). So D is a bug, the default for track_total_hits when rest_total_hits_as_int is set should be to track the number of hits accurately. E and F is a correct workaround but it shouldn't be needed if we fix D.

@consulthys
Copy link
Contributor Author

Thank @jimczi so when specifying rest_total_hits_as_int=true one wouldn't have to also specify track_total_hits: true. That makes sense.

@gaobinlong
Copy link
Contributor

@jimczi OK, I'm glad to do that. @consulthys, only D is incorrect, when rest_total_hits_as_int is set to true, the total hits count should be accurate.

@jimczi jimczi closed this as completed in fb158df Mar 13, 2020
jimczi pushed a commit that referenced this issue Mar 16, 2020
When 'rest_track_total_hits_as_int' is set to true, the total hits count in the response should be accurate. So we should set trackTotalHits to true if need when parsing the inline script of a search template request.

Closes #52801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

4 participants