Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execution_hint: 'map' loads global ords when it doesn't need to #37705

Closed
alexander-marquardt opened this issue Jan 22, 2019 · 2 comments
Closed

Comments

@alexander-marquardt
Copy link

alexander-marquardt commented Jan 22, 2019

(corrected description) 'execution_hint': 'map' loads global ordinals, even though they are not required. This feature is documented here.

I have a client with an index containing hundreds of millions of documents. Within this index, there is a high cardinality field with hundreds of millions of possible values. When the client executes a query that matches a few hundred documents, and then runs a terms aggregation on the high-cardinality field, Elastic will rebuild global ordinals, which can take 15 seconds (In-fact it even does this rebuild of the global ordinals if the query matched zero documents).There are several options for solving this issue:

  1. wait 15 seconds to build global ordinals on execution of the aggregation (not acceptable, and not a real solution)

  2. enable eager global ordinals and increase the refresh interval to minimize the impact of constant rebuilding of global ordinals (which is not ideal due to having to wait to see results, and the constant work of rebuilding global ordinals)

  3. use ‘map’ to only evaluate documents that match the query when running the terms aggregation (doesn’t work)

  4. do a hack - use a script to return the value for the terms aggregation, which forces global ordinals to be ignored as they don't exist for a script-generated field (this works, but feels hackey).

To give context, this is for a bank. A given client will want to see all the IBAN numbers they have transfered to. There are hundreds of millions of IBAN numbers, but each client will have only used on the order of hundreds.

I am currently using option (4) to work around the fact that (3) does not work. Ideally I would like to use (3) execution_hint: map to solve this issue.

This was discussed in the #elasticsearch slack channel on Jan 22, 2019

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor

Just a clarification note for anyone working on this in the future: the issue is that map will load global ordinals even though they aren't required by the map aggregator (StringTermsAggregator), not that the map execution hint is ignored.

The hint works as expected, it's that the relationship between global ords and the aggregator aren't as you'd expect.

@polyfractal polyfractal changed the title execution_hint: 'map' ignored in aggregation execution_hint: 'map' loads global ords when it doesn't need to Jan 22, 2019
@jimczi jimczi added the >bug label Jan 24, 2019
jimczi added a commit that referenced this issue Feb 1, 2019
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality.

Closes #37705
jimczi added a commit that referenced this issue Feb 1, 2019
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality.

Closes #37705
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants