Skip to content

Realtime get from in-memory segment when possible #64504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 10, 2020

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Nov 2, 2020

If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.

@dnhatn dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.11.0 labels Nov 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 2, 2020
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dnhatn , this looks good. I have only looked at the main part of the PR for now and have a single comment/question.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller comments, otherwise looking good.


@Override
public CacheHelper getReaderCacheHelper() {
return in.getReaderCacheHelper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might mostly be for my understanding, but since the wrapped directory reader could return different results from the inner reader, should we not return null here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We should return null here as we modify the liveDocs. I addressed it in eb968e4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reverted this and added a comment instead. Please see: e176bf2

@dnhatn
Copy link
Member Author

dnhatn commented Nov 9, 2020

@henningandersen Are you okay with the latest changes?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@dnhatn
Copy link
Member Author

dnhatn commented Nov 10, 2020

@henningandersen @jpountz Thanks for your reviews.

@dnhatn dnhatn merged commit 33b408f into elastic:master Nov 10, 2020
@dnhatn dnhatn deleted the get-in-memory branch November 10, 2020 00:38
dnhatn added a commit that referenced this pull request Nov 10, 2020
If the reader wrapper is specified, then we can't perform a realtime get
using operations from translog. With this change, we will create an
in-memory Lucene segment from that indexing operation and perform a
realtime get from that segment to avoid refresh storms.
dnhatn added a commit that referenced this pull request Nov 10, 2020
ywelsch added a commit that referenced this pull request Jul 1, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Reading from translog during a realtime get requires special handling in some higher level components, e.g.
ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with
the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always
applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory
index of the document, even when this directory wrapper exists.

This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a
realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve
the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g.
access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is
created on-demand.

Relates #64504
ywelsch added a commit that referenced this pull request Jul 1, 2021
Reading from translog during a realtime get requires special handling in some higher level components, e.g.
ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with
the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always
applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory
index of the document, even when this directory wrapper exists.

This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a
realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve
the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g.
access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is
created on-demand.

Relates #64504
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants