-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Realtime get from in-memory segment when possible #64504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed (:Distributed/Engine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dnhatn , this looks good. I have only looked at the main part of the PR for now and have a single comment/question.
server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few smaller comments, otherwise looking good.
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/SingleDocDirectoryReader.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public CacheHelper getReaderCacheHelper() { | ||
return in.getReaderCacheHelper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might mostly be for my understanding, but since the wrapped directory reader could return different results from the inner reader, should we not return null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. We should return null here as we modify the liveDocs. I addressed it in eb968e4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to reverted this and added a comment instead. Please see: e176bf2
server/src/main/java/org/elasticsearch/index/engine/Engine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/Engine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/SingleDocDirectoryReader.java
Outdated
Show resolved
Hide resolved
@henningandersen Are you okay with the latest changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
Show resolved
Hide resolved
@henningandersen @jpountz Thanks for your reviews. |
If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.
Reading from translog during a realtime get requires special handling in some higher level components, e.g. ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory index of the document, even when this directory wrapper exists. This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g. access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is created on-demand. Relates #64504
Reading from translog during a realtime get requires special handling in some higher level components, e.g. ShardGetService, where we're doing a bunch of tricks to extract other stored fields from the source. Another issue with the current approach relates to #74227 where we introduce a new "field usage tracking" directory wrapper that's always applied, and we want to make sure that we can still quickly do realtime gets from translog without creating an in-memory index of the document, even when this directory wrapper exists. This PR introduces a directory reader that contains a single translog indexing operation. This can be used during a realtime get to access documents that haven't been refreshed yet. In the normal case, all information relevant to resolve the realtime get is mocked out to provide fast access to _id and _source. In case where more values are requested (e.g. access to other stored fields) etc., this reader will index the document into an in-memory Lucene segment that is created on-demand. Relates #64504
If the reader wrapper is specified, then we can't perform a realtime get using operations from translog. With this change, we will create an in-memory Lucene segment from that indexing operation and perform a realtime get from that segment to avoid refresh storms.