-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Avoid sliced locked contention in internal engine #18060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Today we use a sliced lock strategy for acquiring locks to prevent concurrent updates to the same document. The number of sliced locks is computed as a linear function of the number of logical processors. Unfortunately, the probability of a collision against a sliced lock is prone to the birthday problem and grows faster than expected. In fact, the mathematics works out such that for a fixed target probability of collision, the number of lock slices should grow like the square of the number of logical processors. This is less-than-ideal, and we can do better anyway. This commit introduces a strategy for avoiding lock contention within the internal engine. Ideally, we would only have lock contention if there were concurrent updates to the same document. We can get close to this ideal world by associating a lock with the ID of each document. This association can be held in a concurrent hash map. Now, the JDK ConcurrentHashMap also uses a sliced lock internally, but it has several strategies for avoiding taking the locks and these locks are only held for a very short period of time. This implementation associates a reference count with the lock that is associated with a document ID and automatically removes the document ID from the concurrent hash map when the reference count reaches zero. Lastly, this implementation uses a pool of locks to minimize object allocations during indexing.
Running the same workload as @polyfractal in #18053, the strategy employed in this pull request showed a consistent 15--20% gain in performance (on a single machine) and almost completely eliminates lock contention in the internal engine (the extremely tiny bit that remains is from the concurrent hash map). Interestingly, it shifts some contention to the translog writer, but nowhere near enough to offset the gains from reducing the contention here. It's important to note in the comparison images from Java Flight Recorder below how long threads were blocked on the sliced lock in internal engine (compared to how long they are blocked on the translog writer). master @ 0cf1d16: sliced-lock-contention @ a78adcb: |
private void remove(Term uid) { | ||
remove(uid.bytes()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks very similar to KeyedLock, maybe we should use it instead?
I think the difference is that KeyedLock does not have a pool of locks and creates them on demand but I don't think reusing matters much here since we run costly operations under the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I wasn't aware of that implementation. I pushed c14796c.
This commit replaces the use of a custom keyed lock implementation for an existing abstraction that already existed.
@@ -100,7 +100,7 @@ | |||
// we use the hashed variant since we iterate over it and check removal and additions on existing keys | |||
private final LiveVersionMap versionMap; | |||
|
|||
private final Object[] dirtyLocks; | |||
private final KeyedLock<BytesRef> keyedLock = new KeyedLock<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should be called something with dirty
in the name because all the methods refer to that name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a dirty lock anymore. The method name should just change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! |
LGTM |
This commit clarifies the name of InternalEngine#dirtyLock by changing it to InternalEngine#acquireLock now that the locks are no longer dirty sliced locks.
Awesome! |
I am curious whether this change will be noticeable on https://benchmarks.elastic.co. |
Me too! It might not be as big as 20% because @polyfractal's workload is pretty unique iirc. Really small documents and _source disabled I think? Looks like sensor data? |
Awesome, thanks @jasontedor, this is great! :) I'm curious about the nightly benchmarks too. @nik9000 is correct, this is small doc "metric/sensor" style data. Four small cardinality integers, timestamp and one float metric (pulled from a gaussian unique to each "set" of data), |
Some of the nightly benchmarks showed a healthy bump: |
@polyfractal Thank you for discovering and raising the issue in #18053 and verifying the results on your setup! |
Awesome!
|
It will be interesting to see if this addresses the bulk thread contention wave I've seen with hundreds of clients over dozens of nodes. Sounds birthday-ish in itself. |
Today we use a sliced lock strategy for acquiring locks to prevent concurrent updates to the same document. The number of sliced locks is computed as a linear function of the number of logical processors. Unfortunately, the probability of a collision against a sliced lock is prone to the birthday problem and grows faster than expected. In fact, the mathematics works out such that for a fixed target probability of collision, the number of lock slices should grow like the square of the number of logical processors. This is less-than-ideal, and we can do better anyway. This commit introduces a strategy for avoiding lock contention within the internal engine. Ideally, we would only have lock contention if there were concurrent updates to the same document. So the ideal solution here is to switch to using a keyed lock per document.
Closes #18053