Avoid sliced locked contention in internal engine #18060

jasontedor · 2016-04-29T07:22:28Z

Today we use a sliced lock strategy for acquiring locks to prevent concurrent updates to the same document. The number of sliced locks is computed as a linear function of the number of logical processors. Unfortunately, the probability of a collision against a sliced lock is prone to the birthday problem and grows faster than expected. In fact, the mathematics works out such that for a fixed target probability of collision, the number of lock slices should grow like the square of the number of logical processors. This is less-than-ideal, and we can do better anyway. This commit introduces a strategy for avoiding lock contention within the internal engine. Ideally, we would only have lock contention if there were concurrent updates to the same document. So the ideal solution here is to switch to using a keyed lock per document.

Closes #18053

Today we use a sliced lock strategy for acquiring locks to prevent concurrent updates to the same document. The number of sliced locks is computed as a linear function of the number of logical processors. Unfortunately, the probability of a collision against a sliced lock is prone to the birthday problem and grows faster than expected. In fact, the mathematics works out such that for a fixed target probability of collision, the number of lock slices should grow like the square of the number of logical processors. This is less-than-ideal, and we can do better anyway. This commit introduces a strategy for avoiding lock contention within the internal engine. Ideally, we would only have lock contention if there were concurrent updates to the same document. We can get close to this ideal world by associating a lock with the ID of each document. This association can be held in a concurrent hash map. Now, the JDK ConcurrentHashMap also uses a sliced lock internally, but it has several strategies for avoiding taking the locks and these locks are only held for a very short period of time. This implementation associates a reference count with the lock that is associated with a document ID and automatically removes the document ID from the concurrent hash map when the reference count reaches zero. Lastly, this implementation uses a pool of locks to minimize object allocations during indexing.

jasontedor · 2016-04-29T07:41:41Z

Running the same workload as @polyfractal in #18053, the strategy employed in this pull request showed a consistent 15--20% gain in performance (on a single machine) and almost completely eliminates lock contention in the internal engine (the extremely tiny bit that remains is from the concurrent hash map). Interestingly, it shifts some contention to the translog writer, but nowhere near enough to offset the gains from reducing the contention here. It's important to note in the comparison images from Java Flight Recorder below how long threads were blocked on the sliced lock in internal engine (compared to how long they are blocked on the translog writer).

master @ 0cf1d16:

sliced-lock-contention @ a78adcb:

jpountz · 2016-04-29T07:51:29Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+    private void remove(Term uid) {
+        remove(uid.bytes());
+    }
+


This logic looks very similar to KeyedLock, maybe we should use it instead?

I think the difference is that KeyedLock does not have a pool of locks and creates them on demand but I don't think reusing matters much here since we run costly operations under the lock?

Cool, I wasn't aware of that implementation. I pushed c14796c.

This commit replaces the use of a custom keyed lock implementation for an existing abstraction that already existed.

nik9000 · 2016-04-29T11:02:44Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -100,7 +100,7 @@
    // we use the hashed variant since we iterate over it and check removal and additions on existing keys
    private final LiveVersionMap versionMap;

-    private final Object[] dirtyLocks;
+    private final KeyedLock<BytesRef> keyedLock = new KeyedLock<>();


Maybe should be called something with dirty in the name because all the methods refer to that name.

It's not a dirty lock anymore. The method name should just change.

@nik9000 I pushed 64ed5f7.

nik9000 · 2016-04-29T11:07:30Z

Nice!

jpountz · 2016-04-29T11:17:31Z

LGTM

This commit clarifies the name of InternalEngine#dirtyLock by changing it to InternalEngine#acquireLock now that the locks are no longer dirty sliced locks.

nik9000 · 2016-04-29T12:06:02Z

Awesome!

jpountz · 2016-04-29T12:17:34Z

I am curious whether this change will be noticeable on https://benchmarks.elastic.co.

nik9000 · 2016-04-29T12:22:57Z

I am curious whether this change will be noticeable on https://benchmarks.elastic.co.

Me too! It might not be as big as 20% because @polyfractal's workload is pretty unique iirc. Really small documents and _source disabled I think? Looks like sensor data?

polyfractal · 2016-04-29T13:39:27Z

Awesome, thanks @jasontedor, this is great! :)

I'm curious about the nightly benchmarks too. @nik9000 is correct, this is small doc "metric/sensor" style data. Four small cardinality integers, timestamp and one float metric (pulled from a gaussian unique to each "set" of data), _source and _all disabled.

jasontedor · 2016-04-30T16:12:37Z

Some of the nightly benchmarks showed a healthy bump:

jasontedor · 2016-04-30T16:20:32Z

Awesome, thanks @jasontedor, this is great! :)

@polyfractal Thank you for discovering and raising the issue in #18053 and verifying the results on your setup!

nik9000 · 2016-04-30T17:38:22Z

Awesome!
On Apr 30, 2016 12:20 PM, "Jason Tedor" notifications@github.com wrote:

Awesome, thanks @jasontedor https://github.com/jasontedor, this is
great! :)

@polyfractal https://github.com/polyfractal Thank you for discovering
and raising the issue in #18053
#18053 and verifying the
results on your setup!

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#18060 (comment)

drewr · 2016-04-30T19:40:59Z

It will be interesting to see if this addresses the bulk thread contention wave I've seen with hundreds of clients over dozens of nodes. Sounds birthday-ish in itself.

jasontedor added >enhancement review :Core/Infra/Core v5.0.0-alpha3 labels Apr 29, 2016

jpountz reviewed Apr 29, 2016
View reviewed changes

Use existing abstraction for keyed lock

c14796c

This commit replaces the use of a custom keyed lock implementation for an existing abstraction that already existed.

nik9000 reviewed Apr 29, 2016
View reviewed changes

Clarify method name in InternalEngine

64ed5f7

This commit clarifies the name of InternalEngine#dirtyLock by changing it to InternalEngine#acquireLock now that the locks are no longer dirty sliced locks.

jasontedor merged commit c25f8ad into elastic:master Apr 29, 2016

jasontedor deleted the sliced-lock-contention branch April 29, 2016 12:05

jasontedor added the v2.4.0 label Apr 29, 2016

jasontedor removed the v2.4.0 label Apr 29, 2016

This was referenced Apr 29, 2016

Sliced lock contention 2.x #18067

Merged

Append-only indices #18069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid sliced locked contention in internal engine #18060

Avoid sliced locked contention in internal engine #18060

jasontedor commented Apr 29, 2016 •

edited

Loading

jasontedor commented Apr 29, 2016 •

edited

Loading

jpountz Apr 29, 2016

jasontedor Apr 29, 2016 •

edited

Loading

nik9000 Apr 29, 2016

jasontedor Apr 29, 2016

jasontedor Apr 29, 2016

nik9000 commented Apr 29, 2016

jpountz commented Apr 29, 2016

nik9000 commented Apr 29, 2016

jpountz commented Apr 29, 2016

nik9000 commented Apr 29, 2016

polyfractal commented Apr 29, 2016

jasontedor commented Apr 30, 2016

jasontedor commented Apr 30, 2016

nik9000 commented Apr 30, 2016

drewr commented Apr 30, 2016

Avoid sliced locked contention in internal engine #18060

Avoid sliced locked contention in internal engine #18060

Conversation

jasontedor commented Apr 29, 2016 • edited Loading

jasontedor commented Apr 29, 2016 • edited Loading

jpountz Apr 29, 2016

Choose a reason for hiding this comment

jasontedor Apr 29, 2016 • edited Loading

Choose a reason for hiding this comment

nik9000 Apr 29, 2016

Choose a reason for hiding this comment

jasontedor Apr 29, 2016

Choose a reason for hiding this comment

jasontedor Apr 29, 2016

Choose a reason for hiding this comment

nik9000 commented Apr 29, 2016

jpountz commented Apr 29, 2016

nik9000 commented Apr 29, 2016

jpountz commented Apr 29, 2016

nik9000 commented Apr 29, 2016

polyfractal commented Apr 29, 2016

jasontedor commented Apr 30, 2016

jasontedor commented Apr 30, 2016

nik9000 commented Apr 30, 2016

drewr commented Apr 30, 2016

jasontedor commented Apr 29, 2016 •

edited

Loading

jasontedor commented Apr 29, 2016 •

edited

Loading

jasontedor Apr 29, 2016 •

edited

Loading