A new cluster coordination layer #32006

ywelsch · 2018-07-12T13:29:33Z

The cluster state contains important metadata about the cluster, including what the mappings look like, what settings the indices have, which shards are allocated to which nodes, etc. Inconsistencies in the cluster state can have the most horrid consequences including inconsistent search results and data loss, and the job of the cluster state coordination subsystem is to prevent any such inconsistencies. Ideally this subsystem should also be easy to configure correctly and it should perform well in a variety of situations.

The goal of this project is to rebuild the cluster state coordination subsystem, making it more reliable, performant and user-friendly. Better reliability will be achieved by basing the core algorithm on strong theoretical underpinnings and extensive testing. Misconfiguration of the minimum_master_nodes setting, one of the most common causes for cluster state inconsistencies, will be addressed by having this property fully managed by the system itself.

We've built a prototype to validate the approach and, based on our experience with this, present the following development roadmap for this new cluster coordination and consensus layer, targeting ES 7.0:

After 7.0 FF:

Deprecate any Zen1-specific settings and rename any others that mention zen but which are still in use. (Deprecate unused Zen1 settings #38289,Rename static Zen1 settings #38333,Rename no-master-block setting #38350)
Make discovery.type non-configurable/internal-only / move Zen1 to tests only (Remove Zen1 #39466)
Scaling tests (e.g. election clashes when having large cluster states)
Do not close bad indices on state recovery (Do not close bad indices on startup #39500)
Add stats (e.g. expose stuff like node term, or discovery information while the node has troubles forming / joining a cluster) ([Zen2] Add warning if cluster fails to form fast enough #35993)
Contemplate timeouts, retries, etc. and consider improvements to default values (Decrease leader and follower check timeout #38298)
Check logged messages are useful and at the appropriate levels (Do not log unsuccessful join attempt each time #39756, Reduce logging noise when stepping down as master before state recovery #39950).
Docs 📜 ([Zen2] Update documentation for Zen2 #34714, Move 'lost cluster state updates' issue to DONE #36959, [DOCS] Adds overview and API ref for cluster voting configurations #36954, Remove duplicate paragraph #36942, [DOCS] Merges list of discovery and cluster formation settings #36909) also docs for full-cluster and rolling upgrades

Post 7.0:

Smoother master failovers by not exposing those to the ClusterApplierService, i.e., delay putting up a NO_MASTER_BLOCK.
Abdicate on leader shutdown (appoint new leader)
Add "has_voting_exclusions" flag to cluster health output (Add has_voting_exclusions flag to cluster health output #38568)
Enqueueing cluster state updates to behave as well as possible in an overloaded cluster.
Verify that a master which cannot write its cluster state stands down (or maybe actively abdicates)
Deal appropriately with duplicate nodes (see e.g. NotMasterException with duplicate node ids and minimum_master_nodes not met #32904)
High-level rest client integration for new APIs
Avoid bootstrapping if any discovered peer has a nonzero term
Work with support to enhance cluster diagnostics analysis tool.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-12T13:29:35Z

Pinging @elastic/es-distributed

Implements the state machine on the master to publish a cluster state. Relates to #32006

Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR are as follows: - ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings). - Some of the integration tests require persistent storage of the cluster state, which is not fully implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched over as soon as that is fully implemented. - Some very few integration tests are not running yet with Zen2 for other reasons, depending on some of the other open points in #32006.

This commit overhauls the documentation of discovery and cluster coordination, removing mention of the Zen Discovery module and replacing it with docs for the new cluster coordination mechanism introduced in 7.0. Relates #32006

Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.

ywelsch · 2019-04-24T14:30:50Z

Closing this one as shipped in 7.0. Possible follow-ups will be tracked separately.

The changes in elastic#32006 mean that the discovery process can no longer use master-ineligible nodes as a stepping-stone between master-eligible nodes. This was normally an indication of a strange and possibly-fragile configuration and was not recommended, but this commit adds a note to the breaking changes docs to note that this kind of configuration is more obviously broken in recent versions.

The changes in #32006 mean that the discovery process can no longer use master-ineligible nodes as a stepping-stone between master-eligible nodes. This was normally an indication of a strange and possibly-fragile configuration and was not recommended, but this commit adds a note to the breaking changes docs to note that this kind of configuration is more obviously broken in recent versions.

The changes in #32006 mean that the discovery process can no longer use master-ineligible nodes as a stepping-stone between master-eligible nodes. This was normally an indication of a strange and possibly-fragile configuration and was not recommended. This commit clarifies that only master-eligible nodes are now involved with discovery.

The changes in elastic#32006 mean that the discovery process can no longer use master-ineligible nodes as a stepping-stone between master-eligible nodes. This was normally an indication of a strange and possibly-fragile configuration and was not recommended. This commit clarifies that only master-eligible nodes are now involved with discovery.

This resolves a longstanding TODO in the cluster coordination subsystem. Relates elastic#32006

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates elastic#32006

This resolves a longstanding TODO in the cluster coordination subsystem. Relates #32006

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates #32006

This resolves a longstanding TODO in the cluster coordination subsystem. Relates elastic#32006

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates elastic#32006

ywelsch added >feature resiliency Meta :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jul 12, 2018

ywelsch assigned ywelsch and DaveCTurner Jul 12, 2018

This was referenced Jul 16, 2018

Add term and config to cluster state #32100

Merged

Add core coordination algorithm for cluster state publishing #32171

Merged

DaveCTurner mentioned this issue Jul 30, 2018

Support for arbiter/temporary master node #32462

Closed

ywelsch mentioned this issue Aug 2, 2018

Zen2: Cluster state publication pipeline #32584

Merged

ywelsch added a commit that referenced this issue Aug 7, 2018

Zen2: Cluster state publication pipeline (#32584)

785b6e8

Implements the state machine on the master to publish a cluster state. Relates to #32006

DaveCTurner mentioned this issue Aug 17, 2018

NotMasterException with duplicate node ids and minimum_master_nodes not met #32904

Closed

andrershov mentioned this issue Sep 21, 2018

Zen2 ClusterState storage #33958

Closed

6 tasks

DaveCTurner mentioned this issue Oct 30, 2018

NetworkDisruptionIT testJobRelocation failing #35052

Closed

ywelsch mentioned this issue Nov 18, 2018

Zen2: Move most integration tests to Zen2 #35678

Merged

This was referenced Dec 19, 2018

ES master re-election algorithm tries electing a non-reachable master. #31801

Closed

[Feature Request] Configuration to customize discovery/zen/fd/master_ping #36822

Closed

ywelsch mentioned this issue Dec 21, 2018

Add linearizability checker for coordination layer #36943

Merged

DaveCTurner mentioned this issue Dec 21, 2018

BUG: Negative value is successfully set for “discovery.zen.commit_timeout” parameter #36632

Closed

DaveCTurner mentioned this issue Dec 31, 2018

Merge master election with state recovery in the case of a full cluster restart #14016

Closed

andrershov self-assigned this Feb 7, 2019

ywelsch added the v7.0.0 label Feb 24, 2019

This was referenced Mar 11, 2019

es server always restart because of reading metadata file incorrectly #37286

Closed

Reduce logging noise when stepping down as master before state recovery #39950

Merged

ywelsch added v7.0.0 and removed v7.2.0 labels Apr 24, 2019

ywelsch closed this as completed Apr 24, 2019

ywelsch mentioned this issue May 31, 2019

_version does not uniquely identify a particular version of a document #19269

Closed

DaveCTurner mentioned this issue Jun 13, 2019

Partial network partitioning leads to cluster unavailability. #43183

Closed

DaveCTurner mentioned this issue Jul 24, 2019

Clarify that discovery ignores master-ineligibles #44835

Merged

This was referenced Feb 13, 2020

A VM pause (due to GC, high IO load, etc) can cause the loss of inserted documents #10426

Closed

Network partitions can cause divergence, dirty reads, and lost updates. #20031

Closed

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Mar 31, 2020

Use VotingConfiguration#of where possible

d09853a

This resolves a longstanding TODO in the cluster coordination subsystem. Relates elastic#32006

DaveCTurner mentioned this issue Mar 31, 2020

Use VotingConfiguration#of where possible #54507

Merged

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Mar 31, 2020

Resolve some coordination-layer TODOs

f5113d1

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates elastic#32006

DaveCTurner mentioned this issue Mar 31, 2020

Resolve some coordination-layer TODOs #54511

Merged

DaveCTurner added a commit that referenced this issue Mar 31, 2020

Use VotingConfiguration#of where possible (#54507)

2073d8c

This resolves a longstanding TODO in the cluster coordination subsystem. Relates #32006

DaveCTurner added a commit that referenced this issue Apr 1, 2020

Use VotingConfiguration#of where possible (#54507)

5e3b6ab

This resolves a longstanding TODO in the cluster coordination subsystem. Relates #32006

DaveCTurner added a commit that referenced this issue Apr 1, 2020

Resolve some coordination-layer TODOs (#54511)

07b8b07

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates #32006

DaveCTurner added a commit that referenced this issue Apr 1, 2020

Resolve some coordination-layer TODOs (#54511)

6d976e1

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates #32006

yyff pushed a commit to yyff/elasticsearch that referenced this issue Apr 17, 2020

Use VotingConfiguration#of where possible (elastic#54507)

2ebc329

This resolves a longstanding TODO in the cluster coordination subsystem. Relates elastic#32006

yyff pushed a commit to yyff/elasticsearch that referenced this issue Apr 17, 2020

Resolve some coordination-layer TODOs (elastic#54511)

da3d69a

This commit removes a handful of TODO comments in the cluster coordination layer that no longer apply. Relates elastic#32006

DaveCTurner mentioned this issue May 5, 2020

A write alias targeting multiple indices prevents node startup #56186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new cluster coordination layer #32006

A new cluster coordination layer #32006

ywelsch commented Jul 12, 2018 •

edited

elasticmachine commented Jul 12, 2018

ywelsch commented Apr 24, 2019

A new cluster coordination layer #32006

A new cluster coordination layer #32006

Comments

ywelsch commented Jul 12, 2018 • edited

elasticmachine commented Jul 12, 2018

ywelsch commented Apr 24, 2019

ywelsch commented Jul 12, 2018 •

edited