[ILM] Shrink action may allocate shards to excluded nodes #64529

jloleysens · 2020-11-03T14:01:21Z

Elasticsearch version (bin/elasticsearch --version): 7.10.0 (and prior at least to 7.8.0)

JVM version (java -version):

openjdk version "12.0.2" 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):

Darwin 19.6.0 Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:

Given the following two configurations:

cluster.routing.allocation.exclude._host: [ node2.dev ]
An ILM policy with a shrink action in either hot or warm phase - let's call it MyPolicy

Shards belonging to indices being managed with MyPolicy may still be assigned to nodes that are excluded from allocation at the cluster level. This seems to specifically be something wrong in the SetSingleNodeAllocateStep of ILM when performing the shrink action.

This step sets index setting settings.index.routing.allocation.require._id to the id of a disallowed node and then ILM is no longer able to perform the rest of the shrink action.

Steps to reproduce:

Start two nodes with:

bin/elasticsearch -Enetwork.host=node1.dev -Ehttp.port=9221 -Epath.data=dir1/data -Epath.logs=dir1/logs
bin/elasticsearch -Enetwork.host=node2.dev -Ehttp.port=9222 -Epath.data=dir2/data -Epath.logs=dir2/logs

Set up a cluster and do the following:

Set cluster settings to:

{
	"transient": {
		"cluster.routing.allocation.exclude._host": "node2.dev",
		"indices.lifecycle.poll_interval": "1s"
	}
}

Create an ILM policy, call it TestPolicy (notice warm has min_age: 1s for testing)

Policy JSON

{
	"policy": {
		"phases": {
			"warm": {
				"min_age": "1s",
				"actions": {
					"allocate": {
						"number_of_replicas": 0,
						"include": {
						},
						"exclude": {
						},
						"require": {
						}
					},
					"forcemerge": {
						"max_num_segments": 1
					},
					"set_priority": {
						"priority": 50
					},
					"shrink": {
						"number_of_shards": 1
					}
				}
			},
			"cold": {
				"min_age": "50d",
				"actions": {
					"allocate": {
						"number_of_replicas": 0,
						"include": {
						},
						"exclude": {
						},
						"require": {
						}
					},
					"freeze": {
					},
					"set_priority": {
						"priority": 10
					}
				}
			},
			"hot": {
				"min_age": "0ms",
				"actions": {
					"set_priority": {
						"priority": 100
					}
				}
			},
			"delete": {
				"min_age": "60d",
				"actions": {
					"delete": {
						"delete_searchable_snapshot": true
					}
				}
			}
		}
	}
}

Create an index template that will assign indices to this policy

Template JSON

{
	"composed_of": [],
	"index_patterns": [
		"mypolicy*"
	],
	"template": {
		"settings": {
			"index": {
				"lifecycle": {
					"name": "TestPolicy"
				},
				"refresh_interval": "3s",
				"number_of_shards": "5",
				"number_of_replicas": "2"
			}
		},
		"mappings": {
		},
		"aliases": {
		}
	}
}

Create an index that will be captured by the index template and watch the logs

NOTES

This happens randomly (per the random selection from determined allowed nodes)
Able to reproduce on 7.10, the behaviour does not surface when removing the exclude._hosts setting

Provide logs (if relevant):

Last few logs after index allocated to a disallowed node:

...
[2020-11-03T14:35:25,832][INFO ][o.e.x.i.IndexLifecycleTransition] [xxxx] moving index [mypolicy-myindex-1] from [{"phase":"warm","action":"shrink","name":"wait-for-shard-history-leases"}] to [{"phase":"warm","action":"shrink","name":"readonly"}] in policy [TestPolicy]
[2020-11-03T14:35:25,957][INFO ][o.e.x.i.IndexLifecycleTransition] [xxx] moving index [mypolicy-myindex-1] from [{"phase":"warm","action":"shrink","name":"readonly"}] to [{"phase":"warm","action":"shrink","name":"set-single-node-allocation"}] in policy [TestPolicy]
[2020-11-03T14:35:26,078][INFO ][o.e.x.i.IndexLifecycleTransition] [xxx] moving index [mypolicy-myindex-1] from [{"phase":"warm","action":"shrink","name":"set-single-node-allocation"}] to [{"phase":"warm","action":"shrink","name":"check-shrink-allocation"}] in policy [TestPolicy]
<END> // we are stuck at this point

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-11-03T14:01:23Z

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

gaobinlong · 2020-11-06T06:28:16Z

By debuging the code, I found that we init a new FilterAllocationDecider in SetSingleNodeAllocateStep which is different from the FilterAllocationDecider contained in the cluster state, because the variables clusterRequireFilters , clusterIncludeFilters and clusterExcludeFilters in FilterAllocationDecider are instance variables， so any changes of the cluster level exclude filters cannot be seen when executing the SetSingleNodeAllocateStep .

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SetSingleNodeAllocateStep.java

Line 49 in f590d4b

    
           private static final AllocationDeciders ALLOCATION_DECIDERS = new AllocationDeciders(List.of(

elasticsearch/server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/FilterAllocationDecider.java

Line 86 in f590d4b

private volatile DiscoveryNodeFilters clusterExcludeFilters;

gaobinlong · 2020-11-07T14:54:25Z

Can we construct a local AllocationDeciders variable in the performAction method of SetSingleNodeAllocateStep, like this:

AllocationDeciders allocationDeciders = new AllocationDeciders(List.of(
            new FilterAllocationDecider(clusterState.getMetadata().settings(), new ClusterSettings(Settings.EMPTY, ClusterSettings.BUILT_IN_CLUSTER_SETTINGS)),
            new NodeVersionAllocationDecider()
        ));

FilterAllocationDecider can be constructed using the cluster settings in the cluster metadata, so we can get the cluster level exclude filters.

dakrone · 2020-11-09T19:23:12Z

@gaobinlong yes I think that is a better solution for this (recreating the deciders in the step body)

Relates to #64529. Currently the cluster filter variables `clusterRequireFilters` , `clusterIncludeFilters` and `clusterExcludeFilters` are non-static, so the new instance of `FilterAllocationDecider` inited in `SetSingleNodeAllocateStep ` in ILM cannot see the changes when updating the `cluster.routing.allocation.exclude._x` settings, and finally ILM will stuck in the shrink action if one excluded node has been selected in the `SetSingleNodeAllocateStep`. `AllocationRoutedStep` has the same issue. This main changes are: 1. Create `AllocationDeciders ` in the main method of `SetSingleNodeAllocateStep ` and `AllocationRoutedStep`, and the `FilterAllocationDecider ` is constructed using the cluster settings in the cluster metadata, so the cluster level filters can be seen when executing the steps. 2. Add some tests for the change.

…#65037) Relates to elastic#64529. Currently the cluster filter variables `clusterRequireFilters` , `clusterIncludeFilters` and `clusterExcludeFilters` are non-static, so the new instance of `FilterAllocationDecider` inited in `SetSingleNodeAllocateStep ` in ILM cannot see the changes when updating the `cluster.routing.allocation.exclude._x` settings, and finally ILM will stuck in the shrink action if one excluded node has been selected in the `SetSingleNodeAllocateStep`. `AllocationRoutedStep` has the same issue. This main changes are: 1. Create `AllocationDeciders ` in the main method of `SetSingleNodeAllocateStep ` and `AllocationRoutedStep`, and the `FilterAllocationDecider ` is constructed using the cluster settings in the cluster metadata, so the cluster level filters can be seen when executing the steps. 2. Add some tests for the change.

dakrone · 2020-12-08T23:13:01Z

This was resolved by @gaobinlong in #65037, so I'm closing this for now.

…#65037) Relates to elastic#64529. Currently the cluster filter variables `clusterRequireFilters` , `clusterIncludeFilters` and `clusterExcludeFilters` are non-static, so the new instance of `FilterAllocationDecider` inited in `SetSingleNodeAllocateStep ` in ILM cannot see the changes when updating the `cluster.routing.allocation.exclude._x` settings, and finally ILM will stuck in the shrink action if one excluded node has been selected in the `SetSingleNodeAllocateStep`. `AllocationRoutedStep` has the same issue. This main changes are: 1. Create `AllocationDeciders ` in the main method of `SetSingleNodeAllocateStep ` and `AllocationRoutedStep`, and the `FilterAllocationDecider ` is constructed using the cluster settings in the cluster metadata, so the cluster level filters can be seen when executing the steps. 2. Add some tests for the change.

jloleysens added >bug :Data Management/ILM+SLM Index and Snapshot lifecycle management needs:triage Requires assignment of a team area label labels Nov 3, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Nov 3, 2020

gaobinlong mentioned this issue Nov 6, 2020

Make filter variables static in FilterAllocationDecider #64685

Closed

dakrone removed the needs:triage Requires assignment of a team area label label Nov 9, 2020

gaobinlong mentioned this issue Nov 14, 2020

Create AllocationDeciders in the main method of the ILM step #65037

Merged

dakrone closed this as completed Dec 8, 2020

gaobinlong mentioned this issue Jan 7, 2021

[ILM] Shrink action may allocate shards to excluded _tier nodes #67133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ILM] Shrink action may allocate shards to excluded nodes #64529

[ILM] Shrink action may allocate shards to excluded nodes #64529

jloleysens commented Nov 3, 2020 •

edited

elasticmachine commented Nov 3, 2020

gaobinlong commented Nov 6, 2020 •

edited

gaobinlong commented Nov 7, 2020 •

edited

dakrone commented Nov 9, 2020

dakrone commented Dec 8, 2020

[ILM] Shrink action may allocate shards to excluded nodes #64529

[ILM] Shrink action may allocate shards to excluded nodes #64529

Comments

jloleysens commented Nov 3, 2020 • edited

elasticmachine commented Nov 3, 2020

gaobinlong commented Nov 6, 2020 • edited

gaobinlong commented Nov 7, 2020 • edited

dakrone commented Nov 9, 2020

dakrone commented Dec 8, 2020

jloleysens commented Nov 3, 2020 •

edited

gaobinlong commented Nov 6, 2020 •

edited

gaobinlong commented Nov 7, 2020 •

edited