Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support max_single_primary_size in Resize Action and exposed in ILM #67705

Merged
merged 11 commits into from Jan 29, 2021

Conversation

gaobinlong
Copy link
Contributor

@gaobinlong gaobinlong commented Jan 19, 2021

Relates to #65714.

The main changes are:

  1. Add a max_single_primary_size parameter in Resize Action, it's used to calculate an appropriate shards number of the target index when executing the Shrink API.
  2. Expose the max_single_primary_size parameter in ILM's Shrink Action.
  3. Support the parameter in high level rest client.
  4. Add some description for the parameter in the documents.
  5. Add some tests for the code change.

@dakrone dakrone added :Data Management/ILM+SLM Index and Snapshot lifecycle management >feature v7.12.0 v8.0.0 labels Jan 19, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jan 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@dakrone dakrone self-requested a review January 19, 2021 15:28
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @gaobinlong, I took a look and left some initial thoughts.

I also think we should use max_single_shard_size, because for this particular application, the difference between primaries and replicas is irrelevant (shrink only deals with primary shards anyway), and it makes it a bit shorter and easier to read.

Comment on lines 155 to 158
if (minShardsNum > sourceIndexShardsNum) {
throw new IllegalArgumentException("The target index's shards number[" + minShardsNum +
"] is greater than the source index's shards number[" + sourceIndexShardsNum + "]");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should throw an error, or if it would be better to allow the shrink with the same number of shards as are currently available (basically min(sourceIndexShardsNum, minShardsNum))

I am on the fence about this, since it can make the ILM usage almost useless if time-based rollover is used (imagine a period of high ingestion where an index with 2 primary shards has trouble initially rolling over, but finally does at 200gb, and then tries to shrink to 50gb shards and gets stuck in an error forever because it's trying to "shrink" to 4 shards)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a problem, I have set the target index's shards number to sourceIndexShardsNum if minShardsNum is greater than sourceIndexShardsNum, I have also added a log in the code and some description in the document about this.

}

private CreateIndexRequest targetIndexRequest;
private String sourceIndex;
private ResizeType type = ResizeType.SHRINK;
private Boolean copySettings = true;
private ByteSizeValue maxSinglePrimaryShardSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is missing serialization of this value (with version checks) in the StreamInput constructor and writeTo method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also notice this class is missing a hashCode and equals implementation, which means that we can never compare them. We should remedy that, but it can be done in subsequent work (doesn't have to be done in this PR)

(this is not caught because the tests need to be enhanced to check for transport serialization, that can be done as the subsequent work also)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some serialization code in the StreamInput constructor and writeTo method yet, and
I'm glad to open another PR to add the missing hashCode and equals method.

@joegallo
Copy link
Contributor

@dakrone @gaobinlong should we rename this to max_single_primary_size for consistency with #63026 / #67842 (or alternatively, rename things over there to match here...)?

@dakrone
Copy link
Member

dakrone commented Jan 26, 2021

Yes I think we should rename it to max_single_primary_size so that it matches, consistency is very nice to have. Thanks for mentioning that @joegallo.

@gaobinlong
Copy link
Contributor Author

OK, I will rename the parameter to max_single_primary_size soon.

@gaobinlong
Copy link
Contributor Author

@dakrone ,I've renamed the parameter to max_single_primary_size yet, can you take a look at the change?

@dakrone dakrone self-requested a review January 28, 2021 22:04
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @gaobinlong, this looks much closer, I left some more comments on this! (hopefully my suggestions compile, I did not test them so they may need to be checked)

docs/reference/ilm/actions/ilm-shrink.asciidoc Outdated Show resolved Hide resolved
Comment on lines 50 to 58
The max single primary shard size of the target index, it's used to find an optimum shards number of the target index.
When this parameter is set properly, each shard's storage of the target index will not be greater than the parameter,
while the shards number of the target index still be a factor of the source index's shards number, but if the parameter
is less than the single shard size of the source index, the shards number of the target index will be equal to the source index's shards number.
For example, when this parameter is set to `50GB`, if the source index has `60` primary shards with `100GB` storage, then the
target index will have `2` primary shards, each shard's storage is `50GB`; if the source index has `60` primary shards
with `1000GB` storage, then the target index will have `20` primary shards; if the source index has `60` primary shards
with `4000GB` storage, then the target index will still have `60` primary shards. This parameter is conflict
with `number_of_shards` in the `settings` parameter, either of them can be set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The max single primary shard size of the target index, it's used to find an optimum shards number of the target index.
When this parameter is set properly, each shard's storage of the target index will not be greater than the parameter,
while the shards number of the target index still be a factor of the source index's shards number, but if the parameter
is less than the single shard size of the source index, the shards number of the target index will be equal to the source index's shards number.
For example, when this parameter is set to `50GB`, if the source index has `60` primary shards with `100GB` storage, then the
target index will have `2` primary shards, each shard's storage is `50GB`; if the source index has `60` primary shards
with `1000GB` storage, then the target index will have `20` primary shards; if the source index has `60` primary shards
with `4000GB` storage, then the target index will still have `60` primary shards. This parameter is conflict
with `number_of_shards` in the `settings` parameter, either of them can be set.
The max single primary shard size for the target index. Used to find the optimum number of shards for the target index.
When this parameter is set, each shard's storage in the target index will not be greater than the parameter.
The shards count of the target index will still be a factor of the source index's shards count, but if the parameter
is less than the single shard size in the source index, the shards count for the target index will be equal to the source index's shard count.
For example, when this parameter is set to 50gb, if the source index has 60 primary shards with totaling 100gb, then the
target index will have 2 primary shards, with each shard size of 50gb; if the source index has 60 primary shards
totaling 1000gb, then the target index will have 20 primary shards; if the source index has 60 primary shards
totaling 4000gb, then the target index will still have 60 primary shards. This parameter conflicts
with `number_of_shards` in the `settings`, only one of them may be set.

"warm": {
"actions": {
"shrink" : {
"max_single_primary_size": "50GB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"max_single_primary_size": "50GB"
"max_single_primary_size": "50gb"

Comment on lines 233 to 241
The max single primary shard size of the target index, it's used to find an optimum shards number of the target index.
When this parameter is set properly, each shard's storage of the target index will not be greater than the parameter,
while the shards number of the target index still be a factor of the source index's shards number, but if the parameter
is less than the single shard size of the source index, the shards number of the target index will be equal to the source index's shards number.
For example, when this parameter is set to `50GB`, if the source index has `60` primary shards with `100GB` storage, then the
target index will have `2` primary shards, each shard's storage is `50GB`; if the source index has `60` primary shards
with `1000GB` storage, then the target index will have `20` primary shards; if the source index has `60` primary shards
with `4000GB` storage, then the target index will still have `60` primary shards. This parameter is conflict
with `number_of_shards` in the `settings` parameter, either of them can be set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I think you can copy the changes made in the previous doc (rather than duplicate them here)

Comment on lines 121 to 125
boolean hasMaxSinglePrimarySize = maxSinglePrimarySize != null;
out.writeBoolean(hasMaxSinglePrimarySize);
if (hasMaxSinglePrimarySize) {
maxSinglePrimarySize.writeTo(out);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to

Suggested change
boolean hasMaxSinglePrimarySize = maxSinglePrimarySize != null;
out.writeBoolean(hasMaxSinglePrimarySize);
if (hasMaxSinglePrimarySize) {
maxSinglePrimarySize.writeTo(out);
}
out.writeOptionalWriteable(maxSinglePrimarySize);

Comment on lines 160 to 164
logger.info("By setting max_single_primary_size to [" + maxSinglePrimarySize.toString() +
"], the target index [" + targetIndexName + "] will contain [" + minShardsNum +
"] shards, which will be greater than [" + sourceIndexShardsNum +
"] shards of the source index [" + sourceMetadata.getIndex().getName() +
"], use [" + sourceIndexShardsNum + "] as the shards number of the target index [" + targetIndexName + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cleaned up using the {} substitution

Suggested change
logger.info("By setting max_single_primary_size to [" + maxSinglePrimarySize.toString() +
"], the target index [" + targetIndexName + "] will contain [" + minShardsNum +
"] shards, which will be greater than [" + sourceIndexShardsNum +
"] shards of the source index [" + sourceMetadata.getIndex().getName() +
"], use [" + sourceIndexShardsNum + "] as the shards number of the target index [" + targetIndexName + "]");
logger.info("By setting max_single_primary_size to [{}], the target index [{}] will contain [{}] shards, which will be greater than [{}] shards in the source index [{}], using [{}] for the shard count of the target index [{}]",
maxSinglePrimarySize.toString(), targetIndexName, minShardsNum, sourceIndexShardsNum, sourceMetadata.getIndex().getName(), sourceIndexShardsNum, targetIndexName);

@dakrone
Copy link
Member

dakrone commented Jan 28, 2021

@elasticmachine test this please

@dakrone
Copy link
Member

dakrone commented Jan 29, 2021

@elasticmachine ok to test

@dakrone
Copy link
Member

dakrone commented Jan 29, 2021

Thanks for iterating on this @gaobinlong! It looks good to me!

@dakrone
Copy link
Member

dakrone commented Jan 29, 2021

@elasticmachine update branch

@joegallo joegallo changed the title Support max_single_primary_shard_size in Resize Action and exposed in ILM Support max_single_primary_size in Resize Action and exposed in ILM Jan 29, 2021
@joegallo joegallo merged commit d69c033 into elastic:master Jan 29, 2021
@joegallo
Copy link
Contributor

joegallo commented Feb 1, 2021

7.x backport in #68321

joegallo added a commit that referenced this pull request Feb 1, 2021
joegallo added a commit to joegallo/elasticsearch that referenced this pull request Feb 1, 2021
Adjust ResizeRequest and ShrinkAction serialization now that elastic#68321
has been merged, and re-enable BWC testing.

Related to elastic#67705, elastic#68321, and elastic#68329
joegallo added a commit that referenced this pull request Feb 1, 2021
Adjust ResizeRequest and ShrinkAction serialization now that #68321
has been merged, and re-enable BWC testing.

Related to #67705, #68321, and #68329
alyokaz pushed a commit to alyokaz/elasticsearch that referenced this pull request Mar 10, 2021
alyokaz pushed a commit to alyokaz/elasticsearch that referenced this pull request Mar 10, 2021
Adjust ResizeRequest and ShrinkAction serialization now that elastic#68321
has been merged, and re-enable BWC testing.

Related to elastic#67705, elastic#68321, and elastic#68329
easyice pushed a commit to easyice/elasticsearch that referenced this pull request Mar 25, 2021
easyice pushed a commit to easyice/elasticsearch that referenced this pull request Mar 25, 2021
Adjust ResizeRequest and ShrinkAction serialization now that elastic#68321
has been merged, and re-enable BWC testing.

Related to elastic#67705, elastic#68321, and elastic#68329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >feature Team:Data Management Meta label for data/management team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants