-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[FLINK-11094] [rocksdb] Make non-accessed restored state in RocksDBStateBackend snapshottable #7264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+1,229
−579
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… tests should fail if exception isn't thrown This commit strengthens tests in StateBackendMigrationTestBase that depend on a certain state operation (restoring state, accessing state, etc.) to be failing to assert correct behaviour. However, we previously do not really fail the test if no exception was thrown when there should be. This also caught some bugs in the test itself which had the tests verifying incorrect behaviour.
…ackend should always be compatible Previously, we were only checking if the new namespace serializer is incompatible, while properly we should be checking that it is strictly compatible. This doesn't affect any user expected behaviour, since the namespace serializer is never exposed to users.
…ore understandable The StateBackendMigrationTestBase previously mocked user behaviour of upgrading serializers by using a single serializer class, that can be configured with different target compatibility results when they are checked for compatibility. This is a bit hard to understand, also doesn't really reflect how a user would actually approach the feature. Instead, instead of configuring a single serializer class with different compatibility "personalities", this commit uses actual different classes, V1TestTypeSerializer, V2TestTypeSerializer, and IncompatibleTestTypeSerializer, to simulate the compatibility cases in tests. This commit also refactors the serializer migration related test serializers / types / snapshots to the testutil package, so that it can be shared by other state migration related tests in the future.
A StateSerializerProvider wraps logic on how to obtain serializers for registered state, either with the previous schema of state in checkpoints or the current schema of state.
…in RocksDBKeyedStateBackend This ensures that all restored state, even non-accessed ones after the restore, have a meta info available on future snapshots.
…erProviders This commit lets all state meta info subclasses use StateSerializerProviders to replace direct serializer instances. This allows meta infos that were instantiate with restored serializer snapshots to not eagerly access the restore serializer when restoring state. This needs to be avoided since when restoring from 1.6, the restore serializer might not be available; for RocksDB, this should be tolerable. This further coherents with the principle of not accessing the restore serializer unless neccessary.
… map from RocksDBKeyedStateBackend With the fact that now restored serializer snapshots are held by their eagerly created meta infos in the kvStateInformation map, we no longer need to keep a separate map of restored meta info snapshots. As can be seen in the changes, that map is no longer queried anywhere.
…apKeyedStateBackend Likewise in the RocksDBKeyedStateBackend, since restored snapshots are now part of the registered meta info map, we no longer need this map. This commit also reflects the principal changes we made to the RegisteredKeyValueStateMetaInfo class in the HeapKeyedStatBackend.
…faultOperatorStateBackend Likewise in the RocksDBKeyedStateBackend, since restored snapshots are now part of the registered meta info map, we no longer need these maps. This commit also reflects the principal changes we made to the RegisteredKeyValueStateMetaInfo class in the DefaultOperatorStateBackend.
…cessed restored state
…izer(...) from StateMetaInfoSnapshot That method is no longer used after the series of changes in FLINK-11094. This also corresponds to the new principle that the restore serializer is only accessed, when the state backends attempt request it from their state meta infos. We do not create restore serializers eagerly when creating meta infos from a StateMetaInfoSnapshot.
…alizer snapshot in StateMetaInfoSnapshot This corresponds to the abstraction rework of retiring TypeSerializerConfigSnapshot to be replaced by TypeSerializerSnapshot. The related fields / methods should not mention "config" anymore.
3c1bbbe
to
e8acc70
Compare
Hey @tzulitai, thanks for addressing this issue! |
tzulitai
added a commit
to tzulitai/flink
that referenced
this pull request
Dec 11, 2018
… map for restored StateMetaInfoSnapshots Since now all restored state meta info snapshots are handled so that we always eagerly create the corresponding RegisteredStateMetaInfoBase for it, the information is already part of the registered state infos map. As can be seen in the changes, those maps are no longer queried and can therefore be safely removed. This closes apache#7264.
tzulitai
added a commit
to tzulitai/flink
that referenced
this pull request
Dec 11, 2018
… map for restored StateMetaInfoSnapshots Since now all restored state meta info snapshots are handled so that we always eagerly create the corresponding RegisteredStateMetaInfoBase for it, the information is already part of the registered state infos map. As can be seen in the changes, those maps are no longer queried and can therefore be safely removed. This closes apache#7264.
asfgit
pushed a commit
that referenced
this pull request
Dec 11, 2018
… map for restored StateMetaInfoSnapshots Since now all restored state meta info snapshots are handled so that we always eagerly create the corresponding RegisteredStateMetaInfoBase for it, the information is already part of the registered state infos map. As can be seen in the changes, those maps are no longer queried and can therefore be safely removed. This closes #7264.
tisonkun
pushed a commit
to tisonkun/flink
that referenced
this pull request
Jan 17, 2019
… map for restored StateMetaInfoSnapshots Since now all restored state meta info snapshots are handled so that we always eagerly create the corresponding RegisteredStateMetaInfoBase for it, the information is already part of the registered state infos map. As can be seen in the changes, those maps are no longer queried and can therefore be safely removed. This closes apache#7264.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
The ultimate goal of this PR is to fix a bug where NPE is thrown if a non-accessed restored state in
RocksDBKeyedStatebackend
was snapshotted.This was an overlooked issue caused by changes in FLINK-10679.
The problem is that in that change, in the
RocksDBKeyedBackend
,RegisteredStateMetaInfoBase
s were no longer created eagerly for all restored state, but instead only lazily created when the state was accessed again by the user. This causes non-accessed restored state to have empty meta info, and throws NPE when trying to take a snapshot of them.The rationale behind FLINK-10679 was that, since
RegisteredStateMetaInfoBase
holds already serializer instances for state access, creating them eagerly at restore time with restored serializer snapshots did not make sense (because at that point-in-time, we do not have the new serializers yet for state access; the snapshot is only capable of creating the previous state serializer).This PR fixes this by letting state meta infos hold a
StateSerializerProvider
, instead of serializer instances. This allows meta infos to be instantiated from snapshots without eagerly accessing the restore serializer.The API for the
StateSerializerProvider
is as follows:A StateSerializerProvider can be created either from:
For 1), state that has not been accessed yet after the restore will return the same serializer (i.e. the serializer for previous schema) for both
currentSchemaSerializer
andpreviousSchemaSerializer
, meaning that since no new serializer has been registered, the schema remains the same anyways.Once a restored state is re-accessed, then
registerSerializerForRestoredState(TypeSerializer<T> newSerializer)
should be used to update what serializer the provider returns incurrentSchemaSerializer
.Brief change log
Main changes are:
StateBackendMigrationTestBase
.StateSerializerProvider
abstraction and added tests for itStateSerializerProvider
instead of serializer instancerestoredXXStateMetaInfos
maps from all state backends. They can be removed because all restored meta infos are now implicitly part of the eagerly createdRegisteredStateMetaInfoBase
s (which is kept in theregisteredXXXMetaInfo
maps in all state backends).Verifying this change
This PR adds a new test
StateBackendTestBase.testSnapshotNonAccessedState()
to cover the issue.The test fails for RocksDB state backend case without the fixes.
Also, unit tests are added for
StateSerializerProvider
, inStateSerializerProviderTest
.All existing migration related tests in
StateBackendMigrationTestBase
are also related and should not be failing.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)