Skip to content

Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: beta1_power, beta2_power, ready: None #38

Closed
@Mr-Nineteen

Description

@Mr-Nineteen
Contributor

During training, the optimizer is tf.compat.v1.train.AdamOptimizer, and it appears: "Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: beta1_power, beta2_power, ready: None", the scene is:

  1. variable_scope + partitioner
  2. Remove BN

Activity

rhdong

rhdong commented on Apr 21, 2021

@rhdong
Member

The TrainableWrapper still not works well with variable_scope + partitioner , We will spend long time to push a PR to TensorFlow core to resolve this problem and as an equivalent alternative, you can use partitioner in tf.get_variable directly.

rhdong

rhdong commented on Jun 25, 2021

@rhdong
Member

Adam's _create_slots function is colocate beta1_power and beta2_power the variable with the lowest name in var_list, plz refer to here. When BN is turned off(maybe triggered by some other special conditions), the dynamic_embedding TrainableWrapper in var_list may be brought to the front . When colocate_with TrainableWrapper's the device that is not /job:ps, but /job:worker, resulting in the variable is not initialized. Because we can not change the adam._create_slots in TF and we also had to keep the TrainableWrapper on /job:worker local. Then we recommend the way to avoid this issue is changing the first letter of the name of the tfra.embedding_lookup(name='a-to-z-xxxx') to ‘z’ so that it will be queued to the back. In the furture we will make core to know the existing of tfra.dynamic_embedding.TrainableWrapper as fundamental solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rhdong@Mr-Nineteen@MoFHeka

        Issue actions

          Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: beta1_power, beta2_power, ready: None · Issue #38 · tensorflow/recommenders-addons