You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During training, the optimizer is tf.compat.v1.train.AdamOptimizer, and it appears: "Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: beta1_power, beta2_power, ready: None", the scene is:
The TrainableWrapper still not works well with variable_scope + partitioner , We will spend long time to push a PR to TensorFlow core to resolve this problem and as an equivalent alternative, you can use partitioner in tf.get_variable directly.
Adam's _create_slots function is colocate beta1_power and beta2_power the variable with the lowest name in var_list, plz refer to here. When BN is turned off(maybe triggered by some other special conditions), the dynamic_embedding TrainableWrapper in var_list may be brought to the front . When colocate_with TrainableWrapper's the device that is not /job:ps, but /job:worker, resulting in the variable is not initialized. Because we can not change the adam._create_slots in TF and we also had to keep the TrainableWrapper on /job:worker local. Then we recommend the way to avoid this issue is changing the first letter of the name of the tfra.embedding_lookup(name='a-to-z-xxxx') to ‘z’ so that it will be queued to the back. In the furture we will make core to know the existing of tfra.dynamic_embedding.TrainableWrapper as fundamental solution.
Waiting for model to be ready. Ready_for_local_init_op: Variables not initialized: beta1_power, beta2_power, ready: None · Issue #38 · tensorflow/recommenders-addons
Activity
rhdong commentedon Apr 21, 2021
The
TrainableWrapper
still not works well with variable_scope + partitioner , We will spend long time to push a PR to TensorFlow core to resolve this problem and as an equivalent alternative, you can use partitioner intf.get_variable
directly.rhdong commentedon Jun 25, 2021
Adam
's_create_slots
function is colocatebeta1_power
andbeta2_power
the variable with the lowest name invar_list
, plz refer to here. When BN is turned off(maybe triggered by some other special conditions), the dynamic_embedding TrainableWrapper in var_list may be brought to the front . Whencolocate_with
TrainableWrapper's the device that is not/job:ps
, but/job:worker
, resulting in the variable is not initialized. Because we can not change the adam._create_slots
in TF and we also had to keep theTrainableWrapper
on/job:worker
local. Then we recommend the way to avoid this issue is changing the first letter of the name of thetfra.embedding_lookup(name='a-to-z-xxxx')
to ‘z’ so that it will be queued to the back. In the furture we will make core to know the existing oftfra.dynamic_embedding.TrainableWrapper
as fundamental solution.