-
Notifications
You must be signed in to change notification settings - Fork 578
RFC: SavedModel Save/Load in 2.x #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is another use-case which I'm not sure has been covered by this proposal: make large code modifications/refactor and restore the weights/states only (but not restore the computation). class Net():
def __init__(self):
self.W1 = tf.Variable(...)
self.b1 = tf.Variable(...)
self.W2 = tf.Variable(...)
self.b2 = tf.Variable(...)
@tf.function
def __call__(self, x):
return self.W2 * (self.W1 * x + self.b1) + self.b2 And after a while I want to refactor it into: class Net():
def __init__(self):
self.L1 = Layer()
self.L2 = Layer()
@tf.function
def __call__(self, x):
return self.L2(self.L1(x))
class Layer():
def __init__(self):
self.W = tf.Variable(...)
self.b = tf.Variable(...)
@tf.function
def __call__(self, x):
return self.W * x + b This is easily supported by |
@ppwwyyxx I do mention the training checkpoint use-case in the introduction. After that it's not mentioned much mostly because it's already implemented.
|
@allenlavoie I'm aware of the paragraph, but "using attribute names in objects" is exactly what I'm concerned about. In the code example I gave above, due to refactoring, the object hierarchy has changed and therefore the attributes are different. How should users restore an old model after the code has been changed like the way above? |
@ppwwyyxx one option with object-based checkpointing is to construct a checkpoint-compatible structure and a new structure which both reference the same variable objects, then load the checkpoint into the checkpoint compatible structure. Which is roughly equivalent to passing a dictionary of names to I wrote a design document for |
Thanks! From this solution, it sounds to me that, in order to load an old checkpoint, the legacy code has to be kept after refactoring, right? This would be against the goal of refactoring. And users will still need the extra logic to "call old I agree to some extent that this is roughly like passing a dict of names to Another complexity compared to Because of this, I would disagree that "using attribute names is more flexible than using scope names", since attribute names are more closely tied to code structure. In fact, using attribute names to save models is a major headache I have had with PyTorch. |
It doesn't require a legacy implementation of the model, it just requires the same structure. This can be constructed of dummy
I've broken many models by refactoring, so I have to disagree with you here. Scopes are pretty notorious for being fragile. If naming each variable explicitly is an option, it's easy with the object-based API to save a dictionary of variables and nothing else. Mostly we needed to get rid of collections and
I'm curious, a headache compared to what? There's almost a one-to-one mapping between object relationships and the scope-based naming that was common in v1. I agree that refactoring can still break checkpoints with the object-based API, but I think our main disagreement is over whether name-based APIs have any advantages there. |
Thanks for the clarification! Now it makes more sense. Using a dummy Checkpoint object with only the attributes would be quite easy. If I'm not mistaken, to load an old model in the examples I gave above, I'll need to: new = NewNet()
old = DummyEmptyCheckpoint()
old.W1 = new.L1.W
old.W2 = new.L2.W
old.b1 = new.L1.b
old.b2 = new.L2.b
# or perhaps
# old = DummyEmptyCheckpoint(W1=new.L1.W, W2=new.L2.W, b1=new.L1.b, b2=new.L2.b)
old.restore('checkpoint') which is awesome and has the same complexity as
Compared to scope-based APIs, and mainly because code refactoring does not usually result in incompatibilities with scope-based API. I've given one example above. Another example that's more common: class OldNet():
def __init__(self):
self.W1 = tf.Variable(...)
self.b1 = tf.Variable(...)
self.W2 = tf.Variable(...)
self.b2 = tf.Variable(...)
def __call__(self, x):
return tf.nn.conv2d(self.W2, self.W1 * x + self.b1) + self.b2
class NewNet():
def __init__(self):
self.L1 = MyFC()
self.L2 = MyConv()
def __call__(self, x):
return self.L2(self.L1(x)) In this refactoring, variable names are unchanged as long as we don't enter a new scope in |
@ppwwyyxx I've seen a lot of code for which this is not true, and the failure is extremely vexing. Consider this (old-style) code:
If I change the last bit to
does this break a checkpoint? The answer is, maybe. The problem is that it is impossible to tell without looking at the code for |
@martinwicke I can't see the rest of your post :) Of course both methods will have annoying incompatibilities. What I meant to say is that scope-based may have fewer incompatibilities (in some examples I gave) but you can probably found counter examples as well. |
@ppwwyyxx Sorry, my bad, I updated the comment. I just wanted to give an example where scope-naming goes wrong. We've seen it a lot, and I strongly believe the object-based naming will be significantly more robust to refactoring. |
Thanks! Very interesting example! |
One question I have is whether it'd be possible to somehow hook into and save/load things into (1) Save some Python attribute variables, unrelated to the tensorflow graph, (which we'd like to deserialize appropriately on loading). This might be possible via the "custom revived types" but it's not entirely clear to me. Right now, we have a thin wrapper to save/load tensorflow graphs into saved models (which is totally fine and not something we have a problem with), but it'd be kind of nice to make this pluggable at the base! |
@alanhdu Interesting. Those do seem like reasonable things to support somehow. I haven't given it too much thought yet, but one option is to do automatic (limited) serialization for object attributes which contain only basic types like nests of list/tuple/dict/int/string/float. We already need to support saving these types for Python-only function arguments (used for dispatch, e.g. |
@allenlavoie After an instance of a subclass of |
@zakizhou generic |
We reference a bunch of things in the MetaGraph/GraphDef, so it makes sense to add it there rather than to the SavedModel directly. This is in preparation for non-experimental tf.saved_model.save/load symbols. We don't yet have an exposed symbol for loading object-based SavedModels, so this CL won't break anyone (despite moving around the proto and not checking the old location). RFC: tensorflow/community#34 PiperOrigin-RevId: 234887195
Hi! I don't know if this is the write thread to post this in, but I've been playing around with the TF 2.0 APIs and have run into a problem with the saving and loading Our actual use-case is to deploy RNNs working on streams of data. At inference time, we need to (1) be able to change the batch size between calls and (2) save the output state from one call to feed into the input state to another. In TF 1.0, we handled this by explicitly passing in our input states and explicitly capturing the output states using a simple Python dictionary mapping
Is it possible to either loosen this restriction, or to allow |
Dynamic variables should still work as long as the variable is created inside a
(Works for me running 2.0.0-dev20190226 on Python 3.6) But getting TensorFlow Serving (and TensorFlow's C++ loader API) to allow nested structures is a good idea too. One thought (I forget whose) is that signatures could go away completely at some point (TF 4.x? We maintain SavedModel compatibility for at least one extra major version, and are exporting/consuming in 2.x.). Then Serving would support the same nested structures as tf.saved_model.load. Plenty of details to work out (how do you specify those in a request?) but I think it's workable. And we certainly don't need to wait for signatures to go away before allowing Serving to access arbitrary tf.saved_model.save functions. |
@allenlavoie Ah -- by "dynamically sized" I meant something like: class DyanmicSized(tf.Module):
@tf.function
def reset_state(self, batch_size):
self.batch_size = tf.Variables(tf.zeros(batch_size, 32)) (that is, a lstm.reset_states(batch_size=64)
lstm(batch1) # training
lstm.reset_states(batch_size=8)
lstm(batch2) # inference
👍 to that, although I think all our particular use case needs is to somehow map the Python nested structure to TF names, so we know what to feed into the C++ API. |
That's what the snippet I posted does. The function just needs some indication of what the shapes should be ( |
@allenlavoie Ah... I see. So you can get dynamic shapes by setting it via (I should also say that for me, on Python 3.7 and WARNING: Logging before flag parsing goes to stderr.
W0227 12:38:06.226005 140243216058176 tf_logging.py:161] Entity <method-wrapper '__call__' of weakref object at 0x7f8ce614c4f8> could not be transformed and will be staged without change. Error details can be found in the logs when running with the env variable AUTOGRAPH_VERBOSITY >= 1. Please report this to the AutoGraph team. Cause: Object conversion is not yet supported. If you are trying to convert code that uses an existing object, try including the creation of that object in the conversion. For example, instead of converting the method of a class, try converting the entire class instead. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/autograph/README.md#using-the-functional-api for more information. That said, if this works then I think this should work for our use case (although we still might run into specifying |
@allenlavoie After saving a model in Tensorflow 2.0 using Below is how I created the model
Now I want to save the model and load it back. However, the loaded object is no longer a
What's the correct way to use |
Yep, we won't have re-wrapping into Keras types implemented for the alpha. But if you're exporting a Model, the forward pass is available as a signature. I have a guide to 2.x SavedModel pending review. |
@allenlavoie Thanks, it works! |
@goldiegadde added review notes. Does it need anything else? |
Hi @allenlavoie , will |
@zakizhou I believe that's still the plan. |
|
So, I can retrain, serving with saved model in the future? And how can i use the saved model with different device and distribution strategy? More detailed use case will be preferred. |
@zh794390558 yes on the first question; see the 2.x guide to SavedModel. There isn't yet any integration between distribution strategies and SavedModels. Happy to give pointers if you or someone else is interested in starting on that; it's on my list, but that's a very long list right now. |
Gentle ping @goldiegadde . I changed the status. Is this missing anything else? |
* RFC for SavedModel Save/Load in 2.x * Minor edits and a discussion topic for load() with multiple MetaGraphs * Tweak to the "Imported representations of signatures" section * Update "Importing existing SavedModels" with the .signatures change * Update RFC and add review notes * Status -> accepted
* Adding a doc to deprecate collections * Responding to Karmels comments * Minor fix to VariableTracker sample code * RFC for random numbers in TensorFlow 2.0 * Changes after some feedback * Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section. * Some changes after feedback * A tweak * Change after feedback * A tweak * changes * changes * fix link * new-rfc * changes * Update rfcs/20181225-tf-backend.md Co-Authored-By: alextp <apassos@google.com> * Added some considerations about tf.function * Renamed the internal name "op_generator" to "global_generator" * Changed seed size from 256 to 1024 bits * Initial signpost for community meetings Adding this so there is basic information about how to find the community calendar and get invited to meetings. * Add iCal link too * changes * Initial version of embedding and partitioned variable RFC. * Fix one formatting issue. * Fix another formatting issue. * Use markdown language for the table instead of HTML. * Add tensorflow/io R Package CRAN release instructions (tensorflow#53) * Added Design Review Notes * Make clear distinction between embedding variables and loadbalancing variables. * Added decisions below each question, and "how to use generators with distribution strategies". * Adopted Dong Lin's suggestions * Add a paragraph pointing out the problem with the `partition_strategy` argument. * RFC: Move from tf.contrib to addons (tensorflow#37) * Checkpoint addons RFC for review * Add code review to RFC Add future pull request information to criteria Update modified date added some description RFC Move to addons * Add weight decay optimizers * Remove conv2d_in_plane * Add group_norm * Accept addons RFC * Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels. * Add a section for saving and restore `PartitionedVariable`. * Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism. * Create README.md (tensorflow#57) * Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted" * Updated timestamp * Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__` * Update according to the discussion * Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control; * Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves; * Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0; * We want to prioritize shuffling embedding after 2.0 release; * We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions. * Update relese binary build command for sig-io (tensorflow#58) This PR updates relese binary build command for sig-io Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add Bryan to SIG IO release team (tensorflow#59) * Change to accepted * Add link to TensorFlow IO R package * Updated link for the friction log. (tensorflow#64) * Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63) * RFC: Attention for Dense Networks on Keras (tensorflow#54) * Design review for "Attention for Dense Networks" * RFC: Stateful Containers with tf.Module (tensorflow#56) * Create 20190117-tf-module.md * Update 20190117-tf-module.md * Loosen return type for variable properties. * Use Dense consistently. Thanks brilee@ for spotting! * Remove convert_to_tensor from examples. This wasn't ever required and including it might cause confusion. h/t pluskid@ gehring@ and awav@ * Remove owned_* methods. * Document `_flatten` See tensorflow/tensorflow@5076adf6 for more context. * Fix typo in module name. Thanks k-w-w@! * Update 20190117-tf-module.md * RFC: New tf.print (tensorflow#14) * New tf.print proposal * Attempt to fix table of contents * Removed not-working TOC label * Minor updates to the doc. * Update tf.print to be accepted * Added design review notes * Marking doc as accepted * Update cond_v2 design doc (tensorflow#70) * Update to bring in line with implementation * Added the symbol map to the RFC. * Updated testing section of the Community site. * Removed the 100%, formatting tweaks. * Update CHARTER.md * Change contact email address I will leave my current company soon, so update my email. * Create README.md * Logos for SIGs * Update README.md * Update addons owners (tensorflow#85) Add Yan Facai as another project lead. * Created a FAQ for TF 2.0. (tensorflow#78) Adding 2.0 related FAQ to the Testing group. * Request and charter for SIG JVM (tensorflow#86) Chartering docs for SIG JVM * Update CODEOWNERS Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm. * Update CODEOWNERS Add missing / * Update CODEOWNERS Add @dynamicwebpaige as owner for sigs/testing/ * Update RFC with current information (tensorflow#89) Make current to SIG Addons * RFC: TF on Demand Project (tensorflow#69) * Adding an RFC for TF on Demand Project. * modified one line in tf-on-demand md file. * Changing RFC status from PROPOSED to ACCEPTED. * RFC: SavedModel Save/Load in 2.x (tensorflow#34) * RFC for SavedModel Save/Load in 2.x * Minor edits and a discussion topic for load() with multiple MetaGraphs * Tweak to the "Imported representations of signatures" section * Update "Importing existing SavedModels" with the .signatures change * Update RFC and add review notes * Status -> accepted * Update CHARTER.md New leads. * Update 20180920-unify-rnn-interface.md (tensorflow#81) Typo fix. * Update yyyymmdd-rfc-template.md Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way. * Update while_v2 design doc (tensorflow#71) * Update while_v2 design doc, include link to implementation * Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94) * CLN: update sig addons logo png (tensorflow#99) * Add SIG Keras Add a reference link to Keras' governance repository for SIG Keras. * RFC: String Tensor Unification (tensorflow#91) * RFC: String Tensor Unification * Updated rfcs/20190411-string-unification.md Updated TFLite sections to address feedback from @jdduke. Marked as Accepted. * Start RFC for tensor buffers
Review period closes 2018-12-04
SavedModel Save/Load in 2.x
Objective: provide an API for serialization/deserialization in TF-2.0 that supports both serving and reuse use-cases.