Fix unnecessary reloading of the pre-trained model in EVAL and PREDICT #450

peidaqi · 2019-02-22T16:36:02Z

In the current implementation, the pre-trained model will be loaded not only in TRAIN, but also in EVAL and PREDICT.
This is not necessary as the weights will be overwritten anyway when the Estimator framework loads previous check points from TRAIN in EVAL and PREDICT phases.

A RunHook was used to provide a clean solution to load the pre-trained model, as in this way the model_fn() is left with only the graph construction code. By passing the RunHook to estimator.train(), it ensures the pre-trained model gets loaded only during TRAIN. Since the RunHook will run on CPU, there's no longer needs for scaffolding, either.

…T phases. RunHook was used to provide a clean implementation.

googlebot · 2019-02-22T16:36:06Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

- Removed unnecessary parameter "init_checkpoint" in model_fn_builder() - Updated indentation to 2 spaces to be consistent with Google's coding style.

ntson2002 · 2019-03-04T03:35:30Z

@peidaqi Thank for the code. However, when I run your code, the inference time (prediction phase) seems not to be reduced. Do you have any idea about this ?

Below is the time logging in the prediction function:

-- Your run_classifier.py
time 1: 2019-03-04 10:28:27.399931
time 2: 2019-03-04 10:28:27.399948
time 3: 2019-03-04 10:28:37.916306

-- Original run_classsifier.py
time 1-oringial: 2019-03-04 10:33:06.684124
time 2-oringial: 2019-03-04 10:33:06.684171
time 3-oringial: 2019-03-04 10:33:17.847752

        import datetime
        print("time 1: ", datetime.datetime.now())
        result = estimator.predict(input_fn=predict_input_fn)
        print("time 2: ", datetime.datetime.now())
        output_predict_file = os.path.join(FLAGS.output_dir, "test_results.tsv")
        with tf.gfile.GFile(output_predict_file, "w") as writer:
            num_written_lines = 0
            tf.logging.info("***** Predict results *****")
            for (i, prediction) in enumerate(result):
                probabilities = prediction["probabilities"]
                if i >= num_actual_predict_examples:
                    break
                output_line = "\t".join(
                    str(class_probability)
                    for class_probability in probabilities) + "\n"
                writer.write(output_line)
                num_written_lines += 1
        assert num_written_lines == num_actual_predict_examples
        print("time 3: ", datetime.datetime.now())

peidaqi · 2019-03-04T14:43:32Z

@ntson2002 Not sure what you mean by not reduced, but it seems to me the time difference in your reports show the new run_classifer.py is indeed faster - thought only by a small margin.
Time 3 - Time 2:
New run_classifier - 10.516358
Original run_classifier - 11.163581

estimator.predict returns a generator so I imagine most time would be spent with the Python facilities and loops. Also your code has lots of I/O per loop that takes lots of time, compared to the one-off loading of the pre-trained model, which could make the difference insignificant.

I'd suggest testing with estimator.eval and remove all those I/O operations. After all, this is only a small fix, and if you have fast SSDs (seems to be in your case?) the performance gain would not be very big.

peidaqi · 2019-04-29T20:12:09Z

@googlebot I signed it!

Fix unnecessary reloading of the pre-trained model in EVAL and PREDIC…

81d5352

…T phases. RunHook was used to provide a clean implementation.

Cleaned up the code a bit:

06ef4ad

- Removed unnecessary parameter "init_checkpoint" in model_fn_builder() - Updated indentation to 2 spaces to be consistent with Google's coding style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unnecessary reloading of the pre-trained model in EVAL and PREDICT #450

Fix unnecessary reloading of the pre-trained model in EVAL and PREDICT #450

peidaqi commented Feb 22, 2019

googlebot commented Feb 22, 2019

ntson2002 commented Mar 4, 2019 •

edited

Loading

peidaqi commented Mar 4, 2019

peidaqi commented Apr 29, 2019

Fix unnecessary reloading of the pre-trained model in EVAL and PREDICT #450

Are you sure you want to change the base?

Fix unnecessary reloading of the pre-trained model in EVAL and PREDICT #450

Conversation

peidaqi commented Feb 22, 2019

googlebot commented Feb 22, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

ntson2002 commented Mar 4, 2019 • edited Loading

peidaqi commented Mar 4, 2019

peidaqi commented Apr 29, 2019

ntson2002 commented Mar 4, 2019 •

edited

Loading