Using model.eval() with batchnorm gives high error #4741

fsalmasri · 2018-01-19T11:09:59Z

I tested my network using model.eval() on one testing element and the result was very high.
I tried to do testing using the same minibatch size as the training and also testing on one batch size without applying eval mode both of them are better than using average values learned from training by using eval() mode.
theoretically I can’t use this way and I can’t justify, Any solution please ?

apaszke · 2018-01-19T11:45:05Z

Note that we're using GitHub issues for bug reports only, and all questions should be posted on our forums.

fsalmasri · 2018-01-19T15:09:21Z

It might be a bug in the framework as well.

apaszke · 2018-01-19T16:01:14Z

That's true, but what you said is often a symptom of a user error or a network specific instability. You're welcome to post it in the forums, but this is definitely not enough information to say that it's a bug or take any steps towards fixing it

fsalmasri · 2018-01-27T19:28:43Z

It seems like a common issue in pytorch forum while no one is answering people concerns and experience.

soumith · 2018-01-27T20:41:50Z

@fsalmasri the same thread on the forums that you replied has a working answer that I wrote in September. Here's a link: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/3?u=smth

I wrote another comment here today to make it clear to you that this is not a software bug. If you have a non-stationary training you will see the issue.

RLisfun · 2018-07-19T22:31:15Z

Always setting the training parameter to True and manually setting momentum to 0 on eval is a workarund that solves this bug in the software.

just add:

    if self.training:
        momentum = self.momentum
    else:
        momentum = 0.

in the forward of _BatchNorm found here:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py

and remove the not in self.training or not self.track_running_stats.

Of course you also need to replace self.momentum by momentum in the return.

Jyouhou · 2018-08-04T22:11:36Z

@soumith
Thanks! It worked for me.

In my case, a higher momentum solved the problem. Maybe the default value of 0.1 is not a good one?

penguinshin · 2018-10-16T22:01:17Z

Is there ever a case where user error would be responsible for having a model learn on single batch during train mode and then have the error be significantly different (worse) when switching to eval mode but applying on the same single batch that was trained on?

ssnl · 2018-10-17T00:06:05Z

@penguinshin Not sure if I understand. Are you saying that you are training on a fixed batch and when switching to eval the error is high? Could you send a small example showing that please? Thank you!

penguinshin · 2018-10-17T00:13:30Z

I’ll send an example over shortly. But yes, I feed a single batch (the same batch) through a batchnorm layer in train mode until the mean of batchnorm layer becomes fixed, and then switch to eval mode and apply on the same batch and I get different results from the train mode, even though the reported batchnorm running mean for both the train and eval trials are the same

…

Sent from my iPhone

On Oct 16, 2018, at 8:06 PM, Tongzhou Wang ***@***.***> wrote: @penguinshin Not sure if I understand. Are you saying that you are training on a fixed batch and when switching to eval the error is high? Could you send a small example showing that please? Thank you! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

penguinshin · 2018-10-17T01:20:58Z

Here's a minimum working example. You may have to run it a few times to get random matrices that work, but it should manifest within a few tries at most.

for momentum in [None, .1, 1]:

     bn = nn.BatchNorm1d(1, momentum=momentum)
     x = torch.rand(180,1,180)

     bn.train()
     print(f'momentum = {momentum} yields {bn(x).mean()} for TRAIN mode')


     bn.eval()
     print(f'momentum = {momentum} yields {bn(x).mean()} for EVAL mode')

ssnl · 2018-10-17T02:20:44Z

@penguinshin hmmm but here you are not running enough times in forward so running_mean and var becomes close to population mean and var though.

penguinshin · 2018-10-17T02:27:08Z

Well I'm not sure what your expectation is with regard to convergence, but shouldn't it work in at least the None or 1 case? Even if you run it for 10000 iterations on the same data, it breaks. The difference is big enough to completely break models that I'm training

ssnl · 2018-10-17T02:37:35Z

Okay. It's just that the code you gave is inconsistent with what you said above, and happen to include 0.1 too. No big deal.

You are missing the Bessel's correction here https://en.wikipedia.org/wiki/Bessel%27s_correction . After including that the result exactly match.

for momentum in [None, 1]:
     torch.manual_seed(32)
     bn = nn.BatchNorm1d(1, momentum=momentum)
     x = torch.rand(180,1,180)

     print(bn.running_var)
     bn.train()
     print(f'momentum = {momentum} yields {bn(x).mean()} for TRAIN mode')
     print(bn.running_var)

     bn.running_var.data.mul_(1 - 1 / (180*180))
     bn.eval()
     print(f'momentum = {momentum} yields {bn(x).mean()} for EVAL mode')

ssnl · 2018-10-17T02:38:27Z

Possibly also the reason why if you train on very small number of data, things break.

penguinshin · 2018-10-17T02:53:31Z

Thank you. That does fix the minimum working example, although for some reason it doesn't fix the problem for my training of the model. So pytorch applies bessel's correction by default? Is there a setting for batchnorm such that the inputs are not normalized by their own batch statistics, but rather the updated version of the running mean/var? Maybe that could be the issue- because I'm still finding that when creating a simple model based on a single batch norm layer + 1 dimensional convolution of a single filter, trained and applied on a single batch of data, the error goes up drastically when switching from train to eval mode.

I will try to provide the full training loop but it will take longer to create.

ssnl · 2018-10-17T03:06:15Z

pytorch applies bessel's correction by default

Yes, that's what the BN paper proposes.

inputs are not normalized by their own batch statistics, but rather the updated version of the running mean/var

You will get wrong gradient this will though. So no.

single batch of data

how many samples do you have in there?

penguinshin · 2018-10-17T03:16:25Z

Sorry for the trouble...

Its the same dimensions as the example above, so I feed a 180x1x180 ultimately. However, I think there is something special I'm doing that might be throwing off the batchnorm, maybe you can tell me if this would negatively impact -

I originally have a 128 x 1 x 360 window. I loop through each of the 128 windows and create 180 windows of width 180, so for example, the first 360 window would be converted into 180 sliding windows of length 180. I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step(). Presumably, using train mode would mean that each of those windows (of the 128) would be batch normalized appropriately within one backward pass.

My guess is that each window's high autocorrelation (since the 180 batches are all from one continuous 360 window) produces a sharp batch statistics, and maybe that is responsible for the issue, but I don't know for sure. I am going to try setting a lower momentum.

ssnl · 2018-10-17T15:41:50Z

I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step().

@penguinshin Here is your problem. Batch norm (and its backward) is not linear in batch size. I.e., f(x) + f(y) != f([x, y]).sum(). Therefore, instead, you should use tensor.unfold https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold to create a batch of 128 windows and activate them through the network once.

penguinshin · 2018-10-18T19:49:48Z

Right, so I fixed that at made it all one single batch- i find that setting momentum for BN to .9999 helps, but doesnt completely solve the problem. (theres still a difference between average train performance using train mode and average train performance using eval). Do you have any suggestions here?

…

On Oct 17, 2018, at 11:42 AM, Tongzhou Wang ***@***.***> wrote: I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step(). @penguinshin <https://github.com/penguinshin> Here is your problem. Batch norm (and its backward) is not linear in batch size. I.e., f(x) + f(y) != f([x, y]).sum(). Therefore, instead, you should use tensor.unfold https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold <https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold> to create a batch of 128 windows and activate them through the network once. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKVpaovEyENALI1CvyETjUD3199UFNRkks5ul0_4gaJpZM4RkUtW>.

EmmaRocheteau · 2020-03-04T17:22:34Z

I am getting an issue like this

tingweii · 2021-01-20T15:59:58Z

I am getting an issue like this QQ

longlifedahan · 2023-02-20T02:47:06Z

Thanks，it really helped me
I am suffering this problem, and tried for a week, but the loss in the eval anad train mode are extremely different
Now I know how to try

longlifedahan · 2023-02-20T02:47:27Z

Thanks，it really helped me
I am suffering this problem, and tried for a week, but the loss in the eval anad train mode are extremely different
Now I know how to try

apaszke closed this as completed Jan 19, 2018

sonnyhu mentioned this issue Nov 14, 2018

.eval mode predicts [2, -2, 2] ThibaultGROUEIX/3D-CODED#3

Closed

dianyancao mentioned this issue Apr 29, 2019

At training time, pytorch batchnorm use biased batch var to normalize input, but running var is updated by unbiased batch var #19902

Closed

Using model.eval() with batchnorm gives high error #4741

Using model.eval() with batchnorm gives high error #4741

Comments

fsalmasri commented Jan 19, 2018

apaszke commented Jan 19, 2018 • edited by soumith Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fsalmasri commented Jan 19, 2018

Uh oh!

apaszke commented Jan 19, 2018

Uh oh!

fsalmasri commented Jan 27, 2018

Uh oh!

soumith commented Jan 27, 2018

Uh oh!

RLisfun commented Jul 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jyouhou commented Aug 4, 2018

Uh oh!

penguinshin commented Oct 16, 2018

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

penguinshin commented Oct 17, 2018 via email

Uh oh!

penguinshin commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

penguinshin commented Oct 17, 2018

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

penguinshin commented Oct 17, 2018

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

penguinshin commented Oct 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 17, 2018

Uh oh!

penguinshin commented Oct 18, 2018 via email

Uh oh!

EmmaRocheteau commented Mar 4, 2020

Uh oh!

tingweii commented Jan 20, 2021

Uh oh!

longlifedahan commented Feb 20, 2023

Uh oh!

longlifedahan commented Feb 20, 2023

Uh oh!

apaszke commented Jan 19, 2018 •

edited by soumith

Loading

RLisfun commented Jul 19, 2018 •

edited

Loading

penguinshin commented Oct 17, 2018 •

edited

Loading

penguinshin commented Oct 17, 2018 •

edited

Loading