Skip to content

Using model.eval() with batchnorm gives high error #4741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fsalmasri opened this issue Jan 19, 2018 · 24 comments
Closed

Using model.eval() with batchnorm gives high error #4741

fsalmasri opened this issue Jan 19, 2018 · 24 comments

Comments

@fsalmasri
Copy link

I tested my network using model.eval() on one testing element and the result was very high.
I tried to do testing using the same minibatch size as the training and also testing on one batch size without applying eval mode both of them are better than using average values learned from training by using eval() mode.
theoretically I can’t use this way and I can’t justify, Any solution please ?

@apaszke
Copy link
Contributor

apaszke commented Jan 19, 2018

Note that we're using GitHub issues for bug reports only, and all questions should be posted on our forums.

@apaszke apaszke closed this as completed Jan 19, 2018
@fsalmasri
Copy link
Author

It might be a bug in the framework as well.

@apaszke
Copy link
Contributor

apaszke commented Jan 19, 2018

That's true, but what you said is often a symptom of a user error or a network specific instability. You're welcome to post it in the forums, but this is definitely not enough information to say that it's a bug or take any steps towards fixing it

@fsalmasri
Copy link
Author

It seems like a common issue in pytorch forum while no one is answering people concerns and experience.

@soumith
Copy link
Member

soumith commented Jan 27, 2018

@fsalmasri the same thread on the forums that you replied has a working answer that I wrote in September. Here's a link: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/3?u=smth

I wrote another comment here today to make it clear to you that this is not a software bug. If you have a non-stationary training you will see the issue.

@RLisfun
Copy link

RLisfun commented Jul 19, 2018

Always setting the training parameter to True and manually setting momentum to 0 on eval is a workarund that solves this bug in the software.

just add:

    if self.training:
        momentum = self.momentum
    else:
        momentum = 0.

in the forward of _BatchNorm found here:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/batchnorm.py

and remove the not in self.training or not self.track_running_stats.

Of course you also need to replace self.momentum by momentum in the return.

@Jyouhou
Copy link

Jyouhou commented Aug 4, 2018

@soumith
Thanks! It worked for me.

In my case, a higher momentum solved the problem. Maybe the default value of 0.1 is not a good one?

@penguinshin
Copy link

Is there ever a case where user error would be responsible for having a model learn on single batch during train mode and then have the error be significantly different (worse) when switching to eval mode but applying on the same single batch that was trained on?

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

@penguinshin Not sure if I understand. Are you saying that you are training on a fixed batch and when switching to eval the error is high? Could you send a small example showing that please? Thank you!

@penguinshin
Copy link

penguinshin commented Oct 17, 2018 via email

@penguinshin
Copy link

penguinshin commented Oct 17, 2018

Here's a minimum working example. You may have to run it a few times to get random matrices that work, but it should manifest within a few tries at most.

for momentum in [None, .1, 1]:

     bn = nn.BatchNorm1d(1, momentum=momentum)
     x = torch.rand(180,1,180)

     bn.train()
     print(f'momentum = {momentum} yields {bn(x).mean()} for TRAIN mode')


     bn.eval()
     print(f'momentum = {momentum} yields {bn(x).mean()} for EVAL mode')

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

@penguinshin hmmm but here you are not running enough times in forward so running_mean and var becomes close to population mean and var though.

@penguinshin
Copy link

Well I'm not sure what your expectation is with regard to convergence, but shouldn't it work in at least the None or 1 case? Even if you run it for 10000 iterations on the same data, it breaks. The difference is big enough to completely break models that I'm training

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

Okay. It's just that the code you gave is inconsistent with what you said above, and happen to include 0.1 too. No big deal.

You are missing the Bessel's correction here https://en.wikipedia.org/wiki/Bessel%27s_correction . After including that the result exactly match.

for momentum in [None, 1]:
     torch.manual_seed(32)
     bn = nn.BatchNorm1d(1, momentum=momentum)
     x = torch.rand(180,1,180)

     print(bn.running_var)
     bn.train()
     print(f'momentum = {momentum} yields {bn(x).mean()} for TRAIN mode')
     print(bn.running_var)

     bn.running_var.data.mul_(1 - 1 / (180*180))
     bn.eval()
     print(f'momentum = {momentum} yields {bn(x).mean()} for EVAL mode')

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

Possibly also the reason why if you train on very small number of data, things break.

@penguinshin
Copy link

Thank you. That does fix the minimum working example, although for some reason it doesn't fix the problem for my training of the model. So pytorch applies bessel's correction by default? Is there a setting for batchnorm such that the inputs are not normalized by their own batch statistics, but rather the updated version of the running mean/var? Maybe that could be the issue- because I'm still finding that when creating a simple model based on a single batch norm layer + 1 dimensional convolution of a single filter, trained and applied on a single batch of data, the error goes up drastically when switching from train to eval mode.

I will try to provide the full training loop but it will take longer to create.

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

pytorch applies bessel's correction by default

Yes, that's what the BN paper proposes.

inputs are not normalized by their own batch statistics, but rather the updated version of the running mean/var

You will get wrong gradient this will though. So no.

single batch of data

how many samples do you have in there?

@penguinshin
Copy link

penguinshin commented Oct 17, 2018

Sorry for the trouble...

Its the same dimensions as the example above, so I feed a 180x1x180 ultimately. However, I think there is something special I'm doing that might be throwing off the batchnorm, maybe you can tell me if this would negatively impact -

I originally have a 128 x 1 x 360 window. I loop through each of the 128 windows and create 180 windows of width 180, so for example, the first 360 window would be converted into 180 sliding windows of length 180. I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step(). Presumably, using train mode would mean that each of those windows (of the 128) would be batch normalized appropriately within one backward pass.

My guess is that each window's high autocorrelation (since the 180 batches are all from one continuous 360 window) produces a sharp batch statistics, and maybe that is responsible for the issue, but I don't know for sure. I am going to try setting a lower momentum.

@ssnl
Copy link
Collaborator

ssnl commented Oct 17, 2018

I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step().

@penguinshin Here is your problem. Batch norm (and its backward) is not linear in batch size. I.e., f(x) + f(y) != f([x, y]).sum(). Therefore, instead, you should use tensor.unfold https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold to create a batch of 128 windows and activate them through the network once.

@penguinshin
Copy link

penguinshin commented Oct 18, 2018 via email

@EmmaRocheteau
Copy link

I am getting an issue like this

@tingweii
Copy link

I am getting an issue like this QQ

@longlifedahan
Copy link

Thanks,it really helped me
I am suffering this problem, and tried for a week, but the loss in the eval anad train mode are extremely different
Now I know how to try

1 similar comment
@longlifedahan
Copy link

Thanks,it really helped me
I am suffering this problem, and tried for a week, but the loss in the eval anad train mode are extremely different
Now I know how to try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants