-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Using model.eval() with batchnorm gives high error #4741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that we're using GitHub issues for bug reports only, and all questions should be posted on our forums. |
It might be a bug in the framework as well. |
That's true, but what you said is often a symptom of a user error or a network specific instability. You're welcome to post it in the forums, but this is definitely not enough information to say that it's a bug or take any steps towards fixing it |
It seems like a common issue in pytorch forum while no one is answering people concerns and experience. |
@fsalmasri the same thread on the forums that you replied has a working answer that I wrote in September. Here's a link: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/3?u=smth I wrote another comment here today to make it clear to you that this is not a software bug. If you have a non-stationary training you will see the issue. |
Always setting the training parameter to True and manually setting momentum to 0 on eval is a workarund that solves this bug in the software. just add:
in the forward of and remove the Of course you also need to replace |
@soumith In my case, a higher momentum solved the problem. Maybe the default value of 0.1 is not a good one? |
Is there ever a case where user error would be responsible for having a model learn on single batch during train mode and then have the error be significantly different (worse) when switching to eval mode but applying on the same single batch that was trained on? |
@penguinshin Not sure if I understand. Are you saying that you are training on a fixed batch and when switching to eval the error is high? Could you send a small example showing that please? Thank you! |
I’ll send an example over shortly. But yes, I feed a single batch (the same batch) through a batchnorm layer in train mode until the mean of batchnorm layer becomes fixed, and then switch to eval mode and apply on the same batch and I get different results from the train mode, even though the reported batchnorm running mean for both the train and eval trials are the same
…Sent from my iPhone
On Oct 16, 2018, at 8:06 PM, Tongzhou Wang ***@***.***> wrote:
@penguinshin Not sure if I understand. Are you saying that you are training on a fixed batch and when switching to eval the error is high? Could you send a small example showing that please? Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Here's a minimum working example. You may have to run it a few times to get random matrices that work, but it should manifest within a few tries at most.
|
@penguinshin hmmm but here you are not running enough times in forward so running_mean and var becomes close to population mean and var though. |
Well I'm not sure what your expectation is with regard to convergence, but shouldn't it work in at least the None or 1 case? Even if you run it for 10000 iterations on the same data, it breaks. The difference is big enough to completely break models that I'm training |
Okay. It's just that the code you gave is inconsistent with what you said above, and happen to include You are missing the Bessel's correction here https://en.wikipedia.org/wiki/Bessel%27s_correction . After including that the result exactly match.
|
Possibly also the reason why if you train on very small number of data, things break. |
Thank you. That does fix the minimum working example, although for some reason it doesn't fix the problem for my training of the model. So pytorch applies bessel's correction by default? Is there a setting for batchnorm such that the inputs are not normalized by their own batch statistics, but rather the updated version of the running mean/var? Maybe that could be the issue- because I'm still finding that when creating a simple model based on a single batch norm layer + 1 dimensional convolution of a single filter, trained and applied on a single batch of data, the error goes up drastically when switching from train to eval mode. I will try to provide the full training loop but it will take longer to create. |
Yes, that's what the BN paper proposes.
You will get wrong gradient this will though. So no.
how many samples do you have in there? |
Sorry for the trouble... Its the same dimensions as the example above, so I feed a 180x1x180 ultimately. However, I think there is something special I'm doing that might be throwing off the batchnorm, maybe you can tell me if this would negatively impact - I originally have a 128 x 1 x 360 window. I loop through each of the 128 windows and create 180 windows of width 180, so for example, the first 360 window would be converted into 180 sliding windows of length 180. I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step(). Presumably, using train mode would mean that each of those windows (of the 128) would be batch normalized appropriately within one backward pass. My guess is that each window's high autocorrelation (since the 180 batches are all from one continuous 360 window) produces a sharp batch statistics, and maybe that is responsible for the issue, but I don't know for sure. I am going to try setting a lower momentum. |
@penguinshin Here is your problem. Batch norm (and its backward) is not linear in batch size. I.e., |
Right, so I fixed that at made it all one single batch- i find that setting momentum for BN to .9999 helps, but doesnt completely solve the problem. (theres still a difference between average train performance using train mode and average train performance using eval). Do you have any suggestions here?
… On Oct 17, 2018, at 11:42 AM, Tongzhou Wang ***@***.***> wrote:
I run each 180x1x180 window (of the 128) through the net and sum the losses over all 128, before calling a backward pass and running optimizer.step().
@penguinshin <https://github.com/penguinshin> Here is your problem. Batch norm (and its backward) is not linear in batch size. I.e., f(x) + f(y) != f([x, y]).sum(). Therefore, instead, you should use tensor.unfold https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold <https://pytorch.org/docs/master/tensors.html#torch.Tensor.unfold> to create a batch of 128 windows and activate them through the network once.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKVpaovEyENALI1CvyETjUD3199UFNRkks5ul0_4gaJpZM4RkUtW>.
|
I am getting an issue like this |
I am getting an issue like this QQ |
Thanks,it really helped me |
1 similar comment
Thanks,it really helped me |
I tested my network using model.eval() on one testing element and the result was very high.
I tried to do testing using the same minibatch size as the training and also testing on one batch size without applying eval mode both of them are better than using average values learned from training by using eval() mode.
theoretically I can’t use this way and I can’t justify, Any solution please ?
The text was updated successfully, but these errors were encountered: