Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

tkianai · 2020-08-03T03:29:19Z

I have encountered this problem while running python -m torch.distributed.launch --nproc_per_node 8 train.py --batch-size 128 --data coco.yaml --cfg yolov5s.yaml --weights '' with the code of the latest version.

The logs printed as follows:

Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     0/299     3.54G    0.0874    0.0744     0.081    0.2428        10       640: 100%|█████████████████| 925/925 [05:45<00:00,  2.68it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████| 40/40 [01:31<00:00,  2.28s/it]
                 all       5e+03    3.63e+04      0.0336     0.00589     0.00874     0.00232

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     1/299     8.08G   0.07261    0.0743   0.06651    0.2134        37       640: 100%|█████████████████| 925/925 [05:41<00:00,  2.71it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████| 40/40 [01:18<00:00,  1.95s/it]
                 all       5e+03    3.63e+04       0.143      0.0414      0.0393      0.0145

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     2/299     8.08G   0.06912   0.06453   0.06019    0.1938       211       640:   2%|▎                 | 15/925 [00:10<05:50,  2.60it/s]

The GPU states listed as follows:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 43%   72C    P2   187W / 250W |   8409MiB / 12196MiB |     69%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 49%   80C    P2   207W / 250W |   4083MiB / 12196MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 48%   78C    P2   235W / 250W |   4083MiB / 12196MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 51%   83C    P2   252W / 250W |   4083MiB / 12196MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 00000000:0C:00.0 Off |                  N/A |
| 49%   80C    P2   260W / 250W |   4083MiB / 12196MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 00000000:0D:00.0 Off |                  N/A |
| 40%   68C    P2   235W / 250W |   4083MiB / 12196MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 00000000:0E:00.0 Off |                  N/A |
| 49%   81C    P2   225W / 250W |   4083MiB / 12196MiB |     77%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 00000000:0F:00.0 Off |                  N/A |
| 52%   84C    P2   138W / 250W |   4083MiB / 12196MiB |     86%      Default |
+-------------------------------+----------------------+----------------------+

The text was updated successfully, but these errors were encountered:

github-actions · 2020-08-03T03:30:14Z

Hello @tkianai, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-08-03T21:22:27Z

@tkianai I can't comment on this as I don't often employ multi-gpu, but I believe excess device 0 memory consumption has been common for a while. I don't know if this is specific to this repository or pytorch DDP in general.

I would probably not classify as a bug.

tkianai · 2020-08-04T03:19:45Z

@glenn-jocher Hi, thanks for your explanation. I just wonder amp(pytorch built-in version) do not help much. It's slower, and it eliminates little gpu memory.

My environment(ubuntu18.04, pytorch 1.6.0, cuda 10.2, TitanXp)

Compared to commit 48e15be498ab6f3335743b5df7fbd02edd068160 do not use mix_precision training:

0/299     1.77G   0.08957   0.08007   0.08078    0.2504        23       640: 100%|██████████████| 925/925 [05:11<00:00,  2.97it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████| 40/40 [01:25<00:00,  2.14s/it]
                 all       5e+03    3.63e+04      0.0609     0.00288     0.00829     0.00234

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     1/299     8.31G    0.0737   0.07986   0.06731    0.2209        21       640: 100%|██████████████| 925/925 [05:01<00:00,  3.07it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████| 40/40 [01:20<00:00,  2.00s/it]
                 all       5e+03    3.63e+04       0.126      0.0581      0.0457      0.0168

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     2/299     8.36G   0.07015   0.07251   0.06002    0.2027       188       640:   3%|▍              | 26/925 [00:09<04:55,  3.04it/s]

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 45%   74C    P2   231W / 250W |   8609MiB / 12196MiB |     73%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 51%   82C    P2    95W / 250W |   5413MiB / 12196MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 49%   79C    P2    95W / 250W |   5423MiB / 12196MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 52%   83C    P2   261W / 250W |   5415MiB / 12196MiB |     75%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 00000000:0C:00.0 Off |                  N/A |
| 50%   80C    P2   100W / 250W |   5415MiB / 12196MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 00000000:0D:00.0 Off |                  N/A |
| 41%   70C    P2   240W / 250W |   5413MiB / 12196MiB |     74%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 00000000:0E:00.0 Off |                  N/A |
| 50%   81C    P2    90W / 250W |   5413MiB / 12196MiB |     74%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 00000000:0F:00.0 Off |                  N/A |
| 52%   84C    P2   249W / 250W |   5413MiB / 12196MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

glenn-jocher · 2020-08-04T03:40:35Z

@tkianai oh that's odd. In our experiments we found native amp with 1.6 to use a little less GPU ram, which allowed us to increase batch size maybe 10%, allowing for a bit of speed up (maybe 5%-10%). We have not tested on multi-gpu however.

It looks like you see similar GPU memory reduction as we saw, perhaps this would allow you to up your batch size some. I don't know if there is anything you can do about the device 0 mem usage, but that seems to be a separate issue, which may or may not have a solution. You might want to raise an issue on the pytorch repo for that one.

NanoCode012 · 2020-08-04T03:44:40Z

A bit excess device 0 should be expected (<1-2GB). I think this is related to the test performed only on GPU 0.

I did my own quick test using your command. Numbers taken at second epoch.

Branch	GPU 0 (GB)	GPU 1-7 (GB)
My master branch (Pre 1.6)	8	4
My muti branch (Post 1.6)	10	4

@tkianai , I recommend using --sync-bn since the results aren't looking too good.

Edit: See below comment for better visuals. The above table may be skewed.

glenn-jocher · 2020-08-04T03:54:14Z

One minor cause is the EMA, which is resident on device 0, but this should only be a few hundred MB at most. For v5x, with 90M parameters, even as FP32 that would be 4 bytes * 90M = 360MB. The rest I'm not really sure.

Testing is device 0, but once the test.py function exits all the cuda variables should be cleared there and no longer consuming memory. If @tkianai is saying that epoch 0 is different from epoch >0 though, then the call to test.py could be the differentiator. In that sense it is suspect.

tkianai · 2020-08-04T04:02:55Z

@NanoCode012 Yes, using multi-gpu training with the same 300 epochs would cause almost ~0.5 mAP decline. I just test this case when I train with multi-gpu with amp. --sync-bn would be a good way to enhance performance.

NanoCode012 · 2020-08-04T04:13:50Z

Testing is device 0, but once the test.py function exits all the cuda variables should be cleared there and no longer consuming memory.

Oh I see!

If @tkianai is saying that epoch 0 is different from epoch >0 though, then the call to test.py could be the differentiator. In that sense it is suspect.

On 1.6,

Optimizer	Epoch 1 Train	Epoch 1 Test	Epoch 2 Train	Epoch 2 Test	Epoch 3 Train	Epoch 3 Test

@glenn-jocher , hope this provides better visualization. @tkianai , do you get the same results?

Edit: Add table

tkianai · 2020-08-04T04:20:50Z

@glenn-jocher @NanoCode012 Hi, It seems the test stage would increase the GPU memory in device 0. I use --notest flag and this do not happen anymore from epoch 1.

Using amp(pytorch built-in)

python -m torch.distributed.launch --nproc_per_node 8 train.py --batch-size 128 --data coco.yaml --cfg yolov5s.yaml --weights '' --notest

The printed logs listed as follows:

Starting training for 300 epochs...

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     0/299      2.9G   0.08759    0.0743   0.08098    0.2429        10       640: 100%|██████████████████| 925/925 [06:14<00:00,  2.47it/s]

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     1/299     2.94G   0.07263   0.07433   0.06655    0.2135         9       640: 100%|██████████████████| 925/925 [06:03<00:00,  2.55it/s]

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     2/299     2.94G   0.06974   0.07265    0.0598    0.2022       153       640:   7%|█▍                 | 69/925 [00:28<05:34,  2.56it/s]

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 45%   72C    P2   214W / 250W |   3457MiB / 12196MiB |     79%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 53%   83C    P2   206W / 250W |   3423MiB / 12196MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 51%   80C    P2   232W / 250W |   3423MiB / 12196MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 53%   83C    P2   213W / 250W |   3423MiB / 12196MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 00000000:0C:00.0 Off |                  N/A |
| 52%   82C    P2   128W / 250W |   3423MiB / 12196MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 00000000:0D:00.0 Off |                  N/A |
| 43%   69C    P2    96W / 250W |   3423MiB / 12196MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 00000000:0E:00.0 Off |                  N/A |
| 53%   83C    P2   157W / 250W |   3423MiB / 12196MiB |     82%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 00000000:0F:00.0 Off |                  N/A |
| 53%   83C    P2   231W / 250W |   3423MiB / 12196MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+

Using float32

python -m torch.distributed.launch --nproc_per_node 8 train.py --batch-size 128 --data coco.yaml --cfg yolov5s.yaml --weights '' --notest

The printed logs listed as follows:

Starting training for 300 epochs...

     0/299     1.77G   0.08925   0.08011    0.0807    0.2501        23       640: 100%|██████████████████| 925/925 [05:16<00:00,  2.93it/s]

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     1/299      5.2G   0.07356   0.07901   0.06713    0.2197        33       640: 100%|██████████████████| 925/925 [05:06<00:00,  3.01it/s]

     Epoch   gpu_mem      GIoU       obj       cls     total   targets  img_size
     2/299      5.2G   0.07061   0.07835   0.06183    0.2108       222       640:  10%|█▊                 | 89/925 [00:32<04:31,  3.08it/s]

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:04:00.0 Off |                  N/A |
| 46%   73C    P2   166W / 250W |   5637MiB / 12196MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 53%   83C    P2   211W / 250W |   5603MiB / 12196MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:07:00.0 Off |                  N/A |
| 51%   80C    P2   212W / 250W |   5413MiB / 12196MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:08:00.0 Off |                  N/A |
| 53%   83C    P2   194W / 250W |   5413MiB / 12196MiB |     64%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 00000000:0C:00.0 Off |                  N/A |
| 52%   82C    P2   259W / 250W |   5415MiB / 12196MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 00000000:0D:00.0 Off |                  N/A |
| 44%   70C    P2   221W / 250W |   5413MiB / 12196MiB |     77%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 00000000:0E:00.0 Off |                  N/A |
| 53%   82C    P2   263W / 250W |   5413MiB / 12196MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 00000000:0F:00.0 Off |                  N/A |
| 57%   84C    P2   285W / 250W |   5413MiB / 12196MiB |     83%      Default |
+-------------------------------+----------------------+----------------------+

It eliminates the gpu memory from 5400Mb to 3400Mb, which works well. But, it looks like using a little much longer time to train(5 minutes to 6 minutes per epoch)

glenn-jocher · 2020-08-06T21:05:35Z

@tkianai @NanoCode012 I'm updating this issue with comments from a recent PR.

for the GPU memory issue that was mentioned in #610 , tkianai and I did test and we found out that it spiked after epoch 1 test #610 (comment) and #610 (comment) . Specifically saying the issue lies in test as --notest removes the issue.

I am not sure why it does not happen for you. Maybe it is because your maximum memory is 16 GB and it is already training at 14 GB (near its maximum)?

I will run my own test with 2 GPU as comparison. I think this should be in a separate Issue.

Edit: Add table

Command
python -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 64 --data coco.yaml --cfg yolov5s.yaml --weights ''
GPU Train 1 Train 3 Test 5

GPU 0 6145MiB 6483MiB 6483MiB

GPU 1 5965MiB 6295MiB 6295MiB

Edit 2: Due to losing track of time, I did not get to track down the inbetweens, however, we see that it doesn't spike as badly. I am confused why it spiked before. Was it because of 8 GPUs?

Originally posted by @NanoCode012 in #500 (comment)

And my own observations was no additional GPU memory consumption on device 2 when using 2 GPU training, during either training or testing on epoch 0 or any subsequent epochs:

nvidia-smi from epoch 0 and later to test high device 0 memory usage issue.
python -m torch.distributed.launch --nproc_per_node 2 train.py --img 640 640 --batch 32 --cfg yolov5x.yaml --weights '' --bucket ult/coco --name $n --data coco.yaml

Epoch 0 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 17:50:26 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0   210W / 300W |  15020MiB / 16130MiB |     84%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   54C    P0   223W / 300W |  14708MiB / 16130MiB |     88%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      15009MiB |
|    1      1807      C   /opt/conda/bin/python                      14697MiB |
+-----------------------------------------------------------------------------+

epoch 0 test:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:19:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P0    84W / 300W |  14172MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   40C    P0    56W / 300W |  14006MiB / 16130MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14159MiB |
|    1      1807      C   /opt/conda/bin/python                      13995MiB |
+-----------------------------------------------------------------------------+

epoch 1 train:

glenn_jocher_ultralytics_com@instance-7:~$ nvidia-smi
Thu Aug  6 18:21:59 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0   277W / 300W |  14996MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   55C    P0   263W / 300W |  14600MiB / 16130MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1806      C   /opt/conda/bin/python                      14981MiB |
|    1      1807      C   /opt/conda/bin/python                      14589MiB |
+-----------------------------------------------------------------------------+

Originally posted by @glenn-jocher in #500 (comment)

tkianai added the bug label Aug 3, 2020

tkianai closed this as completed Aug 5, 2020

NanoCode012 mentioned this issue Aug 6, 2020

Fix redundant outputs via Logging in DDP training #500

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

tkianai commented Aug 3, 2020

github-actions bot commented Aug 3, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Aug 3, 2020

tkianai commented Aug 4, 2020 •

edited

Loading

glenn-jocher commented Aug 4, 2020

NanoCode012 commented Aug 4, 2020 •

edited

Loading

glenn-jocher commented Aug 4, 2020

tkianai commented Aug 4, 2020 •

edited

Loading

NanoCode012 commented Aug 4, 2020 •

edited

Loading

tkianai commented Aug 4, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020 •

edited

Loading

Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

Consume double GPU memory while using pytorch built-in amp module starting from the second epoch #610

Comments

tkianai commented Aug 3, 2020

github-actions bot commented Aug 3, 2020 • edited by glenn-jocher Loading

glenn-jocher commented Aug 3, 2020

tkianai commented Aug 4, 2020 • edited Loading

glenn-jocher commented Aug 4, 2020

NanoCode012 commented Aug 4, 2020 • edited Loading

glenn-jocher commented Aug 4, 2020

tkianai commented Aug 4, 2020 • edited Loading

NanoCode012 commented Aug 4, 2020 • edited Loading

tkianai commented Aug 4, 2020 • edited Loading

glenn-jocher commented Aug 6, 2020 • edited Loading

github-actions bot commented Aug 3, 2020 •

edited by glenn-jocher

Loading

tkianai commented Aug 4, 2020 •

edited

Loading

NanoCode012 commented Aug 4, 2020 •

edited

Loading

tkianai commented Aug 4, 2020 •

edited

Loading

NanoCode012 commented Aug 4, 2020 •

edited

Loading

tkianai commented Aug 4, 2020 •

edited

Loading

glenn-jocher commented Aug 6, 2020 •

edited

Loading