You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies #1643
OOOOOOOOOOOOOOOOOO
I solved the issue by change nw = 1.
in the code, nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
if nw = 8 that mean 8 CPU core will take part in the work, it needs much RAM.
so, it works if we decrease nw to 1.
OOOOOOOOOO
:)
himashi92, AntonFrlan, ninfueng, ShengtianSang, DmitriyValetov and 53 moreShengtianSang, Marius-Sheppard, bgswaroop, Semihal, mzitniko and 15 moremzitniko, jingmengzhiyue, Animadversio, boyarkin-m, Wolfman1219 and 2 more
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Well, i managed to resolve this.
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically......'. for all the drives, set 'system managed size'. Restart your pc.
Aly-hosny, mondrasovic, xymf, padiro, pirevi and 18 moreVasylKolomiiets, Animadversio, firmium, foogaz, Gariman11 and 1 moremedahmedkrichen, takeyamayuki, foogaz, Gariman11 and kotoha0w0
Well, i managed to resolve this.
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically......'. for all the drives, set 'system managed size'. Restart your pc.
This works, but only temporarily. Nowadays I am facing the problem of encountering a crash after few hours of training. It usually happens at the beginning of the epoch, when it is loading.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Program Files\Python37\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Program Files\Python37\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Program Files\Python37\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Program Files\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "E:\projects\siamfc\src\train.py", line 13, in <module>
import torch
File "E:\venvs\general\lib\site-packages\torch\__init__.py", line 123, in <module>
raise err
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\venvs\general\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x0000025934FA8048>
Traceback (most recent call last):
File "E:\venvs\general\lib\site-packages\torch\utils\data\dataloader.py", line 1324, in __del__
self._shutdown_workers()
File "E:\venvs\general\lib\site-packages\torch\utils\data\dataloader.py", line 1291, in _shutdown_workers
if self._persistent_workers or self._workers_status[worker_id]:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_workers_status'
My environment:
Windows 10
NVidia CUDA 11.1
Python 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)] on win32
torch==1.8.0+cu111
torchvision==0.9.0+cu111
numpy==1.19.5
An interesting and at the same time the reproducible crash happened when I loaded the Microsoft Teams application. Even MS Teams reported an exception regarding virtual memory. No other app stopped working. Thus, MS Teams and PyTorch training became "mutually exclusive". After I applied the trick mentioned above, the problem remains only on the PyTorch side, and only sometimes. A lot of ambiguous words, I know, but that's how it is.
It seems like a lot of windows users are encountering this problem, but as @PonyPC mentioned reducing workers will generally also result in slower training. Are you guys encountering this during DDP or single-GPU training?
EDIT: just realized this is YOLOv3 repo and not YOLOv5. I would strongly encourage all users to migrate to YOLOv5, which is much better maintained. It's possible this issue is already resolved there.
mhrahimi, gb96, alan8365, TalTaiber, yanyilyu and 1 more
I had the same error on win 10 today and following
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically......'. for all the drives, set 'system managed size'. Restart your pc.
didn't help. Then, I suddenly remembered that I had installed cuda 11.7 along with already exisiting cuda 11.3 and 11.2 version. I had moved up lib and libvvp paths variables in system variables at that time. Therefore, decided to install packages )related to cuda 11.7) with conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia, reversed the above quoted process (select automatically...), restarted the PC and now it's working well.
@glenn-jocher Unfortunately I don't think that this is something that can be fixed within yolov5.
This is an issue with CUDA and pytorch DLLs. My 'fix' just changes some flags on the DLLs to make them allocate less memory. This likely would be a job for NVidia to fix the flags on their CUDA DLLs (eg, cusolver64_*.dll in CUDA release). Perhaps 'pytorch' could help some as well, since they also package some of these (eg, caffe2_detectron_ops_gpu.dll)... although they use NVidia tools to do this, so the blame probably falls back to NVidia.
Even with my changes to these flags, these DLLs still reserve a whole lot more memory than they actually use. I don't know who is to blame, and since my flag changes got me going I'm not digging further into it.
edit: I went ahead and submitted the info as a 'bug report' to NVIDIA. Whether or not anything happens with it, or any of the appropriate people at NVIDIA ever see it, who knows? But maybe they'll pick it up and do something about it.
hello, this problem may be solved now!
my environments:
torch 1.13.1+cu117
torchvision 0.14.1+cu117
cuda: 11.8
cudnn: 8.8.1.3_cuda11
or:
cuda: 12.1
cudnn: 8.8.1.3_cuda12
I use yolov5-6.2, --batch-size 16 --workers 16, the virtual memory it need is much less than before! (It need more than 100GB before)
why I use torch 1.13.1+cu117 and cuda 11.8?
Actually I try with torch 2.0+cu118 and cuda 11.8(or cuda12.1), but something wrong with the amp, so I change to torch 1.13.1+cu117 firstly, and it works(cuda11.8 and cuda 12.1 both work), so I don't want to try cuda11.7 any more ~~
Thank you for sharing your solution. It's great to hear that increasing the page file limit of Windows helped in resolving the issue. It seems that managing the page file size effectively contributed to stability during the training process. If you encounter any more issues or have further questions, feel free to reach out.
1.try counting down the num_workers to 1or 0. 2.try modifying batch-size = 2 or 1. Hope to help u.
it works, but the entire training process became too slow, is there any better way to solve this? I got two days wasted on this. Thank you.
There is no better solution.
This issue is related with your computer performance. If you would like to speed up the training, you have to improve your computer hardware performance. Such as, increase your memory, or use better GPU, or use server CPU and etc.
@ardeal hi there! It seems you've already tried the recommended solutions. As for improving speed, upgrading your hardware such as increasing memory, using a stronger GPU, or leveraging a server CPU may help expedite the training process. If you have further queries or need additional assistance, feel free to ask.
It is not really related with the computer performance but rather the fact that:
memory management on pytorch+windows sucks
ultralytics dataloader is constantly memory leaking
python multithreading sucks, and there are various things u can do to mitigate its issues (like using numpy arrays or torch tensors) which are not done in ultralytics dataloader, hence point 2.
Even on linux it will slowly eat up all of your memory and any swap partition you have till it drives training to a halt. good thing on linux is that you can just have oom killer and resume the training (though not an option on large datasets, those will still memory leak into oblivion). But on windows the only solution is to clear pagefile.sys with a hard reboot.
@siddtmb hi! Thanks for your insights. Memory management, particularly in a Windows environment, can indeed introduce challenges. We're continuously working on improving the efficiency of our data loader and overall memory usage within YOLOv3 and appreciate your feedback.
For mitigating memory leaks or high memory usage issues:
Ensuring the latest version of PyTorch is used can sometimes alleviate memory management issues, as improvements and bug fixes are regularly released.
Experimenting with reducing --num-workers and --batch-size in your training command may provide immediate relief from memory pressure, though at the expense of training speed.
Utilizing torch.utils.data.DataLoader with pin_memory=True and carefully managing tensor operations can help in some situations.
We recognize the importance of efficient memory usage and are committed to making improvements. Contributions and pull requests are always welcome if you have suggestions or optimizations to share with the community. Your feedback is valuable in guiding those efforts. Thank you for bringing this to our attention.
[WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies · Issue #1643 · ultralytics/yolov3
Activity
ardeal commentedon Jan 6, 2021
OOOOOOOOOOOOOOOOOO
I solved the issue by change nw = 1.
in the code, nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
if nw = 8 that mean 8 CPU core will take part in the work, it needs much RAM.
so, it works if we decrease nw to 1.
OOOOOOOOOO
:)
github-actions commentedon Feb 6, 2021
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
tufail117 commentedon Feb 20, 2021
Any update on this? I am also facing the same issue. Have tried many things for the last 3 days, but no success.
tufail117 commentedon Feb 20, 2021
Well, i managed to resolve this.
open "advanced system setting". Go to the advanced tab then click settings related to performance.
Again click on advanced tab--> change --> unselect 'automatically......'. for all the drives, set 'system managed size'. Restart your pc.
mondrasovic commentedon Mar 5, 2021
This works, but only temporarily. Nowadays I am facing the problem of encountering a crash after few hours of training. It usually happens at the beginning of the epoch, when it is loading.
My environment:
An interesting and at the same time the reproducible crash happened when I loaded the Microsoft Teams application. Even MS Teams reported an exception regarding virtual memory. No other app stopped working. Thus, MS Teams and PyTorch training became "mutually exclusive". After I applied the trick mentioned above, the problem remains only on the PyTorch side, and only sometimes. A lot of ambiguous words, I know, but that's how it is.
XuChang2020 commentedon Apr 30, 2021
1.try counting down the num_workers to 1or 0.
2.try modifying batch-size = 2 or 1.
Hope to help u.
PonyPC commentedon Jun 9, 2021
reduce number of workers will reduce train speed efficiently.
krisstern commentedon Sep 29, 2021
I was having the same error thrown with
yolov5
, which was fixed by changing the number of workersnw
to4
manually in the "datasets.py" file.PonyPC commentedon Sep 29, 2021
glenn-jocher commentedon Sep 29, 2021
@ardeal @krisstern @PonyPC you can set dataloader workers during training, i.e.:
https://github.com/ultralytics/yolov5/blob/76d301bd21b4de3b0f0d067211da07e6de74b2a0/train.py#L454
It seems like a lot of windows users are encountering this problem, but as @PonyPC mentioned reducing workers will generally also result in slower training. Are you guys encountering this during DDP or single-GPU training?
EDIT: just realized this is YOLOv3 repo and not YOLOv5. I would strongly encourage all users to migrate to YOLOv5, which is much better maintained. It's possible this issue is already resolved there.
18 remaining items
glenn-jocher commentedon Nov 24, 2022
@szan12 i.e.
python train.py --workers 4
bit-scientist commentedon Jan 27, 2023
I had the same error on win 10 today and following
didn't help. Then, I suddenly remembered that I had installed cuda 11.7 along with already exisiting cuda 11.3 and 11.2 version. I had moved up
lib
andlibvvp
paths variables in system variables at that time. Therefore, decided to install packages )related to cuda 11.7) withconda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
, reversed the above quoted process (select automatically...), restarted the PC and now it's working well.kv1830 commentedon Apr 5, 2023
hello, this problem may be solved now!
my environments:
torch 1.13.1+cu117
torchvision 0.14.1+cu117
cuda: 11.8
cudnn: 8.8.1.3_cuda11
or:
cuda: 12.1
cudnn: 8.8.1.3_cuda12
I use yolov5-6.2, --batch-size 16 --workers 16, the virtual memory it need is much less than before! (It need more than 100GB before)
why I use torch 1.13.1+cu117 and cuda 11.8?
Actually I try with torch 2.0+cu118 and cuda 11.8(or cuda12.1), but something wrong with the amp, so I change to torch 1.13.1+cu117 firstly, and it works(cuda11.8 and cuda 12.1 both work), so I don't want to try cuda11.7 any more ~~
francescobodria commentedon May 24, 2023
I solved increasing the page file limit of windows
glenn-jocher commentedon Nov 9, 2023
Thank you for sharing your solution. It's great to hear that increasing the page file limit of Windows helped in resolving the issue. It seems that managing the page file size effectively contributed to stability during the training process. If you encounter any more issues or have further questions, feel free to reach out.
kevinoldman commentedon Dec 13, 2023
it works, but the entire training process became too slow, is there any better way to solve this? I got two days wasted on this.
Thank you.
ardeal commentedon Dec 13, 2023
There is no better solution.
This issue is related with your computer performance. If you would like to speed up the training, you have to improve your computer hardware performance. Such as, increase your memory, or use better GPU, or use server CPU and etc.
glenn-jocher commentedon Dec 13, 2023
@ardeal hi there! It seems you've already tried the recommended solutions. As for improving speed, upgrading your hardware such as increasing memory, using a stronger GPU, or leveraging a server CPU may help expedite the training process. If you have further queries or need additional assistance, feel free to ask.
siddtmb commentedon Mar 21, 2024
It is not really related with the computer performance but rather the fact that:
Even on linux it will slowly eat up all of your memory and any swap partition you have till it drives training to a halt. good thing on linux is that you can just have oom killer and resume the training (though not an option on large datasets, those will still memory leak into oblivion). But on windows the only solution is to clear pagefile.sys with a hard reboot.
glenn-jocher commentedon Mar 22, 2024
@siddtmb hi! Thanks for your insights. Memory management, particularly in a Windows environment, can indeed introduce challenges. We're continuously working on improving the efficiency of our data loader and overall memory usage within YOLOv3 and appreciate your feedback.
For mitigating memory leaks or high memory usage issues:
--num-workers
and--batch-size
in your training command may provide immediate relief from memory pressure, though at the expense of training speed.torch.utils.data.DataLoader
withpin_memory=True
and carefully managing tensor operations can help in some situations.We recognize the importance of efficient memory usage and are committed to making improvements. Contributions and pull requests are always welcome if you have suggestions or optimizations to share with the community. Your feedback is valuable in guiding those efforts. Thank you for bringing this to our attention.