-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Open
Labels
high prioritymodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorymodule: molly-guardFeatures which help prevent users from committing common mistakesFeatures which help prevent users from committing common mistakesmodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Editor note: There is a known workaround further down on this issue, which is to NOT use Python lists, but instead using something else, e.g., torch.tensor directly. See #13246 (comment) . You can use a numpy array, but it only fixes the issue for the fork
start method. See #13246 (comment) for more details
🐛 Bug
CPU memory will leak if the DataLoader num_workers > 0
.
To Reproduce
Run the following snippet:
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torchvision import transforms
import os
class DataIter(Dataset):
def __init__(self):
path = "path/to/data"
self.data = []
for cls in os.listdir(path):
for img in os.listdir(os.path.join(path, cls)):
self.data.append(os.path.join(path, cls, img))
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
with Image.open(self.data[idx]) as img:
img = img.convert('RGB')
return transforms.functional.to_tensor(img)
train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
shuffle=True,
drop_last=True,
pin_memory=False,
num_workers=18)
for i, item in enumerate(train_loader):
if i % 200 == 0:
print(i)
Expected behavior
CPU memory will gradually start increasing, eventually filling up the whole RAM. E.g., the process starts with around 15GB and fills up the whole 128GB available on the system.
When the num_workers=0
, RAM usage is constant.
Environment
PyTorch version: 1.0.0.dev20181028
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.4
Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
PIL.__version__
'5.3.0'
Additional info
There are around 24 million images in the dataset and all image paths are loaded into a single list as presented in the above code snippet.
I have also tried multiple Pytorch (0.4.0 and 0.4.1) versions and the effect is the same.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ssnl @VitalyFedyunin @ejguan
amitabul, harpone, RayXu14, qbx2, Ali2500 and 128 moresemaphore-egg
Metadata
Metadata
Assignees
Labels
high prioritymodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorymodule: molly-guardFeatures which help prevent users from committing common mistakesFeatures which help prevent users from committing common mistakesmodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
ssnl commentedon Oct 29, 2018
Do you see memory usage increasing when iterating, or before you even start to iterate?
bfreskura commentedon Oct 29, 2018
@ssnl During the iteration only.
ezyang commentedon Oct 29, 2018
When we fix #13243 we should check if this one gets fixed too.
samgd commentedon Oct 31, 2018
I've been experiencing something similar where memory usage continuously climbs until a OOM is triggered when using a
batch_sampler
withnum_workers>0
.To Reproduce
Environment
bfreskura commentedon Nov 7, 2018
@ezyang
The issue is still present in
1.0.0.dev20181105
, where the #13243 is fixed.bfreskura commentedon Nov 7, 2018
After some more investigation, I have found an exact scenario when the leak occurs. Consider the code example below:
If we use the
self.data
variable which is a standard Python list of ints, the data leak will occur. However, if theself.data_np
variable is used, which holds the same data but in a form of a Numpy array, the leak will not occur.Another observation is that the leakage is significantly less severe if the
shuffle=False
in theDataLoader
.svishnu88 commentedon Nov 10, 2018
I face similar issue, but in my case it occurs with numpy array too. I am using Python 3.7 and PyTorch nightly release.
mprostock commentedon Dec 8, 2018
I don't know how multiprocessing really works under the hood of pytorch, but we have extensively discussed this "Memory Leak" issue (which probably isn't a memory leak!) on the fast.ai forums (https://forums.fast.ai/t/runtimeerror-dataloader-worker-is-killed-by-signal/31277/55?u=marcmuc). Preliminary findings which hopefully add some insight here (if this does NOT apply, please comment!):
Python Multiprocessing: There is no way of storing arbitrary python objects (even simple lists) in shared memory in Python without triggering copy-on-write behaviour due to the addition of refcounts, everytime something reads from these objects. The refcounts are added memory-page by memory-page, which is why the consumption grows slowly. The processes (workers) will end up having all/most of the memory copied over bit by bit, which is why we get the memory overflow problem. Best description of this behavior is here (SO).
Possible Solution:
Using Multiprocessing like now: in order for python multiprocessing to work without these refcount effects, the objects have to be made “compatible with” and wrapped in
multiprocessing.Array
before the process pool is created and workers are forked. This supposedly ensures, that the memory will really be shared and no copy-on-write happens. This explains how to do it for numpy arrays and this explains the reasoning behind it again. Don’t get confused by some false statements even by the authors of these good answers stating that copy-on-write makes all of this unnecessary, which is not true. One comment also points to this:I am not familiar with the torch.multiprocessing drop-in replacement that I understand pytorch uses, but I would assume it will also not be able to remove the core python refcount issue.
soumith commentedon Dec 9, 2018
@mprostock torch.multiprocessing is simply Python multiprocessing, with a custom pickler. The custom pickler, whenever it encounters a
torch.tensor
, will automatically move it to shared memory, and hence atleast on thetorch.tensor
objects, no copy-on-write happens.mprostock commentedon Dec 10, 2018
Thanks for the explanation! I have experimented with @bfreskura 's reproduction example and I think I can now pinpoint the problem:
The reproduction example by bfreskura above showed the difference between a regular python list and a numpy array. But the problem is not (only) the python list itself, the same happens in a numpy array of type object. Python lists store only references to the objects, the objects are kept separately in memory. Every object has a refcount, therefore every item in the list has a refcount.
Numpy arrays (of standard np types) are stored as continuous blocks in memory and are only ONE object with one refcount.
This changes if you make the numpy array explicitly of type object, which makes it start behaving like a regular python list (only storing references to (string) objects). The same "problems" with memory consumption now appear.
This would explain, why with regular lists (or numpy arrays of type object) we see the "memory leak", which actually is the copy-on-acces problem of forked python processes due to changing refcounts, not a memory leak.
So the problem probably (often) has got nothing to do with tensors or actual torch objects, but rather with the lists of filenames and dicts of labels, that are generally used within dataloaders/datasets.
I have created a notebook gist, if someone wants to quickly try it.
Look at the memory consumption (quick and dirty mem of total system, so minor influences by other processes, tried to keep system clean)
Memory-Consumption in GB with fixed length string array:

Memory-Consumption in GB with object array (only change!)

aurooj commentedon Jan 15, 2019
I am facing the same issue. It fills up my RAM very fast if the num_workers > 0.
I am deleting the variables which I feel are no longer needed in my code, also call gc.collect() on every iteration, but nothing helps.
Any workarounds?
NProkoptsev commentedon Jan 18, 2019
Switching from dict to pandas and from lists to numpy arrays helps me
312 remaining items