You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered the very same issue, and after spending a day trying to marry PyTorch DataParallel loader wrapper with HDF5 via h5py, I discovered that it is crucial to open h5py.File inside the new process, rather than having it opened in the main process and hope it gets inherited by the underlying multiprocessing implementation.
self._base_seed+i, self._worker_init_fn, i, self._num_workers))
In short, it would create multiple processes which "copy" the state of the current process. Thus the opened hdf5 file object would be dedicated to each subprocess if we open it at the first data iteration.
If you somehow create an hdfs file in __init__ and set up the `num_workers' > 0, it might cause two issues:
The writing behavior is non-determistic. (We do not need to write to hdf5, thus this issue is ignored.)
The state of the hdfs is copied, which might not faithfully indicate the current state.
In the previous way, we bypass this two issues.
nzer0, kfeeeeee, MustafaMustafa, RSKothari, epignatelli and 54 moreshangqwe123 and thnguyn2
This issue could be solved and the solution is simple:
This works very well. I am just wondering if there is any way to call a destructor if the worker exited, e.g., closing the hdf5 file properly. Do you know how to do that?
RSKothari, zhmd, showgood163, AissamDjahnine and cs-mshah
This works very well. I am just wondering if there is any way to call a destructor if the worker exited, e.g., closing the hdf5 file properly. Do you know how to do that?
Great to know that it works! If you want to explicitly close the hdf5 file, you might could add the __del__ function to the dataloader:
the python interpreter and OS would correctly close hdf5 files (e.g., free the resources) which are build inside the sub-processes. If the process normally exits, the python closes the hdf5 file upon process ending. Otherwise (i.e., process crashes), the OS would take charge of GC thus no side affects would remain.
RSKothari, showgood163, Oiangu9, Hwa-Jong and ghoehne
@airsplay Love it. The only downside being that __len__ needs to be defined in advanced. How do you propose handling that? Actually, I managed to figure that out with a little hacky H5 read and close operation within init. This solution is smart. Love it and bookmarked.
@airsplay Love it. The only downside being that __len__ needs to be defined in advanced. How do you propose handling that? Actually, I managed to figure that out with a little hacky H5 read and close operation within init. This solution is smart. Love it and bookmarked.
A good point! A previous solution posted by others (I could not find the original link TAT) mentions the with statement, which might be appropriate here:
Hi all! Thank you for your help!! However, since it has been 6 years ago, this solution is not working on my side. And the problem of TypeError: h5py objects cannot be pickled still exists after I open the hdf5 file only in __getitem__ method. May I have your suggestions on that?
Activity
soumith commentedon Sep 21, 2018
closing as duplicate of #11887 and #11928
h5py doesn't allow reading from multiple processes:
https://github.com/h5py/h5py/blob/master/examples/multiprocessing_example.py#L17-L21
yunyundong commentedon Sep 21, 2018
I do not think so. We have found a solution, https://gist.github.com/bkj/f448025fdef08c0609029489fa26ea2a#file-h5py-error-py
If we use it like this ,is it right? @soumith
yunyundong commentedon Sep 21, 2018
This is the answer to the problem. I modify the code, and it works well. Can you explain more about it? Thank you in advance. @soumith
rs9899 commentedon Apr 9, 2019
Can you please update the link?
This link is not working and I am need of the same for my project.
Thanks
rs9899 commentedon Apr 9, 2019
Problem solved
https://gist.github.com/bkj/f448025fdef08c0609029489fa26ea2a
It seemed like a minor link issue.
alexisdrakopoulos commentedon Jun 10, 2020
This does not seem to be working for me at least.
rs9899 commentedon Jun 10, 2020
Can you elaborate more on the issue?
airsplay commentedon Jun 25, 2020
Solution
This issue could be solved and the solution is simple:
__init__
Here is an illustration:
Then the dataloader with
num_workers
> 1 could just be normally used.Explanation
The multi-processing actually happens when you create the data iterator (e.g., when calling
for datum in dataloader:
):pytorch/torch/utils/data/dataloader.py
Lines 712 to 720 in 461014d
In short, it would create multiple processes which "copy" the state of the current process. Thus the opened hdf5 file object would be dedicated to each subprocess if we open it at the first data iteration.
If you somehow create an hdfs file in
__init__
and set up the `num_workers' > 0, it might cause two issues:In the previous way, we bypass this two issues.
kfeeeeee commentedon Jul 6, 2020
This works very well. I am just wondering if there is any way to call a destructor if the worker exited, e.g., closing the hdf5 file properly. Do you know how to do that?
airsplay commentedon Jul 6, 2020
Great to know that it works! If you want to explicitly close the hdf5 file, you might could add the
__del__
function to the dataloader:However, this destructor does not to be explicitly built. Since the sub-processes of workers are closed when the iterator ends:
pytorch/torch/utils/data/dataloader.py
Lines 1091 to 1092 in 461014d
the python interpreter and OS would correctly close hdf5 files (e.g., free the resources) which are build inside the sub-processes. If the process normally exits, the python closes the hdf5 file upon process ending. Otherwise (i.e., process crashes), the OS would take charge of GC thus no side affects would remain.
MustafaMustafa commentedon Jul 9, 2020
@airsplay
Yes, your solution works. Thank you!
RSKothari commentedon Jul 15, 2020
@airsplay Love it. The only downside being that
__len__
needs to be defined in advanced. How do you propose handling that? Actually, I managed to figure that out with a little hacky H5 read and close operation withininit
. This solution is smart. Love it and bookmarked.airsplay commentedon Jul 15, 2020
A good point! A previous solution posted by others (I could not find the original link TAT) mentions the
with
statement, which might be appropriate here:pengzhi1998 commentedon Jun 25, 2024
Hi all! Thank you for your help!! However, since it has been 6 years ago, this solution is not working on my side. And the problem of
TypeError: h5py objects cannot be pickled
still exists after I open the hdf5 file only in__getitem__
method. May I have your suggestions on that?