Closed
Description
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = None
closed_at = <Date 2020-01-19.18:17:44.745>
created_at = <Date 2015-07-18.02:59:28.391>
labels = ['extension-modules', '3.8', 'type-bug', '3.7', 'expert-IO']
title = 'open().write() and .read() fails on 2 GB+ data (OS X)'
updated_at = <Date 2020-01-19.18:17:57.633>
user = 'https://github.com/lebigot'
bugs.python.org fields:
activity = <Date 2020-01-19.18:17:57.633>
actor = 'zach.ware'
assignee = 'none'
closed = True
closed_date = <Date 2020-01-19.18:17:44.745>
closer = 'zach.ware'
components = ['Extension Modules', 'IO']
creation = <Date 2015-07-18.02:59:28.391>
creator = 'lebigot'
dependencies = []
files = ['39960', '44021', '44024', '45177', '45178']
hgrepos = []
issue_num = 24658
keywords = ['patch']
message_count = 32.0
messages = ['246878', '246879', '246979', '246983', '246985', '246987', '246993', '246994', '246999', '247007', '247122', '256882', '272030', '272044', '278672', '278724', '279132', '279159', '294113', '294122', '294160', '294195', '327912', '327916', '327918', '327940', '330259', '330260', '330262', '335566', '335569', '360264']
nosy_count = 11.0
nosy_names = ['barry', 'ronaldoussoren', 'vstinner', 'lebigot', 'ned.deily', 'zach.ware', 'matrixise', 'Mali Akmanalp', 'Ian Carroll', 'Harry Li', 'miss-islington']
pr_nums = ['1705', '9936', '9937', '9938', '10657', '10658', '10659']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24658'
versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']
Metadata
Metadata
Assignees
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
lebigot commentedon Jul 18, 2015
On OS X, the Homebrew and MacPorts versions of Python 3.4.3 raise an exception when writing a 4 GB bytearray:
This has an impact on pickle, in particular (http://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb).
lebigot commentedon Jul 18, 2015
PS: I should have written "2 GB" bytearray (so this looks like a signed 32 bit integer issue).
[-]open().write() fails on 4 GB+ data (OS X)[/-][+]open().write() fails on 2 GB+ data (OS X)[/+]ronaldoussoren commentedon Jul 20, 2015
This is likely a platform bug, it fails with os.write as well. Interestingly enough file.write works fine on Python 2.7 (which uses stdio), that appearently works around this kernel misfeature.
A possible partial workaround is recognise this error in the implementation of os.write and then perform a partial write. Problem is: while write(2) is documented as possibly writing less data than expected most users writing to normal files (as opposed to sockets) probably don’t expect that behavior. On the other hand, os.write already limits writes to INT_MAX on Windows (see _Py_write in Python/fileutils.c)
Because of this I’m in favour of adding a simular workaround on OSX (and can provide a patch).
BTW. the manpage for write says that writev(2) might fail with EINVAL:
I wouldn’t be surprised if write(2) is implemented using writev(2) and that this explains the problem.
ronaldoussoren commentedon Jul 20, 2015
The attached patch is a first stab at a workaround. It will unconditionally limit the write size in os.write to INT_MAX on OSX.
I haven't tested yet if this actually fixes the problem mentioned on stack overflow.
lebigot commentedon Jul 20, 2015
Thank you for looking into this, Ronald.
What does your patch do, exactly? does it only limit the returned byte count, or does it really limit the size of the data written by truncating it?
In any case, it would be very useful to have a warning from the Python interpreter. If the data is truncated, I would even prefer an explicit exception (e.g. "data too big for this platform (>= 2 GB)"), along with an explicit mention of it in the documentation. What do you think?
ronaldoussoren commentedon Jul 20, 2015
The patch limits os.write to writing at most INT_MAX bytes on OSX. Buffered I/O using open("/some/file", "wb") should still write all data (at least according to the limited tests I've done so far).
The same limitation is already present on Windows.
And as I wrote before: os.write may accoding to the manpage for write(2) already write less bytes than requested.
I'm -1 on using an explicit exception or printing a warning about this.
lebigot commentedon Jul 20, 2015
I see, thanks.
This sounds good to me too: no need for a warning or exception, indeed, since file.write() should work and the behavior of os.write() is documented.
vstinner commentedon Jul 20, 2015
The Windows limit to INT_MAX is one many functions:
In the default branch, there is now _Py_write(), so only one place should be fixed.
See the issue bpo-11395 which fixed the bug on Windows.
If it's a bug, it should be fixed on Python 2.7, 3.4, 3.5 and default branches.
ronaldoussoren commentedon Jul 20, 2015
The patch I attached earlier is for the default branch. More work is needed for the other active branches.
15 remaining items
vstinner commentedon Oct 17, 2018
New changeset 74a8b6e by Victor Stinner (Stéphane Wirtel) in branch 'master':
bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
74a8b6e
vstinner commentedon Oct 17, 2018
New changeset a5ebc20 by Victor Stinner (Stéphane Wirtel) in branch '3.6':
[3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) (GH-9937)
a5ebc20
miss-islington commentedon Oct 18, 2018
New changeset 178d1c0 by Miss Islington (bot) in branch '3.7':
bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
178d1c0
vstinner commentedon Nov 22, 2018
New changeset 9a0d7a7 by Victor Stinner in branch 'master':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
9a0d7a7
miss-islington commentedon Nov 22, 2018
New changeset 18f3327 by Miss Islington (bot) in branch '3.7':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
18f3327
miss-islington commentedon Nov 22, 2018
New changeset 0c15e50 by Miss Islington (bot) in branch '3.6':
bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
0c15e50
warsaw commentedon Feb 14, 2019
Nosying myself since I just landed here based on an internal $work bug report. We're seeing it with reads. I'll try to set aside some work time to review the PRs.
[-]open().write() fails on 2 GB+ data (OS X)[/-][+]open().write() and .read() fails on 2 GB+ data (OS X)[/+]matrixise commentedon Feb 14, 2019
Hi @barry
normally this issue is fixed for 3.x but I need to finish my PR for 2.7.
I think to fix for 2.7 in the next weeks.
zware commentedon Jan 19, 2020
Since 3.x is fixed and 2.7 has reached EOL, I'm closing the issue. Thanks for getting it fixed in 3.x, Stephane and Victor!