Skip to content

open().write() and .read() fails on 2 GB+ data (OS X) #68846

Closed
@lebigot

Description

@lebigot
mannequin
Mannequin
BPO 24658
Nosy @warsaw, @ronaldoussoren, @vstinner, @lebigot, @ned-deily, @zware, @matrixise, @miss-islington
PRs
  • bpo-24658: Fix read/write on file with a size greater than 2GB on OSX #1705
  • [3.7] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9936
  • [3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9937
  • WIP: [2.7] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) #9938
  • bpo-24658: os.read() reuses _PY_READ_MAX #10657
  • [3.7] bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657) #10658
  • [3.6] bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657) #10659
  • Files
  • issue24658.txt
  • issue24658-3.6.diff
  • issue24658-3.5.diff
  • issue24658-2-3.6.diff
  • issue24658-3-3.6.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-01-19.18:17:44.745>
    created_at = <Date 2015-07-18.02:59:28.391>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7', 'expert-IO']
    title = 'open().write() and .read() fails on 2 GB+ data (OS X)'
    updated_at = <Date 2020-01-19.18:17:57.633>
    user = 'https://github.com/lebigot'

    bugs.python.org fields:

    activity = <Date 2020-01-19.18:17:57.633>
    actor = 'zach.ware'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-01-19.18:17:44.745>
    closer = 'zach.ware'
    components = ['Extension Modules', 'IO']
    creation = <Date 2015-07-18.02:59:28.391>
    creator = 'lebigot'
    dependencies = []
    files = ['39960', '44021', '44024', '45177', '45178']
    hgrepos = []
    issue_num = 24658
    keywords = ['patch']
    message_count = 32.0
    messages = ['246878', '246879', '246979', '246983', '246985', '246987', '246993', '246994', '246999', '247007', '247122', '256882', '272030', '272044', '278672', '278724', '279132', '279159', '294113', '294122', '294160', '294195', '327912', '327916', '327918', '327940', '330259', '330260', '330262', '335566', '335569', '360264']
    nosy_count = 11.0
    nosy_names = ['barry', 'ronaldoussoren', 'vstinner', 'lebigot', 'ned.deily', 'zach.ware', 'matrixise', 'Mali Akmanalp', 'Ian Carroll', 'Harry Li', 'miss-islington']
    pr_nums = ['1705', '9936', '9937', '9938', '10657', '10658', '10659']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24658'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    Activity

    lebigot

    lebigot commented on Jul 18, 2015

    @lebigot
    MannequinAuthor

    On OS X, the Homebrew and MacPorts versions of Python 3.4.3 raise an exception when writing a 4 GB bytearray:

    >>> open('/dev/null', 'wb').write(bytearray(2**31-1))
    2147483647
    
    >>> open('/dev/null', 'wb').write(bytearray(2**31))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument

    This has an impact on pickle, in particular (http://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb).

    added
    interpreter-core(Objects, Python, Grammar, and Parser dirs)
    type-bugAn unexpected behavior, bug, or error
    on Jul 18, 2015
    lebigot

    lebigot commented on Jul 18, 2015

    @lebigot
    MannequinAuthor

    PS: I should have written "2 GB" bytearray (so this looks like a signed 32 bit integer issue).

    changed the title [-]open().write() fails on 4 GB+ data (OS X)[/-] [+]open().write() fails on 2 GB+ data (OS X)[/+] on Jul 18, 2015
    ronaldoussoren

    ronaldoussoren commented on Jul 20, 2015

    @ronaldoussoren
    Contributor

    This is likely a platform bug, it fails with os.write as well. Interestingly enough file.write works fine on Python 2.7 (which uses stdio), that appearently works around this kernel misfeature.

    A possible partial workaround is recognise this error in the implementation of os.write and then perform a partial write. Problem is: while write(2) is documented as possibly writing less data than expected most users writing to normal files (as opposed to sockets) probably don’t expect that behavior. On the other hand, os.write already limits writes to INT_MAX on Windows (see _Py_write in Python/fileutils.c)

    Because of this I’m in favour of adding a simular workaround on OSX (and can provide a patch).

    BTW. the manpage for write says that writev(2) might fail with EINVAL:

     [EINVAL]           The sum of the iov_len values in the iov array over-
                        flows a 32-bit integer.
    

    I wouldn’t be surprised if write(2) is implemented using writev(2) and that this explains the problem.

    On 18 Jul 2015, at 06:05, Serhiy Storchaka <report@bugs.python.org> wrote:

    Changes by Serhiy Storchaka <storchaka@gmail.com>:

    ----------
    components: +Extension Modules, IO -Interpreter Core
    nosy: +haypo, ned.deily, ronaldoussoren


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24658\>


    ronaldoussoren

    ronaldoussoren commented on Jul 20, 2015

    @ronaldoussoren
    Contributor

    The attached patch is a first stab at a workaround. It will unconditionally limit the write size in os.write to INT_MAX on OSX.

    I haven't tested yet if this actually fixes the problem mentioned on stack overflow.

    lebigot

    lebigot commented on Jul 20, 2015

    @lebigot
    MannequinAuthor

    Thank you for looking into this, Ronald.

    What does your patch do, exactly? does it only limit the returned byte count, or does it really limit the size of the data written by truncating it?

    In any case, it would be very useful to have a warning from the Python interpreter. If the data is truncated, I would even prefer an explicit exception (e.g. "data too big for this platform (>= 2 GB)"), along with an explicit mention of it in the documentation. What do you think?

    ronaldoussoren

    ronaldoussoren commented on Jul 20, 2015

    @ronaldoussoren
    Contributor

    The patch limits os.write to writing at most INT_MAX bytes on OSX. Buffered I/O using open("/some/file", "wb") should still write all data (at least according to the limited tests I've done so far).

    The same limitation is already present on Windows.

    And as I wrote before: os.write may accoding to the manpage for write(2) already write less bytes than requested.

    I'm -1 on using an explicit exception or printing a warning about this.

    lebigot

    lebigot commented on Jul 20, 2015

    @lebigot
    MannequinAuthor

    I see, thanks.

    This sounds good to me too: no need for a warning or exception, indeed, since file.write() should work and the behavior of os.write() is documented.

    vstinner

    vstinner commented on Jul 20, 2015

    @vstinner
    Member

    The Windows limit to INT_MAX is one many functions:

    • os.write()
    • io.FileIO.write()
    • hum, maybe other, I don't remember

    In the default branch, there is now _Py_write(), so only one place should be fixed.

    See the issue bpo-11395 which fixed the bug on Windows.

    If it's a bug, it should be fixed on Python 2.7, 3.4, 3.5 and default branches.

    ronaldoussoren

    ronaldoussoren commented on Jul 20, 2015

    @ronaldoussoren
    Contributor

    The patch I attached earlier is for the default branch. More work is needed for the other active branches.

    15 remaining items

    vstinner

    vstinner commented on Oct 17, 2018

    @vstinner
    Member

    New changeset 74a8b6e by Victor Stinner (Stéphane Wirtel) in branch 'master':
    bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
    74a8b6e

    vstinner

    vstinner commented on Oct 17, 2018

    @vstinner
    Member

    New changeset a5ebc20 by Victor Stinner (Stéphane Wirtel) in branch '3.6':
    [3.6] bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705) (GH-9937)
    a5ebc20

    miss-islington

    miss-islington commented on Oct 18, 2018

    @miss-islington
    Contributor

    New changeset 178d1c0 by Miss Islington (bot) in branch '3.7':
    bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
    178d1c0

    vstinner

    vstinner commented on Nov 22, 2018

    @vstinner
    Member

    New changeset 9a0d7a7 by Victor Stinner in branch 'master':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    9a0d7a7

    miss-islington

    miss-islington commented on Nov 22, 2018

    @miss-islington
    Contributor

    New changeset 18f3327 by Miss Islington (bot) in branch '3.7':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    18f3327

    miss-islington

    miss-islington commented on Nov 22, 2018

    @miss-islington
    Contributor

    New changeset 0c15e50 by Miss Islington (bot) in branch '3.6':
    bpo-24658: os.read() reuses _PY_READ_MAX (GH-10657)
    0c15e50

    warsaw

    warsaw commented on Feb 14, 2019

    @warsaw
    Member

    Nosying myself since I just landed here based on an internal $work bug report. We're seeing it with reads. I'll try to set aside some work time to review the PRs.

    changed the title [-]open().write() fails on 2 GB+ data (OS X)[/-] [+]open().write() and .read() fails on 2 GB+ data (OS X)[/+] on Feb 14, 2019
    matrixise

    matrixise commented on Feb 14, 2019

    @matrixise
    Member

    Hi @barry

    normally this issue is fixed for 3.x but I need to finish my PR for 2.7.

    I think to fix for 2.7 in the next weeks.

    removed their assignment
    on Feb 16, 2019
    zware

    zware commented on Jan 19, 2020

    @zware
    Member

    Since 3.x is fixed and 2.7 has reached EOL, I'm closing the issue. Thanks for getting it fixed in 3.x, Stephane and Victor!

    transferred this issue fromon Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Metadata

    Metadata

    Assignees

    No one assigned

      Labels

      Projects

      No projects

      Milestone

      No milestone

      Relationships

      None yet

        Development

        No branches or pull requests

          Participants

          @matrixise@vstinner@warsaw@ronaldoussoren@serhiy-storchaka

          Issue actions

            open().write() and .read() fails on 2 GB+ data (OS X) · Issue #68846 · python/cpython