Skip to content

[question] Can't find page breaks in an existing document #823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
giuliohome opened this issue May 9, 2020 · 3 comments
Closed

[question] Can't find page breaks in an existing document #823

giuliohome opened this issue May 9, 2020 · 3 comments

Comments

@giuliohome
Copy link

giuliohome commented May 9, 2020

I've tried to open an existing docx with the following code:

from docx import Document
document = Document('Testme.docx')
sections = document.sections
for section in sections:
    print(section.start_type)
paragraphs = document.paragraphs
for paragraph in paragraphs:
    for run in paragraph.runs:
        print(run.text)
        if hasattr(run, 'breaks'):
            for br in run.breaks:
                print(br.type.__name__)

The document has 3 pages (and here I mean 2 manual page breaks) but I can only see the initial NEW_PAGE (2) from the output.
Which is the correct show to find the page breaks?

@giuliohome
Copy link
Author

I've found this answer.
It does the trick. Is it the correct way?

@giuliohome
Copy link
Author

giuliohome commented May 10, 2020

Notice that under the Breaks documentation in readthedoc, under Candidate protocol – run.add_break()
it loooks like one could do

>>> run.breaks
[<docx.text.Break object at 0x10a7c4f50>]

while instead it is not an attribute, see the real effect:

>>> run.breaks
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Run' object has no attribute 'breaks'

Anyway, I confirm that the solution for me has been the stackoverflow answer linked above, inspecting the .r.xml of each run.

@scanny
Copy link
Contributor

scanny commented Dec 29, 2020

@giuliohome didn't mean to be abrupt, just clearing out resolved issues and this one hasn't had any action for several months. Your solution looks fine and looks like it works for you. The document you refer to is a design document and may contain ideas for features that have not yet been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants