feature: Paragraph.delete() #33

scanny · 2014-04-03T19:13:14Z

In order to modify an existing document
As a developer using python-pptx
I need a way to delete a paragraph

Need to account for the possibility the paragraph contains the last reference to a relationship, such as might a hyperlink or inline picture.

jeffreinhart · 2015-03-06T23:32:40Z

Would like to see this available for python-docx. It would be very useful in populating a document full of placeholders given that it would allow the placeholder paragraph to be deleted if the value to populate the placeholder is None.

scanny · 2015-03-07T00:37:16Z

You should be able to do this for the simple case with this code:

def delete_paragraph(paragraph):
    p = paragraph._element
    p.getparent().remove(p)
    p._p = p._element = None

Any subsequent access to the "deleted" paragraph object will raise AttributeError, so you should be careful not to keep the reference hanging around, including as a member of a stored value of Document.paragraphs.

The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.

But if you know for sure none of those are present, these few lines should get the job done.

jeffreinhart · 2015-03-08T17:53:36Z

That works! Thank you!!

scanny · 2015-03-09T04:29:28Z

Glad it worked out Jeff :)

waynerth · 2015-03-22T21:07:50Z

Steve, thanks so much. I was having trouble after merging cells in a table which left extra empty paragraphs. Used your function and worked great, which let the cells shrink back by getting rid of empty space. Used it in a nested loop as follows:

    delete_paragraph(table.rows[rx].cells[cx].paragraphs[-1])

thanks - wayne (retired HW designer, having fun with python while hopefully helping out the non-profit I volunteer for)

zooyf · 2019-11-08T08:19:55Z

Hi @scanny
Why not implement the feature and close the issue?

zooyf · 2019-11-08T08:30:11Z

You should be able to do this for the simple case with this code:
def delete_paragraph(paragraph):
    p = paragraph._element
    p.getparent().remove(p)
    p._p = p._element = None
Any subsequent access to the "deleted" paragraph object will raise AttributeError, so you should be careful not to keep the reference hanging around, including as a member of a stored value of Document.paragraphs.

The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.

But if you know for sure none of those are present, these few lines should get the job done.

What's the difference compared to this solution?

def delete_element(el):
    el._element.getparent().remove(el._element)

scanny · 2019-11-08T16:19:06Z

Well, in fact, on review, there is an error in that code. The last line should be:

paragraph._p = paragraph._element = None

But as for the rest of it:

delete_element and el are misleading name choices in my view. A Paragraph object is an element-proxy object which composes an element object; it is not itself an element. So in general we reserve the name element and its derivatives for the XML element objects themselves.
The core code is essentially the first two lines combined into one, so that's a matter of taste; the operation is the same. I would personally probably choose something like yours in my own code, but for someone learning, sometimes breaking things down more step-by-step eases figuring out what the underlying process is, like first get the element from the proxy, then do this thing with the element, etc.
The (previously incorrect) last line is setting the _p and _element attributes of the "host" Paragraph proxy object to None so the now-deleted (or actually only orphaned) element is not accidentally accessed in later code and also is freed up for garbage collection. Removing an element in lxml does not delete it, it only breaks its relationship with its parent. So the original Paragraph object could still make changes to it and the user might puzzle for quite a while to figure out why their code wasn't working but wasn't raising an error. So you can think of it as preventative medicine.

abubelinha · 2021-06-03T17:40:06Z

Thanks for this @scanny
I suggest you to edit the original previously incorrect last line, because that's the answer which is still linked by you from Stackoverflow.

mrufsvold · 2021-11-17T19:58:45Z

Steve, thanks so much. I was having trouble after merging cells in a table which left extra empty paragraphs. Used your function and worked great, which let the cells shrink back by getting rid of empty space. Used it in a nested loop as follows:
    delete_paragraph(table.rows[rx].cells[cx].paragraphs[-1])
thanks - wayne (retired HW designer, having fun with python while hopefully helping out the non-profit I volunteer for)

I have this same problem. However, when I use the delete_paragraph function with the corrected last line, the resulting document throws an error when opened that reads "Word found unreadable content in document_name.docx. Do you want to recover the contents of this document?" Clicking yes works to open the document, but I'm trying to figure out why deleting the paragraphs is causing this problem.

I think it might be related to the fact that this paragraph exists in a merged cell, but it sounds like @waynerth didn't experience this problem.

Any thoughts?

Thanks for your work on this @scanny!

scanny · 2021-11-17T20:42:41Z

@mrufsvold each cell must contain at least one block item, so a paragraph or a table. If you get rid of all the paragraphs, that leaves the cell in an invalid state. You might want to delete paragraphs[1:] or something like that, just be sure there's at least one left.

mrufsvold · 2021-11-17T20:44:04Z

@scanny That makes complete sense! Thanks for your quick reply. I'll give that a shot when I get back to that project!

mrufsvold · 2021-11-17T21:06:01Z

It worked!

scanny · 2021-11-18T18:20:15Z

Glad you got it working @mrufsvold :)

abubelinha · 2021-12-08T09:25:42Z

The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.

@scanny Does that mean that if I delete a paragraph containing a link, my document will/might crash because the linked stuff is still kept/referenced somewhere else in the document ... or something alike?

scanny · 2021-12-08T21:43:39Z

It depends a little on what you mean by link, but deleting is not so much a problem in practice as copying is.

If you have a hyperlink, for example, in a paragraph, that hyperlink element in the XML contains a relationship reference (like "rId7") to a Relationship element in the .rels "file" associated with the part containing the paragraph (maybe the document-part most commonly). That Relationship element contains the URL of the hyperlink and that's the extent of the relationship (a so-called "external" relationship). If you delete the paragraph but don't delete the Relationship element in the .rels collection that Relationship element will hang around and be saved with the document. This actually shouldn't cause a problem and I don't believe by itself represents a file "corruption" that might give rise to a so-called "repair error" when opening the file.

If you have something "bigger", like say an image embedded in the paragraph (a so-called inline-shape), and you delete the paragraph without attending to the now-dangling relationship, then both the Relationship element in the .rels _as well as the Image-part it refers to will be retained in the document. That bloats the file a little but again, shouldn't cause a problem and may or may not give rise to a "repair-error" on opening the document. You'd have to experiment and behavior might vary by client, like maybe PowerPoint doesn't complain but LibreOffice does or vice-versa.

So deleting a paragraph is worth trying if you don't mind a little wasted space.

But if you copy a paragraph and don't re-establish the relationships (which may need to change "name", e.g. "rId7" -> "rId9") and also copy over target part(s) (e.g. the image in the example above) then that will definitely trigger a repair error on loading the document because Word can't find the image to render in that paragraph.

abubelinha · 2023-04-23T08:31:32Z

I think deleting is working for me, at least for the tests I made with many small controlled documents.

Now with a big document (where I do lots of things, not just deleting paragraphs) I am getting errors when opening it.
Word gives the chance to correct them and save the document, but I wonder if I have any chances of finding out the error source:

Do you know of any way to make Word report where the "unreadable content" is?
I tried opc-diag but the output is so huge I can't really see anything there (BTW, no diff colours, just black and white interface: probably not designed for my Windows 7 machine?)
Reading again your last comment, I wonder what you exactly mean with copying a paragraph. Could you post a simple code example? (maybe I am unconsciously doing it since I reuse quite a few functions made by some other people).

Thanks @scanny

star-starry-sea · 2023-08-08T11:47:54Z

Wow, thank you. It works!!!

scanny modified the milestones: v0.6.0, 0.6.2 May 1, 2014

scanny modified the milestones: v0.6.0 Cursors, 0.6.2 May 13, 2014

scanny added the text label Jun 17, 2014

scanny changed the title ~~feature: delete_paragraph()~~ feature: Paragraph.delete() Feb 13, 2015

scanny removed this from the Cursors / Insert items milestone Apr 9, 2016

perfectstorm88 mentioned this issue Oct 12, 2020

python小工具：从数据库自动导出表结构到docx(数据库验收文档) perfectstorm88/bblog#3

Open

eskildbr mentioned this issue Jun 11, 2021

Footer structure #957

Closed

prabal01pathak mentioned this issue Jun 24, 2021

How can I remove last page from a word document #966

Open

abubelinha mentioned this issue Mar 26, 2023

How to delete table #663

Open

abubelinha mentioned this issue Apr 2, 2023

stile from other document import doesnt' work #88

Closed

charlie2clarke mentioned this issue May 21, 2024

Read text inside <w:sdt> tag #155

Open

feature: Paragraph.delete() #33

feature: Paragraph.delete() #33

Comments

scanny commented Apr 3, 2014

jeffreinhart commented Mar 6, 2015

Uh oh!

scanny commented Mar 7, 2015

Uh oh!

jeffreinhart commented Mar 8, 2015

Uh oh!

scanny commented Mar 9, 2015

Uh oh!

waynerth commented Mar 22, 2015

Uh oh!

zooyf commented Nov 8, 2019

Uh oh!

zooyf commented Nov 8, 2019

Uh oh!

scanny commented Nov 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abubelinha commented Jun 3, 2021

Uh oh!

mrufsvold commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Nov 17, 2021

Uh oh!

mrufsvold commented Nov 17, 2021

Uh oh!

mrufsvold commented Nov 17, 2021

Uh oh!

scanny commented Nov 18, 2021

Uh oh!

abubelinha commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scanny commented Dec 8, 2021

Uh oh!

abubelinha commented Apr 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

star-starry-sea commented Aug 8, 2023

Uh oh!

scanny commented Nov 8, 2019 •

edited

Loading

mrufsvold commented Nov 17, 2021 •

edited

Loading

abubelinha commented Dec 8, 2021 •

edited

Loading

abubelinha commented Apr 23, 2023 •

edited

Loading