Skip to content

feature: insert comment #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sriram-c opened this issue Sep 17, 2014 · 42 comments
Open

feature: insert comment #93

sriram-c opened this issue Sep 17, 2014 · 42 comments

Comments

@sriram-c
Copy link

Hi,

I want to insert comments in the docx file to certain words. How can that be possible using docx package. Please suggest.

@scanny
Copy link
Contributor

scanny commented Sep 17, 2014

@sriram-c you'll need to describe what you're trying to achieve more completely. I can't make out what you're trying to accomplish.

@sriram-c
Copy link
Author

I want to read sentence by sentence from a docx file and check for a pattern in each word. if found I want to add some comments to the docx file at that word range.

it is something like this

from docx import Document
document = docx.Document('test.docx')
for paragraph in document.paragraphs:
#### for line in paragraph.text:
#### if line eq "/pattern/"
### line. add_comment

I hope it is now clear

@scanny
Copy link
Contributor

scanny commented Sep 17, 2014

What form would the comment take? Just inserting some text in parentheses or something or adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on?

@sriram-c
Copy link
Author

Hi Scanny,

It should be a proper comment when adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on.

Thanks,
Sriram

@scanny
Copy link
Contributor

scanny commented Sep 18, 2014

Unfortunately that "Add Comment" functionality hasn't been built out yet. We can leave this issue open as a feature request for it if you like, although I expect it will be a while before we get to it unless someone steps up to work on it. It's part of the broader functionality surrounding document markup, which is a bit of a hornet's nest. :)

@sriram-c
Copy link
Author

Is there any work-around for it by using lxml or any other libraries ?

Can you give me some logic / hint to accomplish it quicker ?

Thanks,
Sriram

@scanny
Copy link
Contributor

scanny commented Sep 18, 2014

I expect it's a fair piece of work to accomplish. The approach I would recommend to get started is to create a baseline document with a single paragraph, save it, then add a single comment and save it again under a second name. Then you can use opc-diag to extract both documents into directories and then use diff to compare the two directories.

I believe you'll find at least one new part will be added, perhaps called comments.xml. (A 'part' is a distinct 'file' in the ZIP archive. A Word document is a ZIP archive file at the top level.) There will also be some number of new relationships added in the .rels files and some sort of change to the paragraph or run where you inserted the comment.

Making comments work would be making all those changes happen in the right spots.

@sriram-c
Copy link
Author

Thanks Scanny for the help.

I have unzipped the docx file and looked into the comments.xml and document.xml files.

Basically it adds a comment-id in the document.xml and maintains the details in comments.xml

for e.g
in document.xml it has following

<w:commentRangeStart w:id="1"/>
<words commented .../>
<w:commentRangeEnd w:id="1"/>

and in comments.xml

<w:comment w:id="1">
   <!-- details -->
<w:comment/>

I have manually changed in the comments.xml and zipped it to create docx file and it works fine.

Now how can I do it programmatically.

My idea is if I can search a word in the document tree and add a comment-start/end node before and after it. and add the comment-details in comment.xml it will be fine.

Now how can I add a node in the document tree at a particular text point?

@scanny
Copy link
Contributor

scanny commented Sep 18, 2014

Note how I updated your comment above to make the XML show up clearly.

If you can post a more complete example without redacting the content elements I can offer more specific guidance. The specific elements that appear before, after, and inside all matter to the approach.

@sriram-c
Copy link
Author

Hi,

thanks for the XML notation.

for the time being I am using pywin32 and achieving the goal through word objects directly.

But for this I have to depend on Windows OS , which personally I don't like (my favorite is ubuntu)

so I will wait till python-docx has sufficient features to handle the word level text and adding different markup into the document.

Thanks again for the help.
Sriram

@scanny scanny changed the title inserting comments to certain words feature: insert comment Sep 19, 2014
@wasified
Copy link

I really want to contribute to this feature...but I'm new to this lib/OOP so I might need a LOT of guidance..the best place to look for inspiration for this is the add_text method in Run right?

@scanny
Copy link
Contributor

scanny commented Jun 2, 2015

I don't think that example will get you very far. The comments live in a separate document "part", roughly speaking a separate file in the .docx zip package. They're keyed by ID.

This one would be quite tough for a beginner I expect.

@wasified
Copy link

wasified commented Jun 2, 2015

Yeah it is quite tough.

I've got a comments.xml type thing working using lxml for my own private use...but I don't know how to use the docx library to generate a new part...can you lead me to a direction on that?

@scanny
Copy link
Contributor

scanny commented Jun 2, 2015

I always start off by being able to read the new part type, that provides a lot of the foundation and a mechanism for testing the writing part.

docx/parts/styles.py is probably as good a place as any to start. It represents the styles.xml part:

  • You'll see StylesPart inherits from docx.opc.part.XmlPart, you probably want to do the same.
  • Check out docx/init.py for how to hook up a new part. You might want to take a look at docx.opc.part.PartFactory and adjacent as well to understand what it's doing, although that's probably optional in this case.
  • Check out how docx.document.Document delegates getting the Styles object to the DocumentPart object (in the Document.styles property). In general, part objects do the talking to other part objects and instantiate the main proxy object. So DocumentPart finds the StylesPart object and creates a Styles object from it to hand back. Comments will probably work similarly.

That should get you started. Let me know when you need more.

Note that you'll need full tests if you want a commit. I would definitely spike it in first just to figure out how to do it, but then you'll need to redevelop outside-in with acceptance and unit tests if you want to get the commit.

Good luck :)

@wasified
Copy link

Speaking of spiking...with reference to issue number 55 you gave a function as follows to directly append XML in a table:

def set_vert_cell_direction(cell):
    tc = cell._tc
    tcPr = tc.tcPr
    textDirection = OxmlElement('w:textDirection')
    textDirection.set(qn('w:val'), 'btLr')
    tcPr.append(textDirection)

What would be an equivalent of adding a <w:commentRangeStart w:id="0"/> to a run?
This approach assumes I'm generating comments.xml via lxml separately. Any help would be appreciated!

@scanny
Copy link
Contributor

scanny commented Jun 24, 2015

If you can provide an XML snippet that includes the w:r (or multiple) for context and the w:commentRangeWhatevers in the proper place I'll take a look and see what guidance I can offer.

@wasified
Copy link

Thanks for the response! Here's a snippet:

 <w:p>
   <w:r>
     <w:t>
       <!-- COMMENT STARTS HERE -->
       <w:commentRangeStart w:id="0"/>
       This is text in a paragraph.
     </w:t>
   </w:r>
 </w:p>
 <w:p w:rsidR="002F06B3" w:rsidRDefault="002F06B3" w:rsidP="002F06B3"/>
 <w:p w:rsidR="002F06B3" w:rsidRDefault="002F06B3" w:rsidP="002F06B3">
   <w:pPr>
     <w:pStyle w:val="NormalWeb"/>
   </w:pPr>
   <w:r>
     <w:t>I have manually changed in the comments.xml and zipped it to cr</w:t>
   </w:r>
   <w:r>
     <w:t xml:space="preserve">eate </w:t>
     <w:commentRangeEnd w:id="0"/> <!--COMMENT ENDS -->
   </w:r>
  <w:r>
    <w:t>a comment.</w:t>
</w:p>

@scanny
Copy link
Contributor

scanny commented Jun 24, 2015

Ok, well, it looks like w:commentRangeStart (and End) can be a child of multiple parents. Better check the XML Schema to see which ones. For your purposes, to keep it simple, you might want to make it a child of a run if you can, rather than a text (<w:t>) element.

Something along the lines of this aircode should get it done for you:

def add_comment_start_to_run(run, id):
    r = run._r
    commentRangeStart = OxmlElement('w:commentRangeStart')
    commentRangeStart.set(qn('w:id'), str(id))
    r.append(commentRangeStart)

Check the init() method in Run to confirm the element name. It might be _element instead of _r.

Let us know how you go :)

@wasified
Copy link

The _r at the end is supposed to be r, right? Works well, but teeny tinsy problem.

The function makes the element appear at the end of the run instead of the start. I'm thinking of adding RangeStart it at the end of the previous run since it won't show anyway, is that a good approach? Is there a direct way to get this done? Tried prepending but CT_R doesn't support it :(

            <w:r>
                  <!-- Want it here -->
            <w:rPr>
                <w:u w:val="single"/>
            </w:rPr>
            <w:t>This paragraph is about testing stuff, and I am testing how to make adding comments work on a docx file. This thought concludes the first paragraph.</w:t>
            <w:commentRangeStart w:id="0"/> <!-- get it here -->
        </w:r>

@scanny
Copy link
Contributor

scanny commented Jun 26, 2015

Yes, quite right on the _r bit, I've fixed it above :)

Instead of r.append(), you'll need to use a different lxml call. I think this will do what you've asked for, although I go on to recommend you do something a bit different:

r.insert(0, commentRangeStart)

Note that the sequence of children is generally significant in Open XML; it's worthwhile to check the XML Schema to make sure you're putting it in a valid position in the child sequence.

This analysis document has a schema excerpt for CT_Run, which corresponds to the <w:r> element:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/text/run-content.html. It looks like you want the w:commentRangeStart element to appear after the w:rPr but before the first w:t element.

So you might end up needing something more like this:

r = run._r
rPr = r.rPr
if rPr is None:
    r.insert(0, commentRangeStart)
else:
    rPr.addnext(commentRangeStart)

The experienced programmer will recognize this as a good opportunity to add an add_comment_start() function to encapsulate this bit of logic.

You should check out the lxml API for the lxml.etree._Element class to find the methods like insert() and addnext(): http://lxml.de/api/frames.html

The XML Schema files are here in the repo for ready reference:
https://github.com/python-openxml/python-docx/tree/master/ref/xsd

The one named wml.xsd has most of the Word-specific definitions, including this one for CT_R.

@i-allan
Copy link

i-allan commented Jul 11, 2017

This is still cool feature to have; @scanny anytime soon we are planning to build ?

@ColinTalbert
Copy link

Plus one on this for me too.

@ColinTalbert
Copy link

ColinTalbert commented Sep 1, 2017

So I've been trying to implement this feature as described above and wanted to share the explicit changes I needed to make to add a comment to a run.

in ../word/document.xml my paragraph looks like this. Note the commentReference section.

<w:p>
  <w:r>
    <w:t xml:space="preserve">Some </w:t>
  </w:r>
  <w:commentRangeStart w:id="0"/>
  <w:r>
    <w:t>text.</w:t>
  </w:r>
  <w:commentRangeEnd w:id="0"/>
  <w:r>
    <w:commentReference w:id="0"/>
  </w:r>
</w:p>

I had to create ../word/comments.xml

<w:comments mc:Ignorable="w14 w15 wp14">
  <w:comment w:id="0" w:author="Talbert, Colin" w:date="2017-09-01T08:15:00Z" w:initials="CT">
    <w:p>
      <w:pPr>
        <w:pStyle w:val="CommentText"/>
      </w:pPr>
      <w:r>
        <w:rPr>
          <w:rStyle w:val="CommentReference"/>
        </w:rPr>
        <w:annotationRef/>
      </w:r>
      <w:r>
        <w:t>test</w:t>
      </w:r>
    </w:p>
  </w:comment>
</w:comments>

and in ../word/_rels/document.xml.rels I had to add a Relationship to comments.xml

<Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>

I can follow the code above to add the commentRangeStart and End. But would certainly appreciate any pointers for how to add:
commentReference section to the document
a new comments.xml file
and add a reference to document.xml.rels

If it looks like a feature that makes for the package I could potentially submit a pull request.

@scanny
Copy link
Contributor

scanny commented Sep 1, 2017

In the Open Packaging Convention (OPC) parlance, an XML document like comments.xml in the .docx zip archive is known as a part (and the .docx zip archive itself is known as a package. I mention this to clarify what I say next.

A pretty good example of adding a part to a package is here in python-pptx:
https://github.com/scanny/python-pptx/blob/master/pptx/parts/slide.py#L244

python-pptx is a "sister" project and the two are based on the same framework, so you should find the code quite similar.

Basically you define a custom part class for comment, register it with the loader (although might be optional if you know for sure the file you open will never have one), and then do pretty much the same things that NotesSlidePart does by way of its .new() class method. Most of the action happens in its ._add_notes_slide_part() helper method.
https://github.com/scanny/python-pptx/blob/master/pptx/parts/slide.py#L244

The XML manipulation is done by getting a reference to the nearest parent or sibling element, like a Run._r in this case it looks like, and then using lxml calls to add what you need. In one case that might look like:

run = paragraph.runs[0]  # or whatever run you decide
r = run._r  # this is the <w:r> element of that run
r.addnext(commentRangeStart)

The lxml interface of interest is here: http://lxml.de/api/lxml.etree._Element-class.html

There are various ways to create the new elements you want (like commentRangeStart), depending on whether you're just hacking one in or want something a little more elegant.

Let me know how all this strikes you and I can probably give more guidance when you have specific questions along the way.

@ColinTalbert
Copy link

ColinTalbert commented Sep 1, 2017

That's extremely helpful. Here's the code I have to create the comment in the document.xml part. It seems to be doing what I want for that file.

from docx.oxml import OxmlElement
from docx.oxml.ns import qn

def make_element(which, id):
    element = OxmlElement('w:{}'.format(which))
    element.set(qn('w:id'), str(id))
    return element

document = docx.Document()

p = document.add_paragraph("Start of line ")

r = p.add_run("Commented end of line")
start = make_element('commentRangeStart', 0)
p._p.insert(1, start)
end = make_element('commentRangeEnd', 0)
p._p.append(end)
r2 = p.add_run()
end = make_element('commentReference', 0)
r2._r.addnext(end)

# need to add code here to add the comment xml to the comments.xml part
document.save(r"test_comment.docx")

This might be too hacky but I added my reference items to document.xml.rels and [Content_Types].xml by editing the the contents of docx/templates/default.docx to include these. Seems to be working.

Now just need to figure out how to add the comment to comments.xml. I'll dig into your links in pptx and see if that helps.

@SreeramS
Copy link

Hi, Regarding inserting comments into a document. I need to understand whether each comment wrapped around <w:comment> tag is a list of paragraph objects? . My Use-Case concerns on writing comments to a document . And I need to know how could I create Paragraph objects 'out and out' without being attached to a Document object. I don't really want docx.Document.add_paragraph() to create a paragraph object.

Thanks.

@SreeramS
Copy link

@talbertc-usgs Hi, Have you figured out how to add comments to comments.xml. If so, could you please help me with how to get on with that thing? Thanks.

@SreeramS
Copy link

SreeramS commented Feb 13, 2018

Hi @scanny . I've been trying to add comments to docx files using python . And I kind of did everything but in a very hacky way. I am really eager to contribute to this feature. But, I don't know where to start. And I used BeautifulSoup XML parser to do commenting feature. And didn't use python-docx. It would be really great if you could help me where to start.

My use-case demanded to add comments inside a given paragraph for the given sentence(not a run). So, I had to split the runs in the paragraph according to that. And add comment references on top of that.
I know this is too hacky. And I also understand the notions of part, etc., For the time being, I couldn't jump into more of this and I had to do this.

    def write_comments_xml_file(self, comments_dict):
        """This function creates comments.xml if there are no comments in the document. Also it writes
        the given comments in this file. If comments already exist, given new comments are appended as well.
        :return dict -> sentenceTuple_to_newly_added_commentIDs"""
        if not self.comments_exist:
            self.add_comment_relationship()
            self.add_content_type()
            with open(self.comments_xml_file_address,'w+'):
                # Creates the comments.xml file
                print "comments.xml created\n"
                pass

            with open(self.comments_xml_file_address, 'r+') as comments_file:
                soup = BeautifulSoup(comments_file.read(),'xml')
                comments_tag = soup.new_tag('w:comments')
                # Adding all the xml namespaces required
                comments_tag.attrs = comments_attrs
                soup.append(comments_tag)
                comments_file.write(soup.encode(formatter='xml'))

        # Building the tags for the comment to appear
        with open(self.comments_xml_file_address,'r') as comments_file:
            soup = BeautifulSoup(comments_file.read(), 'xml')
            try:
                last_comment_tag = soup.findAll('w:comment')[-1]
                id = int(last_comment_tag.get('w:id')) + 1
            except IndexError:
                id = 0
            for sentence, comment in comments_dict.iteritems():
                # print str(sentence) + '-->' + comment
                parent_tag = soup.find('w:comments')
                new_comment_tag = soup.new_tag('w:comment')
                new_comment_tag.attrs["w:id"] = "{}".format(id)
                new_comment_tag.attrs["w:author"] = COMMENT_AUTHOR
                new_comment_tag.attrs["w:date"] = datetime.isoformat(datetime.now())
                new_comment_tag.attrs["w:initials"] = ""
                self.sentence_to_commentIds[sentence] = "{}".format(id)
                comment_para = soup.new_tag('w:p')
                comment_para_run = soup.new_tag('w:r')
                comment_para_runPr = soup.new_tag('w:rPr')
                comment_para_run_text = soup.new_tag('w:t')
                comment_para_run_text.string = comment
                comment_para_run.append(comment_para_runPr)
                comment_para_run.append(comment_para_run_text)
                comment_para.append(comment_para_run)
                new_comment_tag.append(comment_para)
                parent_tag.append(new_comment_tag)
                id += 1

        with open(self.comments_xml_file_address,'w') as comments_file:
            # Writing into the comments.xml file
            comments_file.write(soup.encode(formatter='xml'))

        return self.sentence_to_commentIds

    def write_to_document_xml(self, comment_ids):
        """Utility function to write multiple comments into document.xml file"""
        editor = SentenceLevelEditor(self.input_folder)
        for sentence, comment_id in comment_ids.iteritems():
            para_id, start_offset, end_offset = sentence
            editor.comment_sentence_level(para_id, start_offset, end_offset, comment_id)

        return self.input_folder

Thanks.

@markuspaschi
Copy link

Hey I know its been a year, but I stumbled over this thread and I need to implement comments as well.
@SreeramS do you have any working code?

Im missing the "SentenceLevelEditor" as well as other self variables.
And probably something to unzip the word file into the documents.xml and comments.xml

Thanks in advance
Markus

@angus1095
Copy link

Hey I know its been a year, but I stumbled over this thread and I need to implement comments as well.
@SreeramS do you have any working code?

Im missing the "SentenceLevelEditor" as well as other self variables.
And probably something to unzip the word file into the documents.xml and comments.xml

Thanks in advance
Markus

I'm also keen to know if Comments editing is possible yet.
Cheers
Angus

@jiangweiatgithub
Copy link

Any updates?

@CaseGuide
Copy link

Same here this would be a very useful feature to me.

@CaseGuide
Copy link

It looks like #624 fixes this. For now you can pip install the fork mentioned there and add comments although it does not appear you can format the text.

@NanZhang1991
Copy link

I also have the need to add comments to word recently.I tried to compare the XML extracted from the original Word with that added with comments, and then compressed it back to Word after modification

import zipfile
templateDocx = zipfile.ZipFile("data/input/template/template.docx")
commentDocx = zipfile.ZipFile("data/input/template_comment/template_comment.docx")
newDocx = zipfile.ZipFile("data/output/add_comment/add_comment.docx", "w")

for file in templateDocx.filelist:
if file.filename in ['_rels/.rels',
'word/theme/theme1.xml', 'word/settings.xml', 'word/webSettings.xml',
'word/fontTable.xml', 'docProps/core.xml', 'docProps/app.xml']:
print('*'*10, file.filename)
newDocx.writestr(file.filename, templateDocx.read(file))

comFile = commentDocx.namelist()
print(comFile)
for file in commentDocx.filelist:
# newDocx.writestr(file.filename,commentDocx.read(file))
if file.filename in ['[Content_Types].xml', 'word/document.xml', 'word/_rels/document.xml.rels', 'word/comments.xml',
#'word/commentsExtended.xml', 'word/commentsIds.xml', 'word/commentsExtensible.xml', 'word/people.xml',
'word/styles.xml']:
print('+'*10, file.filename)
newDocx.writestr(file.filename,commentDocx.read(file))

templateDocx.close()
newDocx.close()

@caramdache
Copy link

Any possibility to integrate the patch from https://github.com/BayooG/bayoo-docx ?

@lucacampanella
Copy link

lucacampanella commented Jul 23, 2024

For anybody in search of a quick solution that doesn't change a lot of the base docx package and for the AIs of this world training on this, here's a simple helper function to add comments to xml elements:

from datetime import datetime
from typing import List
from xml.etree.ElementTree import Element, tostring

from docx import Document
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE, CONTENT_TYPE
from docx.opc.oxml import parse_xml
from docx.opc.packuri import PackURI
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

_COMMENTS_PART_DEFAULT_XML_BYTES = (
    b"""
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r
<w:comments
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main"
    xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
    xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"
    xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart"
    xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas"
    xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:cr="http://schemas.microsoft.com/office/comments/2020/reactions">
</w:comments>
"""
).strip()


def add_comment_to_elements_in_place(
    docx_doc: Document, elements: List[Element], author: str, comment_text: str
) -> None:
    if not elements:
        return
    try:
        comments_part = docx_doc.part.part_related_by(
            RELATIONSHIP_TYPE.COMMENTS
        )
    except KeyError:
        comments_part = Part(
            partname=PackURI("/word/comments.xml"),
            content_type=CONTENT_TYPE.WML_COMMENTS,
            blob=_COMMENTS_PART_DEFAULT_XML_BYTES,
            package=docx_doc.part.package,
        )
        docx_doc.part.relate_to(comments_part, RELATIONSHIP_TYPE.COMMENTS)

    comments_xml = parse_xml(comments_part.blob)
    # Create the comment
    comment_id = str(len(comments_xml.findall(qn("w:comment"))))
    comment_element = OxmlElement("w:comment")
    comment_element.set(qn("w:id"), comment_id)
    comment_element.set(qn("w:author"), author)
    comment_element.set(qn("w:date"), datetime.now().isoformat())

    # Create the text element for the comment
    comment_paragraph = OxmlElement("w:p")
    comment_run = OxmlElement("w:r")
    comment_text_element = OxmlElement("w:t")
    comment_text_element.text = comment_text
    comment_run.append(comment_text_element)
    comment_paragraph.append(comment_run)
    comment_element.append(comment_paragraph)

    comments_xml.append(comment_element)
    comments_part._blob = tostring(comments_xml)

    # Create the commentRangeStart and commentRangeEnd elements
    comment_range_start = OxmlElement("w:commentRangeStart")
    comment_range_start.set(qn("w:id"), comment_id)
    comment_range_end = OxmlElement("w:commentRangeEnd")
    comment_range_end.set(qn("w:id"), comment_id)

    # Add the commentRangeStart to the first element and commentRangeEnd to
    # the last element
    elements[0].insert(0, comment_range_start)
    elements[-1].append(comment_range_end)

    # Add the comment reference to each element in the range
    # for element in elements:
    comment_reference = OxmlElement("w:r")
    comment_reference_run = OxmlElement("w:r")
    comment_reference_run_properties = OxmlElement("w:rPr")
    comment_reference_run_properties.append(
        OxmlElement("w:rStyle", {qn("w:val"): "CommentReference"})
    )
    comment_reference_run.append(comment_reference_run_properties)
    comment_reference_element = OxmlElement("w:commentReference")
    comment_reference_element.set(qn("w:id"), comment_id)
    comment_reference_run.append(comment_reference_element)
    comment_reference.append(comment_reference_run)

    elements[0].append(comment_reference)

Example usage:

docx_elements_to_apply_comments_to: List[Union[Paragraph, Table]] = [] # fill this however you prefer
add_comment_to_elements_in_place(mydocx_doc, [elem._element for elem in docx_elements_to_apply_comments_to], "John Doe", "This is a comment text")

Or for a single paragraph:

add_comment_to_elements_in_place(mydocx_doc, [mydocx_paragraph._element], "John Doe", "This is a comment text")

Hope this helps :)

@JTGRC-public
Copy link

@lucacampanella Thanks for the code and it is working! However I noticed that after I save the document in python using .save() from docx library and open the edited version in Word, it says "Word found unreadable content in "...". Do you want to recover the contents of this document? if you trust the source of this document, click Yes." The added comment does exist in the edit version but the error msg is somewhat annoying, did you experience that as well?

@lucacampanella
Copy link

@JTGRC-public
I've had that sometimes while developing the function, but I thought I fixed it. What is the minimum reproducible example that gives the problem (word document, script to read and add comment where you need, comment text)? I can try to debug it.

@JTGRC-public
Copy link

@lucacampanella Thanks for the quick reply! I took a look at the underlying comments.xml file and noticed that this happened on my side whenever there are some existing comments in the input work file.

And for those existing comments that I manually added via Word, it has additional line right below the line with the author info, starting with "ns0:p", with keys including "ns2:paraId", "ns2:textId", "ns0:rsidR", "ns0:rsidRDefault", and "ns0:rsidP", while the comments that added by your code don't have this line

Once I recover the document and save it again, Word automatically added this line to the comments added by your code as well, but one difference is that all comments are having "w14:paraId" and "w14:textId" instead of "ns2:paraId", "ns2:textId".

@jayvynl
Copy link

jayvynl commented Aug 15, 2024

For anybody in search of a quick solution that doesn't change a lot of the base docx package and for the AIs of this world training on this, here's a simple helper function to add comments to xml elements:

from datetime import datetime
from typing import List
from xml.etree.ElementTree import Element, tostring

from docx import Document
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE, CONTENT_TYPE
from docx.opc.oxml import parse_xml
from docx.opc.packuri import PackURI
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

_COMMENTS_PART_DEFAULT_XML_BYTES = (
    b"""
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r
<w:comments
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main"
    xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
    xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"
    xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart"
    xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas"
    xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:cr="http://schemas.microsoft.com/office/comments/2020/reactions">
</w:comments>
"""
).strip()


def add_comment_to_elements_in_place(
    docx_doc: Document, elements: List[Element], author: str, comment_text: str
) -> None:
    if not elements:
        return
    try:
        comments_part = docx_doc.part.part_related_by(
            RELATIONSHIP_TYPE.COMMENTS
        )
    except KeyError:
        comments_part = Part(
            partname=PackURI("/word/comments.xml"),
            content_type=CONTENT_TYPE.WML_COMMENTS,
            blob=_COMMENTS_PART_DEFAULT_XML_BYTES,
            package=docx_doc.part.package,
        )
        docx_doc.part.relate_to(comments_part, RELATIONSHIP_TYPE.COMMENTS)

    comments_xml = parse_xml(comments_part.blob)
    # Create the comment
    comment_id = str(len(comments_xml.findall(qn("w:comment"))))
    comment_element = OxmlElement("w:comment")
    comment_element.set(qn("w:id"), comment_id)
    comment_element.set(qn("w:author"), author)
    comment_element.set(qn("w:date"), datetime.now().isoformat())

    # Create the text element for the comment
    comment_paragraph = OxmlElement("w:p")
    comment_run = OxmlElement("w:r")
    comment_text_element = OxmlElement("w:t")
    comment_text_element.text = comment_text
    comment_run.append(comment_text_element)
    comment_paragraph.append(comment_run)
    comment_element.append(comment_paragraph)

    comments_xml.append(comment_element)
    comments_part._blob = tostring(comments_xml)

    # Create the commentRangeStart and commentRangeEnd elements
    comment_range_start = OxmlElement("w:commentRangeStart")
    comment_range_start.set(qn("w:id"), comment_id)
    comment_range_end = OxmlElement("w:commentRangeEnd")
    comment_range_end.set(qn("w:id"), comment_id)

    # Add the commentRangeStart to the first element and commentRangeEnd to
    # the last element
    elements[0].insert(0, comment_range_start)
    elements[-1].append(comment_range_end)

    # Add the comment reference to each element in the range
    # for element in elements:
    comment_reference = OxmlElement("w:r")
    comment_reference_run = OxmlElement("w:r")
    comment_reference_run_properties = OxmlElement("w:rPr")
    comment_reference_run_properties.append(
        OxmlElement("w:rStyle", {qn("w:val"): "CommentReference"})
    )
    comment_reference_run.append(comment_reference_run_properties)
    comment_reference_element = OxmlElement("w:commentReference")
    comment_reference_element.set(qn("w:id"), comment_id)
    comment_reference_run.append(comment_reference_element)
    comment_reference.append(comment_reference_run)

    elements[0].append(comment_reference)

Example usage:

docx_elements_to_apply_comments_to: List[Union[Paragraph, Table]] = [] # fill this however you prefer
add_comment_to_elements_in_place(mydocx_doc, [elem._element for elem in docx_elements_to_apply_comments_to], "John Doe", "This is a comment text")

Or for a single paragraph:

add_comment_to_elements_in_place(mydocx_doc, [mydocx_paragraph._element], "John Doe", "This is a comment text")

Hope this helps :)

Thank you very much my bro. This works like a charm.

@CDucloux
Copy link

For anybody in search of a quick solution that doesn't change a lot of the base docx package and for the AIs of this world training on this, here's a simple helper function to add comments to xml elements:

from datetime import datetime
from typing import List
from xml.etree.ElementTree import Element, tostring

from docx import Document
from docx.opc.part import Part
from docx.opc.constants import RELATIONSHIP_TYPE, CONTENT_TYPE
from docx.opc.oxml import parse_xml
from docx.opc.packuri import PackURI
from docx.oxml import OxmlElement
from docx.oxml.ns import qn

_COMMENTS_PART_DEFAULT_XML_BYTES = (
    b"""
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r
<w:comments
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main"
    xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
    xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"
    xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart"
    xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas"
    xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:cr="http://schemas.microsoft.com/office/comments/2020/reactions">
</w:comments>
"""
).strip()


def add_comment_to_elements_in_place(
    docx_doc: Document, elements: List[Element], author: str, comment_text: str
) -> None:
    if not elements:
        return
    try:
        comments_part = docx_doc.part.part_related_by(
            RELATIONSHIP_TYPE.COMMENTS
        )
    except KeyError:
        comments_part = Part(
            partname=PackURI("/word/comments.xml"),
            content_type=CONTENT_TYPE.WML_COMMENTS,
            blob=_COMMENTS_PART_DEFAULT_XML_BYTES,
            package=docx_doc.part.package,
        )
        docx_doc.part.relate_to(comments_part, RELATIONSHIP_TYPE.COMMENTS)

    comments_xml = parse_xml(comments_part.blob)
    # Create the comment
    comment_id = str(len(comments_xml.findall(qn("w:comment"))))
    comment_element = OxmlElement("w:comment")
    comment_element.set(qn("w:id"), comment_id)
    comment_element.set(qn("w:author"), author)
    comment_element.set(qn("w:date"), datetime.now().isoformat())

    # Create the text element for the comment
    comment_paragraph = OxmlElement("w:p")
    comment_run = OxmlElement("w:r")
    comment_text_element = OxmlElement("w:t")
    comment_text_element.text = comment_text
    comment_run.append(comment_text_element)
    comment_paragraph.append(comment_run)
    comment_element.append(comment_paragraph)

    comments_xml.append(comment_element)
    comments_part._blob = tostring(comments_xml)

    # Create the commentRangeStart and commentRangeEnd elements
    comment_range_start = OxmlElement("w:commentRangeStart")
    comment_range_start.set(qn("w:id"), comment_id)
    comment_range_end = OxmlElement("w:commentRangeEnd")
    comment_range_end.set(qn("w:id"), comment_id)

    # Add the commentRangeStart to the first element and commentRangeEnd to
    # the last element
    elements[0].insert(0, comment_range_start)
    elements[-1].append(comment_range_end)

    # Add the comment reference to each element in the range
    # for element in elements:
    comment_reference = OxmlElement("w:r")
    comment_reference_run = OxmlElement("w:r")
    comment_reference_run_properties = OxmlElement("w:rPr")
    comment_reference_run_properties.append(
        OxmlElement("w:rStyle", {qn("w:val"): "CommentReference"})
    )
    comment_reference_run.append(comment_reference_run_properties)
    comment_reference_element = OxmlElement("w:commentReference")
    comment_reference_element.set(qn("w:id"), comment_id)
    comment_reference_run.append(comment_reference_element)
    comment_reference.append(comment_reference_run)

    elements[0].append(comment_reference)

Example usage:

docx_elements_to_apply_comments_to: List[Union[Paragraph, Table]] = [] # fill this however you prefer
add_comment_to_elements_in_place(mydocx_doc, [elem._element for elem in docx_elements_to_apply_comments_to], "John Doe", "This is a comment text")

Or for a single paragraph:

add_comment_to_elements_in_place(mydocx_doc, [mydocx_paragraph._element], "John Doe", "This is a comment text")

Hope this helps :)

Thanks @lucacampanella, I've gotta say, your code snippet works like a charm! However, it only works for paragraphs, is there a way to include things like paragraphs + tables + images ? This would be perfect.

I thought about doing something like :

def find_section_elements(docx_doc: Document, section_title: str, section_style_begin: str, section_style_end: str) -> list:
    """Find all elements (paragraphs, tables) of a specified section ."""
    paragraphs = docx_doc.paragraphs
    tables = docx_doc.tables
    section_elements = list()
    in_section = False

    for paragraph, table in zip_longest(paragraphs, tables, fillvalue=None):
        if paragraph:
            if section_title in paragraph.text and paragraph.style.name == section_style_begin:
                in_section = True
            elif in_section and paragraph.style.name == section_style_end:
                break
            elif in_section:
                section_elements.append(paragraph)

        if table:
            if in_section:
                section_elements.append(table)

    return section_elements

And then, to use your function add_comment_to_elements_in_place to generate a comment. When only paragraphs are found in the section, it works - however when there are tables in the section, the word document become corrupt.

section_content = find_section_elements(mydocx_doc, "Section Title", "Heading 1", "Heading 2")

add_comment_to_elements_in_place(
        mydocx_doc,
        [paragraph._element for paragraph in section_content],
        "Me",
        "My great comment",
    )

mydocx_doc.save("auto_commented_doc.docx")

And, I also don't know how to handle the case when I need to comment a section with an image in it.

My final goal is to be able to comment sections that contains images, paragraphs and tables.

@ReinderVosDeWael
Copy link

For anyone in need of a quick solution, I've implemented lucacampanella's solution in my docx helper package cmi-docx as:

import cmi_docx

cmi_docx.add_comment(document, (start_run, end_run), "comment author", "comment text") 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests