You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to read sentence by sentence from a docx file and check for a pattern in each word. if found I want to add some comments to the docx file at that word range.
it is something like this
from docx import Document
document = docx.Document('test.docx')
for paragraph in document.paragraphs:
#### for line in paragraph.text:
#### if line eq "/pattern/"
### line. add_comment
What form would the comment take? Just inserting some text in parentheses or something or adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on?
It should be a proper comment when adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on.
Unfortunately that "Add Comment" functionality hasn't been built out yet. We can leave this issue open as a feature request for it if you like, although I expect it will be a while before we get to it unless someone steps up to work on it. It's part of the broader functionality surrounding document markup, which is a bit of a hornet's nest. :)
I expect it's a fair piece of work to accomplish. The approach I would recommend to get started is to create a baseline document with a single paragraph, save it, then add a single comment and save it again under a second name. Then you can use opc-diag to extract both documents into directories and then use diff to compare the two directories.
I believe you'll find at least one new part will be added, perhaps called comments.xml. (A 'part' is a distinct 'file' in the ZIP archive. A Word document is a ZIP archive file at the top level.) There will also be some number of new relationships added in the .rels files and some sort of change to the paragraph or run where you inserted the comment.
Making comments work would be making all those changes happen in the right spots.
I have manually changed in the comments.xml and zipped it to create docx file and it works fine.
Now how can I do it programmatically.
My idea is if I can search a word in the document tree and add a comment-start/end node before and after it. and add the comment-details in comment.xml it will be fine.
Now how can I add a node in the document tree at a particular text point?
Note how I updated your comment above to make the XML show up clearly.
If you can post a more complete example without redacting the content elements I can offer more specific guidance. The specific elements that appear before, after, and inside all matter to the approach.
I really want to contribute to this feature...but I'm new to this lib/OOP so I might need a LOT of guidance..the best place to look for inspiration for this is the add_text method in Run right?
I don't think that example will get you very far. The comments live in a separate document "part", roughly speaking a separate file in the .docx zip package. They're keyed by ID.
This one would be quite tough for a beginner I expect.
I've got a comments.xml type thing working using lxml for my own private use...but I don't know how to use the docx library to generate a new part...can you lead me to a direction on that?
Activity
scanny commentedon Sep 17, 2014
@sriram-c you'll need to describe what you're trying to achieve more completely. I can't make out what you're trying to accomplish.
sriram-c commentedon Sep 17, 2014
I want to read sentence by sentence from a docx file and check for a pattern in each word. if found I want to add some comments to the docx file at that word range.
it is something like this
from docx import Document
document = docx.Document('test.docx')
for paragraph in document.paragraphs:
#### for line in paragraph.text:
#### if line eq "/pattern/"
### line. add_comment
I hope it is now clear
scanny commentedon Sep 17, 2014
What form would the comment take? Just inserting some text in parentheses or something or adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on?
sriram-c commentedon Sep 17, 2014
Hi Scanny,
It should be a proper comment when adding one of those comment things that appear in the margin alongside markup when you have "Show Markup" turned on.
Thanks,
Sriram
scanny commentedon Sep 18, 2014
Unfortunately that "Add Comment" functionality hasn't been built out yet. We can leave this issue open as a feature request for it if you like, although I expect it will be a while before we get to it unless someone steps up to work on it. It's part of the broader functionality surrounding document markup, which is a bit of a hornet's nest. :)
sriram-c commentedon Sep 18, 2014
Is there any work-around for it by using lxml or any other libraries ?
Can you give me some logic / hint to accomplish it quicker ?
Thanks,
Sriram
scanny commentedon Sep 18, 2014
I expect it's a fair piece of work to accomplish. The approach I would recommend to get started is to create a baseline document with a single paragraph, save it, then add a single comment and save it again under a second name. Then you can use opc-diag to extract both documents into directories and then use diff to compare the two directories.
I believe you'll find at least one new part will be added, perhaps called comments.xml. (A 'part' is a distinct 'file' in the ZIP archive. A Word document is a ZIP archive file at the top level.) There will also be some number of new relationships added in the .rels files and some sort of change to the paragraph or run where you inserted the comment.
Making comments work would be making all those changes happen in the right spots.
sriram-c commentedon Sep 18, 2014
Thanks Scanny for the help.
I have unzipped the docx file and looked into the comments.xml and document.xml files.
Basically it adds a comment-id in the document.xml and maintains the details in comments.xml
for e.g
in document.xml it has following
and in comments.xml
I have manually changed in the comments.xml and zipped it to create docx file and it works fine.
Now how can I do it programmatically.
My idea is if I can search a word in the document tree and add a comment-start/end node before and after it. and add the comment-details in comment.xml it will be fine.
Now how can I add a node in the document tree at a particular text point?
scanny commentedon Sep 18, 2014
Note how I updated your comment above to make the XML show up clearly.
If you can post a more complete example without redacting the content elements I can offer more specific guidance. The specific elements that appear before, after, and inside all matter to the approach.
sriram-c commentedon Sep 19, 2014
Hi,
thanks for the XML notation.
for the time being I am using pywin32 and achieving the goal through word objects directly.
But for this I have to depend on Windows OS , which personally I don't like (my favorite is ubuntu)
so I will wait till python-docx has sufficient features to handle the word level text and adding different markup into the document.
Thanks again for the help.
Sriram
[-]inserting comments to certain words[/-][+]feature: insert comment[/+]wasified commentedon May 30, 2015
I really want to contribute to this feature...but I'm new to this lib/OOP so I might need a LOT of guidance..the best place to look for inspiration for this is the add_text method in Run right?
scanny commentedon Jun 2, 2015
I don't think that example will get you very far. The comments live in a separate document "part", roughly speaking a separate file in the .docx zip package. They're keyed by ID.
This one would be quite tough for a beginner I expect.
wasified commentedon Jun 2, 2015
Yeah it is quite tough.
I've got a comments.xml type thing working using lxml for my own private use...but I don't know how to use the docx library to generate a new part...can you lead me to a direction on that?
38 remaining items