New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Table of Contents" Feature #36
Comments
See this thread on the mailing list. Feel free to add to that thread if you need more :) |
Hi Steve, I found this sister project of python-pptx recently and are discovering step by step just as I did with python-pptx. I read the posts on the mailing list, but for me it is to low level. So can you elaborate more on the way to implement this. <w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
. . .
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r> And probably this to force regenerating the doc when opening the doc. <w:updateFileds w:val='true'/> What would be the correct way to get this in the document ? Peter |
The first step would be to identify the exact XML that would get it done. opc-diag is a good tool for this. A good strategy is to create a simple document, maybe with a single heading 1 or something and save it as before.docx. Then add a TOC to it and save as after.docx. Use opc-diag to do a diff-item on the document.xml part. That should get you the exact XML to be added. If you can post that I can help you work out how to insert it. |
Hi Steve, This is the diff of the before and after:
|
A lot of the diff above is the part Word generates when it updates the TOC. You'll want to get just the part that inserts the TOC field. For the sake of discussion I'll assume that's this: <w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r> This lxml code should give you a starting point. The lxml documentation can provide more insight on details: from docx.oxml.shared import OxmlElement, qn
paragraph = document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
fldChar.text = 'foobar' # not needed for this element, but this is how you set the text it contains
r_element = run._r
r_element.append(fldChar) # adds new element as last child
p_element = paragraph._p
print(p_element.xml) # shows XML so you can track your progress |
Thanks for the preview of the solution, @scanny . Cannot tell you how useful your comments are in implementing the few outstanding to-do features in what truly is a fantastic library. For anyone else looking for a full working solution, here is what I came up with, to generate the single line that inserts the TOC field. Auto-updating the TOC was outside of my capabilities for the time being so I'll leave it to someone else to take over:
|
I ran into this issue when searching for how to make a TOC. For my purposes, having a stub that the user can click on to update is better than nothing. Therefore, if even the partial solution were to make it into python-docx, I would use it immediately. I am currently using @mustash's code for doing just that. |
@mustash Thanks for the code you posted. It works. But I need to update the fields manually. Is there a way to update the field in the python code? |
@mustash @scanny Could you please complete your code a little more, I am too naive to work it out. Besides, where does the 'self' come from? Thank you. wish you could still see my question :)
|
I also had to escape @snowflake01986 just replace |
If we need to generate a PDF, this project uses word to actually update the docx (including TOC) file prior to exporting. It does not actually saves the updated docx file and of course you need MS Word installed. @scanny is it possible that an open-source software like LibreOffice has this TOC update implemented that could be used by this project? |
It's possible. It's been a while since I've looked into it, but I believe there is some sort of library (API) access to LibreOffice. I don't believe it's Python. I think it's Java or C++, possibly both. I don't know if it requires the LibreOffice application to be running or not (the way the Microsoft VBA API does). It may be worth taking a look at though. A search on "libreoffice api" will get you where you want to start looking. |
@scanny Thanks for the directions. I am working on generating binaries so no need to be tied to python. Why not |
Is there a solution that supports liunx? |
@wangcheng-git libreoffice with pyton3-uno work well under ubuntu. |
After two days of searching exhaustively for a solution, here is what I found (just summarizing the info and adding one additional step I couldn't directly find anywhere):
Code provided by @mustash is currently the best (and sufficient) way to achieve #1. There are a number of ways to achieve #2. But all of those ways require running Word layout engine - meaning running MS Word either directly or through CLI/VBA/pywin32/etc. Quick ways to do it:
Sources: |
Is there any way to not prompt a user to update and automatically update the TOC? |
+1 |
Is it possible to set the TOC's style? For me it always defaults to Arial 9 on update. |
|
is there is any other way to update table of content indexes without this manual step |
+1 |
I ended up using latex for documents that need a TOC, unsure if there is something new in the python docx world |
Using the |
It appears to inherit styles based off the 'Normal' style when it generates the needed styles on update. Whatever is set in document.styles['Normal'] will be used when generating these styles. However, I'm not sure how to get it to use different styles for the different "Levels" in the TOC. |
hi ,Is this works for linux system paragraph = self.document.add_paragraph() fldChar2 = OxmlElement('w:fldChar') fldChar4 = OxmlElement('w:fldChar') r_element = run._r |
I just discovered this great project and I wonder if there is a feature to add a Table of Contents to a document that I create with python-docx.
I need to generate a .docx file for a customer and he wants to have a TOC in it.
The text was updated successfully, but these errors were encountered: