Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Table of Contents" Feature #36

Closed
dmr opened this issue Apr 7, 2014 · 26 comments
Closed

"Table of Contents" Feature #36

dmr opened this issue Apr 7, 2014 · 26 comments

Comments

@dmr
Copy link

dmr commented Apr 7, 2014

I just discovered this great project and I wonder if there is a feature to add a Table of Contents to a document that I create with python-docx.
I need to generate a .docx file for a customer and he wants to have a TOC in it.

@scanny
Copy link
Contributor

scanny commented Apr 8, 2014

See this thread on the mailing list. Feel free to add to that thread if you need more :)

@scanny scanny closed this as completed Apr 8, 2014
@tooh
Copy link

tooh commented May 31, 2014

Hi Steve,

I found this sister project of python-pptx recently and are discovering step by step just as I did with python-pptx.

I read the posts on the mailing list, but for me it is to low level. So can you elaborate more on the way to implement this.
I noted that this is not yet supported in the API.
I understood I need a TOC field in de document , probably with code something like this:

<w:r>
  <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
  <w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
  <w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
. . .
</w:r>
<w:r>
  <w:fldChar w:fldCharType="end"/>
</w:r>

And probably this to force regenerating the doc when opening the doc.

<w:updateFileds w:val='true'/>

What would be the correct way to get this in the document ?

Peter

@scanny
Copy link
Contributor

scanny commented May 31, 2014

The first step would be to identify the exact XML that would get it done. opc-diag is a good tool for this. A good strategy is to create a simple document, maybe with a single heading 1 or something and save it as before.docx. Then add a TOC to it and save as after.docx. Use opc-diag to do a diff-item on the document.xml part. That should get you the exact XML to be added.

If you can post that I can help you work out how to insert it.

@tooh
Copy link

tooh commented Jun 1, 2014

Hi Steve,

This is the diff of the before and after:

--- TOCTest_before/word/document.xml

+++ TOCTest_after/word/document.xml

@@ -20,10 +20,80 @@

     mc:Ignorable="w14 wp14"
     >
   <w:body>
-    <w:p w:rsidR="00B63965" w:rsidRDefault="00B63965" w:rsidP="00B63965">
+    <w:p w:rsidR="0079348B" w:rsidRDefault="0079348B">
+      <w:pPr>
+        <w:pStyle w:val="Inhopg1"/>
+        <w:tabs>
+          <w:tab w:val="right" w:leader="dot" w:pos="9056"/>
+        </w:tabs>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+      </w:pPr>
+      <w:r>
+        <w:fldChar w:fldCharType="begin"/>
+      </w:r>
+      <w:r>
+        <w:instrText xml:space="preserve"> TOC  \* MERGEFORMAT </w:instrText>
+      </w:r>
+      <w:r>
+        <w:fldChar w:fldCharType="separate"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:t>Test header</w:t>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:tab/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="begin"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:instrText xml:space="preserve"> PAGEREF _Toc263231988 \h </w:instrText>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="separate"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:t>1</w:t>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="end"/>
+      </w:r>
+    </w:p>
+    <w:p w:rsidR="00B63965" w:rsidRDefault="0079348B" w:rsidP="00B63965">
       <w:pPr>
         <w:pStyle w:val="Kop1"/>
       </w:pPr>
+      <w:r>
+        <w:fldChar w:fldCharType="end"/>
+      </w:r>
       <w:bookmarkStart w:id="0" w:name="_GoBack"/>
       <w:bookmarkEnd w:id="0"/>
     </w:p>
@@ -41,9 +111,11 @@

       <w:pPr>
         <w:pStyle w:val="Kop1"/>
       </w:pPr>
+      <w:bookmarkStart w:id="1" w:name="_Toc263231988"/>
       <w:r>
         <w:t>Test header</w:t>
       </w:r>
+      <w:bookmarkEnd w:id="1"/>
     </w:p>
     <w:sectPr w:rsidR="0062310F" w:rsidSect="0062310F">
       <w:pgSz w:w="11900" w:h="16840"/>

@scanny
Copy link
Contributor

scanny commented Jun 2, 2014

A lot of the diff above is the part Word generates when it updates the TOC. You'll want to get just the part that inserts the TOC field.

For the sake of discussion I'll assume that's this:

<w:r>
  <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
  <w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
  <w:fldChar w:fldCharType="end"/>
</w:r>

This lxml code should give you a starting point. The lxml documentation can provide more insight on details:

from docx.oxml.shared import OxmlElement, qn

paragraph = document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar')  # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
fldChar.text = 'foobar'  # not needed for this element, but this is how you set the text it contains
r_element = run._r
r_element.append(fldChar)  # adds new element as last child
p_element = paragraph._p
print(p_element.xml)  # shows XML so you can track your progress

@mustash
Copy link

mustash commented Oct 4, 2015

Thanks for the preview of the solution, @scanny . Cannot tell you how useful your comments are in implementing the few outstanding to-do features in what truly is a fantastic library.

For anyone else looking for a full working solution, here is what I came up with, to generate the single line that inserts the TOC field. Auto-updating the TOC was outside of my capabilities for the time being so I'll leave it to someone else to take over:

    paragraph = self.document.add_paragraph()
    run = paragraph.add_run()
    fldChar = OxmlElement('w:fldChar')  # creates a new element
    fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
    instrText = OxmlElement('w:instrText')
    instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
    instrText.text = 'TOC \o "1-3" \h \z \u'   # change 1-3 depending on heading levels you need

    fldChar2 = OxmlElement('w:fldChar')
    fldChar2.set(qn('w:fldCharType'), 'separate')
    fldChar3 = OxmlElement('w:t')
    fldChar3.text = "Right-click to update field."
    fldChar2.append(fldChar3)

    fldChar4 = OxmlElement('w:fldChar')
    fldChar4.set(qn('w:fldCharType'), 'end')

    r_element = run._r
    r_element.append(fldChar)
    r_element.append(instrText)
    r_element.append(fldChar2)
    r_element.append(fldChar4)
    p_element = paragraph._p

@madphysicist
Copy link

I ran into this issue when searching for how to make a TOC. For my purposes, having a stub that the user can click on to update is better than nothing. Therefore, if even the partial solution were to make it into python-docx, I would use it immediately. I am currently using @mustash's code for doing just that.

@xie186
Copy link

xie186 commented Jan 26, 2017

@mustash Thanks for the code you posted. It works. But I need to update the fields manually. Is there a way to update the field in the python code?

@snowflake01986
Copy link

snowflake01986 commented May 14, 2018

@mustash @scanny Could you please complete your code a little more, I am too naive to work it out. Besides, where does the 'self' come from? Thank you. wish you could still see my question :)

paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar')  # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u'   # change 1-3 depending on heading levels you need

fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)

fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')

r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p

@Sup3rGeo
Copy link

Sup3rGeo commented May 17, 2018

I also had to escape \\o \\h \\z \\u for it to work without errors. Using Python 3.6.

@snowflake01986 just replace self.document with your document object. Other than that, it is just a straight copy and paste for it to work.

@Sup3rGeo
Copy link

Sup3rGeo commented May 17, 2018

If we need to generate a PDF, this project uses word to actually update the docx (including TOC) file prior to exporting. It does not actually saves the updated docx file and of course you need MS Word installed.
https://github.com/cognidox/OfficeToPDF

@scanny is it possible that an open-source software like LibreOffice has this TOC update implemented that could be used by this project?

@scanny
Copy link
Contributor

scanny commented May 17, 2018

It's possible. It's been a while since I've looked into it, but I believe there is some sort of library (API) access to LibreOffice. I don't believe it's Python. I think it's Java or C++, possibly both. I don't know if it requires the LibreOffice application to be running or not (the way the Microsoft VBA API does). It may be worth taking a look at though. A search on "libreoffice api" will get you where you want to start looking.

@Sup3rGeo
Copy link

@scanny Thanks for the directions.
Based on this idea I actually have just started a project this is an application to use LibreOffice to update indexes and generate a pdf. It seems to be working already and it should work on Windows and Linux:
https://github.com/typhoon-hil/LibreOfficeToPDF

I am working on generating binaries so no need to be tied to python.

Why not
1- add @mustash code to insert a TOC element in the main python-docx library
2- add to documentation that, to update indexes, both OfficeToPDF and LibreOfficeToPDF can be used?

@when-x
Copy link

when-x commented Dec 6, 2019

Is there a solution that supports liunx?

@gshmu
Copy link

gshmu commented Jan 3, 2020

@wangcheng-git libreoffice with pyton3-uno work well under ubuntu.
you can see @Sup3rGeo 's project. but the project can't work under mac OS, because: libreoffice's Python not work.(I tried more than 3 version of libreoffice, OSX version: 10.15.2)

@rohitg-lotusdew
Copy link

After two days of searching exhaustively for a solution, here is what I found (just summarizing the info and adding one additional step I couldn't directly find anywhere):

  • Showing up a ToC in Word works in two steps:
  1. Inserting the ToC metadata (style, position, indentation)
  2. Actually rendering the ToC

Code provided by @mustash is currently the best (and sufficient) way to achieve #1.

There are a number of ways to achieve #2. But all of those ways require running Word layout engine - meaning running MS Word either directly or through CLI/VBA/pywin32/etc.

Quick ways to do it:

  1. Insert the line fldChar.set(qn('w:dirty'), 'true') next to the line fldChar.set(qn('w:fldCharType'), 'begin') in the code provided by @mustash . This will trigger word to prompt the user to run update on opening the Word document (everytime it is opened) and the ToC will get updated once user clicks Yes
  2. If the document is being created on a Windows machine and has MS Word installed, add the following method to the python script and call it with the name of the docx file (after running doc.save(file_name))
def update_toc(file_name):
    script_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
    file_path = os.path.join(script_dir, file_name)
    word = win32com.client.DispatchEx("Word.Application")
    doc = word.Documents.Open(file_path)
    doc.TablesOfContents(1).Update()
    doc.Close(SaveChanges=True)
    word.Quit()

Sources:

  1. @scanny answers on this issue
  2. https://groups.google.com/d/msg/python-docx/VnHD7AwmPgY/VxwhZwB-UcwJ
  3. https://stackoverflow.com/a/34818909/7720280
  4. http://www.ericwhite.com/blog/exploring-tables-of-contents-in-open-xml-wordprocessingml-documents-part-2/

@FatimaArshad-DS
Copy link

Is there any way to not prompt a user to update and automatically update the TOC?

@chrischma
Copy link

+1

@ndahn
Copy link

ndahn commented Nov 30, 2022

Is it possible to set the TOC's style? For me it always defaults to Arial 9 on update.

@panchicore
Copy link

@ndahn

Insert the line fldChar.set(qn('w:dirty'), 'true') next to the line fldChar.set(qn('w:fldCharType'), 'begin') in the code provided by @mustash . This will trigger word to prompt the user to run update on opening the Word document (everytime it is opened) and the ToC will get updated once user clicks Yes

@deepak-coding-art
Copy link

deepak-coding-art commented Jun 28, 2023

is there is any other way to update table of content indexes without this manual step

@nang-dev
Copy link

+1

@dmr
Copy link
Author

dmr commented Sep 19, 2023

I ended up using latex for documents that need a TOC, unsure if there is something new in the python docx world

@dribeiro09
Copy link

Using the updateFields seems to prompt the user to update all other fields marked as dirty. On my particular scenario I have the TOC being generated by a user of the application. I would then like to change the document in a way that every time it opens the document it will ask to update the table of contents only. Any idea on how to achieve this ?

@Ahellrigel33
Copy link

Is it possible to set the TOC's style? For me it always defaults to Arial 9 on update.

It appears to inherit styles based off the 'Normal' style when it generates the needed styles on update. Whatever is set in document.styles['Normal'] will be used when generating these styles. However, I'm not sure how to get it to use different styles for the different "Levels" in the TOC.

@ajaydevaraj63
Copy link

hi ,Is this works for linux system

paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u' # change 1-3 depending on heading levels you need

fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)

fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')

r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests