"Table of Contents" Feature #36

dmr · 2014-04-07T10:22:16Z

I just discovered this great project and I wonder if there is a feature to add a Table of Contents to a document that I create with python-docx.
I need to generate a .docx file for a customer and he wants to have a TOC in it.

scanny · 2014-04-08T06:28:50Z

See this thread on the mailing list. Feel free to add to that thread if you need more :)

tooh · 2014-05-31T07:00:22Z

Hi Steve,

I found this sister project of python-pptx recently and are discovering step by step just as I did with python-pptx.

I read the posts on the mailing list, but for me it is to low level. So can you elaborate more on the way to implement this.
I noted that this is not yet supported in the API.
I understood I need a TOC field in de document , probably with code something like this:

<w:r>
  <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
  <w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
  <w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
. . .
</w:r>
<w:r>
  <w:fldChar w:fldCharType="end"/>
</w:r>

And probably this to force regenerating the doc when opening the doc.

<w:updateFileds w:val='true'/>

What would be the correct way to get this in the document ?

Peter

scanny · 2014-05-31T17:52:17Z

The first step would be to identify the exact XML that would get it done. opc-diag is a good tool for this. A good strategy is to create a simple document, maybe with a single heading 1 or something and save it as before.docx. Then add a TOC to it and save as after.docx. Use opc-diag to do a diff-item on the document.xml part. That should get you the exact XML to be added.

If you can post that I can help you work out how to insert it.

tooh · 2014-06-01T07:00:40Z

Hi Steve,

This is the diff of the before and after:

--- TOCTest_before/word/document.xml

+++ TOCTest_after/word/document.xml

@@ -20,10 +20,80 @@

     mc:Ignorable="w14 wp14"
     >
   <w:body>
-    <w:p w:rsidR="00B63965" w:rsidRDefault="00B63965" w:rsidP="00B63965">
+    <w:p w:rsidR="0079348B" w:rsidRDefault="0079348B">
+      <w:pPr>
+        <w:pStyle w:val="Inhopg1"/>
+        <w:tabs>
+          <w:tab w:val="right" w:leader="dot" w:pos="9056"/>
+        </w:tabs>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+      </w:pPr>
+      <w:r>
+        <w:fldChar w:fldCharType="begin"/>
+      </w:r>
+      <w:r>
+        <w:instrText xml:space="preserve"> TOC  \* MERGEFORMAT </w:instrText>
+      </w:r>
+      <w:r>
+        <w:fldChar w:fldCharType="separate"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:t>Test header</w:t>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:tab/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="begin"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:instrText xml:space="preserve"> PAGEREF _Toc263231988 \h </w:instrText>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="separate"/>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:t>1</w:t>
+      </w:r>
+      <w:r>
+        <w:rPr>
+          <w:noProof/>
+        </w:rPr>
+        <w:fldChar w:fldCharType="end"/>
+      </w:r>
+    </w:p>
+    <w:p w:rsidR="00B63965" w:rsidRDefault="0079348B" w:rsidP="00B63965">
       <w:pPr>
         <w:pStyle w:val="Kop1"/>
       </w:pPr>
+      <w:r>
+        <w:fldChar w:fldCharType="end"/>
+      </w:r>
       <w:bookmarkStart w:id="0" w:name="_GoBack"/>
       <w:bookmarkEnd w:id="0"/>
     </w:p>
@@ -41,9 +111,11 @@

       <w:pPr>
         <w:pStyle w:val="Kop1"/>
       </w:pPr>
+      <w:bookmarkStart w:id="1" w:name="_Toc263231988"/>
       <w:r>
         <w:t>Test header</w:t>
       </w:r>
+      <w:bookmarkEnd w:id="1"/>
     </w:p>
     <w:sectPr w:rsidR="0062310F" w:rsidSect="0062310F">
       <w:pgSz w:w="11900" w:h="16840"/>

scanny · 2014-06-02T20:31:35Z

A lot of the diff above is the part Word generates when it updates the TOC. You'll want to get just the part that inserts the TOC field.

For the sake of discussion I'll assume that's this:

<w:r>
  <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
  <w:instrText xml:space="preserve"> TOC \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
  <w:fldChar w:fldCharType="end"/>
</w:r>

This lxml code should give you a starting point. The lxml documentation can provide more insight on details:

from docx.oxml.shared import OxmlElement, qn

paragraph = document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar')  # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
fldChar.text = 'foobar'  # not needed for this element, but this is how you set the text it contains
r_element = run._r
r_element.append(fldChar)  # adds new element as last child
p_element = paragraph._p
print(p_element.xml)  # shows XML so you can track your progress

mustash · 2015-10-04T00:35:01Z

Thanks for the preview of the solution, @scanny . Cannot tell you how useful your comments are in implementing the few outstanding to-do features in what truly is a fantastic library.

For anyone else looking for a full working solution, here is what I came up with, to generate the single line that inserts the TOC field. Auto-updating the TOC was outside of my capabilities for the time being so I'll leave it to someone else to take over:

    paragraph = self.document.add_paragraph()
    run = paragraph.add_run()
    fldChar = OxmlElement('w:fldChar')  # creates a new element
    fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
    instrText = OxmlElement('w:instrText')
    instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
    instrText.text = 'TOC \o "1-3" \h \z \u'   # change 1-3 depending on heading levels you need

    fldChar2 = OxmlElement('w:fldChar')
    fldChar2.set(qn('w:fldCharType'), 'separate')
    fldChar3 = OxmlElement('w:t')
    fldChar3.text = "Right-click to update field."
    fldChar2.append(fldChar3)

    fldChar4 = OxmlElement('w:fldChar')
    fldChar4.set(qn('w:fldCharType'), 'end')

    r_element = run._r
    r_element.append(fldChar)
    r_element.append(instrText)
    r_element.append(fldChar2)
    r_element.append(fldChar4)
    p_element = paragraph._p

madphysicist · 2016-05-20T14:32:46Z

I ran into this issue when searching for how to make a TOC. For my purposes, having a stub that the user can click on to update is better than nothing. Therefore, if even the partial solution were to make it into python-docx, I would use it immediately. I am currently using @mustash's code for doing just that.

xie186 · 2017-01-26T19:45:44Z

@mustash Thanks for the code you posted. It works. But I need to update the fields manually. Is there a way to update the field in the python code?

snowflake01986 · 2018-05-14T18:14:38Z

@mustash @scanny Could you please complete your code a little more, I am too naive to work it out. Besides, where does the 'self' come from? Thank you. wish you could still see my question :)

paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar')  # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u'   # change 1-3 depending on heading levels you need

fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)

fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')

r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p

Sup3rGeo · 2018-05-17T09:47:17Z

I also had to escape \\o \\h \\z \\u for it to work without errors. Using Python 3.6.

@snowflake01986 just replace self.document with your document object. Other than that, it is just a straight copy and paste for it to work.

Sup3rGeo · 2018-05-17T10:17:18Z

If we need to generate a PDF, this project uses word to actually update the docx (including TOC) file prior to exporting. It does not actually saves the updated docx file and of course you need MS Word installed.
https://github.com/cognidox/OfficeToPDF

@scanny is it possible that an open-source software like LibreOffice has this TOC update implemented that could be used by this project?

scanny · 2018-05-17T17:13:59Z

It's possible. It's been a while since I've looked into it, but I believe there is some sort of library (API) access to LibreOffice. I don't believe it's Python. I think it's Java or C++, possibly both. I don't know if it requires the LibreOffice application to be running or not (the way the Microsoft VBA API does). It may be worth taking a look at though. A search on "libreoffice api" will get you where you want to start looking.

Sup3rGeo · 2018-05-20T13:56:49Z

@scanny Thanks for the directions.
Based on this idea I actually have just started a project this is an application to use LibreOffice to update indexes and generate a pdf. It seems to be working already and it should work on Windows and Linux:
https://github.com/typhoon-hil/LibreOfficeToPDF

I am working on generating binaries so no need to be tied to python.

Why not
1- add @mustash code to insert a TOC element in the main python-docx library
2- add to documentation that, to update indexes, both OfficeToPDF and LibreOfficeToPDF can be used?

when-x · 2019-12-06T07:38:10Z

Is there a solution that supports liunx?

gshmu · 2020-01-03T09:01:03Z

@wangcheng-git libreoffice with pyton3-uno work well under ubuntu.
you can see @Sup3rGeo 's project. but the project can't work under mac OS, because: libreoffice's Python not work.(I tried more than 3 version of libreoffice, OSX version: 10.15.2)

rohitg-lotusdew · 2020-07-09T13:27:37Z

After two days of searching exhaustively for a solution, here is what I found (just summarizing the info and adding one additional step I couldn't directly find anywhere):

Showing up a ToC in Word works in two steps:

Inserting the ToC metadata (style, position, indentation)
Actually rendering the ToC

Code provided by @mustash is currently the best (and sufficient) way to achieve #1.

There are a number of ways to achieve #2. But all of those ways require running Word layout engine - meaning running MS Word either directly or through CLI/VBA/pywin32/etc.

Quick ways to do it:

Insert the line fldChar.set(qn('w:dirty'), 'true') next to the line fldChar.set(qn('w:fldCharType'), 'begin') in the code provided by @mustash . This will trigger word to prompt the user to run update on opening the Word document (everytime it is opened) and the ToC will get updated once user clicks Yes
If the document is being created on a Windows machine and has MS Word installed, add the following method to the python script and call it with the name of the docx file (after running doc.save(file_name))

def update_toc(file_name):
    script_dir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
    file_path = os.path.join(script_dir, file_name)
    word = win32com.client.DispatchEx("Word.Application")
    doc = word.Documents.Open(file_path)
    doc.TablesOfContents(1).Update()
    doc.Close(SaveChanges=True)
    word.Quit()

Sources:

FatimaArshad-DS · 2021-07-26T07:59:36Z

Is there any way to not prompt a user to update and automatically update the TOC?

chrischma · 2022-06-14T12:33:10Z

+1

ndahn · 2022-11-30T16:45:09Z

Is it possible to set the TOC's style? For me it always defaults to Arial 9 on update.

panchicore · 2023-05-05T18:31:07Z

@ndahn

Insert the line fldChar.set(qn('w:dirty'), 'true') next to the line fldChar.set(qn('w:fldCharType'), 'begin') in the code provided by @mustash . This will trigger word to prompt the user to run update on opening the Word document (everytime it is opened) and the ToC will get updated once user clicks Yes

deepak-coding-art · 2023-06-28T01:25:05Z

is there is any other way to update table of content indexes without this manual step

nang-dev · 2023-09-19T01:29:01Z

+1

dmr · 2023-09-19T06:59:43Z

I ended up using latex for documents that need a TOC, unsure if there is something new in the python docx world

dribeiro09 · 2023-10-31T16:35:35Z

Using the updateFields seems to prompt the user to update all other fields marked as dirty. On my particular scenario I have the TOC being generated by a user of the application. I would then like to change the document in a way that every time it opens the document it will ask to update the table of contents only. Any idea on how to achieve this ?

Ahellrigel33 · 2024-02-02T23:19:42Z

Is it possible to set the TOC's style? For me it always defaults to Arial 9 on update.

It appears to inherit styles based off the 'Normal' style when it generates the needed styles on update. Whatever is set in document.styles['Normal'] will be used when generating these styles. However, I'm not sure how to get it to use different styles for the different "Levels" in the TOC.

ajaydevaraj63 · 2024-03-19T12:12:46Z

hi ,Is this works for linux system

paragraph = self.document.add_paragraph()
run = paragraph.add_run()
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = 'TOC \o "1-3" \h \z \u' # change 1-3 depending on heading levels you need

fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
fldChar3 = OxmlElement('w:t')
fldChar3.text = "Right-click to update field."
fldChar2.append(fldChar3)

fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')

r_element = run._r
r_element.append(fldChar)
r_element.append(instrText)
r_element.append(fldChar2)
r_element.append(fldChar4)
p_element = paragraph._p

scanny closed this as completed Apr 8, 2014

scanny mentioned this issue Oct 16, 2017

Index - Table of Contents, update #436

Closed

brendan-ward mentioned this issue Jul 30, 2018

Add table of contents to report consbio/salcc_blueprint2#77

Open

TheGroundZero mentioned this issue Aug 31, 2018

Table of Contents header #542

Open

Ab2nour mentioned this issue Apr 23, 2021

Word: page de garde et sommaire Projet-Clovis/clovis-converter#1

Open

ajaydevaraj63 mentioned this issue Mar 19, 2024

@mustash @scanny Could you please complete your code a little more, I am too naive to work it out. Besides, where does the 'self' come from? Thank you. wish you could still see my question :) #1359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Table of Contents" Feature #36

"Table of Contents" Feature #36

dmr commented Apr 7, 2014

scanny commented Apr 8, 2014

tooh commented May 31, 2014

scanny commented May 31, 2014

tooh commented Jun 1, 2014

scanny commented Jun 2, 2014

mustash commented Oct 4, 2015

madphysicist commented May 20, 2016

xie186 commented Jan 26, 2017

snowflake01986 commented May 14, 2018 •

edited

Sup3rGeo commented May 17, 2018 •

edited

Sup3rGeo commented May 17, 2018 •

edited

scanny commented May 17, 2018

Sup3rGeo commented May 20, 2018

when-x commented Dec 6, 2019

gshmu commented Jan 3, 2020 •

edited

rohitg-lotusdew commented Jul 9, 2020

FatimaArshad-DS commented Jul 26, 2021

chrischma commented Jun 14, 2022

ndahn commented Nov 30, 2022

panchicore commented May 5, 2023

deepak-coding-art commented Jun 28, 2023 •

edited

nang-dev commented Sep 19, 2023

dmr commented Sep 19, 2023

dribeiro09 commented Oct 31, 2023

Ahellrigel33 commented Feb 2, 2024

ajaydevaraj63 commented Mar 19, 2024

"Table of Contents" Feature #36

"Table of Contents" Feature #36

Comments

dmr commented Apr 7, 2014

scanny commented Apr 8, 2014

tooh commented May 31, 2014

scanny commented May 31, 2014

tooh commented Jun 1, 2014

scanny commented Jun 2, 2014

mustash commented Oct 4, 2015

madphysicist commented May 20, 2016

xie186 commented Jan 26, 2017

snowflake01986 commented May 14, 2018 • edited

Sup3rGeo commented May 17, 2018 • edited

Sup3rGeo commented May 17, 2018 • edited

scanny commented May 17, 2018

Sup3rGeo commented May 20, 2018

when-x commented Dec 6, 2019

gshmu commented Jan 3, 2020 • edited

rohitg-lotusdew commented Jul 9, 2020

FatimaArshad-DS commented Jul 26, 2021

chrischma commented Jun 14, 2022

ndahn commented Nov 30, 2022

panchicore commented May 5, 2023

deepak-coding-art commented Jun 28, 2023 • edited

nang-dev commented Sep 19, 2023

dmr commented Sep 19, 2023

dribeiro09 commented Oct 31, 2023

Ahellrigel33 commented Feb 2, 2024

ajaydevaraj63 commented Mar 19, 2024

snowflake01986 commented May 14, 2018 •

edited

Sup3rGeo commented May 17, 2018 •

edited

Sup3rGeo commented May 17, 2018 •

edited

gshmu commented Jan 3, 2020 •

edited

deepak-coding-art commented Jun 28, 2023 •

edited