How can we convert DocBook XML table content into Dita?

2 min readJul 16, 2024

To translate DocBook XML content of a table into DITA, you need to understand the structure of both DocBook and DITA table elements. Below is an example of how you can translate a simple DocBook table into a DITA table.

Example DocBook Table

<table>
    <title>Sample Table</title>
    <tgroup cols="3">
        <thead>
            <row>
                <entry>Column 1</entry>
                <entry>Column 2</entry>
                <entry>Column 3</entry>
            </row>
        </thead>
        <tbody>
            <row>
                <entry>Row 1, Cell 1</entry>
                <entry>Row 1, Cell 2</entry>
                <entry>Row 1, Cell 3</entry>
            </row>
            <row>
                <entry>Row 2, Cell 1</entry>
                <entry>Row 2, Cell 2</entry>
                <entry>Row 2, Cell 3</entry>
            </row>
        </tbody>
    </tgroup>
</table>

Corresponding DITA Table

<table>
    <title>Sample Table</title>
    <tgroup cols="3">
        <thead>
            <row>
                <entry>Column 1</entry>
                <entry>Column 2</entry>
                <entry>Column 3</entry>
            </row>
        </thead>
        <tbody>
            <row>
                <entry>Row 1, Cell 1</entry>
                <entry>Row 1, Cell 2</entry>
                <entry>Row 1, Cell 3</entry>
            </row>
            <row>
                <entry>Row 2, Cell 1</entry>
                <entry>Row 2, Cell 2</entry>
                <entry>Row 2, Cell 3</entry>
            </row>
        </tbody>
    </tgroup>
</table>

Python Script for Conversion

Here is a Python script that uses lxml to perform the conversion:

from lxml import etree

def convert_docbook_to_dita(docbook_xml):
    docbook_tree = etree.fromstring(docbook_xml)
    
    # Create the root element for DITA
    dita_root = etree.Element("table")
    
    # Convert title
    title = docbook_tree.find("title")
    if title is not None:
        dita_title = etree.SubElement(dita_root, "title")
        dita_title.text = title.text
    
    # Convert tgroup
    tgroup = docbook_tree.find("tgroup")
    if tgroup is not None:
        dita_tgroup = etree.SubElement(dita_root, "tgroup", cols=tgroup.get("cols"))
        
        # Convert thead
        thead = tgroup.find("thead")
        if thead is not None:
            dita_thead = etree.SubElement(dita_tgroup, "thead")
            for row in thead.findall("row"):
                dita_row = etree.SubElement(dita_thead, "row")
                for entry in row.findall("entry"):
                    dita_entry = etree.SubElement(dita_row, "entry")
                    dita_entry.text = entry.text
        
        # Convert tbody
        tbody = tgroup.find("tbody")
        if tbody is not None:
            dita_tbody = etree.SubElement(dita_tgroup, "tbody")
            for row in tbody.findall("row"):
                dita_row = etree.SubElement(dita_tbody, "row")
                for entry in row.findall("entry"):
                    dita_entry = etree.SubElement(dita_row, "entry")
                    dita_entry.text = entry.text
    
    # Return the DITA XML as a string
    return etree.tostring(dita_root, pretty_print=True).decode("utf-8")

# Example DocBook XML
docbook_xml = """
<table>
    <title>Sample Table</title>
    <tgroup cols="3">
        <thead>
            <row>
                <entry>Column 1</entry>
                <entry>Column 2</entry>
                <entry>Column 3</entry>
            </row>
        </thead>
        <tbody>
            <row>
                <entry>Row 1, Cell 1</entry>
                <entry>Row 1, Cell 2</entry>
                <entry>Row 1, Cell 3</entry>
            </row>
            <row>
                <entry>Row 2, Cell 1</entry>
                <entry>Row 2, Cell 2</entry>
                <entry>Row 2, Cell 3</entry>
            </row>
        </tbody>
    </tgroup>
</table>
"""

# Convert and print the DITA XML
dita_xml = convert_docbook_to_dita(docbook_xml)
print(dita_xml)

Explanation

Parse the DocBook XML: Use etree.fromstring() to parse the DocBook XML content.
Create DITA XML Structure: Create the root element <table> for DITA.
Convert Title: Find the <title> element in the DocBook XML and add it to the DITA XML.
Convert tgroup: Find the <tgroup> element in the DocBook XML and add it to the DITA XML, including its cols attribute.
Convert thead and tbody: Iterate through the rows and entries in <thead> and <tbody> and add them to the corresponding elements in the DITA XML.
Return the DITA XML: Convert the DITA XML tree to a string and return it.

This script handles the basic structure of a table in DocBook and converts it to the equivalent DITA structure. Depending on the complexity of your DocBook content, you may need to add more elements and attributes to the script.

How can we convert DocBook XML table content into Dita?

Example DocBook Table

Corresponding DITA Table

Python Script for Conversion

Explanation

Written by Swaleha Parvin

No responses yet