How can we convert DocBook XML table content into Dita?
2 min readJul 16, 2024
To translate DocBook XML content of a table into DITA, you need to understand the structure of both DocBook and DITA table elements. Below is an example of how you can translate a simple DocBook table into a DITA table.
Example DocBook Table
<table>
<title>Sample Table</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column 1</entry>
<entry>Column 2</entry>
<entry>Column 3</entry>
</row>
</thead>
<tbody>
<row>
<entry>Row 1, Cell 1</entry>
<entry>Row 1, Cell 2</entry>
<entry>Row 1, Cell 3</entry>
</row>
<row>
<entry>Row 2, Cell 1</entry>
<entry>Row 2, Cell 2</entry>
<entry>Row 2, Cell 3</entry>
</row>
</tbody>
</tgroup>
</table>
Corresponding DITA Table
<table>
<title>Sample Table</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column 1</entry>
<entry>Column 2</entry>
<entry>Column 3</entry>
</row>
</thead>
<tbody>
<row>
<entry>Row 1, Cell 1</entry>
<entry>Row 1, Cell 2</entry>
<entry>Row 1, Cell 3</entry>
</row>
<row>
<entry>Row 2, Cell 1</entry>
<entry>Row 2, Cell 2</entry>
<entry>Row 2, Cell 3</entry>
</row>
</tbody>
</tgroup>
</table>
Python Script for Conversion
Here is a Python script that uses lxml
to perform the conversion:
from lxml import etree
def convert_docbook_to_dita(docbook_xml):
docbook_tree = etree.fromstring(docbook_xml)
# Create the root element for DITA
dita_root = etree.Element("table")
# Convert title
title = docbook_tree.find("title")
if title is not None:
dita_title = etree.SubElement(dita_root, "title")
dita_title.text = title.text
# Convert tgroup
tgroup = docbook_tree.find("tgroup")
if tgroup is not None:
dita_tgroup = etree.SubElement(dita_root, "tgroup", cols=tgroup.get("cols"))
# Convert thead
thead = tgroup.find("thead")
if thead is not None:
dita_thead = etree.SubElement(dita_tgroup, "thead")
for row in thead.findall("row"):
dita_row = etree.SubElement(dita_thead, "row")
for entry in row.findall("entry"):
dita_entry = etree.SubElement(dita_row, "entry")
dita_entry.text = entry.text
# Convert tbody
tbody = tgroup.find("tbody")
if tbody is not None:
dita_tbody = etree.SubElement(dita_tgroup, "tbody")
for row in tbody.findall("row"):
dita_row = etree.SubElement(dita_tbody, "row")
for entry in row.findall("entry"):
dita_entry = etree.SubElement(dita_row, "entry")
dita_entry.text = entry.text
# Return the DITA XML as a string
return etree.tostring(dita_root, pretty_print=True).decode("utf-8")
# Example DocBook XML
docbook_xml = """
<table>
<title>Sample Table</title>
<tgroup cols="3">
<thead>
<row>
<entry>Column 1</entry>
<entry>Column 2</entry>
<entry>Column 3</entry>
</row>
</thead>
<tbody>
<row>
<entry>Row 1, Cell 1</entry>
<entry>Row 1, Cell 2</entry>
<entry>Row 1, Cell 3</entry>
</row>
<row>
<entry>Row 2, Cell 1</entry>
<entry>Row 2, Cell 2</entry>
<entry>Row 2, Cell 3</entry>
</row>
</tbody>
</tgroup>
</table>
"""
# Convert and print the DITA XML
dita_xml = convert_docbook_to_dita(docbook_xml)
print(dita_xml)
Explanation
- Parse the DocBook XML: Use
etree.fromstring()
to parse the DocBook XML content. - Create DITA XML Structure: Create the root element
<table>
for DITA. - Convert Title: Find the
<title>
element in the DocBook XML and add it to the DITA XML. - Convert tgroup: Find the
<tgroup>
element in the DocBook XML and add it to the DITA XML, including itscols
attribute. - Convert thead and tbody: Iterate through the rows and entries in
<thead>
and<tbody>
and add them to the corresponding elements in the DITA XML. - Return the DITA XML: Convert the DITA XML tree to a string and return it.
This script handles the basic structure of a table in DocBook and converts it to the equivalent DITA structure. Depending on the complexity of your DocBook content, you may need to add more elements and attributes to the script.