XML Merge 2.0 Manual

Introduction to XML Merge

XML Merge is an XML file preprocessor. XML Merge allows to recursively include XML files (called "XML fragments" in this document) and to modify XML elements and attributes. The result of the preprocessing is a single XML output file.

XML Merge Processing Diagram

XML Merge is a Python module. It is normally invoked as a program from the command line, but can equally well be used from within another Python program or module. Both use cases are described in this manual in great depth.

Run-time Requirements (Dependencies and Compatibilities) of XML Merge

XML Merge was developed and tested using Python 2.6.4 and lxml 2.2.2. It seems to work with Python 2.5 and lxml 2.1 as well. For future development, only compatibility with Python 2.5 (and later also 3.1) will be aspired.

You can download Python from http://www.python.org/, and lxml from http://codespeak.net/lxml/.

The Code of XML Merge

XML Merge is hosted at http://repo.or.cz/w/xmlmerge.git. That website shows you the version control history of the source code. To get to the XML Merge source code itself, use the "tree" or "snapshot" links.

XML Merge is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

XML Merge is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License (COPYING.LGPL3 and COPYING.GPL3) for more details.

The source code of XML Merge is one single file of less than 1000 lines: xmlmerge.py. It is well-documented, conforms to http://www.python.org/dev/peps/pep-0008/ (Python style guide), and is readable by any Python programmer woh knows his way around http://docs.python.org/, especially "Library Reference" and "Language Reference".

It starts out with the module comments and doc string, followed by imports and constants. Next comes the command line parsing class and function. "XML processing and comparison" comprises the toplevel functions associated with that, followed by the XML Preprocess class. At the end of the code, you'll find the main function that is run if xmlmerge.py is used from the command line (as opposed to being used as a Python module).

Using XML Merge as a Python Module

Looking at the code of main() and the documentation in the source code should give you a good idea what parts you might want to call individually.

A Complete Example

Run: xmlmerge.py -i document.xml -o document.out.xml

Input File (document.xml)
<?xml version='1.0' encoding='utf-8'?> <Document xmlns:xm="urn:felixrabe:xmlns:xmlmerge:preprocess"> <xm:Var greeting="'Hello'" addressee="'World'"/> <Paragraph><xm:Text>{greeting} {addressee}</xm:Text></Paragraph> </Document>

Output File (document.out.xml)
<?xml version='1.0' encoding='utf-8'?> <Document> <Paragraph>Hello World</Paragraph> </Document>

XML Preprocessing

The element names of the <xm:*/> tags are not case sensitive, though of course start and end tags have to match in case, according to the XML standard. The XML namespace used for these tags is xmlns:xm="urn:felixrabe:xmlns:xmlmerge:preprocess".

Please note that whitespace has been simplified in the output of the examples below, and may not match actual XML Merge output.

Python Expressions and Variables in XML Input and Fragment Files

Strings enclosed in "{" and "}" (curly braces) are evaluated as Python expressions within attributes and <xm:Text/>. Python variables are set by <xm:Var/> and <xm:PythonCode/> for the remainder of the current document or block, and by <xm:Include/> for the included document. The scope of these variables is limited by <xm:Block/> and by the boundaries of included documents.

XPath Expressions

Attributes of the XML Merge namespace (i.e. of the <xm:*/> elements) called "select, to, before, after, from" expect XPath expressions. These find elements in the whole document (i.e. before and after, above and below the <xm:*/> element itself), using the containing <xm:*/> element as the context node.

<xm:AddElements/>

InputOutput
<dst> <other/> </dst> <xm:AddElements to="XPath/to/dst"> <a/> <b/> <c/> </xm:AddElements> <dst> <other/> <a/> <b/> <c/> </dst>
<other/> <dst/> <xm:AddElements before="XPath/to/dst"> <a/> <b/> <c/> </xm:AddElements> <other/> <a/> <b/> <c/> <dst/>
<dst/> <other/> <xm:AddElements after="XPath/to/dst"> <a/> <b/> <c/> </xm:AddElements> <dst/> <a/> <b/> <c/> <other/>

Add elements "a", "b", and "c" to, before, or after the element "dst".

<xm:Block/>

InputOutput
<xm:Var x="1"/> <xm:Block> <xm:Var x="2"/> <first x="{x}"/> </xm:Block> <second x="{x}"/> <first x="2"/> <second x="1"/>

Define a scope for variables. Variable values that get set inside the block are only visible inside the block.

<xm:Comment/>

InputOutput
<a/> <!-- not removed --> <xm:Comment> This gets removed by XML Merge. </xm:Comment> <b/> <a/> <!-- not removed --> <b/>

XML Merge provides its own comment element which does not appear in the output. In contrast, standard XML comments are preserved and do appear in the output.

<xm:DefaultVar/>

InputOutput
<xm:Var variable_1="'Hello'" variable_2="'World'"/> <xm:Block> <xm:DefaultVar variable_1="'Hi'"/> <xm:Var variable_2="'Buddy'"/> <Message text="{variable_1} {variable_2}"/> </xm:Block> <Message text="Hello Buddy"/>

Works like <xm:Var/> (see there), but only sets the variables (and evaluates the Python expressions) if those names to not already exist in the current Python namespace. In the example, 'variable_1' is only set once and will not receive the new value 'Hi' because, at that point, it already has a value.

<xm:DefaultVar/> is especially valuable in combination with <xm:Include/> (see there), where it can be used to define default values inside the included file which then can be optionally overwritten by attributes given to the <xm:Include/> tag itself.

<xm:Include/>

WARNING: It is not safe to include untrusted XML fragments as those can cause execution of arbitrary Python code.

The given file path (attribute "file") is relative to the path of the document containing the <xm:Include/> element.

InputOutput
<xm:Include file="../path/to/file.xml" select="XPath/to/src"/> [ Specific 'src' elements (see ‘select’) of the specified document (see ‘file’) ]

In the first form (with the "select" attribute), one can think of the <xm:Include/> element as a function call that causes the whole file (specified in the "file" attribute) to get preprocessed in the same manner as the document containing the <xm:Include/> element. From the resulting document, elements selected by the "select" attribute will then get included to replace the <xm:Include/> element.

In the second form (with the "import" attribute), <xm:Include/> can also be used to "import" variables defined in the included file. This can be used, for example, for defining common Python functions and classes in a separate XML fragment file:

InputOutput
<xm:Include file="../path/to/functions.xml" import="*"/> <xm:Text> { imported_function(1, 2, 3) } </xm:Text> [ Result of ‘imported_function(1, 2, 3)’ ]

The "import" attribute works like the last part of the Python statement "from xyz import <names_or_star>". It is a comma-separated list of names to import in the current Python namespace, or just '*' to import all names.

Variables can be set for the initial namespace of the included file by supplying them as additional attributes (neither called ‘file’, ‘import’, nor ‘select’). The values of those attributes will be evaluated as Python expressions. In addition to those, the content of the current namespace will be available as well.

<xm:Loop/>

InputOutput
<xm:Loop i="range(1, 4)"> <value label="{i} squared"> <xm:Text>{i ** 2}</xm:Text> </value> </xm:Loop> <value label="1 squared"> 1 </value> <value label="2 squared"> 4 </value> <value label="3 squared"> 9 </value>

Loop over the body of <xm:Loop/> with the values i=1, i=2, i=3. The first attribute value of <xm:Loop/> is evaluated as a Python expression and assigned to the variable "i" (or whatever the name of the first attribute happens to be). Refer to the Python Library Reference, section "built-in functions", for more information on the range() function.

<xm:PythonCode/>

InputOutput
<xm:PythonCode><![CDATA[ f = file("binary_data.dat", "rb") bin_data = f.read() ]]></xm:PythonCode> <binary-data><xm:Text> { bin_data } </xm:Text></binary-data> <binary-data> [ XML-escaped binary data ] </binary-data>

The example reads binary data from a file and includes it as-is into the XML document.

<xm:RemoveAttributes/>

InputOutput
<X first="1" second="2"/> <X first="3"/> <X second="4"/> <xm:RemoveAttributes from="XPath/to/X[@first]" name="second"/> <X first="1"/> <X first="3"/> <X second="4"/>

Remove every attribute called ‘second’ from all elements ‘X’ that have an attribute called ‘first’.

<xm:RemoveElements/>

InputOutput
<A> <B/> <C> <D/> </C> </A> <xm:RemoveElements select="XPath/to/C"/> <A> <B/> </A>

Remove all elements ‘C’.

<xm:SetAttribute/>

InputOutput
<element dst="12" other="14"/> <element/> <xm:SetAttribute of="XPath/to/element" name="dst" value="23"/> <element dst="23" other="14"/> <element dst="23"/>

Set the attribute of "element". The name of the attribute to be set is specified in the "name" attribute, the value of the attribute to be set is specified in the "value" attribute of <xm:SetAttribute/>. In this example, set dst="23" on all "element" elements.

<xm:Text/>

See <xm:Loop/> example. An <xm:Text/> provides the same Python expression substitution capability using "{...}" for XML text that is available for XML attributes.

<xm:Var/>

InputOutput
<xm:Var a_number="55.2" a_string=" ’my name is Pascal’ "/> <element x="{int(a_number)}" y="{a_string.upper()}"/> <element x="55" y="MY NAME IS PASCAL"/>

Define the Python variables "a_number" and "a_string". The values of the attributes to <xm:Var/> are all evaluated as Python expressions and assigned to the variables.

<xm:DefaultVar/> is a variant of <xm:Var/> that can be used to define default values for variables that have not been set already.

Using XML Merge from the Command Line

Quickstart

This is the shortest possible invocation of XML Merge from the command line:

xmlmerge.py -i somefile.xml

It may be necessary to run xmlmerge.py using its full path, and/or specifying the full path to the Python interpreter, as in this example (Windows):

C:\Python26\python.exe "C:\Tools\XML Merge\xmlmerge.py" -i somefile.xml

It is strongly recommended though to install Python in a way that (on Windows systems) associates Python script filename extensions with the Python interpreter, which should be done by default during installation.

Usage Summary

This is what you get from a command line invocation of xmlmerge.py with the option "--help" or with erroneous arguments:

Usage: xmlmerge.py [options]

Options:
  -h, --help            show this help message and exit
  -i INPUT, --input=INPUT
                        (REQUIRED) input XML file
  -o OUTPUT, --output=OUTPUT
                        output XML file (.out.xml if not given)
  -s XML_SCHEMA, --xml-schema=XML_SCHEMA
                        XML Schema (.xsd) to validate output against
  -r REFERENCE, --reference=REFERENCE
                        reference XML file to compare output against
  -d, --html-diff       only with -r; if output and reference differ, produce
                        a HTML file showing the differences
  -t, --trace-includes  add tracing information to included XML fragments
  -v, --verbose         show debugging messages
  -q, --quiet           only show error messages

Input (-i), Output (-o), XML Schema (-s), and Reference (-r): Filename Arguments

The arguments given to the options --input, --output, --xml-schema, and --reference can be any XML file. It can optionally be specified with a relative or absolute pathname. If relative, the pathname is relative to the current working directory as determined by the operating system (i.e. not relative to xmlmerge.py itself). The filenames can have any extension, it does not have to be ".xml" (or ".xsd" for the schema), but the extension (if used) has to be provided (i.e. not just "schemafile", but "schemafile.xsd" for a file actually called "schemafile.xsd").

If no output filename is specified using the --output option, then it is constructed from the input filename by removing the filename extension of ".xml" (case-insensitive), and appending ".out.xml". If this is not desired, specify the --output option explicitly. Examples (if no output is specified):

DEVICE.XML==>DEVICE.out.xml
protocol.xhtml==>protocol.xhtml.out.xml

How to use an XML Schema File (-s)

If an XML Schema file is specified using the --xml-schema option, the created output file is matched against that schema. The result of the match is reported by xmlmerge.py on its standard output (the terminal screen) and in its return value (also called error code on Windows).

How to use a Reference XML File (-r and -d)

If a reference XML file is specified using the --reference option, the created output file is compared to it bitwise. (I.e. watch out for differing line endings, they might create a confusing and otherwise empty diff. xmlmerge.py always produces Unix line endings, '\n', on all platforms.) The result of the comparison is reported by xmlmerge.py on its standard output and in its return value.

If the --html-diff option is given, a file "<output-filename>.diff.html" is produced, showing the differences between the user-provided reference file and the xmlmerge.py-produced output file.

Reference files are particularly useful (and meant to be used) in two occasions:

Tracing Inclusions (-t)

(This feature has not yet been implemented in the current version of XML Merge.)

If the --trace-includes option is given, each result of an <xm:Include/> operation will be surrounded by <xmt:Start/> and <xmt:End/> elements. The XML namespace used for these elements is xmlns:xmt="urn:felixrabe:xmlns:xmlmerge:inctrace".