XML Interviews Question page11
How do I include one XML file in another?
This works exactly the same as for SGML. First you declare the entity you want to include, and then you reference it by name:
<!DOCTYPE novel SYSTEM "/dtd/novel.dtd" [
<!ENTITY chap1 SYSTEM "mydocs/chapter1.xml">
<!ENTITY chap2 SYSTEM "mydocs/chapter2.xml">
<!ENTITY chap3 SYSTEM "mydocs/chapter3.xml">
<!ENTITY chap4 SYSTEM "mydocs/chapter4.xml">
<!ENTITY chap5 SYSTEM "mydocs/chapter5.xml">
The difference between this method and the one used for including a DTD fragment is that this uses an external general (file) entity which is referenced in the same way as for a character entity (with an ampersand).
The one thing to make sure of is that the included file must not have an XML or DOCTYPE Declaration on it. If you've been using one for editing the fragment, remove it before using the file in this way. Yes, this is a pain in the butt, but if you have lots of inclusions like this, write a script to strip off the declaration (and paste it back on again for editing).
What is parsing and how do I do it in XML
Parsing is the act of splitting up information into its component parts (schools used to teach this in language classes until the teaching profession collectively caught the anti-grammar disease).
‘Mary feeds Spot’ parses as
1. Subject = Mary, proper noun, nominative case
2. Verb = feeds, transitive, third person singular, present tense
3. Object = Spot, proper noun, accusative case
In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the component parts. All applications that read input have a parser of some kind, otherwise they'd never be able to figure out what the information means. Microsoft Word contains a parser which runs when you open a .doc file and checks that it can identify all the hidden codes. Give it a corrupted file and you'll get an error message.
XML applications are just the same: they contain a parser which reads XML and identifies the function of each the pieces of the document, and it then makes that information available in memory to the rest of the program.
While reading an XML file, a parser checks the syntax (pointy brackets, matching quotes, etc) for well-formedness, and reports any violations (reportable errors). The XML Specification lists what these are.
Validation is another stage beyond parsing. As the component parts of the program are identified, a validating parser can compare them with the pattern laid down by a DTD or a Schema, to check that they conform. In the process, default values and datatypes (if specified) can be added to the in-memory result of the validation that the validating parser gives to the application.
<person corpid="abc123" birth="1960-02-31" gender="female">
The example above parses as: 1. Element person identified with Attribute corpid containing abc123 and Attribute birth containing 1960-02-31 and Attribute gender containing female containing ...
2. Element name containing ...
3. Element forename containing text ‘Judy’ followed by ...
4. Element surname containing text ‘O'Grady’
(and lots of other stuff too).
As well as built-in parsers, there are also stand-alone parser-validators, which read an XML file and tell you if they find an error (like missing angle-brackets or quotes, or misplaced markup). This is essential for testing files in isolation before doing something else with them, especially if they have been created by hand without an XML editor, or by an API which may be too deeply embedded elsewhere to allow easy testing.