XML Interviews Question page13
Do I have to change any of my server software to work with XML?
The only changes needed are to make sure your server serves up .xml, .css, .dtd, .xsl, and whatever other file types you will use as the correct MIME content (media) types. The details of the settings are specified in RFC 3023. Most new versions of Web server software come preset. If not, all that is needed is to edit the mime-types file (or its equivalent: as a server operator you already know where to do this, right?) and add or edit the relevant lines for the right media types. In some servers (eg Apache), individual content providers or directory owners may also be able to change the MIME types for specific file types from within their own directories by using directives in a .htaccess file. The media types required are:
* text/xml for XML documents which are ‘readable by casual users’;
* application/xml for XML documents which are ‘unreadable by casual users’;
* text/xml-external-parsed-entity for external parsed entities such as document fragments (eg separate chapters which make up a book) subject to the readability distinction of text/xml;
* application/xml-external-parsed-entity for external parsed entities subject to the readability distinction of application/xml;
* application/xml-dtd for DTD files and modules, including character entity sets.
The RFC has further suggestions for the use of the +xml media type suffix for identifying ancillary files such as XSLT (application/xslt+xml).
If you run scripts generating XHTML which you wish to be treated as XML rather than HTML, they may need to be modified to produce the relevant Document Type Declaration as well as the right media type if your application requires them to be validated.
I'm trying to understand the XML Spec:
- why does it have such difficult terminology?
For implementation to succeed, the terminology needs to be precise. Design goal eight of the specification tells us that ‘the design of XML shall be formal and concise’. To describe XML, the specification therefore uses formal language drawn from several fields, specifically those of text engineering, international standards and computer science. This is often confusing to people who are unused to these disciplines because they use well-known English words in a specialised sense which can be very different from their common meanings—for example: grammar, production, token, or terminal. The specification does not explain these terms because of the other part of the design goal: the specification should be concise. It doesn't repeat explanations that are available elsewhere: it is assumed you know this and either know the definitions or are capable of finding them. In essence this means that to grok the fullness of the spec, you do need a knowledge of some SGML and computer science, and have some exposure to the language of formal standards. Sloppy terminology in specifications causes misunderstandings and makes it hard to implement consistently, so formal standards have to be phrased in formal terminology. This FAQ is not a formal document, and the astute reader will already have noticed it refers to ‘element names’ where ‘element type names’ is more correct; but the former is more widely understood.
Can I still use server-side inclusions?
Yes, so long as what they generate ends up as part of an XML-conformant file (ie either valid or just well-formed).
Server-side tag-replacers like shtml, PHP, JSP, ASP, Zope, etc store almost-valid files using comments, Processing Instructions, or non-XML markup, which gets replaced at the point of service by text or XML markup (it is unclear why some of these systems use non-HTML/XML markup). There are also some XML-based preprocessors for formats like XVRL (eXtensible Value Resolution Language) which resolve specialised references to external data and output a normalised XML file.
Can I (and my authors) still use client-side inclusions?
The same rule applies as for server-side inclusions, so you need to ensure that any embedded code which gets passed to a third-party engine (eg calls to SQL, VB, Java, etc) does not contain any characters which might be misinterpreted as XML markup (ie no angle brackets or ampersands). Either use a CDATA marked section to avoid your XML application parsing the embedded code, or use the standard <, and & character entity references instead.