How would you build a search engine for large volumes of XML data?
The way candidates answer this question may provide insight into their view of XML data. For those who view XML primarily as a way to denote structure for text files, a common answer is to build a full-text search and handle the data similarly to the way Internet portals handle HTML pages. Others consider XML as a standard way of transferring structured data between disparate systems. These candidates often describe some scheme of importing XML into a relational or object database and relying on the database's engine for searching. Lastly, candidates that have worked with vendors specializing in this area often say that the best way the handle this situation is to use a third party software package optimized for XML data.
What is the difference between XML and C or C++ or Java ?
A: C and C++ (and other languages like FORTRAN, or Pascal, or Visual Basic, or Java or hundreds more) are programming languages with which you specify calculations, actions, and decisions to be carried out in order:
mod curconfig[if left(date,6) = "01-Apr",
t.put "April googlel!",
f.put days('31102005','DDMMYYYY') -
" more shopping days to Samhain"];
XML is a markup specification language with which you can design ways of describing information (text or data), usually for storage, transmission, or processing by a program. It says nothing about what you should do with the data (although your choice of element names may hint at what they are for):
<part num="DA42" models="LS AR DF HG KJ"
<name>Camshaft end bearing retention circlip</name>
<image drawing="RR98-dh37" type="SVG" x="476"
y="226"/> <maker id="RQ778">Ringtown Fasteners Ltd</maker>
<notes>Angle-nosed insertion tool <tool
id="GH25"/> is required for the removal
and replacement of this part.</notes>
On its own, an SGML or XML file (including HTML) doesn't do anything. It's a data format which just sits there until you run a program which does something with it.
Does XML replace HTML?
No. XML itself does not replace HTML. Instead, it provides an alternative which allows you to define your own set of markup elements. HTML is expected to remain in common use for some time to come, and the current version of HTML is in XML syntax. XML is designed to make the writing of DTDs much simpler than with full SGML. (See the question on DTDs for what one is and why you might want one.)