XML Databases

ziggy on 2002-05-15T02:44:20

I'm a skeptic when it comes to XML databases. There are two use cases that seem to fit the concept, and both are outliers. The first are huge documents (a few dozen gigabytes per document, such as FDA filings) that don't fit either the relational model or are just too important to trust to a word processor. The second case is a large mass of semi-structured documents (like a bibliographic database) where relational databases work well at indexing and poorly at storing the data; records are chopped into pieces to be easily indexible, and are forever re-constructed each time the record is retrieved (and HTMLified). This second case can be considered a less degenerate example of "full text searching" on random text.

Perhaps it makes sense to throw out SQL in favor on an XPath query interface. The main problem to solve is how to identify the relevant sub-portions of a document, and the relevant bits and pieces of information to send to the indexer. Easy to conceptualize, difficult to make real.

I'm starting to think that my skepticism is a little over-stated. Sleepycat is announcing Berkeley DB XML, an XML database engine. If the wise folks at Sleepycat see value here, to the point where they're developing a product to resell, then there's got to be something more to XML databases than I've seen to date.

Semi-structured DBs

darobin on 2002-05-15T12:19:56

I've mostly been skeptic about the "heavy" side of the XML DB hype, notably some parts of XQuery and the idea that they will or should replace RDBs.

However I've always thought that there would be huge value in various sorts of semi-structured DBs, and XML DBs could be quite useful there. I find that a lot of the work I do both on garden variety websites and on content management (for more complex systems) requires me to enforce content into a relational model, rather than the other way round.

Indexing is bound to be hard, no surprise there. But the tradeoffs are always the same, and choosing an indexing strategy for semi-structured XML is bound to be fun. At least, probably much more fun than normalising data into a relational system. I think that most of it will happen at the simplest level: index the results of XPath query foo.