XML vs. rel DBs [was: Re: [nycphp-talk] Many pages: one script]

Elliotte Harold elharo at
Sat Aug 11 22:57:17 EDT 2007

Kenneth Downs wrote:

> Select title
>           ,SUBSTRING(text ...insert regexp here...)
>   from chapters
>  where book_name = 'XML in a Nutshell'

Regexps can't do that though. Regular expression are an insufficiently 
powerful tool for processing XML. Trying to do that is just a world of 

> Rusty, you appear to be arguing from ignorance, very unusual coming from 
> you.

Funny how you confuse different experiences with ignorance. Have you 
ever worked in publishing? Or in library science? Or on anything that 
operates at web scale like Yahoo or Google? There are many use cases 
where a couple of months of hard labor will rapidly disabuse anyone of 
the belief that relational databases are the one true solution to all 
problems. Your career just happens not to have taken you down those 
paths yet.

> The true difference between us in this argument is that I understand 
> that I have a prejudice for relational over hierarchical, based on my 
> knowledge and use of both, and based on judgment calls as to how to get 
> through the day.  I daresay however that you are promoting a religious 
> favoring of XML w/o a working knowledge of the alternatives.

Ken, you know me. Do you really think I don't know the relational model 
or what it's good for? I use relational databases all the time, and I'm 
using them now. However unlike you I've hit their limits. While I'm sure 
many people can profitably spend their life doing nothing but relational 
  databases, I happen to be working on applications where neither the 
relational model nor the actual SQL databases out there can come close 
to managing my data. I've never said that all applications should use 
XML databases or other non-relational systems, You keep trying to put 
those words into my mouth. I do say that some applications, especially 
in publishing and web publishing, do not fit the relational model well 
and can better served by XML databases.

> You simply cannot defend a file format as a foundation for frameworks 
> and databases.  The best you can do is defend the model, such as the 
> hierarchical model.

XML is not a file format. We've been down this road before. A native XML 
database is no more based ona  file format than MySQL is based on tab 
delimited text.

> Going further, you cannot defend a file format as a foundation for 
> anything based on how it handles large text (or binary) fields.  There 
> are three issues here:
> -> Data model, hierarchical vs. relational. 
> -> File format, XML vs YAML or JSON or any other format you like
> -> Handling of large text (and binary) columns.
> Finally, if we can all admit that XML is just a file format, then the 
> entire framework crumbles as soon as somebody comes up with a better 
> one, because let's admit it, XML is just about the worst you're going to 
> find.

Troll. Troll. Troll.

> In conclusion, the examples you provide appear to give advantage to XML 
> because tools exist to handle data that has been buried in opaque 
> formats and poorly defined structures.  If the data had been structured 
> properly in the first place and put into formats that were not so 
> opaque, using (pardon me for saying) a *real* database, designed on 
> solid principles, the examples you give become child's play.

LOL. Seriously, try storing a book or an encyclopedia in a relational 
database with anything approximating 1NF, not even 2NF. Then try and 
make it perform adequately.

Not all data fits neatly into tables.

>> I'm glad we have multiple tools to bring to bear on this kind of
>> problem, because I worry about the performance implications of
>> querying an XML database for the average price of those books, or
>> performing an operation that adds another field (tag?) to each book's
>> "record".

Average prices, or adding a field, can be done pretty fast. I don't know 
if it's as fast as oracle or MySQL. I don't much care. Sales systems are 
exactly the sort of apps that relational databases fit well. But 
actually publishing the books? That's a very different story.

>> If it's not too much trouble, could you give us some other use cases
>> for an XML database? Because title and first paragraph, if that's
>> something a system "routinely does" could easily be stored as
>> relational data at the time of import.

Just surf around Safari sometime. Think about what it's doing. Then try 
to imagine doing that on top of a relational database.

Think about combining individual chapters, sections, and even smaller 
divisions to make new on-off books like Safari U does. Consider the 
generation of tables of contents and indexes for these books.

Closer to home, think about a blogging system or a content management 
system. Now imagine what you could do if the page structure were 
actually queryable, and not just an opaque blob in MySQL somewhere.

Elliotte Rusty Harold  elharo at
Java I/O 2nd Edition Just Published!

More information about the talk mailing list