NYCPHP Meetup

NYPHP.org

XML vs. Table DBs [was: Re: [nycphp-talk] Many pages: one script]

Kenneth Downs ken at secdat.com
Wed Aug 29 07:51:23 EDT 2007


Elliotte Harold wrote:
> Kenneth Downs wrote:
>
>> Select title
>>           ,SUBSTRING(text ...insert regexp here...)
>>   from chapters
>>  where book_name = 'XML in a Nutshell'
>>
>
> Regexps can't do that though. Regular expression are an insufficiently 
> powerful tool for processing XML. Trying to do that is just a world of 
> pain.

????

The example shows a query of a table, not XML.  The purpose is to 
demonstrate with a quick snippet that all examples of a supposed 
indispensable need for the "XML Database" stem from an ignorance of the 
abilities of other tools.

Say that you prefer XML, say that you like it, say that you are used to 
using it, but don't say that it is a fundamental requirement of the data 
itself because it just ain't so.

>
>> Rusty, you appear to be arguing from ignorance, very unusual coming 
>> from you.
>
> Funny how you confuse different experiences with ignorance. Have you 
> ever worked in publishing? Or in library science? Or on anything that 
> operates at web scale like Yahoo or Google? There are many use cases 
> where a couple of months of hard labor will rapidly disabuse anyone of 
> the belief that relational databases are the one true solution to all 
> problems. Your career just happens not to have taken you down those 
> paths yet.

My observation on your arguments stems from your repeatedly ignoring 
obvious examples of where tables do just fine to store data, and the 
claim that 80% of the world's apps need an XML database.

If you have gotten used to using XML for text, then say so.  If you like 
it, then say so.  Don't say it is the only tool available because it is 
not.  It has many very serious drawbacks, verbosity being the very 
first, not to mention the confounding of structure and implementation, 
encouraging the illusion of "structureless" data, and so on.

>
>> The true difference between us in this argument is that I understand 
>> that I have a prejudice for relational over hierarchical, based on my 
>> knowledge and use of both, and based on judgment calls as to how to 
>> get through the day.  I daresay however that you are promoting a 
>> religious favoring of XML w/o a working knowledge of the alternatives.
>
> Ken, you know me. Do you really think I don't know the relational 
> model or what it's good for? I use relational databases all the time, 
> and I'm using them now. However unlike you I've hit their limits. 
> While I'm sure many people can profitably spend their life doing 
> nothing but relational  databases, I happen to be working on 
> applications where neither the relational model nor the actual SQL 
> databases out there can come close to managing my data. I've never 
> said that all applications should use XML databases or other 
> non-relational systems, You keep trying to put those words into my 
> mouth. I do say that some applications, especially in publishing and 
> web publishing, do not fit the relational model well and can better 
> served by XML databases.

I do know you, and that is why I was struck by your pro-XML stance for 
"80% of applications", in which you must either be ignorant of what most 
applications really need, or what modern RDBMS's can do, or both. 

Forget about EF Codd and the relational model for a moment, lets just 
look at the real products that have come along, the table-based servers 
we call RDBMS's.  These have all solved the very basic issues of data 
storage.  Most of their power comes from so-called "ACID" compliance, 
the ability to allow multiple simultaneous users to access a data store 
with assurances of predictable behavior.  Your XML databases must solve 
these same issues.

What about security?  The modern RDBMS defines security on all objects. 
Your XML databases will have to provide the ability to define security 
on the complete tree. (By the way, I'm sure they'll get there, just keep 
reading).

But there is one aspect of the relational model where XML, as a format, 
takes a huge leap backward.  Codd realized the incredible productivity 
gains that could be had if a programmer could access data by name and 
not worry about its internal storage structure.  He separated the 
implementation from the interface.  XML, as a format (file, data, 
whatever), confounds these two.  It is a verbose format for hierarchical 
data.  There are better formats for nearly all uses.

Here's the clincher.  Let's say the XML database grows up and has all of 
these things.  On this day the only thing it will have in common with 
XML is a hierarchical model, the XML format itself will be the first to 
go.  The ability to accept XQuery statements will be a historical 
footnote, and people will end up hating XQuery as much as they hate SQL 
(everybody's least favorite part of the RDBMS world).  These databases 
will end up supporting output formats as YAML, JSON, and others, and 
probably inputs as well.  There is just not a lot in the XML format that 
really makes up data storage.

We can thank XML for making us conscious of the ubiquitous need for 
hierarchical data.  I use it all of the time.  Personally  I store my 
database definitions in YAML, a hierarchical data format that is human 
readable/writable (unlike XML) as well as machine readable-writable.   
My programs return hierarchical data from AJAX requests as JSON, because 
that's what the browser works best with, and all of my PHP programs 
handle all data universally as associative arrays, which are just 
hierarchical data in yet another disguise.  I love hierarchies, but have 
not use for a format that is not human readable/writable, which is 
incredibly verbose, and which

So when I say you are arguing from ignorance, I am saying that you are 
generalizing your own experience with heavy-duty text management, and 
since you have never mentioned any of the topics above, you may not have 
the entire picture.

Now, to your point about my own limited experience, I picked a path some 
years ago that has made me an expert in some areas and ignorant in 
others.  But I don't go claiming that "80% of the worlds applications 
cannot use RDBMS".   In fact, the examples you raise are all examples of 
text management.  This is a new area that the RDBMS was never intended 
to solve.  Many people have found it easily possible to extend the RDBMS 
in a few areas, but others (such as you) are saying we need to start 
over.  But it is amusing that the look-again crowd has started over with 
hierarchical data.  In the end it won't be the format that is used, but 
the basic abilities to manage and store text.  I submit that the clear 
solution has yet to emerge from that pursuit. 
>
>> You simply cannot defend a file format as a foundation for frameworks 
>> and databases.  The best you can do is defend the model, such as the 
>> hierarchical model.
>
> XML is not a file format. We've been down this road before. A native 
> XML database is no more based ona  file format than MySQL is based on 
> tab delimited text.

But you are not saying what it is based upon.  My statements above about 
ACID compliance, security, and separation of implementation from 
interface provide a basis for a database.  The structure of the data is 
given by tables.  This makes a complete system.

If you cannot provide the basis for the entire picture of data 
management, we are left with what the XML books tell me: how to format 
the file.

>
>> Going further, you cannot defend a file format as a foundation for 
>> anything based on how it handles large text (or binary) fields.  
>> There are three issues here:
>>
>> -> Data model, hierarchical vs. relational. -> File format, XML vs 
>> YAML or JSON or any other format you like
>> -> Handling of large text (and binary) columns.
>>
>> Finally, if we can all admit that XML is just a file format, then the 
>> entire framework crumbles as soon as somebody comes up with a better 
>> one, because let's admit it, XML is just about the worst you're going 
>> to find.
>
> Troll. Troll. Troll.

???? Geez Rusty, come on.  My conclusion is worded harshly yes, but do 
you really label as a troll a description of the larger issues of 
formats, data models, and everything else that makes up the larger picture?

>
>
>> In conclusion, the examples you provide appear to give advantage to 
>> XML because tools exist to handle data that has been buried in opaque 
>> formats and poorly defined structures.  If the data had been 
>> structured properly in the first place and put into formats that were 
>> not so opaque, using (pardon me for saying) a *real* database, 
>> designed on solid principles, the examples you give become child's play.
>
> LOL. Seriously, try storing a book or an encyclopedia in a relational 
> database with anything approximating 1NF, not even 2NF. Then try and 
> make it perform adequately.
>
> Not all data fits neatly into tables.
>

Actually most data does not, not at first glance.   But since a table is 
simply a mapping of properties to entities, it turns out that most data 
does when you look at it closely.  It takes about the same effort as 
deciding upon a set of tags, since it is of course exactly the same process.

The crucial question is, does your book have structure?  Can you make up 
tags as you go or are you limited to a pre-defined set, such as Docbook? 
  Once you commit to a specific set of tags, you have committed to a 
structure, and you may as well use tables as anything else.  Methinks 
however that at this point it comes down to what you are comfortable 
with.  If you want to use XML, go for it, if you want to use tables, go 
for it, just don't confuse the structure of the data with a fundamental 
need for either system.

-- 
Kenneth Downs
Secure Data Software, Inc.
www.secdat.com    www.andromeda-project.org
631-689-7200   Fax: 631-689-0527
cell: 631-379-0010




More information about the talk mailing list