NYCPHP Meetup

NYPHP.org

[nycphp-talk] Translating XML to SQL

Dan Cech dcech at phpwerx.net
Thu Feb 12 11:08:14 EST 2004


Hi Eric,

It sounds as though you've gone through a few beers on this one...though 
that may explain your eloquence ;)

The idea sounds great to me, reducing your problem down to the 
essentials, inserting a bunch of records into a table while maintaining 
foreign key integrity.

My observation would be as follows:

As far as I can tell you have one table which has foreign keys into a 
bunch of other tables, which then may have foreign keys into further tables.

I think the toughest part is going to be the logic which converts an 
<author>somebody</author> tag into a row in the author table and a 
column in the article table...though thinking about it there are a few 
ways it could be done.

I would also be tempted to do the foreign key integrity checks and 
insertion as you parse the document, rather than trying to figure it out 
at the end.  The reasoning for this is that each record is atomic, 
containing all the necessary information for inserting it into the 
database, so it makes sense to treat it as such.  Also, if you update 
all the tables then when parsing the second (or third, etc) record you 
do not have to worry about whether the foreign key was inserted by the 
first, because if it had been it would already be in the database.

I'm looking forward to seeing some code so I can get a better idea of 
exactly how the system works, but it sounds very interesting.  I am 
especially interested because I am involved in a project which uses XML 
to produce DDL queries (adodb-xmlschema) and it sounds as though they 
would complement each other nicely.

Dan

Eric Rank wrote:
> After I went home last night my wheels were still spinning about this idea,
> and I started re-thinking the design from the ground up, using what I've
> learned from this project so far. Currently the use of rulesets makes the
> design a little bit too complex. The ruleset itself consists of a list of
> tables in the database, within it are children tags containing <if> and
> <else> tags. Within those condition tags are statements that would need to
> occur to get the value for a specific column. This gets extremely messy and
> confusing. Not only do you have to figure out what column relies on the
> inserted id from another, but you also have to know what kind of SQL
> statements to make. Confused? yeah, me too.
> 
> After thinking about it, all this ruleset is, is a combination of
> XML-to-table.column mappings and primary key-foreign key pairs. Combining
> them gets sloppy, but seperating them seems the way to go. It's much easier
> to write 2 seperate files. 1. an xml file with Primary key - Foreign key
> pairs and 2. an xml-to-table.column map xml file. I can also say that it'd
> be much easier to write a parser for those files. Right now, parsing through
> a ruleset xml file with pseudo logic embedded is ugly. Parsing this
> psuedo-logic xml is half the battle. Once it has been incorporated into the
> rule object, that rule object has to follow that logic. Yuck.
> 
> So, since I was off the clock, I cracked open a beer and started putting
> together a new design for this project (beer, I find, is the only way to
> really justify thinking about work stuff when I'm at home). A rough UML
> diagram is here:
> 
> http://lo-fi.net/~eric/default.gif
> 
> All the objects are there, all the methods and properties are not.
> 
> The idea is this, the xml translator has a database property which holds
> regular database stuff, but also an xml-to-table.column map and a primary
> key to foreign key map. The tables hold record objects. the record objects
> are created when the data file gets parsed by the data xml parser. When the
> parser gets to an element that holds data, it looks at the xml-db map on the
> database object, and knows what table to create a record for. After the data
> has all been parsed into record objects for the tables, it's time to insert
> the records. At that point, the translator goes through all the tables,
> making inserts for records that have not yet been saved, ignoring those that
> already exist. If an incomplete record is found during this process, where
> an id is needed from a different record insertion, the translator->database
> object knows where look by virtue of it's Primary Key - Foreign Key map. If
> the id that we need is there, great, grab it and put it in the record. If
> not, try to insert the record (or SELECT the record ID if it already exists,
> eg. an authors name) for the table that we need the id from. Then you can go
> back and save the first record. This process would need to be a recursive
> one.
> 
> What kind of problems do you think I might run into using a design like
> this? Is anything missing?
> 
> I'll consider the article idea. At this point, I've done enough head
> scratching that I think it'd be nice to save anyone else the grief. I'll
> work on collecting and writing up my experiences.
> 
> Eric Rank





More information about the talk mailing list