NYCPHP Meetup

NYPHP.org

[nycphp-talk] Translating XML to SQL

Eric Rank erank at isthmus.com
Thu Feb 12 16:25:48 EST 2004


Hi Dan,

Doing the foreign key checking while parsing is a great idea. I've walked
through the redisigned process on paper and it seemed to work, until I
started hitting areas of multiple children nodes in the xml data structure.
For example, each article has <story>, <headline>, <issue>, and <images>
tags. The <images> tag can have may <image> children tags in it.

I have this designed so the tables have a 'records' property, which is an
array of 'record' objects. when looping through the records making the
inserts I can insure that the image lines up with the 'photographer' record
because they'd be at the same index in the 'records' across tables. However,
since there's only one record for the issue table, I can't know where to
look for the inserted id of the 'issue' table because the indexes don't
match.

However, when parsing the data, if I check the foreign keys needed for a
record, I can make a record at the index of the table where the foreign key
is so I can find it later....

Thinking about it, there's still a few gaps to fill in, but you're on the
right track in suggesting checking the foreign keys while parsing the data.
I'd like to avoid making the inserts during the parsing. I just haven't
wrapped my head around that yet. Sometimes doing everything makes for
troublesome code to debug.




> I would also be tempted to do the foreign key integrity checks and
> insertion as you parse the document, rather than trying to figure it out
> at the end.  The reasoning for this is that each record is atomic,
> containing all the necessary information for inserting it into the
> database, so it makes sense to treat it as such.  Also, if you update
> all the tables then when parsing the second (or third, etc) record you
> do not have to worry about whether the foreign key was inserted by the
> first, because if it had been it would already be in the database.
>
> I'm looking forward to seeing some code so I can get a better idea of
> exactly how the system works, but it sounds very interesting.  I am
> especially interested because I am involved in a project which uses XML
> to produce DDL queries (adodb-xmlschema) and it sounds as though they
> would complement each other nicely.
>
> Dan
>
> Eric Rank wrote:
> > After I went home last night my wheels were still spinning about this
idea,
> > and I started re-thinking the design from the ground up, using what I've
> > learned from this project so far. Currently the use of rulesets makes
the
> > design a little bit too complex. The ruleset itself consists of a list
of
> > tables in the database, within it are children tags containing <if> and
> > <else> tags. Within those condition tags are statements that would need
to
> > occur to get the value for a specific column. This gets extremely messy
and
> > confusing. Not only do you have to figure out what column relies on the
> > inserted id from another, but you also have to know what kind of SQL
> > statements to make. Confused? yeah, me too.
> >
> > After thinking about it, all this ruleset is, is a combination of
> > XML-to-table.column mappings and primary key-foreign key pairs.
Combining
> > them gets sloppy, but seperating them seems the way to go. It's much
easier
> > to write 2 seperate files. 1. an xml file with Primary key - Foreign key
> > pairs and 2. an xml-to-table.column map xml file. I can also say that
it'd
> > be much easier to write a parser for those files. Right now, parsing
through
> > a ruleset xml file with pseudo logic embedded is ugly. Parsing this
> > psuedo-logic xml is half the battle. Once it has been incorporated into
the
> > rule object, that rule object has to follow that logic. Yuck.
> >
> > So, since I was off the clock, I cracked open a beer and started putting
> > together a new design for this project (beer, I find, is the only way to
> > really justify thinking about work stuff when I'm at home). A rough UML
> > diagram is here:
> >
> > http://lo-fi.net/~eric/default.gif
> >
> > All the objects are there, all the methods and properties are not.
> >
> > The idea is this, the xml translator has a database property which holds
> > regular database stuff, but also an xml-to-table.column map and a
primary
> > key to foreign key map. The tables hold record objects. the record
objects
> > are created when the data file gets parsed by the data xml parser. When
the
> > parser gets to an element that holds data, it looks at the xml-db map on
the
> > database object, and knows what table to create a record for. After the
data
> > has all been parsed into record objects for the tables, it's time to
insert
> > the records. At that point, the translator goes through all the tables,
> > making inserts for records that have not yet been saved, ignoring those
that
> > already exist. If an incomplete record is found during this process,
where
> > an id is needed from a different record insertion, the
translator->database
> > object knows where look by virtue of it's Primary Key - Foreign Key map.
If
> > the id that we need is there, great, grab it and put it in the record.
If
> > not, try to insert the record (or SELECT the record ID if it already
exists,
> > eg. an authors name) for the table that we need the id from. Then you
can go
> > back and save the first record. This process would need to be a
recursive
> > one.
> >
> > What kind of problems do you think I might run into using a design like
> > this? Is anything missing?
> >
> > I'll consider the article idea. At this point, I've done enough head
> > scratching that I think it'd be nice to save anyone else the grief. I'll
> > work on collecting and writing up my experiences.
> >
> > Eric Rank
>
>
> _______________________________________________
> talk mailing list
> talk at lists.nyphp.org
> http://lists.nyphp.org/mailman/listinfo/talk




More information about the talk mailing list