NYCPHP Meetup

NYPHP.org

[nycphp-talk] OT: Standard Data Import Tool?

Greg Rundlett (freephile) greg at freephile.com
Tue Dec 29 12:45:09 EST 2009


There are several issues all bundled up here.

First you have to define the file format(s) that your feeds come in so
that you can read them and isolate the data elements correctly.  Are
they CSV, TSV, XML, custom?
At the first (read) step you can do gross validation like validate the
source for correct character encoding, record separators, record
length etc.
You also want to define "layouts" for various feeds.  By defining the
layout, you can apply a set of processing instructions to transform
the received layout into the desired layout.
Once you have the data in the desired layout, you can then do finer
validation such as per field rules.  E.g. last name must
/(a-zA-Z'-){2,100}/
Then you can write an import routine that determines the correct
layout transformation to apply, reads the source, tranposes it,
validates it, stores it in the database, logs the event

hth,

Greg Rundlett

nbpt 978-225-8302
m. 978-764-4424
-skype/aim/irc/twitter freephile
http://profiles.aim.com/freephile




On Tue, Dec 29, 2009 at 12:28 PM, Matt Juszczak <matt at atopia.net> wrote:
> Hi all,
>
> I need to write a tool that somehow takes multiple data feeds (some xls
> files, some csv files, etc.), defines each format,
> verifies/validates/sanitizes the data, and imports it into standardized
> MySQL tables.
>
> As an example, one feed might look like this:
>
> First Name, Last Name, DOB
>
> And another like this:
>
> Name, DOB
>
> And another:
>
> DOB, Last Name, First Name, Middle Initial
>
> I need to somehow be able to say "For this source, this column needs to be
> split into two, and this other column is the date of birth column".  Once
> the data is parsed, it would need to be imported into standard MySQL tables:
>
> first_name, last_name, birth_date, etc.
>
> This is a simple example - the files and tables actually have 20-30 columns
> each, so using mysqlimport or a simple shell script probably isn't the best
> option.
>
> Are there open source tools out there that do this already?  If not, should
> I just build something?
>
> Thanks,
>
> -Matt
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>



More information about the talk mailing list