NYCPHP Meetup

NYPHP.org

[nycphp-talk] Practical Extraction in PHP

csnyder chsnyder at gmail.com
Mon Jul 16 09:37:35 EDT 2007


I've had this idea in the back of my head to do a project that
systematically tracks values embedded in web or email reports. For
instance, I get logwatch emails from the servers I admin, and each
time one of those comes in I'd like to extract the disk free space and
put it into a round-robin database. Or I want to track the rendering
time for various key pages in a CMS.

The problem is that the value isn't always in the same place. It might
be a few lines down because of alerts or content that precede it. Or
it might look different some days (ending in GB rather than MB). It's
possible that some combination of regex and recurrence number ( 5th
instance of /[0-9]*(MB|BG)/ ) could work, but it seems messy.

We all probably do a little of this on an ad hoc basis, scraping
values out of websites and whatnot. Does anyone do it a lot? What kind
of tools do you use? Is Perl better suited to the task? Or sed+awk?
Does anyone know of a system (preferably php) that does this in the
general case?

TIA, y'all.

-- 
Chris Snyder
http://chxo.com/



More information about the talk mailing list