NYCPHP Meetup

NYPHP.org

[nycphp-talk] Best way to accomplish this task

Leam Hall leam at reuel.net
Mon Feb 15 06:23:03 EST 2010


Well, from an admin viewpoint, I'd recommend a lot more thinking before 
doing this. The "script every minute" idea causes all sorts of issues on 
the server as well as the database. One hokey query, or one large 
dataset, and you can cause a lot of problems for the entire machine.

What I don't understand is why you need to have a cron job to deal with 
the user data. Why not have your processing script called when the user 
submits? This keeps your script from having to go through the entire 
database to find uncommitted changes. If you're going to have a 
gazillion row database aren't you going to be spending a lot of time on 
queries just to find those not committed?

One other possibility would be to have a second, separate database that 
stores the user input. Have a script that grabs a small number of rows, 
does it's thing, deletes the rows it just worked on, sleeps for a couple 
seconds, and then calls itself. That way your primary database isn't 
getting hit so hard, your secondary database is the one that has to do 
disaster recovery, you can split the machines up if load gets too much, 
and your SA team won't stuff you in a trashcan when your query trashes 
the system.

Leam

Anthony Papillion wrote:
> Hello Everyone,
> 
> I'm designing a system that will work on a schedule. Users will submit data 
> for processing into the database and then, every minute, a PHP script will 
> pass through the db looking for unprocessed rows (marked pending) and 
> process them.
> 
> The problem is, I may eventually have a few million records to process at a 
> time. Each record could take anywhere from a few seconds to a few minutes to 
> perform the required operations on. My concern is making sure that the 
> script, on the next scheduled pass, doesn't grab the records currently being 
> processed and start processing them again.
> 
> Right now, I'm thinking of accomplishing this by updating a 'status' field 
> in the database. So unprocessed records would have a status of 'pending', 
> records being processed would have a status of 'processing' and completly 
> processed record will have a status of 'complete'.
> 
> For some reason, I see this as ugly but that's the only way I can think of 
> making sure that records aren't duplicatly processed. So when I select 
> records to process, I'm ONLY selecting one's with the status of 'pending' 
> which means they are new, unprocessed.
> 
> Is there a better, more eleqent way of doing this or is this pretty much it?
> 
> Thanks!
> Anthony Papillion 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
> 
> http://www.nyphp.org/Show-Participation



More information about the talk mailing list