NYCPHP Meetup

NYPHP.org

[nycphp-talk] Best way to accomplish this task

Anthony Papillion papillion at gmail.com
Mon Feb 15 15:08:10 EST 2010


Leam,

I spent a good part of last night thinking about this problem and I
came to the exact same conclusion you did: it's a useless charge of
resources. I think I'm going to take your advice and simply archive
the user input into the database but process it immediately when the
user submits it. A lot quicker and a lot leaner solution from a
resource standpoint.

Thanks for the input.

Anthony

On Mon, Feb 15, 2010 at 5:23 AM, Leam Hall <leam at reuel.net> wrote:
>
> Well, from an admin viewpoint, I'd recommend a lot more thinking before doing this. The "script every minute" idea causes all sorts of issues on the server as well as the database. One hokey query, or one large dataset, and you can cause a lot of problems for the entire machine.
>
> What I don't understand is why you need to have a cron job to deal with the user data. Why not have your processing script called when the user submits? This keeps your script from having to go through the entire database to find uncommitted changes. If you're going to have a gazillion row database aren't you going to be spending a lot of time on queries just to find those not committed?
>
> One other possibility would be to have a second, separate database that stores the user input. Have a script that grabs a small number of rows, does it's thing, deletes the rows it just worked on, sleeps for a couple seconds, and then calls itself. That way your primary database isn't getting hit so hard, your secondary database is the one that has to do disaster recovery, you can split the machines up if load gets too much, and your SA team won't stuff you in a trashcan when your query trashes the system.
>
> Leam
>
> Anthony Papillion wrote:
>>
>> Hello Everyone,
>>
>> I'm designing a system that will work on a schedule. Users will submit data for processing into the database and then, every minute, a PHP script will pass through the db looking for unprocessed rows (marked pending) and process them.
>>
>> The problem is, I may eventually have a few million records to process at a time. Each record could take anywhere from a few seconds to a few minutes to perform the required operations on. My concern is making sure that the script, on the next scheduled pass, doesn't grab the records currently being processed and start processing them again.
>>
>> Right now, I'm thinking of accomplishing this by updating a 'status' field in the database. So unprocessed records would have a status of 'pending', records being processed would have a status of 'processing' and completly processed record will have a status of 'complete'.
>>
>> For some reason, I see this as ugly but that's the only way I can think of making sure that records aren't duplicatly processed. So when I select records to process, I'm ONLY selecting one's with the status of 'pending' which means they are new, unprocessed.
>>
>> Is there a better, more eleqent way of doing this or is this pretty much it?
>>
>> Thanks!
>> Anthony Papillion
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> New York PHP Users Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/Show-Participation
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation



More information about the talk mailing list