NYCPHP Meetup

NYPHP.org

[nycphp-talk] How would you do this ?

Rob Marscher rmarscher at beaffinitive.com
Mon Sep 25 11:37:06 EDT 2006


Definitely only parse each feed once across the server (not once for 
each user).  I'm sure that would cut down your number a lot from 
200,000.  You should figure out how much processing time it takes to 
parse a feed.  I wouldn't think it would be all that much.  If it 
doesn't take too long for your code to parse a feed, you should just do 
it on demand. 

i.e. - when the user checks their account, loop through their feeds, 
determine if the last time you parsed the feed was longer than xx amount 
of time (like a half hour or hour or something like that) and then 
determine if any of those feeds have changed (maybe by comparing the 
file size of the live version with a cached local copy).  For the ones 
that have changed, pull down the new content, and mark the current time 
as the last updated time for the feed.  I would model feed entries into 
a database table for easy sorting, searching and other stuff like that. 

In terms of the user interface to deal with this possible wait time in 
updating the feeds, you could show the user the latest cached version of 
the feed and then do an ajax call to do the update.

This way of doing it would avoid parsing feeds that no one accesses and 
also avoid having to predict your user's activity.

-Rob

Jad madi wrote:
> I'm building an RSS aggregator so I'm trying to find out the best way to
> parse users account feeds equally so Lets say we have 20.000 user with
> average of 10 feeds in account so we have about
> 200.000 feed
>
> How would you schedule the parsing process to keep all accounts always
> updated without killing the server? NOTE: that some of the 200.000 feeds
> might be shared between more than one user
>
> Now, what I was thinking of is to split users into
> 1-) Idle users (check their account once a week, no traffic on their RSS
> feeds)
> 2-) Idle++ (check their account once a week, but got traffic on their
> RSS feeds)
> 2-) Active users (Check their accounts regularly and they got traffic on
> their RSS feeds)
>
> NOTE: The week is just an example but at the end it’s going to be
> dynamic ratio
>
> so with this classification I can split the parsing power and time to
> 1-) 10% idle users
> 2-) 20% idle++ users
> 3-) 70% active users.
>
> NOTE: There is another factors that should be included but I don’t want
> to get the idea messy now (CPU usage, Memory usage, connectivity issues
> (if feed site is down) in general the MAX execution time for the
> continues parsing loop shouldn’t be more than 30 minutes 60 minutes)
> Actually I’m thinking of writing a daemon to do it “just keep checking
> CPU/memory” and excute whenever a reasonable amount of resource
> available without killing the server.
>
>
> Please elaborate.
>
> _______________________________________________
> New York PHP Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> NYPHPCon 2006 Presentations Online
> http://www.nyphpcon.com
>
> Show Your Participation in New York PHP
> http://www.nyphp.org/show_participation.php
>   



More information about the talk mailing list