[nycphp-talk] Php in the twilight zone
Jack Scott
lists at jack-scott.com
Fri Apr 21 10:24:04 EDT 2006
On Fri, 2006-04-21 at 16:21 +0300, Iulian Manea wrote:
> The script is used for spidering a site, which is quite big .. so the 20
> minutes isn't that much. But each time the script finds a new link it
> flushes it to the browser, so the connection shouldn't timeout or anything
> ...
This doesn't fix your immediate problem, but if you are on *nix you
could run wget, lynx, or webBot to spider the site and then parse out
those results?
I have had to do this in the past and used wget to recursively spider a
site and create html files locally. Once that is done I grep the results
and pipe them to sed and/or (g,n)awk to fine tune the desired results.
There are a ton of similar windows utilities out there as well if that
is your platform.
Hope this helps,
Jack
More information about the talk
mailing list
Automatic Email Organization without missing anything!