NYCPHP Meetup

NYPHP.org

[nycphp-talk] Curl & Traversing Pages

Joseph Crawford codebowl at gmail.com
Tue Nov 22 20:19:02 EST 2005


Hello Everyone,

let me explain a bit what i am trying to do.  I have a script that
will grab the first page which i specify from a URL such as

http://yellowpages.superpages.com/listings.jsp?PS=45&OO=1&R=N&PP=L&CB=1&STYPE=S&F=1&L=VT&CID=00000518939&paging=1&PI=0

now when it grabs this page, it will scour the returned HTML and grab
all the information for each record under Yellow Page Listings.
once it has all records it then checks to see if there is a Next page,
basically Next will either be a link or not.

If it is a link the script will execute using the URL from the Next
Link.  Here's where i am running into problems.  I want to feed it 1
url and have it go through every page until there is not a next page.

The issue i am having is that with the url grabbed from the link, curl
fetches the page, but it's not the page expected rather it's an error
page from superpages stating that i have not supplied enough search
criteria.

On the first page grabbed, this is the link that is grabbed from the source
http://yellowpages.superpages.com/listings.jsp?PS=45&PP=L&CB=1&L=VT&CID=00000518939&paging=1&F=1&OO=1&PI=45

Now when curl grabs that url it complains about search criteria
however if you paste that to a browser it will work just fine.

here is a screenshot of the page that is returned by cURL
http://codebowl.dontexist.net/images/ypresult.jpg

I am not sure what is going on with this but if anyone here can lend a
hand with curl i would appreciate it.  I have
the cookie directory writable by apache also as i read you had to
specify the exact path to the cookie on windows using apache 2

Here is my code

http://codebowl.dontexist.net/codebowl/System/Misc/Curl.phps
http://codebowl.dontexist.net/codebowl/System/Misc/YellowPages.phps


Note that i created a curl class because i am thinking of expanding on
what i have and the framework i am working on
is going to be 100% Object Oriented.

Any help is appreciated.

--
Joseph Crawford Jr.
Zend Certified Engineer
Codebowl Solutions, Inc.
1-802-671-2021
codebowl at gmail.com



More information about the talk mailing list