NYCPHP Meetup

NYPHP.org

[nycphp-talk] Curl & Traversing Pages

Dallas DeVries dallas.devries at gmail.com
Tue Nov 22 20:50:51 EST 2005


I see you have

commented out, I think these are key for this to work for you

use these to allow cookies that get sent from yellowpages to be written and
subsequently read on the next request
//$c->SetOpt(CURLOPT_COOKIEJAR, "e:\htdocs\tmp\cookies\superpages.cookie.txt");
//$c->SetOpt(CURLOPT_COOKIEFILE, "e:\htdocs\tmp\cookies\superpages.cookie.txt");
Do you see this file superpages.cookie.txt created and a value set?

only use these if you wish to define the cookies yourself
$c->SetOpt(CURLOPT_COOKIE, 1); and
curl_setopt($c, CURLOPT_COOKIE, $cookieFields);

this is a snippet from what I use for passing around cookies

                curl_setopt($c, CURLOPT_COOKIEJAR, STATIC_ROOT ."/cookies/"
. $this->_domainObj->getDomainId() . ".txt");

        if(file_exists(STATIC_ROOT ."/cookies/" .
$this->_domainObj->getDomainId() . ".txt")) {
            if(filesize(STATIC_ROOT ."/cookies/" .
$this->_domainObj->getDomainId() . ".txt") < 8000) {
                curl_setopt($c, CURLOPT_COOKIEFILE, STATIC_ROOT ."/cookies/"
. $this->_domainObj->getDomainId() . ".txt");
            }
        }




On 11/22/05, Joseph Crawford <codebowl at gmail.com> wrote:
>
> Hello Everyone,
>
> let me explain a bit what i am trying to do.  I have a script that
> will grab the first page which i specify from a URL such as
>
>
> http://yellowpages.superpages.com/listings.jsp?PS=45&OO=1&R=N&PP=L&CB=1&STYPE=S&F=1&L=VT&CID=00000518939&paging=1&PI=0
>
> now when it grabs this page, it will scour the returned HTML and grab
> all the information for each record under Yellow Page Listings.
> once it has all records it then checks to see if there is a Next page,
> basically Next will either be a link or not.
>
> If it is a link the script will execute using the URL from the Next
> Link.  Here's where i am running into problems.  I want to feed it 1
> url and have it go through every page until there is not a next page.
>
> The issue i am having is that with the url grabbed from the link, curl
> fetches the page, but it's not the page expected rather it's an error
> page from superpages stating that i have not supplied enough search
> criteria.
>
> On the first page grabbed, this is the link that is grabbed from the
> source
>
> http://yellowpages.superpages.com/listings.jsp?PS=45&PP=L&CB=1&L=VT&CID=00000518939&paging=1&F=1&OO=1&PI=45
>
> Now when curl grabs that url it complains about search criteria
> however if you paste that to a browser it will work just fine.
>
> here is a screenshot of the page that is returned by cURL
> http://codebowl.dontexist.net/images/ypresult.jpg
>
> I am not sure what is going on with this but if anyone here can lend a
> hand with curl i would appreciate it.  I have
> the cookie directory writable by apache also as i read you had to
> specify the exact path to the cookie on windows using apache 2
>
> Here is my code
>
> http://codebowl.dontexist.net/codebowl/System/Misc/Curl.phps
> http://codebowl.dontexist.net/codebowl/System/Misc/YellowPages.phps
>
>
> Note that i created a curl class because i am thinking of expanding on
> what i have and the framework i am working on
> is going to be 100% Object Oriented.
>
> Any help is appreciated.
>
> --
> Joseph Crawford Jr.
> Zend Certified Engineer
> Codebowl Solutions, Inc.
> 1-802-671-2021
> codebowl at gmail.com
> _______________________________________________
> New York PHP Talk Mailing List
> AMP Technology
> Supporting Apache, MySQL and PHP
> http://lists.nyphp.org/mailman/listinfo/talk
> http://www.nyphp.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20051122/396c1431/attachment.html>


More information about the talk mailing list