NYCPHP Meetup

NYPHP.org

[nycphp-talk] fsockopen / scrape / html mail

Andrew M. Yochum andrew at digitalpulp.com
Fri Jan 10 11:39:58 EST 2003


On Fri, 10 Jan 2003, Lynn, Michael  wrote:

> While the fsockopen / scrape issue is hot...
> 
> I have a Newsletter mailing script that uses fsockopen to do just as you say:
> Scrape a url and mail the contents to a list of recipients.
> 
> The only problem is that if I mail this to a user that doesn't have access to the original url, the resultant email will contain only text with errant referances to the images.  This is because the
> email merely contains hrefs and src's back to the original server.
> 
> Is anyone aware of how to scrape and grab content and mail it so that the email is wholey contained and independent of the original url/server?

I'm not sure how you would include the images as attachments for use in HTML in
the email.  You could drop them on a server that the user does have access to
and refer to them in the email. Here are a few ideas to get you on your way
in either case.. 

A. Parse the HTML and retrieve the files yourself.

B. Using lynx: If you exec lynx on a page with the -dump and -image_links
options, it will render the HTML to plain text for you, and at the end list all
URLs and images referred to in the document.  You could parse this list and use
to retrieve each image required.  (Much simpler parsing than in A)

C. Using wget: Using the --page-requisites will cause wget to grab all other
files required to render the page locally.  (No parsing!)

So once you've got your required files, then you have to get them to the email
recipient.  You could drop them on a server they do have access to. Or figure
out how to attach them in an email and refer to them in an HTML email. I know
the latter is possible, although I have not done it myself.

Hope that helps.

Regards,
Andrew

-- 
Andrew Yochum
Digital Pulp, Inc.
212.679.0676x255
andrew at digitalpulp.com




More information about the talk mailing list