NYCPHP Meetup

Sun Oct 12 19:19:11 EDT 2008

Gentlemen,

> > The safest approach is probably to pass the html through tidy, and
> > then into DOM, and traverse and count the length of text nodes, but
> > that would be quite slow if you ran it on every request.
> 
> Right, +1 for Tidy and DOM, it's the "real" way to do it. You won't
> need to do it on every request -- you can either store the summary
> itself as a separate text field, or store the length of the summary as
> an integer.

I tried this, working through using both DOM and Tidy, and combinations of each - no luck.  The problem is getting the differential between the two versions of the text.

> This is crying out for a web service: The Excerpter. POST markup, get
> the first X display characters back as a response, with embedded HTML
> intact.

Yeah, I agree - this has turned into a royal problem, and one that seems as though it'd had to be solved already.

At the end of the day, what would be a very handy library - an object/etc that would store the text, in various forms, include various manipulation methods on it, meta data, etc, etc.  I had written something like this for MIME, but would not look forward to doing it for HTML/etc.

H

NYCPHP Meetup

NYPHP.org

[nycphp-talk] Blog Posts with Embedded Content