NYCPHP Meetup

NYPHP.org

[nycphp-talk] Blog Posts with Embedded Content

John Campbell jcampbell1 at gmail.com
Tue Oct 7 21:19:25 EDT 2008


On Tue, Oct 7, 2008 at 5:27 PM, Hans Zaunere <lists at zaunere.com> wrote:
> In one part of the application, they want to show only the first X number of
> characters, before forcing a user to login.  So we need to cut the submitted
> text at this character count, yet, of course, not cut in the middle of a
> tag.
>
> This has turned into a considerable annoyance and I'm wondering if anyone
> has a quick tip/pointer to a resource to solve this - without writing
> excessive text parsing code.

Interesting question, I have searched unsuccessfully for a solution to
this in the past with no luck.

I hacked together a solution for you, but I am not sure I would put it
in production.  It is reasonably safe because it escapes everything it
doesn't recognize as a tag.  If you use it, I would filter out all but
a whitelist of tags (e.g. a,b,i,blockquote,strong) before passing it
to the function.

See the code at:
http://php.pastebin.com/f7f5262cb

The safest approach is probably to pass the html through tidy, and
then into DOM, and traverse and count the length of text nodes, but
that would be quite slow if you ran it on every request.

Regards,
John Campbell



More information about the talk mailing list