NYCPHP Meetup

NYPHP.org

[nycphp-talk] Text versioning/manipulation caching architecture suggestions

max goldberg max.goldberg at gmail.com
Mon Jan 28 16:02:41 EST 2008


Hello buds,

It's been a while since I've been active on the list but I figured I'd give
a holler and see if anyone had any suggestions for an application design
problem I've run into.

I have a large number of text fields across many tables in a fairly large
database all of which can be manipulated in any number of ways. Some common
manipulations would be scrubbing strings for display on the web (XHTML
compliance and XSS avoidance), censoring of "bad" words, rich-text, etc.

All in all, once you mix and match all of the various text manipulations,
you end up with a large number of versions of the same chunk of text, and
you need access to all of them based on a plethora of variables such as user
options, access interface etc. On top of that, some fields can be edited,
and I'd like to keep copies of the entire revision history, which adds
another level of complexity.

Originally I thought of some sort of memory caching solution, but the main
goal of this is to come up with a scalable solution and there is currently a
few gigabytes of text that this would apply to, so if anything it would
probably need to expire. It's possible that I could have some mixture of
short-term memory cache and long-term disk cache, as disk/database space
isn't a large concern.

Another issue is manipulation function versioning, e.g. when a new word is
added to the censor function, you want to purge the cache of all of the
censored text created by the last version.

Maybe I'm just over-complicating the entire thing, but doing this sort of
manipulation on a high traffic site seems like a gigantic duplication of
CPU-intensive work that could (and should) be avoided.

I've come up with a lot of solutions, but none of them seem very elegant.
I'm trying to avoid a lot of excess DB queries and SQL joins.

I've done some searching around and it seems like anyone who has solved this
problem hasn't discussed it publicly. I thought maybe someone dealing with
locale on a large scale might have come up with a good solution, but since
locale is mostly static, it doesn't seem to apply in most cases.

So has anyone dealt with something similar, or is there an obvious solution
that I'm missing? I'd be interested in hearing some of the more seasoned
NYPHPer's opinions.

Thanks for any advice in advance!
-Max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20080128/642990f0/attachment.html>


More information about the talk mailing list