NYCPHP Meetup

NYPHP.org

[nycphp-talk] Parse HTML Files as PHP

Greg Rundlett greg_rundlett at harvard.edu
Sun Feb 22 22:52:27 EST 2009


On Sun, Feb 22, 2009 at 3:31 PM, Tim Lieberman <tim_lists at o2group.com> wrote:
> On Feb 22, 2009, at 3:04 PM, Greg Rundlett wrote:
>
>> On Sat, Feb 21, 2009 at 11:15 PM, Ajai Khattri <ajai at bitblit.net> wrote:
>>>
>>> On Fri, 20 Feb 2009, Peter Sawczynec wrote:
>>>
>>>> Anyone have any comment on this strategy pro or con?
>>>
>>> Not good for performance or scalability.
>>
>> Why exactly?  mod_php is both performant and scalable.
>
> mod_php can be a memory hog, bloating your httpd processes.  Who needs that
> bloat to serve static content?
>
> Nice little webcast about this here:
>
> <http://www.joomlaperformance.com/articles/webcasts/why_mod_php_is_bad_for_performance_52_58.html>
>

Thanks for the link Tim.  That webcast offers a good explanation of
the differences between a couple options for running PHP (Apache
Module v. FastCGI implementation).  However, the OP was looking to add
PHP capabilities (e.g. logic, dynamic content, navigation, sessions,
whatever) to a collection of static html files.  That means he must
use PHP -- either mod_php or fastcgi and will take on some performance
overhead.  And the question is whether he should just turn on the PHP
interpreter for requests to *.html or some other approach.  Saying
fastcgi is faster than mod_php doesn't offer a solution to the
original question.  To add the php interpreter to a collection .html
URIs, you can turn it on at the server configuration level (with the
AddHandler directive "plan
A"http://httpd.apache.org/docs/2.0/mod/mod_mime.html#addhandler ) or
you could do it in a few other ways.

I think the most commonly seen method is to abandon the original URIs.
E.g. "Due to a "site upgrade" our homepage home.html has now moved to
home.php, please update your bookmarks". But I must say that is a bad
idea.

One alternate way (that has been suggested and even seconded) is to
use an application framework (assuming that it has some dispatcher
mechanism that maps the request to the original content) so that it
can respond appropriately to requests for your original URIs.  This
could be implemented with many PHP frameworks (or Python etc.)  The
business case for "creating an application" depends on many factors we
don't know about -- so I'd hesitate to recommend it apriori.  In fact,
I'd say that is a answer to a problem that hasn't been asked yet.

Besides telling Apache to handle *.html requests with the php
interpreter via the AddHandler directive (either as an Apache module,
or fastcgi implementation, or perhaps using lighthttpd...) another
option is to use content negotiation (which is more often used for
translations but works just as well for mime types).   Rename all the
.html files as .php and let Apache serve the mapped file.
http://www.w3.org/QA/2006/02/content_negotiation.html and
http://httpd.apache.org/docs/2.0/mod/mod_negotiation.html  This
approach will not change the requested URIs, so it is SEO-neutral.
I'd recommend using this approach while simultaneously updating all
internal references to be technology-agnostic.  Client requests
foo.html --> Apache finds foo.php and serves it as foo.html.  bar.php
should link to "foo" rather than foo.html or foo.php  The resource on
disk should be foo.php so that developers can know the technology in
use, and IDEs can easily identify the mime-type.

You could rename all the files (*.html -> *.php) and use mod_rewrite
under the covers but that would be a double performance hit: once for
mod_rewrite execution; the other intentional performance hit for
invoking PHP.

I'm assuming that static or anonymous content serving (e.g.
about.us.html which is now about.us.php on disk and served through the
php interpreter in response to requests to either about.us or
about.us.html) would be optimized using php caching, or other caching
strategies.  Info at
http://freephile.com/wiki/index.php/PHP_Accelerator

It's been a while since I dug into the current link juice algorithms,
but I'd say that as long as external links to your content still work
with a 200 response, then you're good.  If you're implementing visible
(to the client) redirects or changing URIs then you're in for a hit.

Also, for those interested in benchmarking performance of different
servers (Apache v. Lighty), deployment options for PHP (cgi v. module)
or PHP versions (php4 v. php5) here are a couple of good links:
http://buytaert.net/drupal-webserver-configurations-compared
http://sebastian-bergmann.de/archives/634-PHP-GCC-ICC-Benchmark.html



More information about the talk mailing list