[nycphp-talk] php scalability
ntang at mail.communityconnect.com
Wed Aug 13 23:26:51 EDT 2003
Hmmm. Let me answer that with a "not especially".
>From an operations standpoint, you want to build to (at least semi-) automated standards. By that I mean simply making sure that each machine is essentially a cookie-cutter replica of the next, which is fairly easy to accomplish on a large scale with a little thought. It's worth noting that Apache/PHP has no built-in connection pooling, so if you have 10 webservers w/ 100 apache children each, you'll have 1000 connections open to your database. You should also have a way to distribute content across multiple servers. rsync works pretty nicely, to a point.
On the app side, as you noted, you have to keep any state in the db, and have to make sure no "webapp" server keeps any session data locally. This is easy for a lot of people to forget in practice even if they recognize it in theory. Generally, though, anything else is basically just following good programming habits - designing apps and database schemas so they minimize contention, so that they do as much caching as possible (most data doesn't need to be up-to-the-second dynamic; one of my favorite easy to recognize examples of this was from a company I worked at that used an SSI exec to run the unix command "date" on every page load just to print the date - not even time. That's data that updates every 24 hours, so running it a few times a second was obviously stupid.), etc. It's worth bearing in mind that a little extra planning time in the beginning can often make a huge difference in the end results.
Split apps up logically whenever possible. Scaling is generally cheapest when done horizontally. If your site(s) can be broken down into logical pieces, each with their own DB and web cluster, that'll help a bunch, since your DB will probably be your bottleneck. The less you can hit the DB, obviously, the better. Use separate clusters for purely static content - and consider using something like tux or thttpd or some other high-speed server to serve that content. No sense in wasting open DB connections to serve that content.
Something you should think about early and address quickly (regardless of which platform) is shared uploaded content, if there will be any - for instance, if users can upload images, or documents, or music, or whatever, you need a way of mounting that upload repository to all of the servers. The easiest way is just NFS mounting the fileserver across all of the clusters.
Oh, and use something like APC on your php servers. ( http://pear.php.net/package-info.php?package=APC ) (How shameless can a plug be if it's open source? ;) )
I know nothing I said is especially specific, but it's late, I'm tired, and I'm not getting paid. ;)
P.S. I'm catching a plane in a few hours so I probably won't be sending any more replies for a while. ;)
----- Original Message -----
From: Lee Semel
To: NYPHP Talk ; ntang at mail.communityconnect.com
Sent: Wednesday, August 13, 2003 10:38 PM
Subject: Re: [nycphp-talk] php scalability
I heard that your company is one of the largest users of PHP. It's good to know you have able to make this work. I would like to use PHP for our site if it's possible to scale as easily as you described.
Is there anything special in the programming of the application that needs to be done to make it amenable to clustering, aside from keeping session state in the database?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the talk