NYCPHP Meetup

NYPHP.org

[nycphp-talk] Caching, proxies, sharding and other scaling questions

Jake McGraw jmcgraw1 at gmail.com
Sat Jul 25 20:43:07 EDT 2009


Very ontopic image for our discussion:

http://highscalability.com/nsfw-hilarious-fault-tolerance-cartoon

On Sat, Jul 25, 2009 at 2:18 PM, Mitch Pirtle<mitch.pirtle at gmail.com> wrote:
> Memcache is your safest option for an in-memory solution, for sure.
> Realistically, does memcache even have a competitor in that regard?
>
> For persistent storage, you should look at MongoDB and Project
> Voldemort. Voldemort is insanely fast as a key/value store, and
> coupled with BerkeleyDB storage is unbeatable for scale, shard and
> speed needs.
>
> MongoDB provides a persistent datastore that also goes direct to disk,
> and is wikkid fast - MongDB tries to bridge the gap between key/value
> stores (scale, shard, speed) and relational databases (lists, finds).
>
> I have a site that will probably break 1B page views in 6 months,
> running on three physical webservers, each of which having memcache as
> well. It certainly can be done, but depends greatly on the quality of
> your developers, how much time you give them to do their thing, and
> the overall performance characteristics and requirements of your site.
>
> Making massive scale websites isn't actually hard, it just takes time.
> Not giving your engineers time to think things through takes their
> efficiency and sets it on fire outside in the street. :-)
>
> -- Mitch
>
> On Fri, Jul 24, 2009 at 7:29 PM, Jake McGraw<jmcgraw1 at gmail.com> wrote:
>> On Fri, Jul 24, 2009 at 6:45 PM, Ajai Khattri<ajai at bitblit.net> wrote:
>>> On Fri, 24 Jul 2009, Jake McGraw wrote:
>>>
>>>> Whats your data size like? How many requests per second do you plan on
>>>> handling?
>>>
>>> Its a very big site. Last year, we handled a total of 945 million page
>>> views. And we expect those numbers to go up of course :-)
>>>
>>>> a relational database to a key/value store (memcache is nice,
>>>> personally, I'm becoming a big fan of Redis) is to set up a single
>>>> instance and see how it handles the load.
>>>
>>> Yes, my thoughts exactly. (BTW, I also looked at Redis earlier today, but
>>> I have yet to see a comparison with memcache). Any thoughts?
>>>
>>
>> Memcache is a proven product with a long (in web terms) history. Redis
>> is brand knew, RC for version 1.0 was just put out fairly recently.
>> The things I like about Redis are:
>>
>> Data Persistence (not just in memory)
>> * Very easy to take a snapshot of your entire data store, just backup
>> the data dump dir.
>> * Very easy to prime a new data store. Let's say part of scaling
>> strategy includes mirroring your data, that is, you'll have multiple
>> cache servers with the same data. Simply take a snapshot of your data
>> dir, move the files to a new server and start redis.
>> * If your server goes down you can still recover information from the
>> last active state.
>>
>> Lists
>> Redis is not just a key/value store, it also provides lists of values
>> under a single key. You can push, pop, get the length, get an
>> arbitrary value within a list and a bunch of other features. Doing all
>> of this computation within the provides two benefits: 1. No round trip
>> and (de)serialization, 2. Atomic transactions.
>>
>> KEYS command for wildcards in key support.
>> http://code.google.com/p/redis/wiki/KeysCommand
>>
>> Sets
>> Though I haven't played around with sets yet, they look pretty powerful.
>>
>> In general, I think the KEYS and List commands makes the whole
>> key/value thing a lot easier to use when coming from an RDBMS
>> background. For performance information, check out this post:
>>
>> http://groups.google.com/group/redis-db/browse_thread/thread/0c706a43bc78b0e5/455dd41883d90101#455dd41883d90101
>>
>> - jake
>>
>>>> For example, with modern
>>>> hardware, value look up from a single, untaxed instance of memcache
>>>> should take around 1ms. At a certain point, based almost entirely on
>>>> traffic, that'll go up. When it gets to an undesirable level, throw in
>>>> another memcache instance and hash the keys to spread the load (or
>>>> allow your memcached client to hash the keys for you). Continue this
>>>> until some other bottleneck rears its head.
>>>
>>> We know where the bottle necks are, so right now its a case of selecting
>>> some solutions to test with.
>>>
>>>
>>> --
>>> Aj.
>>>
>>> _______________________________________________
>>> New York PHP User Group Community Talk Mailing List
>>> http://lists.nyphp.org/mailman/listinfo/talk
>>>
>>> http://www.nyphp.org/show_participation.php
>>>
>> _______________________________________________
>> New York PHP User Group Community Talk Mailing List
>> http://lists.nyphp.org/mailman/listinfo/talk
>>
>> http://www.nyphp.org/show_participation.php
>>
> _______________________________________________
> New York PHP User Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/show_participation.php
>



More information about the talk mailing list