NYCPHP Meetup

[nycphp-talk] of interest - new form of database programming

inforequest 1j0lkq002 at sneakemail.com
Wed Jan 19 13:24:14 EST 2005


Adam Fields fields-at-hedge.net |nyphp dev/internal group use| wrote:

>On Tue, Jan 18, 2005 at 07:42:26PM -0500, inforequest wrote:
>[...]
>  
>
>>"Relational databases are one to two orders of magnitude too slow," says 
>>Stonebraker, who is chief technology officer at Streambase, a 25-person 
>>outfit based in Lexington, Mass. "Big customers have already tried to 
>>use relational databases for streaming data and dismissed them. Those 
>>products are non-starters in this market."
>>    
>>
>[...]
>
>This is not news. Relational databases are widely known to be slow,
>which is why you have throw things like Oracle at the problem before
>you get anything close to realtime analytics, and even they fall down
>a lot. Relational databases, however, are mature and stable. This
>doesn't sound like an alternative to a relational database, but a
>complement - you still have to store the data somewhere.
>
>I wonder if they've done speed comparisons vs. a relational database
>that's entirely in memory (or properly tuned). Technique alteration is
>one thing, but just saying "the whole thing's in RAM" isn't a
>revolution.
>
I think he was saying they are too slow for *this application* streaming 
data).

I posted this because it described querying data as it streams by, not 
because it is a replacement for the relational database. Sometimes we 
get caught up in the constraints of technology, and don't see innovation 
as easily as we might.

How many people are grabbing RSS feeds, storing them to a database, 
querying them, and using the results of the queries? And of those cases, 
how many consider old data worth saving?

For example, what if Feedster was querying a massive aggregate RSS 
stream instead of a database? What if the results where then streamed to 
another site, which displayed tag-based trending of that result set? 
What if......

I like stuff that makes you think outside the box... while at one 
company someone is parsing and sorting and categorizing and his team is 
optimizing databases, balancing servers and scaling and the company is 
utilizing the result set for ecommerce, while at a competitor someone is 
processing the stream and utilizing it for ecommerce. Faster to 
conversion, faster to maximizing ROI, more flexible, less overhead.

How many cpu cycles and GB of storage are dedicated to log file/log 
analysis these days, just to produce periodic traffic summaries? I know 
many people have gone to work on inserting rdb's so they could better 
manage that process... only to run itnto massive scaling issues. If only 
they could process the stream?

-=john andrews










More information about the talk mailing list