[nycphp-talk] of interest - new form of database programming
1j0lkq002 at sneakemail.com
Wed Jan 19 13:24:14 EST 2005
Adam Fields fields-at-hedge.net |nyphp dev/internal group use| wrote:
>On Tue, Jan 18, 2005 at 07:42:26PM -0500, inforequest wrote:
>>"Relational databases are one to two orders of magnitude too slow," says
>>Stonebraker, who is chief technology officer at Streambase, a 25-person
>>outfit based in Lexington, Mass. "Big customers have already tried to
>>use relational databases for streaming data and dismissed them. Those
>>products are non-starters in this market."
>This is not news. Relational databases are widely known to be slow,
>which is why you have throw things like Oracle at the problem before
>you get anything close to realtime analytics, and even they fall down
>a lot. Relational databases, however, are mature and stable. This
>doesn't sound like an alternative to a relational database, but a
>complement - you still have to store the data somewhere.
>I wonder if they've done speed comparisons vs. a relational database
>that's entirely in memory (or properly tuned). Technique alteration is
>one thing, but just saying "the whole thing's in RAM" isn't a
I think he was saying they are too slow for *this application* streaming
I posted this because it described querying data as it streams by, not
because it is a replacement for the relational database. Sometimes we
get caught up in the constraints of technology, and don't see innovation
as easily as we might.
How many people are grabbing RSS feeds, storing them to a database,
querying them, and using the results of the queries? And of those cases,
how many consider old data worth saving?
For example, what if Feedster was querying a massive aggregate RSS
stream instead of a database? What if the results where then streamed to
another site, which displayed tag-based trending of that result set?
I like stuff that makes you think outside the box... while at one
company someone is parsing and sorting and categorizing and his team is
optimizing databases, balancing servers and scaling and the company is
utilizing the result set for ecommerce, while at a competitor someone is
processing the stream and utilizing it for ecommerce. Faster to
conversion, faster to maximizing ROI, more flexible, less overhead.
How many cpu cycles and GB of storage are dedicated to log file/log
analysis these days, just to produce periodic traffic summaries? I know
many people have gone to work on inserting rdb's so they could better
manage that process... only to run itnto massive scaling issues. If only
they could process the stream?
More information about the talk