NYCPHP Meetup

NYPHP.org

[nycphp-talk] data modelling vs. db design (was: ER Diagram tool for MySQL/OS X)

max goldberg max.goldberg at gmail.com
Tue Oct 4 12:46:55 EDT 2005


I'm actually running into the point of an application where my completely
normalized database design is just killing my database server. It was fine
during the first few months of my site being up but as my traffic grew, it
really began to take a toll. The first thing I had to do was move from
MyISAM to InnoDB, due to the huge number of locks causing the site to just
screech to a halt. On average I am getting around 5000 hits per minute to
the PHP pages that connect to the database. During the "slow" time at 7am, I
get around 2500. I've managed to lower my average queries per second from
300 to 220, but normalization is still killing me.

This makes doing any sort of database maintenance fairly difficult. InnoDB
has helped a lot, I have a dedicated DB server, but one query that has to
copy to a temp table on each page view means almost certain death. My
temporary solution to this has been to create "digest" tables which are more
or less 5-10 table joins that get refreshed every X minutes via cron. This
obviously causes caching problems as data isn't up to the second, and it
seems like a pain to write code around using this "hack" to make my
application work as it should.

I've considered using InnoDB's cascading update/delete function to keep my
digest tables up to date without a cron job, but I want to play with it more
before I just launch into it. I am starting to wonder how much of this is
just my lack of expertise in database administration and how much is just a
fundamental flaw with normalized database design. Is it possible to have a
large scale production database with a fully normalized design without
extremely expensive hardware?

It seems like it would be worth the extra overhead to keep behind-the-scenes
digest tables of commonly joined tables, is this something that higher end
databases (ie. oracle) do?

-Max

On 10/4/05, Russ Demarest <rsd at electronink.com> wrote:
>
> Another consideration is how you are using the addresses and the
> users. For example if you are going to be joining users with other
> tables then trying to determine which of those users also have an
> address in NY the queries can beat on your server. How many users
> will you have? What with the addresses be used for? Will you need to
> search all addresses for each user related to other tables?
>
> From a data point of view you would normalize it all and I would
> suggest a states and countries table to be linked to addresses as a
> well as address type. But when you are trying to come up with all the
> users with a address in NY who are related to other tables, i.e.
> groups, you may wish you have it all in one table.
>
> If you get really "normal" you could have a "user_addresses" table
> which would contain the address_type field so if you have 2 users at
> the same address the address would only be in the address table once
> but related to both users. :)
>
> Don't get me started if you need to track history. "Where did user X
> live at home 2 years ago?". Then you have a start_date and end_date
> in the user_addresses.
>
> I have done it both ways and there really is not a clear winner,
> depends on the situation. I am glad I am not tracking user address
> history any more,
>
> Good luck
>
>
> On Oct 4, 2005, at 11:53 AM, Allen Shaw wrote:
>
> > Stephen Musgrave wrote:
> >
> >
> >> This is an interesting topic because I'm approaching a question based
> >> upon this principal. There is an application that I am building
> >> where
> >> the User record can have 5 addresses (home address, work address,
> >> permanent address, international address, etc, etc). I'm considering
> >> making a table called UserAddress and then linking it to address ID
> >> fields in the User table. I'm on the fence about it because while I
> >> don't want a monstrous User table with tons of columns, I also don't
> >> want to over normalize.
> >>
> >>
> >>
> > Here's a great working example to explore the issue. I would let the
> > User table contain no address info, and then let the UserAddress table
> > contain one address per row with an extra column for UserID. This
> > seems
> > like a clear one-to-many relationship between a person and his/her
> > many
> > addresses (also assuming it could be any number between 0 and 5,
> > right?), so adding 5 sets of columns to the User table doesn't seem
> > right...
> >
> > Honestly, the only reason I'm writing here is so someone can
> > correct me
> > if I'm wrong (and if I'm right then this might actually helping
> > somebody
> > -- neat-o ...).
> >
> > --
> > Allen Shaw
> > Polymer (http://polymerdb.org)
> >
> > _______________________________________________
> > New York PHP Talk Mailing List
> > AMP Technology
> > Supporting Apache, MySQL and PHP
> > http://lists.nyphp.org/mailman/listinfo/talk
> > http://www.nyphp.org
> >
> >
>
> _______________________________________________
> New York PHP Talk Mailing List
> AMP Technology
> Supporting Apache, MySQL and PHP
> http://lists.nyphp.org/mailman/listinfo/talk
> http://www.nyphp.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20051004/1605a50d/attachment.html>


More information about the talk mailing list