NYCPHP Meetup

NYPHP.org

[nycphp-talk] data modelling vs. db design (was: ER Diagram tool for MySQL/OS X)

Hans Zaunere lists at zaunere.com
Mon Oct 10 20:09:19 EDT 2005



Allen Shaw wrote on Monday, October 03, 2005 11:39 PM:
> CED wrote:
> 
> > http://www.databaseanswers.com/modelling_tools.htm
> 
> That's a great list of tools. Something else very interesting to me is
> this note at the end of the page:
> 
> > *A short note about about Data Modelling and Database Design ...*
> > 
> > Data modelling and database design are two very different activities.
> > 
> > For data modelling, the question you are asking is :
> > 1) What does the world being modelled look like ?
> >   In particular, you are looking for similarities between things.
> >   Then you identify a 'super-type' of thing which may have sub-types.
> >    For example, Corporate Customers and Personal Customers
> > 
> >   If, for example, supplier contacts are conceptually different things
> >  from customer contacts, then the answer is that they should be
> >  modelled separately. On the other hand, if they are merely sub-sets of
> > the same thing, then treat them as the same thing. 
> > 
> > 2) For database design, you are answering a different question:-
> >   how can I efficiently design a database that will support the
> >   functions of proposed application or Web Site. The key task here is
> >   to identify similarities between entities so that you can integrate
> >   them into the same table, usually with a 'Type' indicator. For
> >   example, a Customer table, which combines all attributes of both
> >   Corporate and Personal Customers. As a result, it is possible to
> > spend a great deal of time breaking things out when creating a Data
> > Model, and then collapsing them back together when designing the
> > corresponding database.   
> > 
> What interest me here is the writer's understanding that when I'm
> programming an actual application it's in my best interest to
> de-normalize the data model, aiming for fewer tables than my data model
> would have indicated.  I'm assuming this person has good reason for what
> he's saying, but it's something I've never thought of or heard before?
> 
> Can anyone suggest reasons why it's better to have fewer tables, or to
> try and combine different types of objects into the same table?  Surely
> it's something to do with better performance (fewer joins, simpler
> queries), right?  Is this a common principle of design that I haven't
> learned yet?

Normalization is often one of the most misunderstood aspects of relational
databases.  Of course, it's what makes a relational database relational, and
we're all taught that you're not worth a cent unless you only work in the
5th normalized form.

Well, perhaps if you're a mathematician, then that's correct.  The reality
is that over-normalization can kill a project just as much as
under-normalization.  Don't forget - at least from the old SQL books I
learned from - why normalization was being preached.  "Don't have duplicate
data!" was often the mantra.  If you had more than one guy named "John",
then you need to have a table of first names, with a lookup table relating
first names with the other data you're storing.  Seems rediculous, right?

The practice of heavy normalization was based on some old, bygone, concerns,
namely "storage is expensive."  Add in academic chest-beating, and pretty
soon you have a table for every letter of the alphabet and look-up tables to
match.

As I think we generally find, with the abundance of hardware resources,
normalization is reduced to only an element of good design.  Sure,
performance can still be a factor, but let's consider the performance of an
entire system, and not just what's happening in the database.

Not considering the overhead involved in joining tables (and thus resolving
disparate blocks across the drive(s)), there's overhead in the application,
and in a big way.  First is the performance of dealing with duplicate rows -
you know, the throwing out of wasted rows from a one-many relationships.
That not only incurs network traffic - which is often the slowest part of a
modern internet system - it also incurs processing and memory waste on the
application server.

And what about the overhead of development itself?  Large complex queries,
sub-queries, joins, and SQL query management in any language can make for
unmaintable and tedious code.

So it's a balancing act - as always.  The degree of normalization is
generally more of an art than a science, and the "correct" amount is
different for every project, and for every database architect.  Get three
database architects, and you'll get four recommendations for the correct
schema.


---
Hans Zaunere / President / New York PHP
   www.nyphp.org  /  www.nyphp.com






More information about the talk mailing list