NYCPHP Meetup

NYPHP.org

[nycphp-talk] data modelling vs. db design (was: ER Diagram toolfor MySQL/OS X)

Phil Duffy phil at bearingasset.com
Tue Oct 11 09:05:20 EDT 2005


> -----Original Message-----
> From: talk-bounces at lists.nyphp.org [mailto:talk-bounces at lists.nyphp.org]
> On Behalf Of Hans Zaunere
> Sent: Monday, October 10, 2005 7:09 PM
> To: 'NYPHP Talk'
> Subject: Re: [nycphp-talk] data modelling vs. db design (was: ER Diagram
> toolfor MySQL/OS X)
> Normalization is often one of the most misunderstood aspects of relational
> databases.  Of course, it's what makes a relational database relational,
> and
> we're all taught that you're not worth a cent unless you only work in the
> 5th normalized form.
> 
> Well, perhaps if you're a mathematician, then that's correct.  The reality
> is that over-normalization can kill a project just as much as
> under-normalization.  Don't forget - at least from the old SQL books I
> learned from - why normalization was being preached.  "Don't have
> duplicate
> data!" was often the mantra.  If you had more than one guy named "John",
> then you need to have a table of first names, with a lookup table relating
> first names with the other data you're storing.  Seems rediculous, right?
> 
> The practice of heavy normalization was based on some old, bygone,
> concerns,
> namely "storage is expensive."  Add in academic chest-beating, and pretty
> soon you have a table for every letter of the alphabet and look-up tables
> to
> match.
> 
> As I think we generally find, with the abundance of hardware resources,
> normalization is reduced to only an element of good design.  Sure,
> performance can still be a factor, but let's consider the performance of
> an
> entire system, and not just what's happening in the database.

<snip>

Hans,

While I agree with your general message, I have a reservation about the
"storage is expensive" specific example.  Having been around since before
Boole and Babbage, I can't even recall in the "old days" when developers
were slavish about "storage is expensive".  We sought opportunities to
reduce storage, when that was practical, but only in very specialized
applications did we go out of our way to conserve disk storage.  The primary
issue was data consistency and software maintainability.  Lack of
normalization created situations which encouraged users to create
inconsistent data.  For example, consider a pre-relational system in
healthcare that recognized two 'pseudo-entities', physicians and patients
(pseudo in the sense we now recognize these as roles as opposed to
fundamental entities).  Each pseudo-entity had its own basic record to
contain address, telephone number and other attributes.  That worked as long
as the physician never became a patient.  However, a change to a
physician/patient's record created an inconsistency in the changed
attribute, e.g., the address.

The second issue was software maintenance.  There were situations where one
record's structure might be modified while the other was not.

I believe it was a natural progression from normalization of databases to
object-oriented design, because the latter encouraged all interactions with
the database to be performed in a single module.  I know of at least one
situation in healthcare where that limitation in the existing systems nearly
killed a patient.

But your principle point is that database design (and object design by
extension in my opinion) is still an art form.  Both database and object
design should look upon the 'rules' as significant guidelines.  Understand
when to bypass these rules, and document those reasons (conditions may
change in the future and others will probably have to clean up the
situation).

I liked your example of normalizing the string 'John', which underlines the
absurdity of over-normalization.  For me, appropriate normalization begins
with the recognition of the fundamental entities in a system and a
distinction between real world-modeled entities (e.g., a Person) as opposed
to system-internal entities (e.g., a web page).  If I come to a branch in
the road at which normalization or de-normalization both offer advantages, I
am most likely to take the normalization branch, make a mental note and if I
later discover that creates a significant performance issue that can't be
otherwise corrected, I would de-normalize that area of the system
intentionally (documenting my reasons and the implementation).

It would be interesting to see if others have approached normalization
guidelines differently.  I won't claim the above is the only way to do
database design - it is still an art form after all.

Phil






More information about the talk mailing list