NYCPHP Meetup

[nycphp-talk] talk Digest, Vol 41, Issue 5

Gary Mort garyamort at gmail.com
Tue Mar 9 00:41:47 EST 2010


On Mon, Mar 8, 2010 at 8:33 PM, Mitch Pirtle <mitch.pirtle at gmail.com> wrote:

> On Mon, Mar 8, 2010 at 6:34 PM, Gary Mort <garyamort at gmail.com> wrote:
> > I think in the end there are a range of practices, and while special
> numbers
> > can and does turn around and bite you in the but eventually[what happens
> > when you run out of them?] - their good enough for /most/ people.  And
> those
> > who do get bitten by the issue are hopefully making enough money hand
> over
> > fist to hire programmers from the NYPHP members list. :-)
>
> Yup, and that line of thought is exactly why we've suffered under the
> loathsome burden of ItemID all these years.
>


I think it's more that itemID, ID, etc have stood the test of time as the
best solution.

In the beginning, it was a simple matter of what is the FASTEST comparison
one can make - and an integer is pretty dang close to the fast[bitwise
comparisons being even better] - and for understandability, integers are
easier to deal with.

It allowed one to store relational data in the smallest possible value.

As processor speed increases, and disk space becomes dirt cheap, it is no
longer neccessary to store all these keys to save space and processing
power.  But by the same token, it costs virtually nothing more to store an
integer ID AND data that actually has semantic value[am I using that term
correctly?].

For example, I can have a user table:
Userid, Username, Firstname, Lastname, EmailAddress, etc

And then a forum posting table, and for each post in the old days, I would
store userid.....  while using a combo of username, firstname, lastname,
emailAddress as my key might seem more intuitive.... Why?  We know one of
those is going to change over the lifetime of a system.  If I want to store
that data with the post to make things faster...well, I can go ahead and do
that AND store the userid.  And then I can decide which takes precedence...
 For example, maybe I want to show the username of the person at the time
they posted, but if you go to their profile I will use the id to link to it,
and show their current name.

While you can argue that it costs nothing to use the other data for a key
due to increased processing and disk space...I would argue it costs nothing
to include that other data AND a key field.

Furthermore, no matter how hard one tries, the identifying link is going to
be shown to a user in some manner at some point...in the url, in a profile
page, in a forum posting, SOMEWHERE there will be the identifier shown to
them.

And sure as men suck, there is going to be some woman getting out of a bad
relationship, who has changed her name after the divorce and she is going to
be really really pissed to have that reminder staring her in the face.  And
then the programmer has to go and track down some way to change the data not
only currently, but retroactively...

And then something crashes...and you restore the backup tapes...from 6
months ago because no one was checking the recovery process.  And you have
to figure out how to match up all the data you retconned on the live system
that you now recovered against the original data from 6 months ago.

All data changes, integer key data is the only way to keep things simple[I
love the 'internet' solution for XML, where you namespace things based on
your domain....and then 3 years later the company has changed hands, the
domain went to someone else...and you have this domain for some porn site in
your code...yick]

And by the same token that you can store more textual data in a row, you can
also store larger integers so you can never run out of them.


Then there are spreadsheets...  spreadsheets never ever die....  and
business people love them.  So try explaining to them why they have to have
a key identifier that is 300 characters long in order to match spreadsheet A
to spreadsheet B.

No, I see no reason to kill the humble ID.  Supplement it with extra data,
sure.  You can normalize your data to an extreme and still duplicate it in
every table if you wish.

Dealing with ID's can be painful from time to time for developers and admins
- but dealing with semantic data as a key is painful from time to time for
users.  And I am firmly in the camp of choosing to make life harder for a
few geeks who can handle it - then the end users who just won't get why it's
not a trivial matter to change their last name on every report ever filed in
the system because they fell in love and got married...

-Gary

:-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20100309/abe3364a/attachment.html>


More information about the talk mailing list