Goodbye, auto increment

You are going to see a lot more writing about non-relational databases from me in the near future, and I thought I’d start making the transition with a little ditty about auto increment columns in MySQL.

In short, they are bad. Not bad like mixing parachute pants with knitted sweaters, but bad like trying to eat a Volvo because you are hungry.

Auto increment, how much do I hate thee? Let me count the ways:

  1. Firstly they encourage lousy design. If you are going to use a relational database, you need to bother yourself with learning basic relational theory.
  2. They blow a hole in the push for a more semantic web. They are just numbers, and have no meaning.
  3. They make sharding really really hard. Hope that site of yours never gets very successful.

Put more bluntly, right now the only databases that use auto increment are MySQL and Microsoft Access. Yes, there is a reason, and it ain’t a good one.

First lets cover the ‘lousy design’ angle.

Don’t forget your keys, honey.

When you are building a table, you need a primary key.

(You’re still with me, right? Good.)

Now this key needs to be unique for every row. Maybe there is a user table in your website, and there is a column called username.


+----------+---------------------+
| id | username                  |
+----------+---------------------+
| 62 | admin                     |
| 63 | amy                       |
| 64 | charlie                   |
| 65 | hasselhoff                |
| 66 | dracula                   |
| 68 | caroline                  |
| 69 | peter                     |
+----------+---------------------+

That’s unique, right? Then why are you leveraging a number that has absolutely no meaning? In the good ol’ days when I had to walk to school 20 miles each way in the ice and snow, naked, disk space was expensive and RAM was even worse. Then this kind of approach would have been appreciated as a cost cutting move. Nowadays? Fuhgeddaboudit.

If you are building a relational database, you need to know what a relational database is in the first place. There are tons of free resources out there to help you along. Trust me, knowing how to take advantage of a relational database and actually using one is akin to getting a drivers license and actually driving. Otherwise, you’ll make no friends plowing into someone’s living room with their car.

The worst part about this approach is now whenever you insert a record into that table, you gotta make a second call to find out what the id was. Pretty sure it would be a lot more efficient to know when you inserted the row what the primary key was, now don’t you think?

Say what you mean.

Using the above example, when I am referring to a user on your website I am not going to say “hey man, did you help out user #41432995?” I’m going to ask you about the user “funkychicken” or “the dude with the MIT email address.” (For the record, anyone from MIT referring to themselves as funkychicken is okay in my book, ‘s all good with me.)

If you refer to a particular record by a column in that table, most likely that is your natural primary key. Why ignore that for an artificial one?

Slicing and dicing, gone horribly wrong.

When you start dealing with websites that got too big for their britches, the first point of major pain is almost always the database. Relational databases can be scaled, but it is hard, and can only go so far. Most big websites in this situation can either migrate to a different platform (more on that in subsequent posts) or chop their data into smaller chunks so it is easier to manage. This practice is called sharding.

With that out of the way, sharding is really really hard when all of your information is linked with arbitrary numbers with no meaning. Sharding approaches are unique for every shop and website, so instead of making a passionate argument here, I’m going to leave you to your own devices. Try it yourself, and within a couple hours you will be nodding in complete understanding (and agreement).

There’s always somebody who refuses to fit in.

That’s right, there are plenty of scenarios where auto increment is doing you a favor. It was actually meant to do that favor for you. They just never envisioned an entire database schema littered with nothing but random numbers as primary keys.

Comments, for example. I’ve seen databases where articles go in one table, and comments go in another. The comments are not necessarily unique, maybe except for the timestamp and email of the person making the comment. Instead of making a composite key and additional index overhead, auto increment makes sense here. Besides, you are always going to be fetching based on the article_id or some such, right?

But wait, it’s just not that easy.

“Your users example is no good for me though, as I have multiple records with the same username (one is an old disabled account, or whatever). This doesn’t make sense to have a composite key of username and status!”

You’re absolutely right. And that’s why I think you should consider a different datamodel. This one maybe makes sense if you are auto increment happy, and I’d say over-reliance on auto increment is what got you into this jam in the first place.

Try it – design a database from scratch, with absolutely zero auto increment columns. It is a good exercise, and I promise that even if the resulting database is worthless, you will have learned a few things.

Next up I’m going to talk about a couple different approaches that are leaving the realm of tomorrowland and starting to percolate in today’s consciousness.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • FriendFeed
  • LinkedIn
  • SphereIt
  • StumbleUpon
  • Technorati
  • TwitThis
  • Identi.ca
  • Tumblr
  • Twitter

About mitchy

Founder @joomla, CTO @Totsy, contributor at @mongodb and @lithium, technology yes man. Musician, father, youth soccer coach. I am heavy into social technologies and open source, so you can see me all over the place. I'm on Linked In, and you can find me on Facebook and a pile of other sites as well. Might add them all here, when I have more time to play with this site.
This entry was posted in Nerdy stuff and tagged , . Bookmark the permalink.

3 Responses to Goodbye, auto increment

  1. Andy Tarr says:

    The problem with using something with meaning is that you better be sure it won’t need to be changed and there are surprisingly few things that is true of. Changing a primary key, especially on a non-relational database, can be a real pain.

    I really like to be able to change user names, for instance. I had one user who got married and divorced 3 times in 7 years – and changed her name every time. Product ids, catalog numbers and even UPC numbers need to be able to change without creating a new record. Such a lovely feeling when you had to change a key piece of information like that and to find it wasn’t used as a key.

    Still, you are right not to use auto inc indiscriminately.

  2. mitchy says:

    It requires balance. My biggest beef with auto increment is that too many folks just think that is required for every table.

    In short, they don’t seem to be modeling data anymore. If you don’t want to bother yourself with the structure of your data, you should go schema-free, man! :-)

  3. Gary says:

    As you pointed out, disk space is cheap, even if you have no use for the auto increment field for a table today, it costs nothing to include it. Course it also costs nothing to put on a couple of timestamp fields for created and updated.

  4. [quote]
    Put more bluntly, right now the only databases that use auto increment are MySQL and Microsoft Access. Yes, there is a reason, and it ain’t a good one.
    [/quote]

    SQL Server has IDENTIY(), Postgres has sequences. 99% of the time these operate the same as autoincrement, with all the same problems. I doubt Oracle, Sybase and DB2 lack this feature by another other name. While popularity does not make this feature right.

    [quote]
    In the good ol’ days when I had to walk to school 20 miles each way in the ice and snow, naked, disk space was expensive and RAM was even worse. Then this kind of approach would have been appreciated as a cost cutting move. Nowadays? Fuhgeddaboudit.
    [/quote]

    DBs are one of the areas where this does not really apply. No matter how many gigs of music/video/word document I have I’m only using one at a time so there is a “fast enoiugh.” When I have a blog, an accounting system, or something else, I want a billion people to be able to access said blog. I want to be able to store a millions of articles and billions of comments in one database. I also want to serve them to hundres of people a second. From that point of view, keysize matters. Of course, if you want to be able to shard and have an artificial key, even a a 96bit OID or a 128 bit UUID is smaller than most email addresses.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>