This is coming on the heals of yesterday’s post: the woeful configuration table that has too many rows.

Any time you need to configure something you need to make sure that at the end of the day things are configured as they should be. When you have a table with well over 70,000 rows in it, the chance that all of them are correct quickly approach zero.

Let’s examine how things can grow out of hand like that.

First off, lets make a config table for product line for instance. Maybe the color scheme for each product.

Product Line Color
Vacuum Cleaners Blue
Computers Red
Radios Black
Phones Yellow
Camping Equipment Fuchsia
Tea Puce

This is all good and a normal, non-OCD, human can go in and manage this table just fine. It’s something that works. (Also imagine another 100 product lines in there. I’m lazy enough to not want to make something that big) Additionally let’s assume we have two or three more columns of configuration on each of these rows.

Now the business comes to you and says “this works fine except for customers coming from Apple really should be aquamarine.”

Now what?

Two solutions come to mind:

  • Make a special case in the code for Apple’s source (42 in this case)
  • Add a column to the table for source and multiply the number of rows (product lines) by the number of sources.

The first one sucks, the second one is is horrific. Lets go over each one in turn.

Adding a special case with an ID in code is bad. It makes something which is a database abstraction and moves a hidden dependency into the code itself. Now “42” is magic. (Even more magical than before) Just by looking at the configuration table you’d have no idea that Apple’s traffic is yellow. Moreover the code is now tied to the database as well; what if the test and production databases are different for some reason? Even worse the precedent has been set that this is the way to do this type of thing. The next novice coder that comes along will probably do nothing except copy it and make “105” just as magical later on. You can see this going down an slippery slope.

The second choice might seem like the better option at first glance but it’s evilness runs even deeper. We have a hundred product lines and a thousand sources. Now we have 100,000 rows in the configuration table. OMG! This might work in the short term when all we do is script out a change… but in the long term this will fall on its face even harder than the first option since this creates so many rows of data that you can’t maintain that many with any reliability.

Given just these two options, the first option is probably the better one. But it’s still pretty sucky.

Really, what you want is some variety of decision system that won’t automate the process, but allow for the flexibility to come up with the right answers. The key is to make sure that you have the minimum amount of data to encode the required information. What you want is to make sure that you don’t store any more redundant data than is absolutely necessary.

Not having a system like that, I’d make my own language to support it in some way.

Based on this if I didn’t have the luxury of a decision system or to make something myself I’d hold my nose and do number one. Sucks, eh?