I wandered into a discussion that was happening yesterday and made an interesting observation: padding of estimates from two different people clashing in some interesting ways.
This is about a project that’s going to be going to our production site soon. Nothing earth-shattering or novel, just a typical project that could add some significant load to our database. The load here is write-heavy logging for the most part.
Taking a step back for a second, let me describe our production environment. It’s typical for a large OLTP e-commerce site: a clustered pair of servers to ensure reliability in the face of failure. It’s some impressive hardware, but the configuration is pretty much by-the-book. Since this is a highly used database that’s business critical (we stop making money if it’s down), we don’t want to screw it up. Adding some load probably won’t do anything bad, but I don’t blame our DBA for raising his hand. He’s the one that needs to sleep and night knowing things are working as they should.
Plan B: take a spare server (It’s a beefy server that’s between jobs) and log to it instead.
Now the thing that should jump out at you is that I wrote “server.” It’s not plural. The obvious next question is what happens if it fails? (Well, at least the main production site will still be just fine!)
Here’s where it gets more interesting.
Business Owner: If the server goes down, how long will it be until it’s back up?
DBA: It could take up to four days.
BO: Then this project wouldn’t work right.
Continuing to talk a few things came up:
- We have a 4-hour turnaround time on server maintenance from our vendor.
- If we stop the logging we just become slightly non-optimal. We can run with old data for a while.
In theory, the database could be down for a few days. If things really blew up. But that’s really unlikely. The project is to optimize something based on performance — if we stop logging we stop optimizing. But things shouldn’t swing wildly — just using the most recent available data is good enough for a while (assuming that it doesn’t die on day 1 with no data)
Remember though that in the conversation there were no lies! Not even exaggerations. Just blunt worst-case assessments. I think there was a bit of CYA mixed in too. The worst-case scenario is bloody unlikely in this case. Basing everything around that just makes things expensive. It does let you tell something to your boss though.
What I learned is how the worst-case thinking can multiply and spiral out of control. Normally you see that over the course of weeks and countless meetings. This was just boiled down to it’s essence. Plenty of common ground typically exists between the extremes that are first explored. The trick is to make sure you can find that before you clam up and stand your ground.
The win here is that everything did work out in the end, but it could’ve gotten expensive if everything really had to be handled at the utmost level.