We’re working on a service… and as I wrote on the prologue to the specs for the dashboard:

Launching a product or a system that you cannot monitor is a nonstarter. Adding monitoring to an existing system is far harder than designing it in from the beginning.


I was working on adding some alerts to the system today as well. So I fire up a light load test to get some metrics flowing and start working on stuff.

And it’s not working right. I’m getting errors when I’m not expecting them.

I get an email from an alert I had set up.

What’s up?

Oh… it turns out I’m monitoring an outage of the third-party we’re talking to in real-time.

They host on Heruku. And if just happens that they managed to have an incident right as I was testing.

I guess the monitoring works. :-)