We’re working on a service… and as I wrote on the prologue to the specs for the dashboard:
Launching a product or a system that you cannot monitor is a nonstarter. Adding monitoring to an existing system is far harder than designing it in from the beginning.-me
I was working on adding some alerts to the system today as well. So I fire up a light load test to get some metrics flowing and start working on stuff.
And it’s not working right. I’m getting errors when I’m not expecting them.
I get an email from an alert I had set up.
Oh… it turns out I’m monitoring an outage of the third-party we’re talking to in real-time.
They host on Heruku. And if just happens that they managed to have an incident right as I was testing.
I guess the monitoring works. :-)