[Ops] Disabling pages during maintenance

We have a lot of pages lately, and at the current noise level it’s hard to properly track all of them. So it’s important to reduce that noise. On one side, the noisiest genuine alerts are being addressed (the main one currently being the forums issue) – but on the other side, it would be good to also do something about one of the most frequent false positive: the maintenance operation.

There are often operations that can generate downtimes - and in these cases, currently we let the alert happen, comment on the Mattermost thread, and close the alert (hopefully). Could we instead prevent the alert from being generated at all in these cases? If the downtime is expected, it isn’t actually something to page about.

I haven’t tested it, but there seem to now be a supported feature to schedule one-time or recurring monitor downtimes:

https://docs.newrelic.com/docs/synthetics/synthetic-monitoring/using-monitors/monitor-downtimes-disable-monitoring-during-scheduled-maintenance-times

Could we start using that when performing a maintenance operation?

[ Ticket to log time discussing & testing this ]

2 Likes

@antoviaque, a while ago, I wrote some documentation for doing this here under the Silencing OpsGenie/New Relic alerts for known issues section. I have been following this whenever anything that I am going to work on will raise an alert.

Can we check & update this, move this to a better location, if needed?

5 Likes

@guruprasad Ah, that’s great! Thanks for having done this already :+1: Yes, it could make sense to double-check that it’s up to date – and maybe move the whole page to our public documentation, since we shouldn’t have any actual secret in there? That would also make it more visible and discoverable.