Outage Postmortem - July 26th, 2017

Follow

Overview

On July 26th, 2017, some clients may have experienced an outage starting at 11:33am (EDT).

The outage lasted for approximately one hour and was almost completely resolved by 12:40pm (EDT). By 1:17pm, we'd discovered the root cause and started taking action to prevent a recurrence of the issue.

The root cause was an unintentional synchronisation event which affected our Varnish servers in the EU, US and Australia. A change was made the day before, on July 25th, which triggered the event, however the synchronisation didn't take place until our nginx configuration was reloaded on July 26th.

Follow Up

In response to this outage, we've reviewed and updated our internal procedures to prevent this issue recurring, added an additional backup on our Varnish servers for the nginx configuration, and added additional staff to our platform monitoring notifications. Our infrastructure team already monitors our platform 24/7, and we will continue to do so, as well as ensuring we do everything we can to learn from every incident to ensure it doesn't happen again.

We sincerely apologise for the downtime caused by this - we understand how important it is that your website is available 100% of the time, and we will continue to aim for this. You can always monitor our platform status by visiting our Status page, and if you do notice any problems with your website, you can use our live chat service ("Chat with us!" at the bottom right of this page) to alert our Support team immediately and we will investigate the problem.

Have more questions? Submit a request

Comments