Following up after last night's incident involving an increased number of 403 errors, we now have a little more information.
It appears we first saw some elevated error levels early on July 29th. Some customers reported these, but they were not substantial enough to trigger our monitoring alerts.
Despite some initial changes and improvements, these intermittent errors began to escalate again. Still, despite impacting customers, they didn't hit the threshold for our monitoring until 01:30 UTC on July 30th, at which point our out-of-hours team was alerted. This delay in alerting left several customers frustrated without a response. I'm sorry for this.
In response to this alert, we made several configuration changes to our firewall. We also discussed with our service providers why genuine traffic was flagged as dangerous and blocked.
These configuration changes improved the situation.
This morning we have made further configuration changes to the firewall. We have also adjusted the thresholds on our monitoring to give us better early warning should this issue come back at any point.
We are also continuing to work with our Edge and WAF providers to understand better why things went wrong and what improvements we can make.
Our monitoring has been good for a while now, so I believe the configuration changes we made to our firewall have fixed the issue with some genuine requests being blocked.
We are sorry to those of you who were affected.
We'll now conduct a proper review of what happened and look to make changes to our platform to minimize the risks of this happening again.
Thanks again to everyone for your understanding.
Monitoring is looking much happier after the recent configuration changes, we've not seen any blocked requests in the last 25 minutes.
I'll continue to watch things, but they're certainly looking more reliable right now.
We've made some configuration changes to our firewall, which should help with the over-sensitivity and prevent any genuine requests from being blocked.
We're watching the data to see if this has a positive effect.
We are still seeing a few 403 errors; however seemly a lot fewer than before.
We're still working to understand the root of the issue, and I hope to have more positive progress and more information for you shortly.
It appears that this issue may be caused by our firewall, incorrectly blocking some genuine customer requests.
Our monitoring suggests there haven't been any failures of this kind for 10+ minutes now, but we're continuing to work with our infrastructure provider to understand the problem.
Thanks to everyone for their understanding, I hope to have more information for you soon.
Some customers in specific locations currently see several requests failing with an HTTP 403 error code. We're looking closely now at what might be causing this.
I'll update you as soon as we know more.
We’ll find your subscription and send you a link to login to manage your preferences.
We’ve found your existing subscription and have emailed you a secure link to manage your preferences.
We’ll use your email to save your preferences so you can update them later.
Subscribe to other services using the bell icon on the subscribe button on the status page.
You’ll no long receive any status updates from Sorry™ Service Status, are you sure?
{{ error }}
We’ll no longer send you any status updates about Sorry™ Service Status.