Skip to content

Microsoft Resolves Outlook.Com 3-Day Outage

Outlook.Com 3-day Outage

Microsoft have posted an update on their service status page indicating that they have resolved the recent outage with their cloud based email solution – outlook.com. The incident was believed to have originated with their ActiveSync caching server which had a flow on effect causing unusually high traffic levels to the rest of their services. To prevent further issues, Microsoft was forced to block Exchange ActiveSync access in order to commence service restorations.

The full explanation from the company is below:

Update and Resolution of Recent Outlook.com Outage We want to apologize to our customers who were affected by the outage on Outlook.com this week. We have restored access to all accounts and have made changes so that the service will be more resilient in the future. We realize that we have a responsibility to the customers who use our services to communicate and share with the people they care most about, and we apologize for letting those customers down this week. Our first priority is to the health of the services, and we will learn from this incident and work to improve the experience of all our customers. As part of that, we would also like to provide more detail about what happened. This incident was a result of a failure in a caching service that interfaces with devices using Exchange ActiveSync, including most smart phones. The failure caused these devices to receive an error and continuously try to connect to our service. This resulted in a flood of traffic that our services did not handle properly, with the effect that some customers were unable to access their Outlook.com email and unable to share their SkyDrive files via email. In order to stabilize the overall email service, we temporarily blocked access via Exchange ActiveSync. This allowed us to restore access to Outlook.com via the web and restore the sharing features of SkyDrive. These parts of the service were fully stabilized within a few hours of the initial incident. A significant backlog of Exchange ActiveSync requests accumulated as we worked to stabilize access. To avoid another flood of traffic, we needed to restore access to Exchange ActiveSync slowly, which meant that some customers remained impacted for a longer period of time. We have learned from this incident, and have made two key changes to harden our systems against future failure – one that involved increasing network bandwidth in the affected part of the system, and one that involved changing the way error handling is done for devices using Exchange ActiveSync. We will continue to monitor the system and make additional changes as needed to keep the service healthy. We are now fully through the backlog and have restored service so all customers should have normal access from all of their devices. We want to apologize to everyone who was affected by the outage, and we appreciate the patience you have shown us as we worked through the issues.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.