As an Azure developer, you likely followed the recent AWS outage news and were relieved that it happened to Amazon and not to your Windows Azure instances. However, you still need to learn from this cornerstone event in cloud computing history.
The reality is that failure can happen on any of the cloud offerings. Just because someone else is managing the physical machines does not mean that design for failure and disaster recovery planning are optional steps in delivering a solution. Chances are good that most deployments to the Windows Azure environment would not survive a similar event. Let’s take a look at why you may be at risk and actions you can take to safeguard your deployments.
When you are setting up your hosted services and your storage accounts, current guidance is to put these together in an affinity group. This ensures that the hosted services and the storage accounts stay in the same physical location. If you are using SQL Azure, you will likely configure that instance to be in the same location as the hosted services that would consume its data. This guidance will continue due to the performance and cost advantages of being in the same data center. Ah, now you see the problem. Just like many of the services that were sitting solely in the Northern Virginia data center, your deployment now sits in a single data center.
Performance Advantage of close proximity
There are significant performance benefits of having your data in the same location as your compute instances. Your application would suffer significant latency penalties for having your web role in North Central US and data in North Europe. This latency adds at least 250ms to any call to fetch data. Contrast this with two machines that are on the same rack connected with a gigabit connection. Hosting your data far from your hosted services could well cause an unacceptable level of performance degradation. I have worked for clients where any database call that took more than 100ms in testing would cause uncomfortable meetings to occur with the resident DBA. This latency is 2.5 times that at best and does not include the data side processing. The takeaway from this is by no means a new concept, nor is it unique to Windows Azure: Keep your data close and performance will improve.
Cost Advantage of close proximity
Under the Windows Azure pricing model, bandwidth between a web role and storage in the same data center is not charged. In the previous sample where the web role is in North Central US and the data is in North Europe, all bandwidth between the two will be charged. Pay-by-resource computing requires that the ongoing cost of a system be kept in mind when designing a solution. The takeaway from this: Keep your data in the same data center and charges can be significantly reduced.
Disadvantage of single datacenter
What happens if you put your compute, storage, and SQL Azure instances in the North Central datacenter, and that data center goes down much like what happened with the AWS North Virginia datacenter? What if it is more like the blackout that occurred in the Northeast a few years ago? You can bet that the Windows Azure team is planning for just such an event, if they have not already. But until this scenario is covered in the SLA that you agree to when using Windows Azure (or any cloud vendor), you need to plan for this failure.
What can I do
So, we have two opposing forces. We need our hosted service, storage, and SQL Azure instance located close together for best performance and most significant cost advantage. We also need hosted services in multiple physical locations for failover. It would seem that there is no resolution between those two goals, but the Windows Azure team has some offerings in the works that help with resolve this conflict. Traffic Manager (in CTP) will allow you to put your hosted services in multiple datacenters and provide traffic routing based on geo-location or round robin. In the event of a datacenter failure, this allows traffic to your web role to be routed to a functioning datacenter. SQL Azure Data Sync (in CTP) will allow you to keep SQL Azure instances in the same location as each of your hosted services and keep the data in sync in the background. The connection strings for each hosted instance would point to the near SQL Azure instance for best performance.
What is missing
As of right now, you would need to write code to keep your storage accounts in sync. Many of your applications may be taking advantage of Windows Azure Table Storage for scalable, low-cost data services. There is currently no offering for keeping your Table Storage data in sync. I think this feature would complete the failover scenario on the Windows Azure platform providing a significant advantage over other cloud offerings today. As such, I have put in a feature request on the site MyGreatWindowsAzureIdea. If you agree with me, please vote up the feature request, which helps raise visibility of the request to the Windows Azure Team.