Network

We stayed up when AWS went down

on Mar 3, 17 • by Ian Goldsmith • with No Comments

AWS had a well-documented failure of the S3 service in its US East region this week. Our Akana API Management SaaS platform is hosted in AWS, so how were we able to offer a seamless service to our customers throughout this disruption...

Home » General Industry » We stayed up when AWS went down

Amazon Web Services (AWS) had a well-documented failure of the S3 service in its US East region this week. The knock-on effects of the S3 outage took out many of Amazon’s other services and caused serious and long-lasting outages for many high-profile services including Quora, Trello, GitLab, and many others. Overall Amazon provides an outstanding set of services, and as anyone knows these things happen, the key is to architect your solution so that it can tolerate these kinds of failures.

We host our Akana API Management SaaS platform in AWS, including having a substantial presence in the US East region, so how were we able to offer a seamless service to our customers throughout this disruption?

Many API Management solutions offer “SaaS” solutions that are not a true multi-tenant platform. In many cases, what you end up with is a dedicated server running in a defined availability zone within one specific region. If this region, or availability zone, or even server has a problem, your service is down. That’s not how we do things at Rogue Wave. We run a massively scalable, highly distributed, multi-tenant SaaS platform with a highly-resilient architecture.

We use a combination of sharded and replicated NoSQL stores, with master/slave replicated relational stores to provide data storage with exceptional resilience to failure. We use large numbers of small servers in clusters distributed across multiple availability zones and regions with sophisticated routing and load-balancing to ensure that we will continue to process API traffic appropriately no matter what. Even in the event of a total failure in our master region, any of the other regions will take the role of a master within a couple of minutes, meaning that customers may not be able to update their configurations for a minute or two, but that all other functionality will continue uninterrupted. We can survive the failure of individual servers, whole availability zones, and even entire regions without any interruption to customers’ core services.

If you rely on your APIs to stay in business you should look at Akana API Management to help ensure that you offer the best possible service, no matter what’s happening across the internet.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top