Spotlight on leadership and cybersecurity in changing times

Metin Mitchell, Managing Partner, and guest contributors

Tuesday, 03 April 2018 09:56

When and Why Clouds Go Wrong

Written by Raef Meeuwisse When and Why Clouds Go Wrong

Guest blog by Raef Meeuwisse, passionate about cyber, AI, keynote speaker, CISO consultancy and author of numerous cybersecurity publications, including the highly successful title ‘Cybersecurity for Beginners’.

Have you ever stopped to consider just how many of the technologies in our lives (and in our businesses) depend on public cloud computing?

Would your enterprise operations be affected if there were a cloud outage? How about your home life?


A recent study estimated that a substantial, prolonged outage at a major cloud service provider could rapidly result in.

You might not think that cloud outages happen very often – but even our AI friend Alexa has been down as recently as this month (March 2018).

With the good news that the Microsoft Azure cloud is expanding to have a data center in Dubai before the end of 2018, this article takes a look at public cloud reliability, security and resilience.

  • How frequently do public cloud outages happen?
  • Just how badly can a cloud outage impact people and organizations?
  • What measures should be taken to mitigate enterprise risks?

However, it is first worth pointing out that cloud computing, used in the right way, is a great and powerful business asset. Each cloud company aims to deliver advantages to their customer base by being able to provide them with services and scalability at a much lower cost and often with much better functionality (usability) than could be achieved using only their in-house resources.

The cloud providers achieve this by leveraging economies of scale. The initial deployment of a single virtual server in-house might cost a few thousand dollars but in a cloud environment, that cost can drop to just a few dollars.

The problem is that the price point for cloud is so compelling that most of the services used in an enterprise can end up with dependencies on the public cloud, often without this issue being consciously recognized.

How frequently do cloud outages happen?

For major providers of cloud services, such as Amazon Web Services (AWS) and Microsoft Azure, major outages are rare – but they do happen. After all, a 99.99% historical uptime looks great on paper – but did you know that could allow for nearly 9 hours per year of unplanned downtime?

What happens if those hours hit during the peak of a working week? Do those hours ever happen together?

Here are some of the most notable cloud outages from the past 18 months:

  • On Feb 28, 2017 a mega-outage at part of AWS took out a ‘sizable chunk of the Internet’ for around 5 hours. Some major enterprises had critical parts of their business operations interrupted.
  • On Sept 29, 2017 a fire extinguisher incident in a European Azure data center caused a series of knock-on events that ultimately resulted in several key services being offline for many Northern European clients for over 6 hours.
  • On 6 Nov, 2017 an error in a data center failover system resulted in knocking two Google Cloud services offline for almost 2 hours.

A common thread as I read through more than 10 cloud outage incidents was the root cause almost always involved some human error at the cloud provider, resulting in a chain of unforeseen events.

How badly can a cloud outage impact people and organizations?

For enterprises, the level of impact will depend on how well your security and resilience teams designed your organizations architecture and contingency plans. Often, with limited budgets, the substantial business impact of a cloud outage will have been underestimated.

In some of the past cloud outages, several businesses, where they lacked contingency options, effectively stopped trading until the cloud service came back online. That may be a valid approach for some business outages lasting just several hours – but what if the outage went on for longer?

As reported in a recent Trade Arabia article, far too many business are under-assessing the risk from cloud outages.

For private individuals, the impact can often depend on just how many ‘smart’ devices we have. In past cloud outages, many of those devices have temporarily moved from being smart to being inoperable. That might not sound too inconvenient if you just lose access to your voice assistant. However if you have your door locks, lights, air-conditioning and other items all hooked into the Internet of Things, life could get frustrating.

How to mitigate the threat from cloud outages.

In my own experience of auditing cloud services for enterprises, there are 3 critical areas for each enterprise to carefully consider:

1. Choose the right cloud provider

Large cloud providers usually have the best track record, security and resilience options. Generally the smaller and less experienced a cloud provider is, the more likely it will be vulnerable to service or security issues. Expect any decent cloud provider to be able to show their historic uptime and reliability statistics, don’t let them show you someone else’s! If they don’t want to be transparent about their own history, that should be an item of concern.

2. Select the right security options and configure them

Something that happens far more regularly than cloud outages are data breaches resulting from poorly configured cloud systems. In almost all cases, these breaches happen because security options and settings that could have been in place were never identified or set-up in the first instance. This was the most frequent finding from my own audits of cloud services – not that the cloud supplier had security issues – but that the customer security team had not been involved to ensure the right security configuration options were enabled from the beginning.

3. For the most critical cloud services – be sure to have a contingency option.

Just because a cloud service is really, really large and has hardly ever had an outage in the past is no guarantee that it will be reliable in the future. The rapid changes in technology means that for your most critical business services, you should always look to have an actionable business continuity plan – and a capability to set-up or activate a replacement technical service in an appropriate amount of time.

You may also be wondering what you should do as a private individual? Item 3 above (have a contingency) is the best way to work around the potential for a cloud failure to interrupt your personal quality of life. After all – if I can still open and lock my door when the smart function is not working – and operate the air-conditioning and lights – I am not going to be concerned about that side of the outage.

To summarise – all organizations should expect that public clouds, just like anything else, can go wrong and that extended outages are possible.

You may not need contingency for everything your organization does but at the very least, the security and resilience teams must ensure that the most critical products and services have adequate contingency options in place. After all, you never know when those huge cloud services might stop working for a while.

When it comes to technology, the threat landscape is constantly evolving. It is easily possible that the next cloud outage will be much bigger and last a lot, lot longer than anyone thought.

Published in

Tagged under cloud cloud computing cloud outages

Leave a Reply

Your email address will not be published. Required fields are marked *

back to top