Opinion

Facebook's 2021 outage: the lessons for your business

What went wrong and how to ensure your company never suffers the same catastrophe
By
Peter Boyle

Can you imagine a world without social media platforms like Facebook, WhatsApp or Instagram?

Billions of users experienced this very scenario on 4 October 2021, when Facebook and its subsidiaries became suddenly and globally unavailable for more than five hours.

People worldwide depend on virtual networks for everything from conducting business to staying connected with loved ones. So, how did this happen — and what are the broader implications for business technology?

What led to Facebook's downtime?

Facebook released a statement explaining that configuration changes on the background routers that coordinated network traffic between data centres caused issues that interrupted this communication. And as Facebook runs all its services through Facebook, this had a cascading effect on its other services.

Companies such as Facebook use Border Gateway Protocol (BGP) to advertise the location of their data centres to the internet. Internet routers need this information to request access to relevant servers, so a faulty configuration change to this system is what caused routers to conclude that Facebook's data centres simply did not exist, rendering its various apps and services unusable.

The outage lasted so long because the network that went down was the same one that staff needed to access the network and fix the issue remotely. On top of this, it also took out Facebook Workplace (an online collaborative software tool) and third-party communication apps. The fault reportedly prevented staff from physically accessing its data centres, as their site access cards depended on functioning internal systems.

Prevent a Facebook-scale crisis

Not only did Facebook's shutdown impact the businesses and individuals who rely on Facebook's network of social media products, but it also had significant financial consequences for Facebook itself. Founder Mark Zuckerberg's personal fortune was diminished by $7 billion (almost £5.1 billion), and the company lost more than $13 million (nearly £9.5 million) in advertising revenue every hour it was out of action.

So, if something like this can happen to a digital empire like Facebook, what is stopping it from happening to any other business? In short, the answer is nothing. Without the proper contingencies in place, this unfortunate scenario could befall any organisation.

Make technology a business priority

Physical data centre infrastructure will always be more vulnerable than cloud-based systems, as hardware can fail unexpectedly. So, embracing the latest IoT-enabled technology and upgrading to remote servers will help automate routine processes, prevent system and equipment failures and improve the robustness of cyber security systems.

Having your business' information stored on one centralised system might seem the more straightforward approach, but too many mutually dependent systems in a network could facilitate a Facebook-scale shutdown. Instead, decentralising network control by migrating to cloud architecture will ensure data is distributed and remotely accessible, preventing a fault at one data centre from impacting other networks.

Data security and identity and access management (IAM) are crucial considerations for successful cloud migration. So, companies must have good IAM processes to prevent errors and put privileged access controls in place to ensure that people do not have excess privileges on critical systems. Unified identity security platforms such as One Identity deliver identity governance, access management and privileged account management solutions to help businesses address the real-world realities and challenges of digital transformation.

Take action to prevent human error

No matter how hard IT professionals work to manage various risks, mistakes sometimes happen. Human error is a leading cause of system downtime, and phishing attacks remain the most common cyber threat to businesses.

Although Facebook's downtime was not the result of a cyber attack or direct attempt to steal user data, every company must enforce training and policies to address this issue and prevent a breach from disrupting uptime. By automating risk assessment and threat modelling with artificial intelligence (AI) or machine learning (ML) and promoting company-wide cyber awareness, business leaders can reduce the likelihood of human error causing a fault.

AI-driven software can interpret patterns and provide a baseline understanding of normal network activity, allowing organisations to plan, prioritise and protect their systems. For example, network-centric threat detection solutions like CyGlass NDaaS enable businesses to uncover, pinpoint and respond to unknown threats that have evaded traditional security controls — even while an attack is still evolving.

Invest in quality network management

Facebook’s widescale disruption has prompted organisations to remember how much modern business relies on virtual networks. So, companies and cyber security teams must ensure they have the appropriate software and technology to keep pace with rapid digitisation.

It can be challenging for businesses without a dedicated CISO or in-house IT department to manage the various demands of the digital world. But with rapidly accelerating digitisation and the shift to operating online, can your business afford to leave the probability of a system failure up to chance?


Written by
Peter Boyle