
Got Holes in Your SOC? Why a Security Health Check is Non-Negotiable in 2025
9 October 2025
Another high-profile IT outage has hit the headlines, but this time, the disruption originated from within the world’s largest cloud provider, AWS.
The first alarms were raised on Downdetector on Monday at around 8:00 AM UK time. By 9:00 AM, thousands of users were reporting critical failures. The impact was immediate and widespread, affecting everything from Amazon’s own services like Alexa and Ring to a diverse and critical list of global brands, including HMRC, Coinbase, Duolingo, Fortnite, and Lloyds Banking Group.
Two hours into the incident, AWS confirmed its engineers had identified the cause: a failure in the connection between customers and a core service, which was causing a catastrophic ripple effect. By 11:35 AM, AWS reported the underlying problem, the age old “DNS Issue” was resolved. Soon after, services like Lloyds Bank began to come back online, but the damage was done.
For millions, it was a morning of disruption. For businesses, it was a stark reminder of an uncomfortable question: Is having your entire environment in one public cloud, even one split across multiple geographical regions, truly a resilient strategy?
The Uptime Myth and the “Safety in Numbers” Fallacy
Although events of this scale are rare, they prove that no environment, no matter how sophisticated, will ever achieve 100% uptime. This forces every organisation to confront a critical question: how long can you really cope if your primary cloud provider has a degraded service?
The common observation is that when a hyperscaler goes down, so does a significant portion of the internet. There is a “safety in numbers” comfort in knowing your competitors and partners are likely offline too. Is the impact as significant as an isolated incident that affects only your organisation?
The answer, uncomfortably, is that it all comes down to balance. For many, this shared downtime is an acceptable risk. But for a blue-light service, where downtime is measured in lives, or a global bank processing transactions, or an e-commerce site on its biggest sales day, “everyone else is down” is not an acceptable answer. The investment in a diverse cloud environment must be weighed against the real-world impact of an outage that impacts half the globe.
Multi-Cloud: The Obvious Answer? Or a Maze of Complexity?
The knee-jerk solution to single-provider dependency is, naturally, multi-cloud. However, while strategically sound, a true active-active multi-cloud architecture introduces a profound layer of operational complexity and cost.
Even with a flawless multi-cloud setup, a single point of failure can still exist. Today’s outage was AWS, but it could just as easily have been a third-party cloud-based load balancer or a central identity provider that sits in front of both your cloud environments, rendering your entire service inaccessible.
Before leaping to multi-cloud as the panacea, organisations must consider the hard realities:
· The FinOps Challenge:
You must implement a strategy and a central tool to consolidate and analyse two completely separate bills, each with different pricing, instance types, and discount models. A rigidly enforced, standardised resource tagging policy across both clouds becomes absolutely essential for financial control.
· The Interconnectivity Hurdle:
You require secure, high-bandwidth, low-latency connections between your cloud environments (e.g., AWS Direct Connect and Azure ExpressRoute). This adds cost, complexity, and another potential point of failure.
· The Identity Crisis:
To avoid the nightmare of managing two sets of users and permissions, you must federate identity using a single Identity Provider (IdP). This provider then becomes, ironically, a new potential single point of failure.
· The “Single Pane of Glass” Mirage:
To manage and deploy resources efficiently, you need a unified operational view. This requires standardising on cloud-agnostic Infrastructure as Code (IaC) tools (like Terraform) and potentially investing in a costly multi-cloud management platform.
· The Security and Compliance Nightmare:
Your security posture must be uniform. This is exceptionally difficult. It demands a sophisticated Cloud Security Posture Management (CSPM) tool to monitor both environments for misconfigurations and threats from a single dashboard, mapped to your specific compliance needs.
· The Financial Trap:
This is the hidden killer of multi-cloud strategies. Moving data between clouds incurs significant data egress fees. Applications must be meticulously architected to keep “chatty” components within the same cloud to control costs and maintain performance.
· The Skills Gap:
Your team, from engineers to architects, must now be proficient in two entirely different platforms, each with its own services, APIs, and best practices. This requires a massive investment in cross-training or hiring specialised, expensive talent.
Beyond Active-Active: What Are the Real Options?
Given this complexity, is multi-cloud an over-engineered response? For many, yes. The key is to match the solution to the business requirement. This is where a nuanced strategy, moving beyond a simple active-active dream, becomes critical.
1. Strategic Hybrid Cloud:
Some organisations are making a conscious decision to move back to basics, repatriating their absolute “crown jewel” services to on-premise hardware in highly-available data centres. Here, they have ultimate control over the infrastructure, network, and resilience.
2. Multi-Cloud (Active-Passive):
A more common and cost-effective approach. Instead of running services in parallel, organisations use a second cloud provider as a “warm standby” or disaster recovery target.
3. Modernised Disaster Recovery (DR):
For most, this is the most practical and effective solution. Time-tested backup and DR technologies have evolved significantly. Modern platforms can replicate or store your critical data and applications off-site (in a peer cloud, a co-location facility, or a dedicated IaaS platform) ready to be spun up in the event of a disaster. With low Recovery Time Objectives (RTOs), these services can restore your business operations far faster than waiting for a hyperscaler to fix a global root-cause problem.
The Right Answer is Strategic, Not Subjective
There is no single right or wrong answer. The path forward is a strategic trade-off, unique to every organisation, balancing cost, complexity, and your specific tolerance for downtime.
This is precisely where Darwin Technology Solutions provides clarity. We are not a vendor; we are your independent partner. We apply our structured discovery process to clearly understand your business, your critical services, your recovery objectives, and your strategic goals. We then take these unique requirements to the market, running a comprehensive appraisal to find the best-fit options for your organisation.
If this week’s outage has you questioning your own cloud resilience, let’s have a conversation about what a practical, cost-effective, and robust strategy looks like for you.
If you’re interested in having a chat about what Darwin can do to assist, please get in contact at declanmckee@darwin-tech.com or give us a call at 020 8137 3637 | ext. 1001
