Cloudflare has revealed new technical details about the outage that disrupted major portions of the internet on Tuesday, taking down several high-traffic websites for hours.
According to CEO Matthew Prince, the issue originated inside Cloudflare’s Bot Management system, which evaluates incoming traffic by assigning “bot scores” to determine whether requests are real or automated.
Although the system is meant to protect websites from malicious crawlers, a breakdown in its internal processes triggered widespread failures across the network.
Prince clarified that DNS issues did not cause the outage, nor were they related to Cloudflare’s new AI tools or a cyberattack. Instead, the problem stemmed from how Cloudflare’s ClickHouse database handled queries following a recent update.
The update caused the configuration file used by the machine learning model to fill up with duplicate database rows. As the file grew rapidly, it exceeded memory limits and caused Cloudflare’s main proxy service—responsible for handling customer traffic—to fail for users reliant on bot-scoring features.
Websites that used Cloudflare rules to block certain types of bots were particularly affected, as real user traffic was mistakenly flagged as automated and subsequently blocked. Customers who did not depend on these bot evaluations remained largely unaffected.
Platforms including X, ChatGPT, and Downdetector were impacted, creating disruption similar to recent global outages tied to Microsoft Azure and Amazon Web Services.
Cloudflare, which handles traffic for an estimated 20 percent of the internet, experienced one of its most severe service interruptions in years, prompting renewed concerns about the internet’s growing dependence on a small number of infrastructure providers.
To prevent similar failures, Prince said Cloudflare is implementing multiple safeguards. These include tightening controls on configuration files, adding more global kill switches to disable malfunctioning systems quickly, limiting how system resources are consumed during error reporting, and reassessing how core systems react when unexpected failures occur.
The incident highlights increasing internet fragility as essential online services consolidate on shared infrastructure, raising the likelihood of simultaneous global outages.

