- Google Cloud’s API service is blamed for widespread power outage
- Most regions were back online in 40 minutes but some took even longer
- The company has promised to protect against future power cuts and improve communication
Following Google Cloud’s recent widespread power outage that took places like Spotify, Cloudflare and Discord Offline, the company released its detailed report that shared exactly why the failed customers.
The company says the root cause was a code problem in service control – part of the company’s API management and political control system.
Specifically, invalid automated quota update and a lack of proper error handling a global crash loop, with 503 errors seen across not only Google Cloud Services, but services using its APIs.
Google Cloud Staffage caused by the API edition
The stop affected the Google Cloud infrastructure as well as other popular Google Workspace apps such as Drive, Docs, Gmail and Calendar. However, third-party sites that access Google Cloud’s API including popular music flow platform Spotify, which boasts 678 users, as well as some cloudflare services, also affected.
“On May 29, 2025, a new function for service checks was added for further quota policy controls,” the company wrote in its event report. “The problem with this change was that it did not have appropriate error handling and it was protected flag.”
Google Cloud boasted that its reliability technical team had begun to triage the incident within two minutes, after identifying the root cause within 10 minutes. “The red button [to disable the serving path] Was ready to roll ~ 25 minutes from the start of the incident, ”Google said with the rollout complete within 40 minutes.
Although smaller regions recovered relatively quickly, larger regions such as US-Central-1 took longer to get back online-ca. Two hours and 40 minutes in case of this particular region.
In his Mini -Event Report Problems on the Operations Day, Google Cloud promised to “do better.” Its more detailed report promises the usual answers in the future, such as improving static analysis and testing practice, revision and modularization of the Service Control architecture to include future events, but the company has also promised to “improve [its] External communication “to better inform customers and ensure that its communication infrastructure remains online even during such power breaks in the future.



