Google's Quality Questioned After Layoffs
A recent Google Cloud billing fiasco has sparked a debate about whether engineering quality is declining following mass layoffs and a heavy pivot to AI. The incident is raising questions about system reliability and the potential impact on engineers working in critical infrastructure roles like GCP. It highlights the pressure on remaining teams to maintain complex systems with fewer resources.
The ongoing restructuring has seen continued, rolling layoffs throughout 2025, distinct from the massive 12,000-person cut in 2023. These are described as more surgical, with hundreds of roles eliminated in May and October within divisions like Google Cloud, Android, Chrome, and Global Business Operations. This workforce reshaping is explicitly tied to a strategic pivot to AI. In October 2025, over 100 employees were cut from Google Cloud's design and user experience (UX) research teams, with some teams reduced by half, as the company redirects resources toward AI infrastructure and development. A major symptom of potential systemic issues appeared on June 12, 2025, when a three-hour global outage took down major Google Cloud and Workspace services, impacting platforms like Spotify, Snapchat, and Discord. The disruption was not a billing-specific error but a widespread failure of core API infrastructure. The root cause was a software bug: a new feature in the "Service Control" system was deployed without a feature flag and lacked proper error handling. A subsequent policy update with blank fields triggered a null-pointer exception, causing the component to enter a crash-reboot loop that cascaded globally. This failure was compounded by a lack of basic resilience patterns like "randomized exponential backoff" in the affected services. The simultaneous retries from crashing clients created a "thundering herd" effect, essentially launching a self-inflicted denial-of-service attack on Google's own infrastructure. The incident raised questions about the role of Site Reliability Engineering (SRE), a discipline Google pioneered to prevent such failures. An internal source reportedly claimed the failure stemmed from "leadership driving velocity by cutting corners," a practice that had been ongoing for years. In the months following, other issues have surfaced, including a vulnerability disclosed in February 2026 where thousands of API keys were exposed with access to Gemini endpoints, leading to some developers incurring tens of thousands of dollars in fraudulent charges. There were also separate reports in 2025 of users receiving impossible bills for Gemini services they never used.