DevOps vs SRE
A simple framing popped up: DevOps gets you to production; SRE keeps you there — the emphasis is on SLIs/SLOs, error budgets and runbooks for long‑term reliability (x.com). If you're designing Apple service SLIs, that distinction helps decide whether to invest in release velocity or operational controls first (x.com).
Apple lists multiple site‑reliability roles inside its Service Engineering organization — job postings explicitly name iCloud, iMessage, FaceTime and Media Platforms as services that SRE teams support at scale. (jobs.apple.com) SRE practice converts reliability into measurable contracts: a 99.9% SLO corresponds to a 0.1% error budget and, for a 30‑day window, roughly 43 minutes of allowable downtime. (sre.google) Google’s SRE error‑budget policy says teams must restrict or halt non‑critical changes when the error budget is exhausted, making release‑velocity decisions conditional on burn‑rate and SLO status. (sre.google) Engineering tradeoffs are concrete: SRE guidance lists automating rollbacks or moving to replicated data stores as example projects whose priority can be set by SLO impact, and industry writeups map the same SRE levers onto CI/CD pipelines to slow or speed releases. (sre.google) For Apple services such as the App Store or Apple Music, defining SLIs like request success rate, P95/P99 latency and error rate lets teams quantify whether additional developer velocity is affordable within the remaining error budget. (glassdoor.com) Consumer‑facing SLO targets commonly range from 99.9% to 99.99% availability; each extra “nine” shrinks the monthly error budget from ~43 minutes to ~4 minutes, which materially changes whether a team funds faster rollout tooling or more robust runbooks and operational controls. (upstat.io)