Agentic AI applied to SRE
Practitioners are applying agentic AI to operations, security, and developer experience—pointing to enterprise tools like ServiceNow, CrowdStrike and UiPath and to simple agent implementations that automate monitoring and reduce toil. Social posts also highlight platforms that emphasize iteration speed and practical devex reviews rather than synthetic benchmarks. ( )
Site reliability engineering is the work of keeping software up, fast, and recoverable — and teams are starting to hand parts of that job to artificial intelligence agents that can watch systems, decide what to do, and take action. (developers.openai.com, openai.com) An agent is not just a chatbot. OpenAI defines agents as systems that “independently accomplish tasks” by using tools and following guardrails, and that is the model operations teams are now applying to alerts, tickets, and incident workflows. (openai.com, developers.openai.com) The problem in site reliability engineering is toil: repetitive work like checking dashboards, triaging alerts, opening tickets, and running the same fix steps at 2 a.m. Vendors and practitioners are packaging agents around exactly those loops instead of asking engineers to start with full autonomy. (crowdstrike.com, uipath.com) ServiceNow has been pushing that model across enterprise workflows since January 29, 2025, when it introduced AI Agent Orchestrator and AI Agent Studio as a control layer for building, governing, and coordinating agents. On April 9, 2026, it expanded that pitch with “fully autonomous operations” across its product line. (newsroom.servicenow.com, newsroom.servicenow.com) CrowdStrike is making the same case in security operations, which overlaps with reliability work whenever incidents involve compromised endpoints, suspicious behavior, or containment steps. Its Charlotte AI pages say agents can handle triage, risk analysis, query translation, and structured response flows inside the Falcon platform. (crowdstrike.com, crowdstrike.com, crowdstrike.com) On March 25, 2026, CrowdStrike also launched the Charlotte AI AgentWorks ecosystem with partners including Accenture, Amazon Web Services, Anthropic, Deloitte, Kroll, Nvidia, OpenAI, Salesforce, and Telefónica Tech. The company described it as a no-code platform for building secure agents for the security operations center. (crowdstrike.com) UiPath is framing the shift more broadly as “agentic automation,” with orchestration, governance, and process automation wrapped together on one platform. That matters for reliability teams because many of their workflows already cross ticketing systems, chat tools, cloud consoles, and approval chains that traditional robotic process automation was built to connect. (uipath.com) The technical change is smaller than the marketing language suggests. Many teams are starting with simple agents that read logs, summarize incidents, draft postmortems, suggest runbooks, or execute a narrow remediation step, because narrow scopes are easier to measure and safer to roll back. (developers.openai.com, openai.com) Tooling vendors are also shifting the sales pitch from benchmark scores to workflow speed. OpenAI’s AgentKit launch in October 2025 emphasized versioning, preview runs, datasets, trace grading, and automated prompt optimization — features aimed at helping teams iterate on real production tasks rather than chase synthetic leaderboard results. (openai.com, openai.com) That is why the current wave of agentic site reliability engineering looks less like a robot replacing the operations team and more like software taking the first pass at the queue. The near-term pattern is supervised autonomy: agents watch, classify, and act inside limits, while engineers keep the pager and the final call. (openai.com, servicenow.com, crowdstrike.com)