SRE-mode LLM assistant for incident work
Karan Sharma demoed a read-only, SRE-focused LLM extension that stitches timelines from logs/metrics, enforces guardrails and overlays org-specific context for incident investigations demoed. It’s an example of embedding agents directly into SRE workflows while keeping controls tight.
Code and releases are published on GitHub under mr-karan/pi-sre-mode, described as a "Pi-native incident investigation workflow" for running investigations from a terminal. github.com The release notes list seven built-in incident templates — "5xx spike, high latency, OOM, broker issues, service down, deploy regression, resource exhaustion" — plus commands like /incident, /report and /check-connectors for preflight checks. github.com The default policy ships read-only guardrails that block destructive commands during investigations and exposes explicit toggles (/sudo and /sudo-off) to allow controlled bypass when authorized. github.com The tool embeds into the Pi terminal so operators can query metrics, grep logs and stitch timelines without switching contexts, while a separate "private overlay" package carries org-specific templates, runbooks and connector checks. github.com Installation and distribution are packaged as a pi/npm-style module (the repo and release point to pi install paths), and the /check-connectors flow is intended to validate observability connectors before an investigation starts. github.com The design pattern matches broader vendor efforts — Microsoft’s Azure SRE Agent preview documents hypothesis-driven deep investigations and connector integrations, and Redis’s SRE Agent outlines an LLM agent loop that calls Prometheus/log queries and synthesizes evidence — both emphasizing tool-level observability and audit trails compatible with read-only, overlay-driven approaches. sre.azure.com