Booking.com unvarnished lessons
Booking.com’s senior engineer shared the 'unpolished story' of 20 years of AI work at QCon London, emphasizing the hard jump from pilots to enterprise adoption — observability, modular enablement, and cultural change were non‑negotiable. The talk reinforced platform patterns for production-grade, cross-team agent use. (infoq.com)
Jabez Eliezer Manuel, Senior Principal Engineer at Booking.com, delivered "Behind Booking.com's AI Evolution: The Unpolished Story" at QCon London on March 16, 2026. (qconlondon.com) He described a seven‑year migration off Hadoop and an inference platform that now runs over 480 models and serves roughly 400 billion predictions per day with sub‑20ms latencies. (letsdatascience.com) Production capabilities are bundled into four unified domain platforms—GenAI, Content Intelligence, Recommendations, and Ranking—rather than one monolithic ML stack. (letsdatascience.com) Manuel framed observability as a non‑negotiable control plane for getting agents past pilots, advocating consolidated telemetry, event‑driven traces, and tooling that surfaces domain failures. (infoq.com) Booking.com’s engineering history shows that observability was operationalized at scale—partnering on consolidated tracing and metrics and rebuilding telemetry to support p99.9 feature serving latencies below 25ms at ~200,000 requests/sec. (honeycomb.io) The stack Manuel sketched uses an orchestrator → moderation → agent → RAG pipeline and includes a dedicated long‑term memory system to preserve cross‑session state for agents. (qconlondon.com) Platform enablement is explicit: an "AI Application" platform centralizes governance and developer workflows so product teams can ship intelligent, compliant features at scale, built on Booking.com's long history of experimentation (≈150,000 A/B tests since 2005 with early success rates under 25%). (qconlondon.com) Operational model choices balance speed and trust—Booking.com uses smaller, faster models for latency‑sensitive paths and larger models for high‑trust reasoning, a shift that reportedly doubled agent accuracy in internal experiments. (venturebeat.com)