Columbia DSI: Bayesian sports talk

Columbia's Data Science Institute recently lectured on Bayesian analysis for sports performance prediction, highlighting probabilistic modeling techniques that map directly to player forecasting and match‑level inference. The talk underscores why Python/R and Bayesian workflows are becoming core skills for performance analysts in cricket and football. (x.com)

Dr. Scott Spencer — listed as a Columbia lecturer in applied analytics and a Stan language collaborator who builds Bayesian generative models for professional sports — is the faculty lead tied to Columbia’s Bayesian sports workshops and courses. (sps.columbia.edu) The session foregrounded Stan and cmdstanr for hand‑coded Bayesian inference alongside PyMC examples, with step‑by‑step R/Python live‑coding modules and hierarchical models used for in‑game win probability and player‑level forecasting. (athlyticz.com) Columbia’s transdisciplinary Columbia–Dream Sports AI Innovation Center — launched as a $10 million partnership in June 2024 — explicitly lists “performance prediction and optimization” and “real‑time data processing and interpretation” among its research priorities, creating a direct pathway to deploy Bayesian workflows in commercial sports settings. (engineering.columbia.edu) Dream Sports (parent of Dream11) already operates large ML teams and a personalization stack for a user base the company has reported in the hundreds of millions, and it is a named industry partner in Columbia’s sports‑AI programming and symposiums that feature Dream engineers and researchers. (dreamsports.group) Indian organisations are advertising concrete entry pathways that match the lecture’s technical thrust: the BCCI lists Performance Analyst roles at its Centre of Excellence, Dream Sports and FanCode list Data/Analyst and Product/Operations openings across Mumbai and Bengaluru, and standard entry‑level analyst templates list Python, R, SQL and visualization tools as core requirements. (bcci.tv) Concrete student projects that mirror the Columbia talk include: build a Bayesian hierarchical in‑game win‑probability model for IPL matches using public ball‑by‑ball datasets on Kaggle (2008–2025), implement the model in Stan or PyMC and compare posterior forecasts to baseline ML models, and adapt a Bayesian ensemble to estimate player transfer/value trajectories — approaches shown in recent arXiv and applied‑Bayesian papers on win probability and player valuation. (kaggle.com)

Columbia DSI: Bayesian sports talk

Get your own daily briefing