Jeff Dean: data still plentiful
Google's Jeff Dean says models aren't running out of data — vast video and audio reserves remain — but research must 're‑engineer' tools and infrastructure to keep up with the speed of agentic AI ( ).
At NVIDIA GTC 2026, Jeff Dean and NVIDIA chief scientist Bill Dally framed the near-term agenda as shifting from single‑prompt LLMs to continuously running, agentic systems that expose tooling and runtime as the next engineering bottlenecks (blogs.nvidia.com)). The scale of raw audiovisual material available for model training remains enormous: IDC’s Data Age projections put the global datasphere on the order of 175 zettabytes by 2025, and platform-level uploads continue at hundreds of hours of video per minute on YouTube, creating massive multimodal corpora to mine for training signals (seagate.com)). Recruiting signals are already tilting toward infrastructure and production expertise because agentic pipelines require low‑latency runtimes and orchestration at scale; market analysts and talent reports mark ML infrastructure and MLOps engineers with distributed‑systems experience as “extremely” scarce and highly sought after in 2026 recruiting funnels (smithspektrum.com)). Top labs’ current public job postings make the hybrid profile explicit: Google DeepMind lists PhD‑level research scientists who also show “strong software‑engineering skills” on Gemini and tool‑orchestration roles, and OpenAI’s research job descriptions ask candidates to “move easily between theory and code,” signalling equal weight on math and production coding ability (job-boards.greenhouse.io)). Concrete technical skills rising in hiring requirements across frontier labs include advanced mathematics and theoretical ML (for model design and evaluation), proficiency in Python/C++ for high‑performance model engineering, experience with distributed training/inference stacks, and domain experience with multimodal data pipelines and RL/post‑training methods for agent orchestration (examples in lab job ads and hiring guides). (mljobs.io)) Patterned career outcomes show elite industry labs continue to prize publication records and PhDs while also recruiting engineers with production experience; bibliometric analyses document substantial migration of high‑impact researchers from academia into corporate labs, and career guides recommend pairing an academic track record (papers, PhD/postdoc) with hands‑on systems or internship experience to be competitive for roles at DeepMind, OpenAI and similar teams (link.springer.com))