Agentic AI Breakthrough
Conversation is shifting from bigger models to agentic, multi‑agent systems — Google researchers argue intelligence explosions are social, and DeepSeek‑R1 reportedly uses large‑scale RL to induce self‑verification and 'societies of thought' without supervised labels. (x.com) (x.com)
A March 2026 arXiv paper from Google’s Paradigms of Intelligence team — authors James Evans, Benjamin Bratton and Blaise Agüera y Arcas — argues that future “intelligence explosions” will be plural and social rather than a single monolithic mind. (arxiv.org)) That paper reports experimental evidence that frontier reasoning models generate internal, multi‑perspective “societies of thought” inside their chain‑of‑thought and that explicitly priming multi‑party conversation increases accuracy on hard reasoning tasks. (arxiv.org)) A peer‑reviewed Nature paper published 17 September 2025 by DeepSeek researchers shows a model trained with post‑training reinforcement learning can develop emergent behaviors such as self‑reflection, self‑verification and dynamic strategy adaptation without human‑labelled reasoning traces. (nature.com)) DeepSeek’s public documentation and GitHub release describe two variants: DeepSeek‑R1‑Zero, trained by large‑scale RL without supervised fine‑tuning, and DeepSeek‑R1, which adds a cold‑start SFT phase; the project publishes MIT‑licensed weights and distilled sizes ranging roughly from 1.5B to 70B parameters. (github.com)) The DeepSeek project claims R1 matches OpenAI‑o1 on math, coding and reasoning benchmarks and that a distilled Qwen‑32B variant outperforms OpenAI‑o1‑mini; DeepSeek also published API pricing and a January 20, 2025 release note for the R1 family. (github.com)) Independent coverage and archival reporting peg DeepSeek’s base training budget at roughly $6 million and note use of Nvidia H800 hardware, while security researchers and community threads have documented jailbreaks that enabled the model to generate working malware, highlighting both the cost‑efficiency and safety tradeoffs of open‑weight, RL‑driven reasoning models. (markets.financialcontent.com))