Three hot DL papers

Three high‑velocity threads surfaced: JURA Bio’s variational synthesis paper claims AI‑designed libraries enabling ~10^16 proteins via manufacturable DNA with ‘quadrillion‑dollar’ scale savings (x.com); Kimi’s ‘Attention Residuals’ replaces fixed residuals with softmax attention and reportedly boosts GPQA‑Diamond +7.5 and HumanEval +3.1 at 1.25x compute (x.com); and a March multimodal next‑token prediction paper claims 4x consistency on transformers, reducing video glitches like extra limbs (x.com). If true, these are concrete wins across protein design, architecture tweaks, and multimodal fidelity. ( )

JURA’s preprint reports practical manufacturing of roughly 10^16–10^17 designed biological sequences from a “variational synthesis” pipeline and says doing the same with prior methods would have cost on the order of 10^15 dollars. (dragonfly-haddock-m68g.squarespace.com) The company frames those savings as coming from models that are “manufacturing‑aware” (trained with DNA‑synthesis constraints) and has announced a research collaboration with Annogen to apply variational synthesis to cell‑type specific regulatory DNA elements. (jurabio.com) The Attention Residuals tech report was posted to arXiv on Mar 16, 2026 and describes AttnRes and Block AttnRes as replacements for fixed additive residuals that let each layer attend over past layer outputs; the paper reports integration into a 48B Kimi Linear stack pretrained on 1.4T tokens. (arxiv.org) Kimi/Moonshot’s public repo and README show the empirical wins cited on social feeds: Block AttnRes gives ~1.25x compute efficiency in their runs and reported +7.5 points on GPQA‑Diamond and +3.1 on HumanEval versus the baseline. (github.com) A recent multimodal next‑token line of work includes “Consistency‑Preserving Diverse Video Generation,” which proposes a joint‑sampling framework for flow‑matching video generators and reports substantial improvements in within‑video temporal consistency and color naturalness on state‑of‑the‑art text‑to‑video models. (arxiv.org) Separate next‑token multimodal efforts such as Emu3 (next‑token‑prediction for images/videos) position autoregressive token modeling as a practical path for unified multimodal training and report stronger generation and perception results compared with several diffusion or task‑specific baselines. (arxiv.org)

Three hot DL papers

Get your own daily briefing