Meta launches TRIBE v2
Meta introduced TRIBE v2, a trimodal brain‑encoder foundation model trained on 500+ hours of fMRI data for zero‑shot neural predictions—positioning multimodal brain modeling as a new research frontier. The demo shows zero‑shot mapping between neural signals and multimodal outputs. (x.com)
Meta published TRIBE v2’s research paper, model weights, codebase and an interactive web demo, with the implementation available on GitHub and a Hugging Face checkpoint for immediate researcher access. (github.com)) The training corpus scales far beyond earlier attempts: Meta says the v2 release was fit using brain-imaging data from more than 700 human volunteers, a dramatic increase from prior versions that used only a handful of subjects. (awesomeagents.ai)) Spatial resolution in v2 expands to roughly 70,000 voxels/vertices—up from about 1,000 regions in the original TRIBE—letting predictions map at near-voxel granularity onto cortical meshes. (awesomeagents.ai)) Meta reports that TRIBE v2 delivers roughly a 2–3× improvement over previous brain-encoding methods on naturalistic benchmarks such as movies and audiobooks. (unrollnow.com)) The codebase shows a modular design that combines off-the-shelf extractors—LLaMA 3.2 for text, V‑JEPA2 for video and Wav2Vec‑BERT for audio—into a transformer that projects multimodal latent features onto the cortical surface. (github.com)) TRIBE v2 explicitly builds on Meta’s Algonauts 2025-winning TRIBE architecture and is positioned by Meta FAIR as an “in‑silico” platform intended to let researchers run large-scale virtual experiments without new scans. (arxiv.org))