Meta releases TRIBE v2
Meta published TRIBE v2, a trimodal 'brain encoder' trained on 500+ hours of fMRI data from 700+ people that predicts brain responses to sights and sounds with 2–3x better accuracy than prior methods — and Meta is releasing the model, code, paper, and a demo. That’s a rare open release tying neuroscience signals to multimodal encoders, useful for research and cross‑disciplinary AI work. (x.com/AIatMeta/status/2037153758455750717)
TRIBE v2 raises spatial granularity from the original model’s ~1,000 cortical parcels to predictions across roughly 70,000 whole‑brain voxels. (aidemos.atmeta.com) The model pipelines pretrained encoders — LLaMA 3.2 for text, V‑JEPA2 for video, and Wav2Vec‑BERT for audio — into a unified transformer that maps multimodal representations onto brain space. (github.com) Meta published runnable artifacts: a GitHub repo with a Colab demo notebook, inference utilities, and pretrained weights intended to load from Hugging Face under facebook/tribev2. (github.com) Out‑of‑the‑box inference produces predictions for an “average” subject on the fsaverage5 cortical mesh (~20k vertices), while the codebase includes utilities to project between MNI and surface spaces and to fit subject‑level mappings. (github.com) Meta’s release notes that TRIBE v2 follows a log‑linear scaling law with data volume and enables zero‑shot generalization to new stimuli, tasks and subjects without retraining. (aidemos.atmeta.com) The TRIBE line previously won the Algonauts 2025 brain‑encoding challenge, and the team has posted the full architecture and evaluation on OpenReview/ArXiv for reproducibility and benchmarking. (openreview.net) The GitHub README highlights an operational detail: the LLaMA 3.2 text encoder is gated and requires a Hugging Face access token to run the full pipeline, so reproducing end‑to‑end inference needs HF authentication and the provided colab/demo flow. (github.com)