MAMMAL beats AlphaFold3 on benchmarks
- IBM researchers published MAMMAL in Nature this week, pitching it as a multimodal biology foundation model that beats AlphaFold3 on one antibody-binding benchmark. - The paper says MAMMAL was pretrained on 2 billion samples and hit state-of-the-art results on 9 of 11 drug-discovery benchmarks. - If the result holds up, structure prediction stops being the whole game — cross-modal design tools start to matter more.
Protein AI just got a little messier — and more interesting. AlphaFold3 is still the name most people know, because it became the default shorthand for “AI that predicts biomolecular structure.” But IBM researchers just published a different kind of model, called MAMMAL, and the claim is not just that it predicts structures well. The bigger claim is that one model can move across proteins, antibodies, small molecules, and gene-expression data — and sometimes beat AlphaFold3 on a task people actually care about in drug discovery. ### What is MAMMAL, exactly? MAMMAL stands for Molecular Aligned Multi-Modal Architecture and Language. Basically, it is a foundation model for biology that was trained across several data types at once — protein and antibody sequences, small molecules, and gene-expression profiles — instead of being built mainly around structure prediction. IBM says the pretraining weights are public. ### Why is that different from AlphaFold3? AlphaFold3 is best understood as a structure-and-interaction prediction system. It predicts how proteins and other biomolecules fit together in 3D, and that is hugely valuable. But drug discovery is not one problem. It is a chain of problems — target selection, binding prediction, molecule ranking, cellular response, safety throughout that chain, not just the structure step. ### So did it really beat AlphaFold3? On the narrow claim in the paper, yes — with caveats. The Nature paper says fine-tuned MAMMAL prediction scores significantly outperformed AlphaFold3 confidence scores, used as a proxy for binding likelihood, in five of seven antigen targets in an antibody-antigen binding benchmark. That is a real result, but it is not the same thing as a prediction. It is one benchmark, one setup, and even the comparison uses AlphaFold3 confidence as a proxy rather than a head-to-head model trained for the same output. ### Why does that caveat matter? Because AlphaFold3 was built to predict structures and interactions, not to be a universal score generator for every downstream screening task. A third-party benchmark from late 2025 still found AlphaFold3 strongly competitive across protein complexes and especially strong on antigen-antibody complexes. So the interesting part here may outperform it on some decision-making tasks around binding.” ### What else did MAMMAL claim? The paper says MAMMAL achieved state-of-the-art performance on 9 of 11 benchmarks spanning multiple stages of the drug-discovery pipeline, with competitive results on the other two. The open-source repo and Hugging Face release make that easier for outside labs to test, which matters because these claims only become important once other groups can reproduce them on fresh datasets and real workflows. ### Why are people talking about “multimodal biology” now? Because drug discovery data lives in different languages. A protein sequence is one language. A small molecule graph is another. A cell’s gene-expression response is another. Old tools usually specialized in one slice. The new bet is that a model that can translate across all of them may be better at the problem "which design should we test next?" ### What is the real bottom line? MAMMAL matters because it shifts the contest from pure structure prediction to workflow usefulness. If outside labs can reproduce these benchmark wins, the center of gravity in protein and antibody design could move toward open, multimodal systems that rank, generate, and connect biological evidence in one place. But today, the rigor, not yet a knockout.