PNAS: protein LMs mirror MSA

- Zhidian Zhang, Hannah Wayment-Steele and colleagues used mechanistic probes on Meta’s ESM-2 and showed the model stores coevolving residue statistics like MSA-era methods. - Their key tool was a categorical Jacobian, which exposed contact-relevant pairwise motifs and showed local sequence windows recover predicted contacts best. - That matters because protein LMs may be less “magic intuition” than compressed evolutionary alignment machinery in single-sequence form.

Protein language models are AI systems trained on raw amino-acid strings. The big promise is simple — skip the expensive evolutionary preprocessing and still get useful biology out. But that raised a deeper question: are these models actually learning protein physics, or are they just reconstructing the same evolutionary clues older tools got from multiple sequence alignments? A PNAS paper from Zhidian Zhang, Hannah Wayment-Steele, Garyk BriXI, Haobo Wang, Dorothee Kern, and Sergey Ovchinnikov pushes hard toward the second answer by dissecting Meta’s ESM-2 from the inside. (pnas.org) ### What’s the old MSA trick? A multiple sequence alignment, or MSA, lines up many related proteins and asks which positions vary together across evolution. If one residue changes and another compensates, that pair often sits near each other in 3D space. Classical models like Markov random fields and multivariate Gaussian approaches turn that coevolution signal into contact predictions. AlphaFold2 leaned heavily on thi(pnas.org)entral preprocessing step. (pnas.org) ### Why were protein LMs supposed to be different? Protein language models like ESM-2 are trained on single sequences, not explicit alignments. That made them feel like a cleaner route to biology — maybe they were learning folding rules directly from sequence alone. The attraction is obvious: many proteins have few detectable homologs, and building MSAs takes time and can fail in sparse families. If a single-sequence m(pnas.org). (pnas.org) ### So what did this paper actually test? The authors looked at ESM-2’s contact predictions and asked what internal signal supports them. Their main tool was a “categorical Jacobian” calculation — basically a way to measure how changing one amino acid changes the model’s beliefs about another position. If those cross-position effects look like coevolution statistics, then the model is not inventing a totally new strate(pnas.org)ased methods exploit. (pnas.org) ### What did they find inside ESM-2? They found that ESM-2 stores statistics of coevolving residues in a way the paper says is analogous to Markov random fields and multivariate Gaussian models. That is the headline result. The model’s internals behaved less like a pure simulator of folding physics and more like a compressed archive of evolutionary couplings learned from huge sequence corpora. In plain English — the mo(pnas.org)being handed an alignment at inference time. (pnas.org) ### Why do local windows matter? The masking experiments are the aha. When the team hid parts of the sequence and varied what context ESM-2 could still see, local windows let the model recover predicted contacts best. That suggests ESM-2 relies on stored motifs of pairwise contacts rather than a broad, fully global physical model. Think of it less like solving first-principles chemistry and more like recognizing recurring structural phrases. (pnas.org) ### Is this the same as saying protein LMs are useless? No — just narrower, and more understandable. A model that compresses evolutionary statistics into single-sequence inference is still extremely useful. It can be faster, easier to deploy, and available when explicit homolog searches are weak. But the paper argues against the strongest version of the “sequence-only models learned the real physics” story. The catch i(pnas.org)rstanding in the human sense. (pnas.org) ### How does this connect to interpretability work? It fits a broader trend. Other recent work has used sparse autoencoders on ESM2 and found interpretable features tied to protein families, functions, motifs, and amino-acid-level patterns. That does not prove the exact same mechanism, but it reinforces the idea that protein LMs are storing biologically meaningful, decomposable structure rather than inscrutable mush. (pnas.org) ### Why should biologists care? Because this reframes what “MSA-free” may really mean. The model may not need an alignment at runtime, but it may still be doing the moral equivalent internally. That is good news for usability and bad news for hype. The win is not that evolution suddenly stopped mattering. The win is that a giant model may have learned to carry evolutionary alignment machinery around in its weights. (p([pnas.org)he bottom line is that this paper makes protein language models feel less mystical and more legible. Turns out the new tools may be powerful partly because they rediscover, compress, and replay the old evolutionary logic — just in a much more convenient package. (pnas.org)

PNAS: protein LMs mirror MSA

Get your own daily briefing