Transformer designs MHCII peptides

Researchers trained a Transformer model to design peptides that bind MHC class II molecules and validated the best candidates using AlphaFold3 structural predictions. The work used evolutionary data across diverse HLA allotypes, and the authors released their code and datasets openly, making it easy to reproduce and extend. This is a clear example of computational biology moving from method papers toward practical, shareable toolsets. (x.com)

Your immune system does not inspect whole viruses or bacteria. It inspects short protein fragments, like torn scraps of a label, that are displayed on cell-surface holders called major histocompatibility complex class II molecules. (frontiersin.org) Those holders sit mainly on dendritic cells, B cells, and macrophages, and they show outside material to CD4-positive helper T cells. A helper T cell only reacts if the fragment fits the holder well enough to be seen. (frontiersin.org) In humans, these holders are encoded by human leukocyte antigen genes, and those genes vary a lot from person to person. That variation means a peptide that fits one person’s holder can miss another person’s by just enough to fail. (merckmanuals.com) That is why vaccine and immunotherapy design keeps running into the same bottleneck: finding peptides that bind across many human leukocyte antigen versions. Researchers have chased “promiscuous” class II peptides for years because broad binding gives broader population coverage. (aacrjournals.org) The hard part is that class II peptides are floppy before they bind. The new paper says that flexibility makes conventional structure-based design difficult, because the peptide does not hold one clean shape in advance. (academic.oup.com) So the authors worked backward from sequence patterns instead of starting from a fixed 3D structure. They trained a Transformer neural network on evolutionary signals taken from known binding peptides, including how often each amino acid appears at each position and which amino-acid pairs tend to appear together. (academic.oup.com) A Transformer is the same family of model used in modern language systems, but here the “words” are amino acids. It learns which residues can swap in, which positions have to stay constrained, and which long-range combinations still look like real binders. (academic.oup.com, github.com) The team did not stop at one generator. Their public code also includes a Monte Carlo simulated annealing search, which is a trial-and-error optimizer that mutates sequences step by step and keeps changes that improve the score, producing candidate sets labeled L1, L2, MC1, and MC2. (github.com) Then they checked the best designs with AlphaFold 3, Google DeepMind’s 2024 model for predicting biomolecular interactions. The paper reports designed peptides with predicted binding affinities comparable to native peptides and AlphaFold 3 confidence scores above 90 on the bound complexes. (academic.oup.com, alphafoldserver.com) The paper was published in Bioinformatics Advances on March 28, 2026, and the GitHub repository is already public with code, data folders, and training scripts. That makes this less like a one-off model demo and more like a reusable workflow other labs can run, test, and modify. (academic.oup.com, github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.