AI spies thousands of proteins

- New AI tools are being used to rapidly identify immune and other proteins across bacterial genomes. - MIT's DefensePredictor found 600+ E. coli immune proteins in minutes, while Pasteur AI predicted 2.4 million proteins. - These high-throughput AI scans help map bacterial defenses and speed protein engineering research ( ).

Bacteria carry protein-based defenses against viruses, and two new artificial intelligence systems are now scanning whole genomes fast enough to find them at scale. (science.org) Proteins are the working parts of a cell, and bacterial immune proteins can block, cut, or sabotage invading phages — the viruses that infect bacteria. More than 250 bacterial antiviral systems have already been validated, but many more remain uncharacterized. (nature.com, science.org) Older genome-mining methods often looked for defense genes clustered in “defense islands,” which misses proteins scattered elsewhere in the chromosome. DefensePredictor instead uses embeddings from a protein language model — a statistical map of sequence patterns — to classify single proteins as likely defensive. (biorxiv.org, science.org) In a Science paper published in April 2026, researchers reported that DefensePredictor scanned 69 diverse Escherichia coli strains and flagged 624 proteins as defense-related with high confidence. More than 100 of those had no detectable similarity to previously known defense proteins. (science.org) The same study said the team experimentally validated 45 previously unknown systems, showing the model was not just rediscovering familiar genes. Nearly half of the predicted defense proteins were outside plasmids, prophages, and classic defense islands. (biorxiv.org, science.org) A second April 2026 Science paper from Institut Pasteur researchers pushed the search much wider, across more than 32,000 bacterial genomes. That team estimated that about 1.5% of genes in a typical bacterial genome are devoted to antiviral defense, and that more than 85% of predicted defense-associated protein families had no prior link to immunity. (science.org) Nature, describing the pair of studies, said the two teams built databases containing thousands of antiviral defense proteins that could feed future biotechnology work. One of the underlying genome resources now includes 2,440,377 bacterial assemblies, giving these models a much larger search space than earlier surveys. (nature.com, biorxiv.org) That matters because bacterial defense proteins have already yielded laboratory tools, including CRISPR-based gene editing, and researchers have argued that some parts of animal innate immunity trace back to bacterial systems. Faster searches let labs move from sequence databases to testable protein candidates in days instead of years of manual curation. (nature.com, biorxiv.org) The bottleneck now shifts from finding candidates to proving what each protein does in cells. The new studies suggest the catalog of bacterial antiviral machinery is much larger than the field’s named parts list. (science.org, science.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.