DrugCLIP screens 10 trillion pairs daily

- Researchers from Tsinghua University and collaborators reported in Science in January 2026 that DrugCLIP can run genome-wide virtual screening in about one day. - The paper said DrugCLIP screened more than 10 trillion protein-ligand pairs across about 10,000 human proteins and 500 million molecules on eight GPUs. - GenomeScreenDB and related DrugCLIP outputs are publicly available through the project site and code repositories maintained by the authors.

Researchers from Tsinghua University and collaborators reported in *Science* in January 2026 that an AI system called DrugCLIP can screen more than 10 trillion protein-ligand pairs in under 24 hours on eight GPUs. The team said the run covered about 10,000 AlphaFold-predicted human protein structures against a library of 500 million compounds and produced a public resource called GenomeScreenDB. The work was led by Yanyan Lan, with co-authors including Yinjun Jia and Bowen Gao, according to the paper. The paper frames the method as a way to make genome-scale virtual screening practical in the post-AlphaFold era. ### How is DrugCLIP different from standard docking? DrugCLIP treats virtual screening as a retrieval problem rather than a step-by-step physics-style docking calculation. The *Science* paper said the model encodes protein pockets and small molecules into a shared latent space, then searches that space with dense-retrieval methods similar to search engines. Traditional docking estimates how a ligand fits into a protein pocket one pair at a time, which makes very large screens expensive. (science.org) The authors said existing docking and many deep-learning approaches were too computationally costly for genome-wide coverage, and reported that DrugCLIP outperformed docking and other deep-learning baselines on speed and accuracy in DUD-E and LIT-PCBA benchmark tests. ### Where does the “10 trillion pairs a day” figure come from? The paper’s genome-wide screen paired about 10,000 human proteins with 500 million compounds, which yields more than 10 trillion protein-ligand combinations. *Science* said the full campaign was completed within one day on eight GPUs. The social-media claim that DrugCLIP is “10 million times faster than docking” appears to compress broader speed comparisons into a single headline number. (science.org) The primary paper and related coverage support a million-fold speedup claim over conventional methods, while also stating that the system can process trillions of pairs daily; that means the exact “10 million” figure should be treated as a social-post shorthand unless the authors publish a benchmark with that wording. ### What is GenomeScreenDB, exactly? GenomeScreenDB is the output of that genome-wide screening campaign. The *Science* paper said the database contains more than 2 million potential hit molecules for more than 20,000 pockets from around 10,000 human targets. The project repository says the database is available under a CC BY 4.0 license, while model weights and generated outputs are non-commercial under CC BY-NC 4.0. (phys.org) The same repository provides code for chunked screening and links to model weights and encoded embeddings. ### Did the researchers show anything beyond speed? The authors reported benchmark gains on DUD-E and LIT-PCBA and said DrugCLIP identified ligands for the serotonin 2A receptor and the norepinephrine transporter. (science.org) The *Science* summary said two 5HT2AR agonists had median effective concentration values below 100 nM, and two NET inhibitors were structurally validated by cryo-electron microscopy. (github.com) The paper also said DrugCLIP, combined with a pocket-refinement module called GenPack, found inhibitors for TRIP12, which the authors described as a less explored target with no reported holo structure or ligand. That matters because the system is being presented not only as a faster filter, but as a way to work with AlphaFold-predicted structures and harder targets. ### What should readers be careful about when they see the viral claim? (science.org) The most solid numbers are the ones in the paper: about 10,000 proteins, 500 million compounds, more than 10 trillion pairs, under 24 hours, and eight GPUs. Those figures are supported by the *Science* article and secondary coverage. The May 22 social posts are useful as a pointer, but the paper is the primary source for what DrugCLIP actually did. (science.org) Readers who want to inspect the outputs can use GenomeScreenDB and the public code repositories, where the authors have posted the database, model details and screening scripts. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.