Uni‑FEP benchmarks industrial FEP
- Atombeat’s Uni‑FEP team published an open benchmark for binding free‑energy simulations, built from roughly 1,000 ChEMBL‑derived protein‑ligand systems and about 40,000 ligands. (chemrxiv.org) - The point is scale and realism: the set includes scaffold hops, charge changes, and other medicinal‑chemistry moves that older public FEP tests often avoid. (chemrxiv.org) - It matters because open FEP validation is shifting from toy public sets toward cross‑company, workflow‑level tests that look more like actual lead optimization. (industrybenchmarks2024.readthedocs.io)
Free‑energy perturbation — usually shortened to FEP — is one of the most serious physics‑based tools drug hunters have for guessing which molecule will bind a target protein more tightly. The promise is obvious: run simulations before you make compounds, and you waste less chemistry. (chemrxiv.org) But the annoying part has always been the benchmark story. A lot of published FEP benchmarks look cleaner than real medicinal chemistry. Atombeat’s new Uni‑FEP benchmark is basically an attempt to fix that by making the test set much bigger, messier, and closer to what industrial lead optimization actually looks like. ### What is FEP actually trying to predict? FEP estimates the change in binding free energy when you modify one ligand into another inside a protein binding site and in solvent. (industrybenchmarks2024.readthedocs.io) In plain English, it tries to answer a medicinal chemist’s favorite question: if I swap this group for that one, do I gain potency or lose it? That is why relative binding free energy, not just docking score, matters so much in lead optimization. ### Why weren’t older benchmarks enough? The main complaint is that many public FEP sets were curated around what current methods already handle well. That makes them useful, but also flattering. Uni‑FEP’s authors argue that these cleaner sets understate the ugly parts of real projects — weird chemotypes, larger transformations, protonation headaches, and structure prep decisions that can swing results. (chemrxiv.org) ### What did Uni‑FEP release? The new benchmark is an open repository plus a preprint. The preprint says the team curated about 1,000 protein‑ligand systems from ChEMBL and around 40,000 ligands, which would make it significantly larger than earlier public FEP benchmarks. The GitHub repo frames it as a long‑term benchmark project for tracking Uni‑FEP performance over time, not just a one‑off paper dump. (chemrxiv.org) ### What makes this set feel more industrial? The chemical moves are the key. The benchmark includes scaffold replacements, charge changes, and other transformations that are common in medicinal chemistry but harder for alchemical methods to handle reliably. That matters because a benchmark stops being useful if it only rewards tiny, low‑risk edits — the simulation equivalent of taking a test you already know the answers to. (chemrxiv.org) ### So is this benchmarking Uni‑FEP or benchmarking FEP? Both, but not in the same way. Formally, it is a benchmark for Uni‑FEP, Atombeat’s workflow. But because the dataset is open and structured, it also becomes shared infrastructure for comparing other FEP stacks. (chemrxiv.org) The repo even includes an “FEP Open Challenge,” which invites outside researchers to submit public or published cases for comparison. ### How does this fit with the rest of the field? There is a broader shift happening. OpenFE’s industry benchmark project is also trying to validate free‑energy workflows on both public and private datasets, and a December 2025 update said OpenFE had been tested across more than 1,700 ligands with industry partners. (chemrxiv.org) So the field is moving away from single‑lab, hand‑picked demos and toward larger workflow‑level validation. ### What’s the catch? A huge benchmark does not magically settle accuracy. It mostly tells you where methods break, how often setup choices matter, and whether results stay reproducible across many systems. Uni‑FEP’s value is less “we proved FEP is solved” and more “here is a harder exam.” That is a big deal, because realistic failure maps are what method developers and pharma users actually need. (github.com) ### Bottom line? The interesting part is not just that Uni‑FEP posted a bigger benchmark. It is that open FEP benchmarking is starting to look like real drug discovery instead of a classroom exercise. If that trend sticks, the winners will not just be the prettiest physics engines — they will be the workflows that survive messy chemistry, messy structures, and messy reality. (industrybenchmarks2024.readthedocs.io) (chemrxiv.org)