HealthFormer predicts 27 of 30 endpoints

- Weizmann Institute researchers posted HealthFormer on April 30, a generative transformer that models longitudinal human physiology from 15,000-plus deeply phenotyped participants. - The headline result is breadth: better prediction on 27 of 30 disease or mortality endpoints, plus intervention simulation matching all 41 trial directions. - It pushes “digital twin” medicine closer, but this is still a preprint—not a clinical tool or substitute for randomized trials.

A health model is only useful if it can do more than spit out one score. It has to track how a body changes over time, across sleep, blood tests, glucose, microbiome, behavior, and medication use. That is the gap HealthFormer is trying to close. The new preprint, posted April 30 by researchers at the Weizmann Institute of Science and collaborators, frames the model as a generative “world model” of human physiology rather than a single-purpose risk predictor. (arxiv.org) ### What is HealthFormer? HealthFormer is a decoder-only transformer trained on longitudinal data from the Human Phenotype Project, a cohort with more than 15,000 deeply phenotyped people followed across multiple visits. The model turns each person’s trajectory into tokens across 667 measurements spanning seven domains — blood biomarkers, body composition, sleep physiology, continuous glucose monitoring, gut microbiome, wearable physiolo(arxiv.org)cally, it tries to learn the sequence of how a person’s health state tends to evolve. (arxiv.org) ### Why is that different from a normal clinical model? Most clinical models do one narrow job. They predict diabetes risk, or cardiovascular risk, or mortality, and they are trained specifically for that endpoint. HealthFormer is built from one general objective — predict the next pieces of a person’s physiological trajectory — and then answer many downstream questions from that shared model. That matters because real health is not silo(arxiv.org)iomarkers. Behavior affects body composition. The pitch here is that one model can capture those cross-links instead of treating them as separate spreadsheets. (arxiv.org) ### How good was it at prediction? The eye-catching number is 27 of 30. In the paper’s tests, HealthFormer improved prediction for 27 incident-disease and mortality endpoints and beat established clinical risk scores in every comparison the authors report. It also transferred to four independent cohorts without task-specific retraining. That is a strong result if it holds up, because cross-cohort transfer is where many biomedical models start to wobble. (arxiv.org) ### What about interventions? This is the more interesting part. The authors did not just ask whether the model can forecast what happens next. They asked whether it can simulate what happens if you change something. In a held-out personalized nutrition trial, intervention-conditioned predictions recovered six-month biomarker changes at the individual level — including a Pearson correlation of 0.78 for diastolic blood pressure. That is t(arxiv.org)hints at a model you can query with a “what if.” (arxiv.org) ### Did it line up with real trials? To a point, yes. The paper compares model outputs with 41 randomized intervention-outcome comparisons drawn from published trials. The predicted direction of effect matched all 41, and the predicted mean landed inside the reported 95% confidence interval in 30 cases. That does not mean the model has replicated those trials. But it does suggest the model is learning something more structured than loose correlation. (arxiv.org) ### So is this a medical digital twin? Not yet — but that is clearly the ambition. A useful analogy is a weather model. You do not ask it for one fixed answer; you ask how conditions change under different inputs. HealthFormer is trying to do that for metabolism and physiology. The catch is that medicine is messier than weather, and even impressive retrospective validation can break when a model meets real clinical decisions. (arxiv.org([arxiv.org)are the limits right now? The biggest one is simple: this is a preprint, not a deployed clinical system. The work has not yet cleared peer review, and the paper itself positions HealthFormer as an “initial” health world model. Also, outperforming risk scores and matching trial directions is not the same as proving causal reliability for treatment decisions. A model can be directionally right and still miss who benefits, who gets harmed, or why. (arxiv.org) ### Bottom line The real news is not just that HealthFormer predicted a lot of endpoints. It is that one multimodal transformer appears able to forecast health trajectories and run plausible intervention simulations from the same learned representation. If that survives external testing, it could become a serious research tool for biomarker discovery and trial design. But for now, it is an impressive map — not the territory. (arxiv.org)

HealthFormer predicts 27 of 30 endpoints

Get your own daily briefing