Federated Learning Powers Rare Disease Research

DNAstack and PacBio are part of a new global consortium that has created the first federated dataset of HiFi whole-genome sequences. The approach uses federated learning to enable rare disease research across sensitive datasets while preserving patient privacy. This model is critical for healthcare data sharing where information cannot be centrally pooled due to regulations.

The HiFi Solves Global Consortium, launched in 2023, brings together nearly 30 clinical and research institutions across 15 countries to advance rare disease research using PacBio's HiFi sequencing. This initiative has already gathered or has commitments for over 10,000 HiFi whole-genome sequences, creating one of the largest and most diverse federated datasets for this purpose. This federated approach, powered by DNAstack's platform, allows researchers to run queries on harmonized data across different institutions without centralizing the protected information. This model is crucial for overcoming the data silos that often hinder rare disease research, which is challenged by fragmented and scarce data. It enables international collaboration while adhering to regional data privacy regulations like GDPR and HIPAA. The use of PacBio's HiFi long-read sequencing is significant because it provides a more comprehensive and accurate view of the genome compared to short-read technologies. It excels at identifying complex structural variants, repetitive sequences, and other variations that are often missed by other methods and are linked to rare diseases. A 2025 study by consortium members demonstrated that HiFi sequencing detected 100% of known clinically relevant variants in certain complex genes. This model of "moving questions to the data" rather than moving the data itself represents a significant shift in how large-scale genomic analysis is performed. Instead of researchers accessing raw patient data directly, they submit queries to the federated network. The system then returns aggregated results, preserving patient privacy and data sovereignty for each participating institution. The consortium's work addresses a critical need, as an estimated 300 million people worldwide are affected by rare diseases, and up to half of these cases remain unsolved with traditional diagnostic methods. By increasing the statistical power through global data collaboration, researchers can better interpret rare genetic variants and accelerate the time to diagnosis for patients and their families.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.