Thoughtson_tech: multimodal prep costs $500K–$1M
- On May 21, 2026, healthcare builders circulated a post by thoughtson_tech saying multimodal preprocessing layers can take 6–12 months and cost $500,000–$1 million. - The post’s most concrete claim was that joining DICOM imaging, HL7/FHIR records, notes and vitals is the expensive work before models run reliably. - Next steps are visible in standards work from HL7, DICOM and ONC around FHIR, imaging mappings and core data definitions.
A May 21 post by thoughtson_tech put a hard number on a problem many healthcare AI teams describe more vaguely: the preprocessing layer needed to unify imaging, records, notes and monitoring data can cost $500,000 to $1 million and take six to 12 months to build. The estimate referred to the work required before a model can operate on multimodal clinical data at production scale. The post cited DICOM, HL7/FHIR, clinical notes and vital signs as the main inputs that have to be normalized and linked first. That claim aligns with ongoing standards work and published integration research showing that the bottleneck is often data plumbing, not model access. ### Why does the data layer eat so much time and money? DICOM and FHIR were built for different jobs, and that split is part of the cost. DICOM is the standard for transmitting, storing and displaying medical imaging information, while FHIR is used to exchange clinical data across healthcare systems. Bridging those systems means mapping different identifiers, timestamps, encounter structures and metadata conventions into a form that downstream applications can use consistently. (pmc.ncbi.nlm.nih.gov) A recent peer-reviewed paper on large-scale integration of DICOM metadata into HL7 FHIR described an end-to-end pipeline to extract, convert and pseudonymize imaging metadata into FHIR “ImagingStudy” resources for research repositories. The paper framed that work as filling a gap between routine imaging data and broader clinical data environments, which is the same integration gap the social post was pointing to. (dicomstandard.org) ### What has to be built before a model can use the data? ONC’s USCDI framework defines a standardized set of health data classes and elements for interoperable exchange, including clinical notes and imaging narratives. But USCDI defines what data should be exchangeable; it does not remove the implementation work of extracting data from source systems, cleaning it, reconciling patient identity, handling missing fields and matching events across systems. (pmc.ncbi.nlm.nih.gov) HL7 and DICOM have also been working on mapping projects because imaging outputs and non-imaging systems do not naturally line up. An HL7 proposal on DICOM Structured Reports to FHIR Observation mapping says AI findings may be captured in DICOM SR for radiology workflows but need to be encoded as FHIR observations to work with non-imaging systems. That is one example of why preprocessing becomes a product in its own right. (isp.healthit.gov) ### Where do teams usually underestimate the effort? Patient matching, pseudonymization, governance and exception handling are recurring sources of delay. The Erlangen integration paper said its pipeline had to extract, convert and pseudonymize DICOM metadata before integration into repositories. In live clinical environments, teams also have to manage versioning, audit trails, access controls and data quality checks across separate vendors and legacy systems. (confluence.hl7.org) DICOM’s own standards activity reflects that the problem is still active. The DICOM standards body lists a working group focused on integration of imaging and information systems using HL7 standards, with projects spanning imaging service requests, DICOM SR to FHIR mappings and DICOMweb-FHIR integration. ### What does that imply for product strategy? The practical implication is narrower scope. (pmc.ncbi.nlm.nih.gov) If preprocessing is the expensive, slow-moving part, teams usually get faster payback by choosing a small number of high-volume integrations rather than promising broad multimodal coverage from the start. The thoughtson_tech estimate points in that direction, and the standards landscape supports it: healthcare interoperability is expanding, but the implementation burden remains specific to each workflow and data source. (dicomstandard.org) The next places to watch are standards and implementation guides. HL7’s US Core guidance continues to map USCDI requirements into FHIR profiles, while DICOM and HL7 working groups are still developing imaging-related mappings and implementation guidance for production use. (build.fhir.org)