AI solutions are quickly becoming a staple in the mainstream healthcare tech stack with 2 out of 3 physicians now reporting some use of AI in 2024—an increase of nearly 80% over 2023. But while most physicians report use cases around daily workflow tasks like documentation, billing and notes, AI is showing the most promise and value in addressing the complex needs of high-variance, high-volume data analytics in areas like genomics, cancer stratification, and chronic disease prediction.
These applications in the diagnosis, treatment and delivery of personalized medicine have the potential to expand access to care, predict and optimize for better outcomes, and allow for mass customization of treatment plans to save lives.
The problem is that sources of healthcare data are a patchwork of various platforms, networks and standards, which makes fully leveraging the data and harnessing its potential challenging. Every integration, acquisition, Electronic Health Record (EHR) instance, departmental tool or customization, and even every geographic location adds complexity. Fragmentation is baked in, and it’s not just inconvenient. The disconnects and disparities can actively degrade model performance, potentially impacting care quality and treatment efficacy.
To address these issues, AI-powered application developers need integration solutions that can overcome the data and deployment challenges with efficiency, accuracy and trust. Here’s how modern interoperability, integration and exchange solutions are allowing developers to conquer some of the biggest hurdles in AI deployment.
Massive data sets
It’s well known that genetic sequencing is key in developing personalized treatments for everything from pain management to mood disorders and even cancer treatment. Yet a single genome can contain up to 20 GB of data, which makes analyzing that volume at scale extremely difficult. Adding to the complexity, clinical data is multi-modal. Treatment decisions require the AI tool to make sense of not only patient genomics, but also lab data, imaging and clinician notes. But that kind of interpretation is impossible without infrastructure that can connect, match and normalize across different data sources, formats and standards.
SOPHiA GENETICS overcomes these obstacles by using an automated data pipeline that supports secondary and tertiary analysis, seamlessly integrating EHR, LIMS, and sequencing data through flexible API-enabled connections. The SOPHiA DDM(TM) Platform processes over 1.9 million genomic profiles globally, delivering AI to support cancer diagnosis and treatment decisions with high accuracy. Combined with edge integration architecture, which delivers real-time data at speeds up to 10 GB/sec and keeps PHI securely within firewalls, clinicians gain faster, more secure access to actionable insights – directly within their existing workflow.
Interoperability & disparate systems
Aside from the sheer volume of data, the need to integrate it from a wide range of sources to get a complete picture of a patient is a formidable challenge. Instance customizations, variations in local and specialization standards, and the risk of incomplete data further complicate integration.
For example, Zephyr AI combines data from medical and pharmacy claims, EHRs, medical devices, genetic sequencing, cancer registries, molecular clinical data, and more to help insurers anticipate patient health risks and prioritize opportunities to improve outcomes and lower costs. It pulls from datasets that include ICD, SNOMED, CPT, and/or LOINC codes, while other data are semi-structured or unstructured text.
But when the data arrives, it’s often messy and disorganized. In just one example, Zephyr found more than 60 different text strings used to represent a single eGFR lab test, which mapped to 10 different LOINC codes across 5.4 million rows of data. That’s just for one test—multiply that by site-specific naming, local conventions, and inconsistent coding and it’s clear that if the data wasn’t harmonized, it wouldn’t just be messy; it would be completely unusable.
To overcome the problem, Zephyr uses an automated terminology management platform that streamlines code set mapping, validates any codes required and confirms formatting. It filters out irrelevant data and curates features of interest—for example, predicting which patients are at the highest risk for an adverse outcome such as a diabetic foot ulcer.
This automation not only allows Zephyr to prep 98 billion rows of messy real-world data for machine learning in under 30 days but also operationalize its workflows and apply them at scale to other parts of the pipeline. For example, additional mappings have uncovered another 35 million lab results from unstandardized text values that were previously unusable, all from just one team member uploading a code system, creating the properties and operationalizing the API in a single day. The result: faster insights, dramatically lower manual time commitment, and accelerated time to value.
Bridging new & legacy systems
The patchwork of healthcare IT systems isn’t limited to just various EHRs; version drift across systems is also a major obstacle. Within the same health system, you might have one location using ICD-10 2021 and another that’s updated to the 2023 edition, not to mention a wide range of legacy standards like FHIR, HL7v2, DICOM, plus a wide range of both proprietary and standard data models. This variation in standards, formats and implementations means that simply connecting to an EHR once doesn’t mean you’ve solved for EHR integration.
This is especially true in M&A situations, where data inconsistency across acquired systems can be a major barrier to AI adoption. Take the West Virginia University Health System as an example, which has acquired and connected 21 hospitals, five specialty institutes—including a groundbreaking Alzheimer’s research program and the Rockefeller Neuroscience Institute—along with numerous rural health clinics across five states. Each acquisition brought its own legacy systems, data formats, and workflows.
For health systems looking to leverage AI, this kind of variation poses a serious challenge: If the underlying data isn’t standardized, AI models can’t deliver accurate or safe insights. The WVUHS team was able to leverage a low-code, GUI-based integration engine that made programming integration tasks easy, eliminated extra manual coding and accelerated development and testing. This allowed the team to onboard new facilities and systems faster, saving interface development time by over 50% by using “templates” to operationalize integration for repeated application and fewer errors and variance.
WVUHS’s approach demonstrates what it takes to prepare for AI: A strong integration strategy that ensures data consistency, interoperability, and continuous care delivery—without downtime. It’s a blueprint for how provider organizations can build the technical foundation needed to successfully adopt AI.
Data quality, trust & accuracy
AI solutions have already faced an uphill battle in winning the trust of clinicians, so ensuring data quality, accuracy and security is an essential foundation of any development effort.
As with any AI solution, the output is only as good as the inputs no matter how good the model is. A recent CHIME ViVE focus group of CIOs reinforced this concern: many confessed they’re drowning in data from acquired organizations, with no audit trail and minimal transparency. If the data isn’t normalized and governed, no AI built on top of it can be trusted because we cannot tell what information it’s pulling.
In order for personalized medicine to work, platforms must connect genetic profiles, clinical encounters, lab tests, and treatment outcomes all to the same individual. Yet disparate records and inconsistent person data are major hurdles. With person data split across systems and sites, connecting insights at the individual level requires efficient and accurate reconciliation. Something as common as a misspelled name (“Jonh” vs. “John”), transposed birth dates (04/12/1975 vs. 12/04/1975), or conflicting address formats (e.g., “123 N Main Street” vs. “123 North Main St. Apt #4”). Basic matching algorithms often fail to reconcile these differences, fragmenting the patient record and generating AI insights that are incomplete—or worse, dangerously misleading.
Solutions that leverage automated person data management to connect patient records across systems and sites turn disparate, messy records into clean, AI-ready inputs. Localization support to handle real-world variations can ensure consistency across multiple sites, and platforms that can scale across global regions offer even greater flexibility for analyzing larger population data sets.
Not to mention, clinicians are more likely to trust something they’ve helped build than something simply handed to them. Co-design isn’t optional, and neither is transparency about where models can break down. Involving end-users from the start is key to understanding real-world use cases and building buy-in. If the system creates more work or doesn’t help them make decisions faster and more confidently, they’ll ignore it, no matter how technically impressive it is.
Delivering solutions, not promises
The need for cost-effective, efficacious treatment and prevention strategies has never been greater, and AI has the potential to make a significant impact through advanced diagnosis, treatment development and personalized medicine. Yet the talent deficit has also never been deeper, both at the point of care and in the specialized skills needed for clinical integration of next-generation AI tools.
Health systems don’t need more tools. They need better tools that plug into what they’re already using, reduce manual labor and overhead, and make clean data the default, not the exception. Through efficient, accurate and scalable data integration, AI developers can deliver infrastructure that enables reuse, compliance, and deployment speed — without creating more tech debt.