HIPAA compliant AI collaboration for Cancer Research | tracebloc

Compliant AI Collaboration on PHI Data Across Borders

Organizations increasingly collaborate with specialized AI/ML partners who possess deep expertise in specific diseases and model architectures or deliver the best embeddings. These partnerships become essential when tackling complex, multimodal problems that require specialized capabilities not available in-house.

Especially for international organizations headquartered in the US this poses a challenge, since HIPAA compliance regulations prohibit data sharing across borders. Yet external partners still need a way to run, test, and improve models on real data, otherwise evaluation is not meaningful.

Goal

Build a model that helps oncologists determine the most effective radiation dosage for individual cancer patients based on their specific patient journey and clinical profile. The goal is to leverage real-world evidence across comparable patient cohorts to support treatment decisions with meaningful clinical impact.

Current situation and the path to a more holistic view

Basic algorithmic approaches using single-source data are already widely established in clinical practice. However, modern healthcare generates multimodal data across entire patient journeys - imaging studies, demographics, structured EHR records, treatment plans, radiation therapy reports, surgical notes, and unstructured clinician documentation.

By reframing this as a big data problem and taking a more holistic view, one can unlock significantly better outcomes. The approach involves extracting valuable signals from large document stores, like surgery docs, radiation reports, patient characteristics etc. and using embeddings to build a unified semantic view of each patient's complete journey across all data sources. From this comprehensive view, a model can be trained to predict the optimal radiation dosage for a cohort and a specific patient profile. Then it can be checked by statisticians and reviewed by clinicians.

Compliant collaboration on PHI data across borders

Using tracebloc, one can keep patient data within the organization's secure environment and instead bring vendor models to the data. No PHI leaves the infrastructure at any point.

Vendors access the secure compute environment through the tracebloc platform. They can submit model code, run training and evaluation on the full multimodal dataset, fine-tune their approaches, and iterate rapidly - all without seeing or accessing patient records directly. Model weights and outputs remain under the data owner's control and within their infrastructure throughout the entire development process.

Example: Prostate cancer radiation dosing model

A exemplary collaboration was conducted on a multimodal prostate cancer dataset to predict optimal radiation doses based on comprehensive patient characteristics and tumor profiles across multiple data sources.

Get specifics about:

Model performance on the leaderboard

Compare how vendor models perform on the real multimodal evaluation dataset versus baselines.

Clinical data in the exploratory data analysis

Understand patient cohort characteristics, treatment patterns, and outcome distributions across multiple data modalities without accessing individual patient records.

Training iterations in the model training section

Review external approaches, including fine-tuning strategies and learning curves demonstrating improvement on the training data.

- use tracebloc for research collaborations on sensitive EHR data

- get model benchmarks from external partners and stay 100% compliant