
Combination MultiOmics: Validating Therapy Response on an External Cohort
Participants
10
End Date
01.04.26
Dataset
dgizwed1
Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)
Compute
0 / 100.00 PF
Submits
0/5

10
01.04.26
On this page
External Validation of Multi Omics Response Models in Combination Targeted Therapy
Combination targeted therapies are increasingly central to pediatric oncology and oncology adjacent rare disease programs. The rationale is clear: targeting two pathways simultaneously can overcome resistance mechanisms that limit monotherapy efficacy. But predicting which patients will benefit from the combination over either single agent requires multi omics profiling that captures the interplay between genomic, proteomic, and metabolic response. Pharma and biotech companies build these predictive models on their own trial data, but without external validation on an independent cohort, there is no evidence the model generalizes beyond the original study population.
tracebloc provides secure access to an independent multi omics combination therapy dataset held at a clinical institution. Researchers can validate internally developed models on external data, or build new models from scratch, without the data leaving the hospital. The dataset includes three treatment arms (two monotherapies and their combination) with integrated genomic, proteomic, metabolomic, and clinical features alongside a continuous treatment response outcome.
To be completed after evaluation concludes.
SCIVIAS: Seeing Childhood Illness through Multi Omics
SCIVIAS is a monocentric observational study conducted at the Dr. von Hauner Children’s Hospital, LMU Munich, led by Prof. Dr. Dr. Christoph Klein. The study combines retinal imaging (fundus photography, OCT) with multi omics profiling (genome, transcriptome, proteome, metabolome) to identify early diagnostic markers for rare and chronic childhood diseases.
The core premise: children with rare diseases are often diagnosed only when complications arise. SCIVIAS aims to change this by integrating pattern recognition on retinal images with multi layer omics data, using machine learning to detect disease signatures before clinical manifestation. All omics data and retinal images are pseudonymized and processed through ML algorithms, comparing data both within defined disease groups and across phenotypes to uncover pleiotropic factors.
The cohort consists of 2500 patients and covers 13 therapeutic areas including IBD (Crohn’s, ulcerative colitis, celiac disease), cystic fibrosis, Duchenne muscular dystrophy, spinal muscular atrophy, and other rare pediatric conditions.
Ethics approval: LMU Munich, approval no. 17–801. German Clinical Trials Register: DRKS00013306.
Study page: https://www.ccrc-hauner.de/clinical-research/scivias-study
For this challenge, a subset of the SCIVIAS cohort provides multi omics data from patients treated across three arms: two monotherapies and their combination. The dataset integrates gene expression, protein levels, metabolomic profiles, and clinical measurements with a continuous treatment response label, creating a unique resource for modeling combination therapy effects from molecular data.
Combination targeted therapy is standard practice in pediatric oncology and is expanding into oncology adjacent rare disease indications. The clinical logic is straightforward: if a tumor or disease relies on two signaling pathways, blocking both simultaneously should produce better outcomes than blocking either alone. In practice, this is harder to predict than it sounds. Not every patient benefits from the combination, some respond equally well to a single agent, and others experience additional toxicity without additional efficacy.
The central question for pharma and biotech companies running combination trials: which patients are combination responders, and can this be predicted from baseline molecular data? Multi omics profiling captures information across the biological layers that combination therapy targets: genomic mutations that define pathway dependencies, transcriptomic and proteomic activity that reflects current pathway state, and metabolomic readouts that capture downstream functional consequences. Integrating these layers into a response prediction model is the analytical foundation for precision combination therapy.
This challenge is designed for a scenario that is increasingly common in combination therapy development: a pharma company has already built a multi omics response model on its own trial data. The model uses gene expression, proteomic, and metabolomic features to predict which patients benefit from the combination over monotherapy. The company now needs to validate this model on an independent cohort. Three validation scenarios apply:
A) Feature relevance: Are the omics features that were selected as predictive on internal data still informative in this independent cohort? If the internal model relied on specific gene expression signatures or metabolite ratios to distinguish combination responders, do those same features carry signal in a different patient population?
B) Performance consistency: Does the model achieve comparable predictive performance (as measured by MSE) on external data? A significant performance drop indicates overfitting to the original trial, drug specific effects, or population characteristics that do not transfer.
C) Both: In most real world validation exercises, both questions need to be answered simultaneously. A model that uses the right features but predicts poorly, or predicts well but on different features, provides incomplete validation evidence.
This dataset supports all three scenarios. Researchers can submit their pre trained model directly, re train on external data using their original feature set, or build entirely new models to benchmark against their internal results.
Researchers work with a multi omics combination therapy dataset (240 samples, 293 features). Each sample represents one patient assigned to one of three treatment arms: two monotherapies and their combination. The treatment assignment is encoded as a categorical variable. The dataset spans four omics layers plus clinical measurements, with a continuous label representing treatment response magnitude.
The continuous label (range approximately -0.2 to +0.4) likely represents a normalized treatment effect score, where higher values indicate stronger response. This is a regression task, not classification. Feature names are anonymized across all omics layers.
Combination therapy trials generate complex multi omics datasets, but individual companies typically have data from only their own trial, with one specific drug combination, one dosing regimen, and one patient selection criteria. Validating whether molecular predictors of combination benefit generalize across cohorts requires access to independent combination therapy data, which is extremely rare commercially. Sharing this data externally creates compliance and re identification risk. tracebloc resolves this: researchers submit models that execute on the external data inside the hospital, and only aggregate performance metrics are returned.
Regression: predict the continuous treatment response score from integrated multi omics features and treatment assignment. The model receives gene expression levels, protein measurements, metabolomic profiles, clinical variables, and the treatment arm (monotherapy A, monotherapy B, or combination), and must predict the magnitude of treatment response. The treatment variable adds a layer of complexity: the model must learn not just which molecular features predict response, but how treatment context modulates that relationship.
Mean Squared Error (MSE). Lower is better. MSE directly measures the average squared deviation between predicted and actual treatment response. In a combination therapy context, accurate magnitude prediction matters: the difference between a predicted response of 0.35 (strong combination benefit) and 0.05 (marginal benefit) drives clinical decisions about whether to escalate to combination therapy or maintain monotherapy.
293 features across 240 samples. This is a high dimensional, small sample regime (p/n ratio > 1) where overfitting is the primary risk.
| Feature Block | Count | Notes |
|---|---|---|
| Gene expression | 100 | Continuous. Transcriptomic features. |
| Protein levels | 100 | Continuous. Proteomic measurements. |
| Metabolites | ~70 | Continuous. Metabolomic profiles. |
| Clinical measurements | 20 | Continuous. Clinical phenotype variables. |
Additionally: a patient identifier, a treatment variable (three arms: two monotherapies and their combination, approximately balanced), and a continuous label representing treatment response magnitude.
Continuous, approximately normally distributed around a mean near zero, with values ranging from slightly negative to moderately positive. This suggests a normalized treatment effect where zero represents no response and positive values represent increasing benefit.
Combination therapy response prediction from multi omics data is both analytically complex and scientifically consequential. Researchers face multiple modeling decisions simultaneously: how to encode the treatment variable, whether to build treatment specific or treatment agnostic models, which omics layers to integrate, and how to handle the high p/n ratio. Without a controlled evaluation environment, it is impossible to determine whether one approach genuinely outperforms another. tracebloc provides a standardized surface where these strategies are evaluated on their merits.
tracebloc provides secure access to clinical combination therapy data held at hospitals. Researchers interact through a controlled environment where they receive exploratory data analysis outputs to understand the external dataset and assess compatibility with their internal model’s feature set. They then submit model code that executes on the institution’s infrastructure. Raw patient data never leaves the hospital. Model weights are not extractable. Only aggregate performance metrics are returned.
Primary: MSE on the continuous treatment response label. For external validation, the critical comparison is internal vs. external MSE. Consistent performance across cohorts provides evidence that the combination response model generalizes.
Compute efficiency within the allocated budget. The small sample size (240) means most standard architectures will train quickly, but the high dimensionality (293 features) creates a regularization challenge. Researchers must balance model complexity against the risk of fitting noise.
Three questions define the value of this challenge. First: can multi omics data predict the magnitude of combination therapy benefit, not just binary response, but how much additional benefit the combination provides over monotherapy? Second: which omics layer carries the combination specific signal? If gene expression predicts monotherapy response but metabolomics predicts the additional combination benefit, this has direct implications for companion diagnostic design. Third: does the treatment variable interact with molecular features in a way that standard single arm models miss? Models that explicitly capture treatment by omics interactions will likely outperform those that treat the treatment arm as just another covariate.
To be completed after evaluation concludes.
To be completed after evaluation concludes.