
PD Proteomics: Validating Response Biomarkers in Pediatric IBD
Participants
8
End Date
01.04.26
Dataset
dj94p0ba
Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)
Compute
0 / 100.00 PF
Submits
0/5

8
01.04.26
On this page
Validating Early Anti TNF Response Prediction on an Independent Longitudinal Cohort
Pharma and biotech companies develop pharmacodynamic (PD) biomarker models from their own clinical trial data to predict treatment response from early proteomic signals. These models perform well internally, but without validation on an independent external cohort, there is no evidence they generalize beyond the original trial population, drug, and dosing regimen. Regulatory agencies and internal decision makers increasingly require this external evidence before a PD biomarker panel can advance to clinical use.
tracebloc provides secure access to an independent longitudinal PD proteomics dataset held at a clinical institution, enabling researchers to re run their internally trained models on external data without the data leaving the hospital. This allows two critical validation steps: confirming whether previously selected proteomic biomarkers remain informative in an independent cohort, and assessing whether model performance is consistent with what was observed on internal data.
To be completed after evaluation concludes.
SCIVIAS: Seeing Childhood Illness through Multi Omics
SCIVIAS is a monocentric observational study conducted at the Dr. von Hauner Children’s Hospital, LMU Munich, led by Prof. Dr. Dr. Christoph Klein. The study combines retinal imaging (fundus photography, OCT) with multi omics profiling (genome, transcriptome, proteome, metabolome) to identify early diagnostic markers for rare and chronic childhood diseases.
The core premise: children with rare diseases are often diagnosed only when complications arise. SCIVIAS aims to change this by integrating pattern recognition on retinal images with multi layer omics data, using machine learning to detect disease signatures before clinical manifestation. All omics data and retinal images are pseudonymized and processed through ML algorithms, comparing data both within defined disease groups and across phenotypes to uncover pleiotropic factors.
The cohort consists of 2500 patients and covers 13 therapeutic areas including IBD (Crohn’s, ulcerative colitis, celiac disease), cystic fibrosis, Duchenne muscular dystrophy, spinal muscular atrophy, and other rare pediatric conditions.
Ethics approval: LMU Munich, approval no. 17–801. German Clinical Trials Register: DRKS00013306.
Study page: https://www.ccrc-hauner.de/clinical-research/scivias-study
For this challenge, the proteomic and pharmacokinetic monitoring layer of the SCIVIAS IBD cohort provides the foundation. The dataset captures repeated proteomic measurements across five timepoints during the first two weeks of biologic induction therapy in pediatric IBD patients (Crohn’s disease and ulcerative colitis), combined with pharmacokinetic drug exposure parameters and clinical measurements. This longitudinal design captures the biological response to therapy as it unfolds, from hours after first infusion through early treatment stabilization, making it an ideal external validation cohort for PD biomarker models developed on internal anti TNF trial data.
Anti TNF biologics (infliximab, adalimumab) are the backbone of treatment for moderate to severe pediatric inflammatory bowel disease. But response rates in children are inconsistent: roughly 30 to 40% of pediatric IBD patients lose response to anti TNF therapy within the first year. Identifying responders early, ideally within the first two weeks of induction, would transform clinical practice: non responders could be switched to alternative biologics (vedolizumab, ustekinumab) before disease progression worsens, and trial designs could be enriched for likely responders to demonstrate efficacy more efficiently.
Pharma and biotech companies developing next generation biologics for pediatric IBD build pharmacodynamic biomarker models from their own trial data, selecting proteomic features measured at early timepoints (24 hours through Day 14 post infusion) that predict clinical response at week 12 or later. These models work well on internal data. But a model trained on one company’s Phase II cohort, with one anti TNF agent, one dosing schedule, and one patient mix (Crohn’s vs. ulcerative colitis ratio, disease severity distribution, prior treatment history), may not generalize. The proteomic features that predicted response internally could be drug specific rather than disease specific, or they could reflect the particular demographic and clinical profile of the original trial.
This challenge is designed for a specific scenario: a pharma company has already trained a PD biomarker model on its own longitudinal proteomic data from pediatric IBD patients receiving biologic therapy. The model uses protein levels measured at early timepoints after induction, combined with pharmacokinetic parameters and clinical variables, to classify patients as responders or non responders. The company now needs to answer two questions on an independent external dataset:
1. Feature relevance: Are the proteomic biomarkers that were selected as predictive on internal data still informative in this independent pediatric IBD cohort? If the internal model relied on specific inflammatory or immune pathway proteins at specific post infusion timepoints, do those same features carry signal in a different patient population with a potentially different mix of Crohn’s and ulcerative colitis, different disease severity, and different prior treatment exposure?
2. Performance consistency: Does the model achieve comparable classification performance (as measured by MSE) on external data? A significant performance drop indicates overfitting to the original trial population, drug specific pharmacodynamic effects that do not transfer across biologics, or IBD subtype specific dynamics that limit generalizability.
This is not exploratory biomarker discovery. It is a validation exercise where the model and feature set already exist, and the question is whether they hold up when applied to data the model has never seen, from a cohort it was never trained on.
Researchers work with a longitudinal PD proteomics dataset (800 observations, 183 features) from pediatric IBD patients receiving biologic induction therapy. Each observation represents one patient at one timepoint. The same patients appear at multiple timepoints (Baseline, 24h, 72h, Day 7, Day 14), creating a repeated measures structure across the first two weeks of treatment. The timepoints map to the standard anti TNF induction monitoring window: Baseline (pre infusion), acute response (24h, 72h), early maintenance (Day 7), and induction completion (Day 14).
The dataset contains three feature blocks: proteomic measurements capturing inflammatory and immune pathway protein levels, pharmacokinetic parameters reflecting drug exposure and clearance, and clinical variables. Feature names are anonymized. A timepoint variable identifies when each observation was collected, and a patient identifier links observations across timepoints.
The classification target is a binary label representing treatment response (responder vs. non responder). This label is constant across timepoints for each patient: the goal is to predict this outcome from proteomic data collected at various stages of early treatment.
Longitudinal PD proteomics data from pediatric IBD patients with treatment response labels is among the hardest clinical datasets to access externally. It combines detailed biologic drug exposure information, dense temporal sampling during induction, and patient level response outcomes, all of which are tightly governed. Pharma and biotech companies cannot share this data across organizations, and individual companies rarely have more than one or two pediatric IBD trials worth of longitudinal proteomic data to validate against. tracebloc resolves this: researchers submit their pre trained model, it runs on the external dataset inside the hospital’s infrastructure, and only aggregate validation metrics are returned. The data never moves.
Binary classification: predict treatment response (responder vs. non responder) from proteomic, pharmacokinetic, and clinical features measured during the first 14 days of therapy. The dataset has a repeated measures structure, with each patient observed at up to five timepoints. The task is framed as external validation: researchers bring a model developed on their own internal data and evaluate whether it generalizes to this independent cohort. Researchers may also train new models directly on this dataset to benchmark against their internal results.
Mean Squared Error (MSE). Lower is better. For binary classification, MSE is equivalent to the Brier score, which jointly evaluates discrimination (can the model separate responders from non responders?) and calibration (are the predicted probabilities accurate?). For external validation, the key comparison is between the MSE achieved on internal data and the MSE achieved here. Consistent performance indicates generalizability. A significant degradation points to overfitting or population specific effects.
183 features across 800 observations (repeated measures across approximately 160 unique patients at 5 timepoints).
| Feature Block | Count | Notes |
|---|---|---|
| Proteomic measurements | 150 | Continuous. Anonymized protein level readouts measured at each timepoint. |
| Pharmacokinetic parameters | 10 | Continuous. Drug exposure variables (concentration, clearance, distribution metrics). |
| Clinical measurements | 20 | Continuous. Clinical phenotype variables. |
Additionally: a timepoint variable (5 levels: Baseline, 24h, 72h, Day7, Day14), a patient identifier linking observations across timepoints, and a binary label (responder/non responder).
Approximately balanced between responders and non responders.
Each patient has up to five observations across the first two weeks of treatment. The timepoints are approximately balanced. This repeated measures structure is critical for validation: researchers whose internal models use specific timepoints (e.g., 72h proteomic snapshot) can test whether those same timepoint features predict response in the external cohort. Researchers whose models use temporal trajectories (protein change over time) can validate whether the dynamics they observed internally replicate here.
External validation of PD biomarker models requires a controlled environment where the same evaluation metric and the same data governance apply to every model. Without this, performance claims from internal validation are unverifiable. tracebloc provides the infrastructure: researchers submit models developed on their own data, those models execute on the external dataset inside the hospital, and MSE is computed under standardized conditions. This produces reproducible, auditable external validation evidence.
tracebloc provides secure access to clinical pharmacodynamic data held at hospitals. Researchers interact through a controlled environment where they receive exploratory data analysis outputs to understand the external dataset and assess compatibility with their internal model’s feature set. They then submit model code that executes on the institution’s infrastructure. Raw patient data never leaves the hospital. Model weights are not extractable. Only aggregate performance metrics are returned.
Primary: MSE on the binary response label. For external validation, the critical comparison is internal vs. external MSE. Consistent performance across cohorts is the evidence that the PD biomarker panel generalizes.
Compute efficiency within the allocated budget. Pre trained models being validated should require minimal compute. Researchers training new models from scratch on this dataset face the standard resource allocation trade off, particularly if using temporal or sequence based architectures.
Three validation questions define the scientific value of this challenge. First: do the proteomic features selected on internal anti TNF trial data remain predictive in this independent pediatric IBD cohort, or does the model need to be re fitted with different features? Second: does the temporal structure that was informative internally (e.g., 72h post infusion snapshot, or trajectory from Baseline to Day 7) replicate in a cohort with a potentially different mix of Crohn’s and ulcerative colitis? Third: does the model maintain discrimination at the earliest timepoints, where validated PD prediction would have the greatest clinical impact? If response can be predicted at 72 hours post infusion rather than Day 14, non responders could be identified before the second infusion, enabling earlier switch to alternative biologics and reducing exposure to ineffective therapy.
To be completed after evaluation concludes.
To be completed after evaluation concludes.