Proteomics-Based Validation, Anti-TNF Response in Paediatric IBD

Participants

End Date

20.08.27

Dataset

dj94p0ba

Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)

Compute

0 / 100.00 PF

Submits

0/5

On this page

Deploy workspace

Overview

About this use case: Eight European paediatric haematology centres hold the patient data needed to generate EU AI Act conformity evidence for a bleeding disorder stratification model — but organising cross-border data transfers at all eight sites would take 18 months that the regulatory submission window doesn't have. tracebloc lets every centre validate their model against the consortium cohort without a single patient record crossing institutional or national borders. Explore the data, submit your own model, and see how your approach compares.

Problem

Clinical decision support AI for paediatric patients requires something most AI deployments do not: documented proof of fairness across demographic and disease subgroups before the system touches a clinical workflow. Under the EU AI Act, systems used in therapeutic decision support are classified as high-risk — meaning multi-site validation across representative patient populations is a regulatory requirement, not a best practice. For bleeding disorder stratification AI, that means validation data from paediatric haematology centres across Europe. Organising that data through conventional means means 18 months of ethics amendments, data transfer agreements, and patient consent renegotiation at each participating site.

Solution

Dr. Elena Vasquez, Head of Clinical Data Science at a rare bleeding disorder consortium in Amsterdam, has developed a mutation status stratification model intended for clinical decision support across European paediatric haematology centres. She deploys a tracebloc workspace loaded with 480 anonymised patient records representing the multi-omics profile of the consortium's patient population. Partner institutions — paediatric haematology centres across Europe — submit their local model versions to the workspace for cross-site validation. Inside tracebloc's containerised training environment, each centre's model trains on the consortium cohort — fine-tuning to the specific mutation marker distributions, gene expression patterns, and clinical covariate relationships in the combined dataset — without any patient record leaving its originating institution. tracebloc handles orchestration, scores each adapted model against the holdout cohort, and publishes results to a live leaderboard automatically. This is a federated learning application of regulatory compliance validation: multi-site evidence generated without multi-site data centralisation.

Outcome

In this example evaluation, the consortium model held performance across paediatric subgroups after multi-site fine-tuning — generating the cross-site validation evidence required for EU AI Act high-risk system documentation. The leaderboard records per-site performance and subgroup results. The workspace stays in place as a standing validation infrastructure, allowing model updates to be re-evaluated against the same holdout cohort without restarting the ethics approval cycle.

The Operational Challenge

Elena's consortium has developed a multi-omics stratification model for paediatric haemophilia and related bleeding disorders — 282 features spanning mutation markers, gene expression profiles, protein levels, and clinical variables — targeting binary classification of mutation status that determines treatment pathway. The model performs well on the consortium's internal patient population, 480 patients across three centres. The regulatory question is whether it performs consistently across the full range of European paediatric subgroups: different age distributions, different ethnic backgrounds, different co-morbidity profiles across eight participating centres.

This is the EU AI Act challenge in practice. High-risk AI systems used in therapeutic decision support require prospective bias testing and fairness documentation across the populations they will serve. The guidance is explicit: validation on a single-centre training cohort is not sufficient. The system must demonstrate consistent performance across the paediatric subgroups present in the deployment population.

The clinical stakes compound the regulatory requirement. Bleeding disorder stratification determines treatment intensity — whether a patient receives prophylactic factor replacement therapy, what dosing protocol is initiated, and how frequently they are monitored. A model that performs well on one ethnic subgroup and poorly on another does not make different predictions on different charts: it makes different clinical outcomes across different patient populations. That is not an acceptable deployment profile under any regulatory framework.

Elena's problem is the data access problem at scale. Eight European paediatric haematology centres have agreed in principle to participate in the multi-site validation. Each centre has a patient cohort of 60–120 children. None of those centres can transfer patient data to Amsterdam. Each transfer would require a new data sharing agreement under GDPR, a local ethics committee amendment, and potentially re-consent of the patients or their guardians — because the original consent covered clinical care and research within the institution, not data sharing with a consortium coordinating centre in another country.

The conventional route — negotiate eight separate data sharing agreements with eight ethics committees across five countries — takes 18 months at minimum. Elena's regulatory submission window is six months.

She needs a mechanism where all eight centres validate their local model versions against a shared cohort — and contribute their local model updates to improve the consortium model — without a single patient record crossing institutional or national borders.

Stakeholders

Dr. Elena Vasquez, Head of Clinical Data Science: Owns the stratification model and the regulatory submission dossier. KPIs: cross-site AUC, subgroup fairness metrics (paediatric age bands, demographic subgroups), EU AI Act compliance documentation
Chief Medical Officer: Responsible for the clinical safety of deploying a high-risk AI system across paediatric haematology workflows at eight European centres
Regulatory Affairs Lead: Must document multi-site validation evidence meeting EU AI Act Article 10 data requirements and Article 13 transparency obligations for the conformity assessment
Local Ethics Committees (8 centres): Each centre's ethics board has approved internal research use only; any data transfer requires a new application that resets the timeline
Data Protection Officers (per centre): GDPR Article 9 applies to health data; cross-border transfer to a consortium coordinating centre requires binding corporate rules or standard contractual clauses — a months-long legal process at each site
Paediatric Haematology Clinicians: The end users of the stratification tool; they need confidence that the system performs equitably across the patient populations they treat

The Underlying Dataset

The evaluation dataset contains 480 anonymised paediatric patient records with multi-omics characterisation for bleeding disorder stratification. Full dataset statistics, feature distributions, and target class analysis are available in the Exploratory Data Analysis tab.

This dataset is augmented. It was constructed to reflect the statistical structure of real-world paediatric haematology multi-omics cohorts — the mutation marker prevalence, gene expression distributions, protein level profiles, and clinical variable patterns — without containing any identifiable patient information.

Property	Value
Total records	480
Training cohort	384 records
Holdout cohort	96 records
Features	282
Target	Binary mutation status (mutation_0)
Missing values	None
Highly correlated feature pairs	0
Class imbalance ratio	1.1× (near-balanced)
Evaluation metric	AUC-ROC and subgroup recall

Mutation status class distribution (full dataset):

Class	Status	Patients	Share
1	Mutation present	255	53.1%
0	Mutation absent	225	46.9%

A note on the features: The 282 features span four domains. Mutation markers (mutation_0 through mutation_N) are binary variables encoding the presence or absence of specific genetic variants — these are the features most directly relevant to the classification target. Gene expression features (gene_0 through gene_N) capture normalised expression values for genes in the relevant haematological pathways. Protein level features (protein_0 through protein_N) reflect circulating protein concentrations from standard haematology panels. Clinical variables (clinical_0 through clinical_16) capture patient demographics, disease history, treatment exposure, and functional assessments. No features require imputation — the dataset is complete with no highly correlated feature pairs, suggesting the multi-omics domains contribute non-redundant signal. The near-balanced class distribution (53.1% / 46.9%) makes accuracy a useful metric alongside AUC, but per-subgroup recall is the regulatory fairness requirement.

How Evaluation Works

Each contributing centre submitted their locally trained stratification model to the tracebloc workspace. The evaluation ran in two phases.

Phase 1 — Out-of-the-box performance. Each centre's model was scored as submitted, with no adaptation to the consortium cohort. This establishes the generalisation gap: how well a model trained on one centre's 60–120 patients performs on the consortium's 480-patient combined population — the cross-site performance that determines regulatory compliance.

Phase 2 — Fine-tuning. Contributing centres were given access to the training environment inside the tracebloc workspace. Each centre transferred their model into tracebloc and ran training on the 384-patient consortium cohort. This process fine-tuned the model weights to the broader patient population — adapting from a single-centre model calibrated to local demographic and clinical characteristics to one that generalises across the consortium's paediatric subgroups. After training, the adapted model was evaluated automatically against the 96-patient holdout cohort, with results broken down by paediatric age band and demographic subgroup. No centre had visibility into another centre's results before the leaderboard published.

Each contributor received:

Training access: 384 anonymised patient records (282 multi-omics features, binary mutation status at realistic distribution) for model fine-tuning inside the workspace
Evaluation environment: Sandboxed execution — adapted models run against the holdout cohort, no patient data export path available
Metrics tracked: AUC-ROC (overall), recall by paediatric age subgroup, specificity, and calibration of mutation probability estimates
Regulatory constraint: Per-subgroup recall must meet the fairness threshold defined in the EU AI Act conformity assessment plan — models with subgroup performance variance exceeding the defined threshold are flagged regardless of overall AUC

Results

→ View the full model leaderboard — complete contributor rankings, per-subgroup fairness metrics, and AUC results across all submissions.

Contributor	Out-of-the-Box AUC	After Fine-tuning AUC	Subgroup Recall Variance	Regulatory Pass
Centre A	0.78	0.84	0.09	Conditional
Centre B ✅	0.81	0.91	0.04	Yes
Centre C ⚠️	0.76	0.87	0.14	No

What the numbers reveal:

Centre B achieved the strongest overall AUC at 0.91 after fine-tuning on the consortium cohort, and — critically — the lowest subgroup recall variance at 0.04. That variance figure is the regulatory result: the model performs consistently across paediatric age bands, with no subgroup experiencing recall more than four percentage points below the overall rate. This is the multi-site fairness evidence the EU AI Act conformity assessment requires.

Centre C reached 0.87 after fine-tuning — a strong performance on the headline metric — but its subgroup recall variance of 0.14 means one or more paediatric subgroups are being significantly under-served relative to the overall population. Under the EU AI Act's high-risk system requirements, that variance fails the conformity threshold regardless of the AUC. Centre C cannot be submitted for regulatory approval without further bias mitigation work.

Centre A's result of 0.84 with variance of 0.09 is a conditional pass — acceptable on overall performance, but the subgroup variance requires additional documentation and a bias mitigation plan before the conformity assessment can be finalised. It represents the middle ground: a model that works but requires further work to demonstrate equitable performance across the full paediatric population.

Business Impact

Illustrative assumptions: 8 European paediatric haematology centres / Regulatory submission window: 6 months / Conventional multi-site data centralisation timeline: 18 months (ethics × 8 + data transfer agreements × 8) / Estimated internal cost of one 12-month delay to market: €4–8M / EU AI Act non-compliance penalty exposure for high-risk system deployment: up to 3% of global annual revenue

Approach	Time to Multi-site Evidence	Regulatory Risk	Patient Data Centralised	Estimated Cost
Conventional data sharing	18+ months	High — timelines miss submission window	Yes	High governance overhead × 8 sites
Single-centre validation only	< 1 month	Very high — EU AI Act non-compliance	No	Low short-term, high long-term
tracebloc workspace ✅	Weeks	Low — documented multi-site evidence	No	Single workspace deployment

The value of this evaluation is not found in AUC points — it is found in the regulatory timeline. tracebloc compresses 18 months of multi-site ethics and data governance into a workspace deployment cycle, generating the cross-site validation evidence required for EU AI Act conformity assessment without centralising a single patient record. The submission window is met. The compliance evidence is documented. The patient data stays at each institution.

Decision

Elena submits Centre B's consortium-fine-tuned model for regulatory conformity assessment, with the tracebloc workspace evaluation constituting the multi-site clinical validation evidence required under EU AI Act Article 10. The leaderboard output — AUC by centre, recall by paediatric subgroup, subgroup variance analysis — forms the bias testing documentation appended to the conformity assessment dossier.

Centre A is retained in the workspace for continued bias mitigation work: the same holdout cohort can be used to evaluate revised model versions as the bias mitigation steps are applied, without restarting the multi-site evaluation infrastructure. Centre C's result informs the consortium's position on subgroup-specific data collection priorities for the next cohort expansion.

The tracebloc workspace stays active after the regulatory submission. As the consortium enrols new patients — expanding the cohort and improving subgroup representation — model versions can be re-evaluated on the updated holdout without rebuilding the evaluation pipeline. New centres joining the consortium can validate their local models on the same terms, contributing to the ongoing EU AI Act post-market monitoring requirement for high-risk systems. The leaderboard becomes the living compliance record: which model versions meet the subgroup fairness threshold, which require further work, and how performance evolves as the consortium's data grows.

Explore this use case further:

View the model leaderboard — full contributor rankings, subgroup fairness metrics, AUC results
Explore the dataset — mutation marker distribution, multi-omics feature profiles, class balance analysis
Start training — submit your own stratification model to this evaluation

Related use cases: See how the same regulatory compliance approach applies to omics biomarker panel narrowing across rare disease cohorts and heart disease prediction across clinical sites. For a broader view of what federated learning applications look like in EU AI Act compliance and precision medicine, see our guide.

Deploy your workspace or schedule a call.

Disclaimer

Disclaimer: The dataset used in this use case is augmented — constructed to reflect the statistical structure of real-world paediatric haematology multi-omics cohorts, including mutation marker prevalence, gene expression distributions, protein level profiles, and clinical variable patterns, without containing any identifiable patient information. The persona, contributing centres, performance figures, regulatory scenario, and business impact assumptions are illustrative and based on patterns observed across rare disease research and clinical AI regulatory environments. They do not represent any specific institution, regulatory submission, or EU AI Act conformity assessment.