
Proteomics-Based Validation, Anti-TNF Response in Paediatric IBD
Participants
8
End Date
20.08.27
Dataset
dj94p0ba
Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)
Compute
0 / 100.00 PF
Submits
0/5

8
20.08.27
On this page
About this use case: Eight European paediatric haematology centres hold the patient data needed to generate EU AI Act conformity evidence for a bleeding disorder stratification model — but organising cross-border data transfers at all eight sites would take 18 months that the regulatory submission window doesn't have. tracebloc lets every centre validate their model against the consortium cohort without a single patient record crossing institutional or national borders. Explore the data, submit your own model, and see how your approach compares.
Clinical decision support AI for paediatric patients requires something most AI deployments do not: documented proof of fairness across demographic and disease subgroups before the system touches a clinical workflow. Under the EU AI Act, systems used in therapeutic decision support are classified as high-risk — meaning multi-site validation across representative patient populations is a regulatory requirement, not a best practice. For bleeding disorder stratification AI, that means validation data from paediatric haematology centres across Europe. Organising that data through conventional means means 18 months of ethics amendments, data transfer agreements, and patient consent renegotiation at each participating site.
Dr. Elena Vasquez, Head of Clinical Data Science at a rare bleeding disorder consortium in Amsterdam, has developed a mutation status stratification model intended for clinical decision support across European paediatric haematology centres. She deploys a tracebloc workspace loaded with 480 anonymised patient records representing the multi-omics profile of the consortium's patient population. Partner institutions — paediatric haematology centres across Europe — submit their local model versions to the workspace for cross-site validation. Inside tracebloc's containerised training environment, each centre's model trains on the consortium cohort — fine-tuning to the specific mutation marker distributions, gene expression patterns, and clinical covariate relationships in the combined dataset — without any patient record leaving its originating institution. tracebloc handles orchestration, scores each adapted model against the holdout cohort, and publishes results to a live leaderboard automatically. This is a federated learning application of regulatory compliance validation: multi-site evidence generated without multi-site data centralisation.
In this example evaluation, the consortium model held performance across paediatric subgroups after multi-site fine-tuning — generating the cross-site validation evidence required for EU AI Act high-risk system documentation. The leaderboard records per-site performance and subgroup results. The workspace stays in place as a standing validation infrastructure, allowing model updates to be re-evaluated against the same holdout cohort without restarting the ethics approval cycle.
Elena's consortium has developed a multi-omics stratification model for paediatric haemophilia and related bleeding disorders — 282 features spanning mutation markers, gene expression profiles, protein levels, and clinical variables — targeting binary classification of mutation status that determines treatment pathway. The model performs well on the consortium's internal patient population, 480 patients across three centres. The regulatory question is whether it performs consistently across the full range of European paediatric subgroups: different age distributions, different ethnic backgrounds, different co-morbidity profiles across eight participating centres.
This is the EU AI Act challenge in practice. High-risk AI systems used in therapeutic decision support require prospective bias testing and fairness documentation across the populations they will serve. The guidance is explicit: validation on a single-centre training cohort is not sufficient. The system must demonstrate consistent performance across the paediatric subgroups present in the deployment population.
The clinical stakes compound the regulatory requirement. Bleeding disorder stratification determines treatment intensity — whether a patient receives prophylactic factor replacement therapy, what dosing protocol is initiated, and how frequently they are monitored. A model that performs well on one ethnic subgroup and poorly on another does not make different predictions on different charts: it makes different clinical outcomes across different patient populations. That is not an acceptable deployment profile under any regulatory framework.
Elena's problem is the data access problem at scale. Eight European paediatric haematology centres have agreed in principle to participate in the multi-site validation. Each centre has a patient cohort of 60–120 children. None of those centres can transfer patient data to Amsterdam. Each transfer would require a new data sharing agreement under GDPR, a local ethics committee amendment, and potentially re-consent of the patients or their guardians — because the original consent covered clinical care and research within the institution, not data sharing with a consortium coordinating centre in another country.
The conventional route — negotiate eight separate data sharing agreements with eight ethics committees across five countries — takes 18 months at minimum. Elena's regulatory submission window is six months.
She needs a mechanism where all eight centres validate their local model versions against a shared cohort — and contribute their local model updates to improve the consortium model — without a single patient record crossing institutional or national borders.
The evaluation dataset contains 480 anonymised paediatric patient records with multi-omics characterisation for bleeding disorder stratification. Full dataset statistics, feature distributions, and target class analysis are available in the Exploratory Data Analysis tab.
This dataset is augmented. It was constructed to reflect the statistical structure of real-world paediatric haematology multi-omics cohorts — the mutation marker prevalence, gene expression distributions, protein level profiles, and clinical variable patterns — without containing any identifiable patient information.
| Property | Value |
|---|---|
| Total records | 480 |
| Training cohort | 384 records |
| Holdout cohort | 96 records |
| Features | 282 |
| Target | Binary mutation status (mutation_0) |
| Missing values | None |
| Highly correlated feature pairs | 0 |
| Class imbalance ratio | 1.1× (near-balanced) |
| Evaluation metric | AUC-ROC and subgroup recall |
Mutation status class distribution (full dataset):
| Class | Status | Patients | Share |
|---|---|---|---|
| 1 | Mutation present | 255 | 53.1% |
| 0 | Mutation absent | 225 | 46.9% |
A note on the features: The 282 features span four domains. Mutation markers (mutation_0 through mutation_N) are binary variables encoding the presence or absence of specific genetic variants — these are the features most directly relevant to the classification target. Gene expression features (gene_0 through gene_N) capture normalised expression values for genes in the relevant haematological pathways. Protein level features (protein_0 through protein_N) reflect circulating protein concentrations from standard haematology panels. Clinical variables (clinical_0 through clinical_16) capture patient demographics, disease history, treatment exposure, and functional assessments. No features require imputation — the dataset is complete with no highly correlated feature pairs, suggesting the multi-omics domains contribute non-redundant signal. The near-balanced class distribution (53.1% / 46.9%) makes accuracy a useful metric alongside AUC, but per-subgroup recall is the regulatory fairness requirement.
Each contributing centre submitted their locally trained stratification model to the tracebloc workspace. The evaluation ran in two phases.
Phase 1 — Out-of-the-box performance. Each centre's model was scored as submitted, with no adaptation to the consortium cohort. This establishes the generalisation gap: how well a model trained on one centre's 60–120 patients performs on the consortium's 480-patient combined population — the cross-site performance that determines regulatory compliance.
Phase 2 — Fine-tuning. Contributing centres were given access to the training environment inside the tracebloc workspace. Each centre transferred their model into tracebloc and ran training on the 384-patient consortium cohort. This process fine-tuned the model weights to the broader patient population — adapting from a single-centre model calibrated to local demographic and clinical characteristics to one that generalises across the consortium's paediatric subgroups. After training, the adapted model was evaluated automatically against the 96-patient holdout cohort, with results broken down by paediatric age band and demographic subgroup. No centre had visibility into another centre's results before the leaderboard published.
→ View the full model leaderboard — complete contributor rankings, per-subgroup fairness metrics, and AUC results across all submissions.
| Contributor | Out-of-the-Box AUC | After Fine-tuning AUC | Subgroup Recall Variance | Regulatory Pass |
|---|---|---|---|---|
| Centre A | 0.78 | 0.84 | 0.09 | Conditional |
| Centre B ✅ | 0.81 | 0.91 | 0.04 | Yes |
| Centre C ⚠️ | 0.76 | 0.87 | 0.14 | No |
What the numbers reveal:
Centre B achieved the strongest overall AUC at 0.91 after fine-tuning on the consortium cohort, and — critically — the lowest subgroup recall variance at 0.04. That variance figure is the regulatory result: the model performs consistently across paediatric age bands, with no subgroup experiencing recall more than four percentage points below the overall rate. This is the multi-site fairness evidence the EU AI Act conformity assessment requires.
Centre C reached 0.87 after fine-tuning — a strong performance on the headline metric — but its subgroup recall variance of 0.14 means one or more paediatric subgroups are being significantly under-served relative to the overall population. Under the EU AI Act's high-risk system requirements, that variance fails the conformity threshold regardless of the AUC. Centre C cannot be submitted for regulatory approval without further bias mitigation work.
Centre A's result of 0.84 with variance of 0.09 is a conditional pass — acceptable on overall performance, but the subgroup variance requires additional documentation and a bias mitigation plan before the conformity assessment can be finalised. It represents the middle ground: a model that works but requires further work to demonstrate equitable performance across the full paediatric population.
Illustrative assumptions: 8 European paediatric haematology centres / Regulatory submission window: 6 months / Conventional multi-site data centralisation timeline: 18 months (ethics × 8 + data transfer agreements × 8) / Estimated internal cost of one 12-month delay to market: €4–8M / EU AI Act non-compliance penalty exposure for high-risk system deployment: up to 3% of global annual revenue
| Approach | Time to Multi-site Evidence | Regulatory Risk | Patient Data Centralised | Estimated Cost |
|---|---|---|---|---|
| Conventional data sharing | 18+ months | High — timelines miss submission window | Yes | High governance overhead × 8 sites |
| Single-centre validation only | < 1 month | Very high — EU AI Act non-compliance | No | Low short-term, high long-term |
| tracebloc workspace ✅ | Weeks | Low — documented multi-site evidence | No | Single workspace deployment |
The value of this evaluation is not found in AUC points — it is found in the regulatory timeline. tracebloc compresses 18 months of multi-site ethics and data governance into a workspace deployment cycle, generating the cross-site validation evidence required for EU AI Act conformity assessment without centralising a single patient record. The submission window is met. The compliance evidence is documented. The patient data stays at each institution.
Elena submits Centre B's consortium-fine-tuned model for regulatory conformity assessment, with the tracebloc workspace evaluation constituting the multi-site clinical validation evidence required under EU AI Act Article 10. The leaderboard output — AUC by centre, recall by paediatric subgroup, subgroup variance analysis — forms the bias testing documentation appended to the conformity assessment dossier.
Centre A is retained in the workspace for continued bias mitigation work: the same holdout cohort can be used to evaluate revised model versions as the bias mitigation steps are applied, without restarting the multi-site evaluation infrastructure. Centre C's result informs the consortium's position on subgroup-specific data collection priorities for the next cohort expansion.
The tracebloc workspace stays active after the regulatory submission. As the consortium enrols new patients — expanding the cohort and improving subgroup representation — model versions can be re-evaluated on the updated holdout without rebuilding the evaluation pipeline. New centres joining the consortium can validate their local models on the same terms, contributing to the ongoing EU AI Act post-market monitoring requirement for high-risk systems. The leaderboard becomes the living compliance record: which model versions meet the subgroup fairness threshold, which require further work, and how performance evolves as the consortium's data grows.
Explore this use case further:
Related use cases: See how the same regulatory compliance approach applies to omics biomarker panel narrowing across rare disease cohorts and heart disease prediction across clinical sites. For a broader view of what federated learning applications look like in EU AI Act compliance and precision medicine, see our guide.
Deploy your workspace or schedule a call.
Disclaimer: The dataset used in this use case is augmented — constructed to reflect the statistical structure of real-world paediatric haematology multi-omics cohorts, including mutation marker prevalence, gene expression distributions, protein level profiles, and clinical variable patterns, without containing any identifiable patient information. The persona, contributing centres, performance figures, regulatory scenario, and business impact assumptions are illustrative and based on patterns observed across rare disease research and clinical AI regulatory environments. They do not represent any specific institution, regulatory submission, or EU AI Act conformity assessment.