FL Applications
FL Use Cases
Start Training
Metadatasets
FL Clients
Docs
Login Icon
Website
Guest user
Signup
cover

Heart Disease Risk Prediction Using 13 Clinical Features

Participants

5

End Date

17.06.27

Dataset
d0y7szjz
Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)
Compute
0 / 300.00 PF
Submits
0/5

On this page

Book a live demo

Overview

About this use case: A hospital's cardiac risk model has been stuck at 0.82 AUC for six months — not because the data is wrong, but because the cardiologists and ML specialists who understand the clinical feature interactions are at partner hospitals that cannot access the cohort. tracebloc brings their expertise to the 1,888 patient records without moving a single record out of the hospital. Explore the data, submit your own model, and see how your approach compares.

Problem

The clinical analytics team at a university hospital has been working on cardiac AI for heart disease risk stratification for over a year. Their best model sits at 0.82 AUC. For six months it has not moved. The cardiologists who understand the clinical nuance — which combinations of ECG findings, exercise response, and chest pain type actually signal elevated risk — are at partner hospitals in other cities. The ML specialists who could try new cardiovascular risk assessment architectures are at academic research groups. None of them can access the patient cohort.

Solution

Dr. Jonas Meier, Head of Cardiology Analytics, deploys a tracebloc workspace loaded with 1,888 anonymised patient records — each with 13 clinical features and a binary cardiac risk label. Cardiologists and ML specialists from partner hospitals and research groups submit their models to the workspace. Inside tracebloc's containerised training environment, each model trains on the anonymised patient dataset — fine-tuning its weights to the specific clinical feature interactions, ECG patterns, and cardiovascular risk signals in this cohort — without any patient record leaving the hospital's infrastructure. This is a federated learning application of expert collaboration: patient data stays on Jonas's infrastructure, and external expertise comes to it. tracebloc orchestrates evaluation, scores each model on the holdout set, and publishes results to a live leaderboard.

Outcome

In this example, the top contributor broke through 0.82 AUC — reaching 0.91 — by combining domain-informed feature interactions with an architecture specifically suited to mixed tabular clinical data. The internal team's approach, which had hit its ceiling, finished third. The workspace stays active for ongoing collaboration as the patient cohort grows and new contributors join. See the live leaderboard for current rankings.

The Operational Challenge

Jonas's team manages clinical decision support for the cardiology department. Chest pain and suspected cardiac events are among the highest-volume referral categories in the hospital. The team built their cardiac risk stratification model on 1,888 patient records covering the full spectrum of presenting features — from typical angina in older male patients to atypical symptoms in younger women with normal resting ECGs.

The model's performance at 0.82 AUC is clinically meaningful but not good enough for the department's decision support ambitions. The cardiologists' standard is: if the model cannot reach at least 0.88 AUC with adequate sensitivity on high-risk patients, it stays out of the workflow. That threshold has been in place for six months, and the internal team has exhausted the obvious paths: different gradient boosted tree configurations, neural network variants, feature engineering attempts, threshold tuning. Each iteration produces marginal gains that don't accumulate.

The problem is that the features driving cardiac risk prediction are not uniformly informative. Chest pain type (cp), ST depression (oldpeak), the slope of the ST segment, maximum heart rate achieved (thalachh), exercise-induced angina, and the number of major vessels coloured by fluoroscopy — these features interact in ways that require clinical domain expertise to model correctly. The team's ML specialists are strong on the modelling side. They are not cardiologists, and the cardiologists in their own department do not have the time or ML background to translate clinical intuition into feature engineering.

The cardiologists and ML researchers who could contribute are at partner hospitals and academic groups. The data sharing constraint is straightforward: the hospital's ethics committee approval for the patient dataset does not permit external transfer. There is no legal mechanism to share 1,888 patient records across institutional boundaries without a separate application process taking six to twelve months.

Jonas needs a way to bring expert approaches to the data — not the data to the experts.

Stakeholders

  • Dr. Jonas Meier, Head of Cardiology Analytics: Owns model performance, clinical decision support integration, and the 0.88 AUC threshold required for workflow deployment. KPIs: AUC, sensitivity on high-risk patients, specificity, ECG AI analysis accuracy, explainability for cardiologist trust
  • Chief Medical Officer: Patient safety authority — clinical decision support that misses high-risk patients has direct patient harm implications; false positive rate also matters for unnecessary downstream testing
  • Head of Clinical AI: Responsible for model governance, audit trail, and alignment with the hospital's AI ethics framework; every model submitted must be logged and traceable
  • Data Protection Officer: Ethics committee approval is dataset-specific; any external access to patient records requires a new application that the team cannot afford to wait for
  • Cardiology Department Head: The model must earn clinical trust — cardiologists need to understand why it flags a patient as high-risk, or they will not use the output in practice

The Underlying Dataset

The training dataset contains 1,888 anonymised patient records covering a binary cardiac risk classification task. Full dataset statistics, feature distributions, and correlation analysis are available in the Exploratory Data Analysis tab.

This dataset is augmented. It was constructed to reflect the statistical structure of real-world cardiovascular risk assessment data — the feature distributions, clinical correlations, and class balance observed in hospital cardiology cohorts — without containing any identifiable patient information.

PropertyValue
Total records1,888
Features13 clinical features + 1 binary target
Target: less chance of heart attack (0)911 patients (48.25%)
Target: more chance of heart attack (1)977 patients (51.75%)
Class balanceApproximately balanced
Missing valuesNone

Clinical features:

FeatureTypeClinical significance
AgeNumericalBaseline cardiovascular risk factor
SexCategoricalRisk distribution differs significantly by sex
Chest pain type (cp)Categorical (4)Strongest single predictor — typical angina to asymptomatic
Resting blood pressureNumericalSystolic BP at rest
CholesterolNumericalSerum cholesterol level
Fasting blood sugarBinary>120 mg/dl flag
Resting ECG resultCategorical (3)Normal / ST-T wave abnormality / left ventricular hypertrophy
Max heart rate achieved (thalachh)NumericalExercise stress test response
Exercise-induced anginaBinaryAngina during stress test — strong risk signal
ST depression (oldpeak)NumericalST depression induced by exercise relative to rest
ST segment slopeCategorical (3)Upsloping / flat / downsloping
Major vessels (ca)Numerical (0–3)Vessels coloured by fluoroscopy
Thalassemia typeCategoricalNormal / fixed defect / reversible defect

The dataset is near-perfectly balanced across the binary target (48.25% / 51.75%). The strongest predictors confirmed by correlation analysis are chest pain type, exercise-induced angina, ST depression, ST slope, major vessel count, and maximum heart rate — which is why domain knowledge in ECG AI analysis and stress test interpretation matters for model performance on this cohort.

How Evaluation Works

Each contributor submitted their model to the tracebloc workspace. The evaluation ran in two phases.

Phase 1 — Out-of-the-box performance. Each model was benchmarked as submitted, with no adaptation to the hospital's patient cohort. This establishes the true baseline: what the approach delivers on this specific clinical feature distribution before any fine-tuning.

Phase 2 — Fine-tuning. Contributors were given access to the training environment inside the tracebloc workspace. Each contributor transferred their model into tracebloc and ran training on the patient dataset. This training process fine-tuned the model weights to the specific clinical feature interactions, ECG signal patterns, and cardiovascular risk correlations in this cohort — adapting from a generalised classification approach to a system calibrated for this patient population. After training, the adapted model was evaluated automatically against the holdout set. Patient records never left the hospital's infrastructure. Contributors received only their own results back; no contributor had visibility into another's approaches or scores before the leaderboard published.

Each contributor received:

  • Training access: 1,888 anonymised patient records (all 13 clinical features, balanced binary target) for model fine-tuning inside the workspace
  • Evaluation environment: Sandboxed execution — adapted models evaluated against the holdout set, no patient data export path available
  • Metrics tracked: AUC (ROC), sensitivity on high-risk patients (Class 1 recall), specificity, overall accuracy, SHAP feature importance for clinical explainability
  • Key constraint: Sensitivity on high-risk patients weighted in final model selection — a missed high-risk patient in a decision support workflow carries direct clinical consequences; the 0.88 AUC threshold is the minimum bar for workflow integration

Results

→ View the full model leaderboard — complete contributor rankings, AUC curves, sensitivity/specificity breakdown, and feature importance across all submissions.

ContributorApproachOut-of-the-Box AUCAfter Fine-tuning AUCSensitivitySpecificity
Internal baselineLightGBM + feature engineering0.820.8283%81%
Contributor A (Partner Hospital)XGBoost + clinical feature interactions0.790.8785%88%
Contributor B ✅ (Research Group)Gradient Boosted Trees + domain-informed engineering0.810.9189%92%
Contributor C (Academic ML Lab)Neural network ensemble0.760.8582%87%

What the numbers reveal:

Contributor B breaks through the 0.82 AUC ceiling that had stalled the internal team for six months — reaching 0.91 after fine-tuning on 1,888 patient records inside the tracebloc workspace. The approach combines gradient boosted trees with domain-informed feature engineering that captures the interaction between chest pain type, ST segment behaviour, and exercise response in a way the internal team had not modelled explicitly. Sensitivity reaches 89% with specificity at 92% — both above the clinical thresholds required for workflow integration.

Contributor A, a partner hospital's analytics team, brings an XGBoost approach with clinical feature interactions that reaches 0.87 AUC — already above the 0.88 threshold for consideration, but trailing Contributor B on both sensitivity and specificity.

The internal baseline finishes third in AUC but second in interpretability: the clinical team trusts the feature importance outputs because they have been iterating on this approach for a year. Contributor B's SHAP outputs show similar feature importance rankings — chest pain type, ST depression, and vessel count as dominant predictors — which helps cardiologists trust that the model is reasoning in familiar clinical terms.

Business Impact

Illustrative assumptions: 12,000 patients assessed for cardiac risk per year / high-risk prevalence: 8% (960 high-risk patients) / cost per missed high-risk patient (delayed treatment, adverse cardiac event risk): €8,500 / cost per unnecessary further investigation (false positive): €400

StrategyAUCSensitivityMissed High-RiskMiss CostFalse PositivesFP CostModel Cost (p.a.)Total Annual Cost
Internal baseline0.8283%163€1,385,500~192€76,800—€1,462,300
Contributor A0.8785%144€1,224,000~168€67,200€120,000€1,411,200
Contributor B ✅0.9189%106€901,000~115€46,000€180,000€1,127,000
Contributor C0.8582%173€1,470,500~196€78,400€100,000€1,648,900

Contributor B reduces total annual cost from €1,462,300 (internal baseline) to €1,127,000 — a saving of over €335,000 per year — while clearing the clinical threshold for workflow deployment that the internal model has not reached in six months.

Decision

Jonas's team adopts Contributor B's approach for integration into clinical decision support, running in parallel with the cardiology department's existing risk scoring workflow across 20% of new referrals. The shadow period validates that 0.91 AUC and 89% sensitivity hold on the full referral stream — including patient presentations outside the training cohort distribution — and that the SHAP feature importance outputs meet the explainability standard required for cardiologist adoption.

The tracebloc workspace stays active after the initial evaluation. As the patient cohort grows and new ML architectures appear in the research literature, the workspace provides a controlled environment to benchmark new approaches against the established baseline — without rebuilding the evaluation pipeline or re-opening data access discussions with the ethics committee. The leaderboard becomes a live record of which approaches are advancing cardiac risk stratification performance on this cohort.

Explore this use case further:

  • View the model leaderboard — full contributor rankings, AUC curves, sensitivity/specificity breakdown
  • Explore the dataset — clinical feature distributions, correlation analysis, class balance
  • Start training — submit your own cardiac risk model to this evaluation

Related use cases: See how the same expert collaboration approach applies to retinal disease classification across clinical sites and predictive patient stratification in rare bleeding disorders. For a broader view of what federated learning applications look like across clinical AI, see our federated learning applications guide.

Deploy your workspace or schedule a call.

Disclaimer

Disclaimer: The dataset used in this use case is augmented — designed to closely reflect the statistical structure of real-world cardiovascular risk assessment data, including clinical feature distributions, correlations, and class balance observed in hospital cardiology cohorts, without containing any identifiable patient information. The persona, contributor names, performance figures, business impact assumptions, and clinical scenario are illustrative and based on patterns observed across hospital analytics environments. They do not represent any specific organisation, clinical product, or patient population.