Neuromuscular Disease Prognosis via Transcriptomics & Gene Expression

Participants

End Date

13.10.27

Dataset

dg4lctwg

Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)

Compute

0 / 100.00 PF

Submits

0/5

On this page

Deploy workspace

Overview

About this use case: A neurological disease foundation has 640 neuromuscular patients with rich transcriptomics data and a progression model stuck at macro-F1 0.73 for eight months — while the specialists who could push it further sit at academic centres that cannot access the cohort. tracebloc lets those external bioinformatics groups train on the patient data directly, without downloading a single record. Explore the data, submit your own model, and see how your approach compares.

Problem

Gene expression analysis has become the workhorse of neuromuscular disease prognosis research — and yet most translational teams hit the same ceiling. Internal bioinformatics resources can process the data, fit the standard models, and reach a performance plateau they cannot move. The transcriptomics specialists who could push further — experts in disease signature scoring, pathway-level feature engineering, and neuromuscular gene network topology — are at academic research centres. And they cannot access the patient cohort, because the cohort is governed by ethics approvals and data sharing restrictions that do not extend to external collaborators.

Solution

Professor Amara Osei's translational research team at a neurological disease foundation holds a cohort of 640 neuromuscular disease patients with longitudinal gene expression data. Her internal bioinformatics team has plateaued at macro-F1 0.73 on progression speed classification. She deploys a tracebloc workspace loaded with the full dataset. External transcriptomics specialists — academic bioinformatics groups, computational biology labs — submit their models to the workspace. Inside tracebloc's containerised training environment, each model trains on the 512-patient cohort — fine-tuning to the specific gene expression patterns, disease signature scores, and clinical covariate relationships in Amara's patient population — without any researcher ever downloading a patient record. tracebloc handles orchestration, scores each adapted model against the holdout cohort, and publishes results to a live leaderboard automatically. This is a federated learning application of expert collaboration: the cohort stays on the foundation's infrastructure, the expertise comes in through the workspace.

Outcome

In this example evaluation, the best external contributor exceeded Amara's internal baseline by twelve percentage points after fine-tuning — a result her internal team had not moved in eight months. The performance difference was driven by pathway-level feature construction, which the leaderboard's feature importance output made visible. The leaderboard records which gene expression approaches actually work on this patient population. The workspace stays active for new collaborators and model updates without rebuilding the evaluation infrastructure.

The Operational Challenge

Amara's team manages a longitudinal cohort assembled across six years: 640 patients with neuromuscular disease, each characterised by full transcriptomics across approximately 200 genes, clinical variables capturing disease stage and treatment history, and disease signature scores computed from published pathway databases. The classification target is disease progression speed — Slow, Medium, or Fast — based on longitudinal functional decline measurements. The three classes are nearly evenly distributed: 223 Slow, 215 Fast, 202 Medium patients.

Her internal bioinformatics team has built several iterations of a classification model using standard supervised approaches applied to the gene expression features. They have tuned hyperparameters, experimented with feature selection, and applied dimensionality reduction. The best they have produced achieves macro-F1 around 0.73. They have been stuck at that number for eight months.

The ceiling is not a modelling problem in the conventional sense. It is an expertise gap. The gene expression patterns that differentiate Fast progressors from Slow ones in neuromuscular disease are not straightforward individual gene signals. They are embedded in the co-expression relationships between gene networks, in the interaction between transcriptomic signatures and clinical covariates, and in pathway-level features that require specific knowledge of neuromuscular biology to construct. Amara's team are capable statisticians and strong programmers. They are not neuromuscular transcriptomics specialists.

The specialists exist. There are academic bioinformatics groups — at disease-focused research institutes, at neurology departments of major universities — who work exclusively on transcriptomics-based progression modelling in conditions like ALS, spinal muscular atrophy, and Duchenne MD. They have developed proprietary feature engineering approaches, pathway scoring methods, and gene network priors that would materially improve on what Amara's team has built.

The problem is access. Amara cannot give those groups her patient data. The ethics approval for her cohort covers internal research use and defined collaboration agreements — it does not permit transfer to external researchers' compute environments. A formal collaboration agreement with an academic centre takes six to twelve months to negotiate, and triggers an ethics amendment. Amara needs results in the timeline of her current grant cycle, not the next one.

She needs a mechanism where external transcriptomics specialists submit their methodology — their models, their feature engineering pipelines — and train on her cohort without downloading a single patient record.

Stakeholders

Prof. Amara Osei, Head of Translational Research: Owns the cohort, the ethics approval, and the scientific accountability for progression modelling results. Her KPI is macro-F1 on the holdout cohort — and closing the gap between her internal ceiling and the state of the art
VP Biomarker Strategy: Needs a validated prognostic model to support upcoming endpoint discussions with regulatory agencies; performance at 0.73 does not provide the confidence margin required
Scientific / Grant Affairs: A meaningful performance improvement on the prognostic model is a deliverable in the current grant reporting period — the team needs to demonstrate progress within months, not years
Ethics and Data Governance: The ethics committee has not approved data transfer to external academic partners; any collaboration must work within the existing approval without triggering a new amendment cycle
External Collaborators (academic bioinformatics groups): Computational biology labs with neuromuscular transcriptomics expertise; willing to contribute their methods but cannot access the cohort under standard collaboration terms

The Underlying Dataset

The evaluation dataset contains 640 anonymised neuromuscular disease patient records with full transcriptomics and clinical characterisation. Full dataset statistics, feature distributions, and class-level analysis are available in the Exploratory Data Analysis tab.

This dataset is augmented. It was constructed to reflect the statistical structure of real-world neuromuscular disease transcriptomics cohorts — the gene expression value distributions, the near-balanced three-class progression structure, the clinical variable patterns, and the disease signature score ranges — without containing any identifiable patient information.

Property	Value
Total records	640
Training cohort	512 records
Holdout cohort	128 records
Features	252 (250 numerical, 2 categorical)
Progression classes	3
Missing values	None
Duplicate records	None
Class imbalance ratio	1.1× (near-balanced)
Evaluation metric	Macro-F1 score

Disease progression class distribution (full dataset):

Class	Progression Speed	Patients	Share
Slow	Low functional decline rate	223	34.8%
Fast	High functional decline rate	215	33.6%
Medium	Intermediate decline rate	202	31.6%

A note on the features: The 250 numerical features span three domains. Gene expression features (approximately 200 variables, labelled gene_0 through gene_N) capture normalised expression values across genes relevant to neuromuscular disease biology — these have low individual variance but high collective signal. Clinical variables (approximately 28 variables, labelled clinical_0 through clinical_28) capture disease stage, treatment history, and functional assessments — these carry the highest variance in the dataset. Disease signature scores (labelled signature_0 through signature_17+) are pathway-level aggregates computed from curated gene sets. No features require imputation — the dataset is complete, with no duplicate records. The near-equal class distribution (34.8% / 33.6% / 31.6%) means that a model predicting the most common class achieves only 34.8% accuracy — macro-F1 is the metric that measures genuine discriminative performance across all three progression groups.

How Evaluation Works

Each contributor submitted their progression prediction model to the tracebloc workspace. The evaluation ran in two phases.

Phase 1 — Out-of-the-box performance. Each model was scored as submitted, with no adaptation to Amara's patient cohort. This establishes the true baseline: what the model delivers when applied to a new patient population without access to that population's data during development — and, where that baseline is below the internal team's result, how much the claimed approach depends on its original training distribution.

Phase 2 — Fine-tuning. Contributors were given access to the training environment inside the tracebloc workspace. Each contributor transferred their model — including any custom feature engineering pipelines, pathway scoring methods, or gene network priors — into tracebloc and ran training on the 512-patient cohort. This process fine-tuned the model weights and feature representations to the specific gene expression distributions and clinical covariate patterns in Amara's patient population, adapting from a generalised transcriptomics classifier to one calibrated for this specific disease context. After training, the adapted model was evaluated automatically against the 128-patient holdout. Contributors received only their own results; no contributor had visibility into another's scores before the leaderboard published.

Each contributor received:

Training access: 512 anonymised patient records (252 features including ~200 gene expression values, ~28 clinical variables, and signature scores) for model fine-tuning inside the workspace
Evaluation environment: Sandboxed execution — adapted models run against the holdout cohort, no patient data export path available
Metrics tracked: Macro-F1 score, per-class precision and recall (Slow / Medium / Fast), and feature importance rankings for gene expression and clinical features
Baseline to beat: Amara's internal model at macro-F1 0.73 — contributors are evaluated against this internal ceiling, not only against each other

Results

→ View the full model leaderboard — complete contributor rankings, per-class recall breakdown, and gene expression feature importance across all submissions.

Contributor	Internal Baseline	Out-of-the-Box	After Fine-tuning	Fast Recall
Internal team	0.73	—	—	0.69
Contributor A	—	0.71	0.79	0.74
Contributor B ✅	—	0.74	0.85	0.81
Contributor C ⚠️	—	0.69	0.76	0.63

What the numbers reveal:

Contributor B exceeded Amara's internal ceiling by twelve percentage points at macro-F1 0.85 — a result her team had not approached in eight months. The contributor's feature engineering pipeline, incorporating pathway-level aggregation across neuromuscular-relevant gene networks, produced a representation that generalised substantially better than individual gene expression features alone. The Fast progression recall of 0.81 is clinically significant: patients classified as Fast progressors are those for whom early therapeutic intervention has the greatest potential to alter trajectory.

Contributor A demonstrated that the internal team's approach was not the only viable direction: starting at 0.71 out-of-the-box, it improved to 0.79 after training on 512 real-distribution patients — outperforming the internal baseline. The gap between Contributor A and Contributor B (0.79 versus 0.85) reflects the value of the specialised pathway methodology Contributor B brought. Both represent advances over what Amara's team could achieve alone.

Contributor C started at 0.69 — the lowest out-of-the-box result in the evaluation — and reached 0.76 after fine-tuning. Its Fast recall of 0.63 trails the other contributors. The approach shows improvement through adaptation but does not close the gap Amara needs for regulatory-quality prognostic evidence.

Business Impact

Illustrative assumptions: Neuromuscular disease cohort supporting Phase II endpoint validation / Internal team spent 8 months at 0.73 ceiling / Traditional academic collaboration timeline: 6–12 months governance + ethics amendment / Grant reporting deadline: 6 months

Approach	Time to Result	Macro-F1 Achieved	Ethics/Governance Risk	Patient Data Transfer
Internal team (status quo)	8 months invested	0.73	None	No
Traditional collaboration	6–12 months governance	Unknown	High — new ethics amendment	Yes
tracebloc workspace ✅	Days to weeks	0.85	None — existing approval covers	No

The twelve percentage point improvement in macro-F1 is not primarily a modelling result — it is a collaboration result. The expertise existed in the external research community. The blocker was data access. tracebloc removed the blocker without creating the governance cost that a conventional data transfer would require. The result is achievable within the grant reporting window. The traditional route is not.

Decision

Amara adopts Contributor B's model and methodology as the new prognostic tool for the cohort. The feature importance output from the fine-tuning run identifies the gene signature combinations and clinical variable interactions that drive Fast progression classification — providing mechanistic insight that Amara's team can carry into the next phase of biological interpretation and endpoint design for regulatory discussions.

The tracebloc workspace stays active after the initial evaluation. As Amara's cohort grows — new patient enrolment, new longitudinal timepoints — Contributor B can retrain inside the workspace on updated data without rebuilding the collaboration arrangement. New academic groups with specialised transcriptomics expertise can be invited to submit models on the same terms, without additional ethics amendments. The leaderboard becomes a persistent record of which approaches advance the state of the art on this cohort.

Explore this use case further:

View the model leaderboard — full contributor rankings, per-class F1, gene expression feature importance
Explore the dataset — progression class distribution, gene expression feature statistics, clinical variable analysis
Start training — submit your own transcriptomics model to this evaluation

Related use cases: See how the same expert collaboration approach applies to omics biomarker panel narrowing across rare disease cohorts and combination multi-omics therapy response prediction. For a broader view of what federated learning applications look like in translational research, see our guide.

Deploy your workspace or schedule a call.

Disclaimer

Disclaimer: The dataset used in this use case is augmented — constructed to reflect the statistical structure of real-world neuromuscular disease transcriptomics cohorts, including gene expression value distributions, disease progression class balance, clinical variable patterns, and signature score ranges, without containing any identifiable patient information. The persona, contributor names, performance figures, and scenario are illustrative and based on patterns observed across rare disease research and clinical bioinformatics environments. They do not represent any specific institution, research group, or grant programme.