
Neuromuscular Disease Prognosis via Transcriptomics & Gene Expression
Participants
36
End Date
13.10.27
Dataset
dg4lctwg
Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)
Compute
0 / 100.00 PF
Submits
0/5

36
13.10.27
On this page
About this use case: A neurological disease foundation has 640 neuromuscular patients with rich transcriptomics data and a progression model stuck at macro-F1 0.73 for eight months — while the specialists who could push it further sit at academic centres that cannot access the cohort. tracebloc lets those external bioinformatics groups train on the patient data directly, without downloading a single record. Explore the data, submit your own model, and see how your approach compares.
Gene expression analysis has become the workhorse of neuromuscular disease prognosis research — and yet most translational teams hit the same ceiling. Internal bioinformatics resources can process the data, fit the standard models, and reach a performance plateau they cannot move. The transcriptomics specialists who could push further — experts in disease signature scoring, pathway-level feature engineering, and neuromuscular gene network topology — are at academic research centres. And they cannot access the patient cohort, because the cohort is governed by ethics approvals and data sharing restrictions that do not extend to external collaborators.
Professor Amara Osei's translational research team at a neurological disease foundation holds a cohort of 640 neuromuscular disease patients with longitudinal gene expression data. Her internal bioinformatics team has plateaued at macro-F1 0.73 on progression speed classification. She deploys a tracebloc workspace loaded with the full dataset. External transcriptomics specialists — academic bioinformatics groups, computational biology labs — submit their models to the workspace. Inside tracebloc's containerised training environment, each model trains on the 512-patient cohort — fine-tuning to the specific gene expression patterns, disease signature scores, and clinical covariate relationships in Amara's patient population — without any researcher ever downloading a patient record. tracebloc handles orchestration, scores each adapted model against the holdout cohort, and publishes results to a live leaderboard automatically. This is a federated learning application of expert collaboration: the cohort stays on the foundation's infrastructure, the expertise comes in through the workspace.
In this example evaluation, the best external contributor exceeded Amara's internal baseline by twelve percentage points after fine-tuning — a result her internal team had not moved in eight months. The performance difference was driven by pathway-level feature construction, which the leaderboard's feature importance output made visible. The leaderboard records which gene expression approaches actually work on this patient population. The workspace stays active for new collaborators and model updates without rebuilding the evaluation infrastructure.
Amara's team manages a longitudinal cohort assembled across six years: 640 patients with neuromuscular disease, each characterised by full transcriptomics across approximately 200 genes, clinical variables capturing disease stage and treatment history, and disease signature scores computed from published pathway databases. The classification target is disease progression speed — Slow, Medium, or Fast — based on longitudinal functional decline measurements. The three classes are nearly evenly distributed: 223 Slow, 215 Fast, 202 Medium patients.
Her internal bioinformatics team has built several iterations of a classification model using standard supervised approaches applied to the gene expression features. They have tuned hyperparameters, experimented with feature selection, and applied dimensionality reduction. The best they have produced achieves macro-F1 around 0.73. They have been stuck at that number for eight months.
The ceiling is not a modelling problem in the conventional sense. It is an expertise gap. The gene expression patterns that differentiate Fast progressors from Slow ones in neuromuscular disease are not straightforward individual gene signals. They are embedded in the co-expression relationships between gene networks, in the interaction between transcriptomic signatures and clinical covariates, and in pathway-level features that require specific knowledge of neuromuscular biology to construct. Amara's team are capable statisticians and strong programmers. They are not neuromuscular transcriptomics specialists.
The specialists exist. There are academic bioinformatics groups — at disease-focused research institutes, at neurology departments of major universities — who work exclusively on transcriptomics-based progression modelling in conditions like ALS, spinal muscular atrophy, and Duchenne MD. They have developed proprietary feature engineering approaches, pathway scoring methods, and gene network priors that would materially improve on what Amara's team has built.
The problem is access. Amara cannot give those groups her patient data. The ethics approval for her cohort covers internal research use and defined collaboration agreements — it does not permit transfer to external researchers' compute environments. A formal collaboration agreement with an academic centre takes six to twelve months to negotiate, and triggers an ethics amendment. Amara needs results in the timeline of her current grant cycle, not the next one.
She needs a mechanism where external transcriptomics specialists submit their methodology — their models, their feature engineering pipelines — and train on her cohort without downloading a single patient record.
The evaluation dataset contains 640 anonymised neuromuscular disease patient records with full transcriptomics and clinical characterisation. Full dataset statistics, feature distributions, and class-level analysis are available in the Exploratory Data Analysis tab.
This dataset is augmented. It was constructed to reflect the statistical structure of real-world neuromuscular disease transcriptomics cohorts — the gene expression value distributions, the near-balanced three-class progression structure, the clinical variable patterns, and the disease signature score ranges — without containing any identifiable patient information.
| Property | Value |
|---|---|
| Total records | 640 |
| Training cohort | 512 records |
| Holdout cohort | 128 records |
| Features | 252 (250 numerical, 2 categorical) |
| Progression classes | 3 |
| Missing values | None |
| Duplicate records | None |
| Class imbalance ratio | 1.1× (near-balanced) |
| Evaluation metric | Macro-F1 score |
Disease progression class distribution (full dataset):
| Class | Progression Speed | Patients | Share |
|---|---|---|---|
| Slow | Low functional decline rate | 223 | 34.8% |
| Fast | High functional decline rate | 215 | 33.6% |
| Medium | Intermediate decline rate | 202 | 31.6% |
A note on the features: The 250 numerical features span three domains. Gene expression features (approximately 200 variables, labelled gene_0 through gene_N) capture normalised expression values across genes relevant to neuromuscular disease biology — these have low individual variance but high collective signal. Clinical variables (approximately 28 variables, labelled clinical_0 through clinical_28) capture disease stage, treatment history, and functional assessments — these carry the highest variance in the dataset. Disease signature scores (labelled signature_0 through signature_17+) are pathway-level aggregates computed from curated gene sets. No features require imputation — the dataset is complete, with no duplicate records. The near-equal class distribution (34.8% / 33.6% / 31.6%) means that a model predicting the most common class achieves only 34.8% accuracy — macro-F1 is the metric that measures genuine discriminative performance across all three progression groups.
Each contributor submitted their progression prediction model to the tracebloc workspace. The evaluation ran in two phases.
Phase 1 — Out-of-the-box performance. Each model was scored as submitted, with no adaptation to Amara's patient cohort. This establishes the true baseline: what the model delivers when applied to a new patient population without access to that population's data during development — and, where that baseline is below the internal team's result, how much the claimed approach depends on its original training distribution.
Phase 2 — Fine-tuning. Contributors were given access to the training environment inside the tracebloc workspace. Each contributor transferred their model — including any custom feature engineering pipelines, pathway scoring methods, or gene network priors — into tracebloc and ran training on the 512-patient cohort. This process fine-tuned the model weights and feature representations to the specific gene expression distributions and clinical covariate patterns in Amara's patient population, adapting from a generalised transcriptomics classifier to one calibrated for this specific disease context. After training, the adapted model was evaluated automatically against the 128-patient holdout. Contributors received only their own results; no contributor had visibility into another's scores before the leaderboard published.
→ View the full model leaderboard — complete contributor rankings, per-class recall breakdown, and gene expression feature importance across all submissions.
| Contributor | Internal Baseline | Out-of-the-Box | After Fine-tuning | Fast Recall |
|---|---|---|---|---|
| Internal team | 0.73 | — | — | 0.69 |
| Contributor A | — | 0.71 | 0.79 | 0.74 |
| Contributor B ✅ | — | 0.74 | 0.85 | 0.81 |
| Contributor C ⚠️ | — | 0.69 | 0.76 | 0.63 |
What the numbers reveal:
Contributor B exceeded Amara's internal ceiling by twelve percentage points at macro-F1 0.85 — a result her team had not approached in eight months. The contributor's feature engineering pipeline, incorporating pathway-level aggregation across neuromuscular-relevant gene networks, produced a representation that generalised substantially better than individual gene expression features alone. The Fast progression recall of 0.81 is clinically significant: patients classified as Fast progressors are those for whom early therapeutic intervention has the greatest potential to alter trajectory.
Contributor A demonstrated that the internal team's approach was not the only viable direction: starting at 0.71 out-of-the-box, it improved to 0.79 after training on 512 real-distribution patients — outperforming the internal baseline. The gap between Contributor A and Contributor B (0.79 versus 0.85) reflects the value of the specialised pathway methodology Contributor B brought. Both represent advances over what Amara's team could achieve alone.
Contributor C started at 0.69 — the lowest out-of-the-box result in the evaluation — and reached 0.76 after fine-tuning. Its Fast recall of 0.63 trails the other contributors. The approach shows improvement through adaptation but does not close the gap Amara needs for regulatory-quality prognostic evidence.
Illustrative assumptions: Neuromuscular disease cohort supporting Phase II endpoint validation / Internal team spent 8 months at 0.73 ceiling / Traditional academic collaboration timeline: 6–12 months governance + ethics amendment / Grant reporting deadline: 6 months
| Approach | Time to Result | Macro-F1 Achieved | Ethics/Governance Risk | Patient Data Transfer |
|---|---|---|---|---|
| Internal team (status quo) | 8 months invested | 0.73 | None | No |
| Traditional collaboration | 6–12 months governance | Unknown | High — new ethics amendment | Yes |
| tracebloc workspace ✅ | Days to weeks | 0.85 | None — existing approval covers | No |
The twelve percentage point improvement in macro-F1 is not primarily a modelling result — it is a collaboration result. The expertise existed in the external research community. The blocker was data access. tracebloc removed the blocker without creating the governance cost that a conventional data transfer would require. The result is achievable within the grant reporting window. The traditional route is not.
Amara adopts Contributor B's model and methodology as the new prognostic tool for the cohort. The feature importance output from the fine-tuning run identifies the gene signature combinations and clinical variable interactions that drive Fast progression classification — providing mechanistic insight that Amara's team can carry into the next phase of biological interpretation and endpoint design for regulatory discussions.
The tracebloc workspace stays active after the initial evaluation. As Amara's cohort grows — new patient enrolment, new longitudinal timepoints — Contributor B can retrain inside the workspace on updated data without rebuilding the collaboration arrangement. New academic groups with specialised transcriptomics expertise can be invited to submit models on the same terms, without additional ethics amendments. The leaderboard becomes a persistent record of which approaches advance the state of the art on this cohort.
Explore this use case further:
Related use cases: See how the same expert collaboration approach applies to omics biomarker panel narrowing across rare disease cohorts and combination multi-omics therapy response prediction. For a broader view of what federated learning applications look like in translational research, see our guide.
Deploy your workspace or schedule a call.
Disclaimer: The dataset used in this use case is augmented — constructed to reflect the statistical structure of real-world neuromuscular disease transcriptomics cohorts, including gene expression value distributions, disease progression class balance, clinical variable patterns, and signature score ranges, without containing any identifiable patient information. The persona, contributor names, performance figures, and scenario are illustrative and based on patterns observed across rare disease research and clinical bioinformatics environments. They do not represent any specific institution, research group, or grant programme.