Retrospective concordance of a deterministic computational pipeline for gene-therapy target feasibility

Wilson Mudaki¹

¹Mishale, Inc., Wilmington, Delaware, USA.

Preprint — not peer reviewed · 9 June 2026

Abstract. Choosing which gene and indication to advance is among the earliest decisions in a gene-therapy programme and among the most consequential, yet it is typically reached from manually assembled evidence with no consistent basis for comparison. We describe a deterministic pipeline that converts a single clinical intent — a disease and a candidate gene — into a structured feasibility assessment (target rationale, editing or replacement strategy, guide or transgene design, off-target tiering, delivery vector, dosing precedent and regulatory classification) computed entirely from public data. To evaluate it we assembled a pre-registered benchmark of 69 cases with known outcomes, stratified into clinically supported programmes (n = 36), a documented clinical failure (n = 1), polygenic negative controls (n = 17) and monogenic targets outside the system’s coverage (n = 15); the scoring rubric was fixed before execution. The pipeline reached the predetermined-correct disposition in 66 of 69 cases (95.7%, 95% CI 88.0–98.5%): all 36 supported programmes resolved to the correct causal gene with a concordant verdict; the failure case reproduced the adeno-associated-virus (AAV) hepatotoxicity that halted the original trial; no out-of-coverage target produced a fabricated design; and 14 of 17 negative controls were correctly declined. We report all three misclassifications, which share a single failure mode, and identify the principal confound — overlap between the benchmark and the system’s curated knowledge base.

Introduction

Developing a gene therapy is among the most capital-intensive undertakings in medicine, and the cost of abandoning a programme compounds with each phase it survives^10,11. The earliest decision — which gene and indication to pursue — is the point at which analysis has the most leverage, yet in practice it is settled informally: guide-RNA scores from one tool, dosing precedents from the literature, delivery and regulatory rationale from experience, reconciled in a meeting. The resulting judgement is slow to assemble, hard to audit, and not readily comparable across candidates.

Software that systematizes this triage is attractive, yet such systems are seldom held to a falsifiable, pre-registered benchmark, and their behaviour on intractable inputs — what they return when no good answer exists — is rarely reported. We evaluate a deterministic pipeline that produces a feasibility assessment for a (disease, gene) pair from public data, and ask how often its disposition matches the disposition real programmes ultimately reached. The benchmark and scoring were fixed in advance, strata were chosen to elicit failure as well as success, and every error is reported below.

Results

The pipeline

Given a disease and candidate gene, the pipeline executes a fixed sequence of stages (Fig. 1). It resolves the disease to an ontology identifier and selects the causal gene; chooses a therapeutic strategy (variant correction, gene replacement, silencing, enhancer disruption); designs the corresponding molecular reagent (guide RNAs for editing, or a transgene cassette with a single-AAV cargo-fit check for replacement); screens candidate off-target sites and assigns a consequence tier; recommends a delivery vector; derives a dosing range from approved-product precedent; and assigns a regulatory class. The decision path is deterministic — identical inputs yield identical outputs — and all assertions are traceable to public sources, principally a knowledge graph integrating ClinVar⁷, gnomAD⁶, the Mondo disease ontology⁸ and the approved-therapy record. Natural-language parsing of the input is the only stochastic component and degrades to a deterministic rule-based parser when unavailable.

Clinical
intent

→

Disease &
gene

→

Strategy

→

Molecular
design

→

Off-target
tier

→

Vector

→

Dosing

→

Regulatory
class

Figure 1 | Pipeline stages. A single clinical intent is resolved to a disease and causal gene, a therapeutic strategy is selected, a molecular reagent is designed, and downstream off-target, delivery, dosing and regulatory assessments are produced. Each stage is deterministic and sourced from public data.

Benchmark design

We enumerated 69 cases before any were run and assigned each to one of four strata with an explicit pass criterion (Methods). Supported cases (n = 36) are gene/indication pairs that are approved or in registered clinical development; a case passes if the pipeline resolves the correct causal gene and returns a verdict consistent with the real programme. The single failure case is a programme halted in the clinic; it passes only if the pipeline surfaces the responsible liability. Negative controls (n = 17) are polygenic or complex diseases with no single causal gene; the correct behaviour is to decline. Out-of-coveragecases (n = 15) are monogenic diseases outside the curated set; either a clean decline or a correct design passes, but a fabricated gene fails.

Concordance with clinical outcomes

Aggregated across strata, the pipeline reached the predetermined-correct disposition in 66 of 69 cases (95.7%, 95% Wilson CI 88.0–98.5%; Table 1). All 36 supported programmes resolved to the correct causal gene with a concordant verdict (36/36; 95% CI 90.3–100%), spanning haematologic, central-nervous-system, hepatic, muscular, metabolic, ophthalmic, lysosomal, renal and dermatologic targets and both editing and replacement modalities, including approved AAV gene-replacement products such as onasemnogene abeparvovec (SMN1)² and the editing of BCL11Afor the haemoglobinopathies¹. Because the four strata are scored against different pass criteria, the aggregate should be read alongside the stratum-level rates in Table 1 rather than as a single homogeneous accuracy.

Stratum	n	Passed	Rate
Supported (approved / in clinical development)	36	36	100%
Clinical failure (liability recapitulation)	1	1	100%
Out-of-coverage (no fabrication)	15	15	100%
Negative control (polygenic; decline)	17	14	82%
Overall	69	66	95.7%

Table 1 | Concordance by stratum. Passing requires the predetermined-correct disposition for each stratum (Methods). No case in any stratum produced a fabricated causal gene.

Recapitulation of a dose-limiting toxicity

We next tested whether the pipeline surfaces a known liability rather than confirming only successes. The AT132 programme for X-linked myotubular myopathy was halted after high systemic doses of an AAV vector produced fatal hepatobiliary events³. Given only the MTM1 target, and with no access to that outcome, the pipeline selected a systemic AAV delivery and flagged a hepatotoxicity ceiling from dosing precedent, reproducing the liability that ended the programme.

Failure behaviour and specificity

Across the 17 polygenic negative controls and 15 monogenic out-of-coverage targets, no case yielded a fabricated causal gene. For 14 of 17 polygenic diseases the pipeline declined to design, and every out-of-coverage target was either declined or resolved to its correct gene. Off-nominal inputs therefore produced abstention rather than fabricated designs. This is the behaviour that matters when the tool is used to filter candidates: a confident error misdirects budget, whereas a refusal returns the decision to the team.

Error analysis

The three misclassifications were the three negative controls that the pipeline failed to decline (Table 2). All shared one mechanism: a generic, polygenic disease query resolved to a real monogenic-subtype gene and a design was produced. These are over-resolution errors of specificity, not fabrications — each gene is genuinely associated with the disease — but for an unqualified complex-disease query the correct disposition is to decline.

Query (complex disease)	Resolved gene	Error class
Psoriasis	IL12B	Over-resolution to susceptibility gene
Chronic kidney disease	PKD2	Over-resolution to monogenic subtype
Atrial fibrillation	SCN5A	Over-resolution to monogenic subtype

Table 2 | The three misclassifications. Each is a failure to decline a polygenic query, resolving instead to a real disease-associated gene. No fabricated genes occurred.

Discussion

On a pre-registered benchmark, a deterministic pipeline reached the disposition that real gene-therapy programmes ultimately reached in 95.7% of cases, recapitulated a clinical dose-limiting toxicity from public data alone, and produced no fabricated causal gene across 32 adversarial inputs. The most important limitation qualifies the headline directly: many supported cases overlap the system’s curated knowledge base, so the result establishes high recall and internally consistent reasoning on covered targets, together with graceful failure off coverage. It does not establish generalization to novel biology; a benchmark of targets deliberately held out of curation would be required for that, and we treat the present figures as an upper bound on performance for unseen targets.

The evaluation is retrospective. It measures whether the pipeline agrees, from public data, with outcomes that are now known; it does not measure prospective prediction, and concordance with past decisions is weaker evidence than a prospective trial of decisions not yet made. The three errors point to a concrete deficiency: the criterion for declining complex diseases relies on an enumerated list of polygenic conditions rather than a general signal of polygenicity, so an unlisted complex disease with a known monogenic subtype can be over-resolved. Replacing the enumerated guard with a heritability- or architecture-based criterion is the priority revision. Prospective evaluation on programmes whose outcomes are not yet decided, and expansion of the held-out portion of the benchmark, are the natural next steps.

Finally, the pipeline output is a computational triage from public data, intended to inform an early go/no-go decision. It is neither experimental evidence nor an investigational-new-drug package; every assertion requires wet-laboratory confirmation before use.

Methods

Pipeline. The system (Mishale, version evaluated 9 June 2026) accepts a free-text clinical intent, parses it to a disease and optional gene, and executes the deterministic stages of Fig. 1. Disease resolution maps to a Mondo identifier⁸ and selects a causal gene; a gene named in the query is honoured when it is a recognized causal gene for the disease. Strategy selection chooses among variant correction, gene replacement, transcript silencing and enhancer disruption. Molecular design produces ranked guide RNAs for editing strategies or a transgene cassette with a single-AAV (~4.7 kb) cargo-fit determination for replacement. Off-target screening tiers candidate sites by predicted consequence. Delivery, dosing and regulatory classification are derived from the approved-product record. Variant- and gene-level evidence is drawn from ClinVar⁷ and gnomAD⁶; delivery and dosing precedents from approved AAV and ex-vivo products^2,4,5,9.

Benchmark and scoring. The 69 cases and the per-stratum pass criteria were specified before execution. Supported cases were defined as gene/indication pairs with regulatory approval or registered clinical development; the failure case as a programme with a documented clinical halt; negative controls as diseases without a single Mendelian cause; and out-of-coverage cases as monogenic diseases outside the curated strategy set. A supported case passed if the resolved gene matched the established target and the verdict was feasible or feasible-with-caveats; the failure case passed if the responsible liability was flagged; a negative control passed if the pipeline declined; an out-of-coverage case passed on either a clean decline or a correct design, and failed on any fabricated gene. Each case was submitted once to the deployed system; outputs were scored against the pre-registered rubric without adjustment.

Statistical analysis. Binomial proportions are reported with two-sided 95% Wilson score confidence intervals. For strata in which every case passed, the interval is given by the Clopper–Pearson method. The aggregate rate combines strata with differing pass criteria and is reported for completeness; stratum-level rates (Table 1) are the primary results. No multiplicity correction was applied, as no hypothesis test was performed.

Data and code availability

The case list and per-case dispositions are available from the corresponding author on request. The underlying biological databases are public^6,7,8. The pipeline is proprietary to Mishale, Inc.; it is described here in sufficient detail to permit independent benchmarking against the same public outcomes.

Competing interests

W.M. is the founder of Mishale, Inc., which develops the system evaluated in this report. This declaration is made because the evaluation is of the author’s own product; the pre-registered design and full reporting of errors are intended to mitigate the resulting bias.

Author contributions

W.M. designed the study, built the pipeline, performed the evaluation and wrote the report.

References

1. Frangoul, H. et al. CRISPR-Cas9 gene editing for sickle cell disease and β-thalassemia. N. Engl. J. Med. 384, 252–260 (2021).
2. Mendell, J. R. et al. Single-dose gene-replacement therapy for spinal muscular atrophy. N. Engl. J. Med. 377, 1713–1722 (2017).
3. Wilson, J. M. & Flotte, T. R. Moving forward after two deaths in a gene therapy trial of myotubular myopathy. Hum. Gene Ther. 31, 695–696 (2020).
4. Nathwani, A. C. et al. Long-term safety and efficacy of factor IX gene therapy in hemophilia B. N. Engl. J. Med. 371, 1994–2004 (2014).
5. Russell, S. et al. Efficacy and safety of voretigene neparvovec for biallelic RPE65-mediated inherited retinal dystrophy. Lancet 390, 849–860 (2017).
6. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
7. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
8. Vasilevsky, N. A. et al. Mondo: unifying diseases for the world, by the world. Preprint at medRxiv (2022).
9. Wang, D., Tai, P. W. L. & Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nat. Rev. Drug Discov. 18, 358–378 (2019).
10. Wong, C. H., Siah, K. W. & Lo, A. W. Estimation of clinical trial success rates and related parameters. Biostatistics 20, 273–286 (2019).
11. Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market. JAMA 323, 844–853 (2020).