MISHALE
Docs

Methodology & Data

A feasibility memo is produced by a deterministic pipeline: the same input yields the same output, and every assertion traces to a public source. The pipeline reasons over a knowledge graph that integrates public biological databases; only the parsing of free-text input uses a language model, and that step falls back to a deterministic parser when unavailable.

From intent to memo

Given a disease and candidate gene, the pipeline resolves the disease to a Mondo identifier and selects the causal gene; chooses a strategy (variant correction, gene replacement, silencing, or enhancer disruption); designs the molecular reagent — ranked guides for editing, or a transgene cassette with a single-AAV cargo-fit check for replacement; screens off-target sites and assigns a consequence tier; recommends a delivery vector; derives a dose range from approved-product precedent; and assigns a regulatory class. A gene named in the query is honoured when it is a recognised causal gene for the disease, which disambiguates heterogeneous indications.

Data sources

ClinVarPathogenic and likely-pathogenic variants, and the consequence distribution per gene.
gnomADPopulation allele frequencies and gene-level constraint.
MondoDisease ontology used to resolve a free-text indication to a stable identifier.
GTExTissue-level expression, used to weight off-target consequence and inform delivery.
GRCh38The reference genome, against which guides and off-target sites are screened.
Approved-therapy recordVectors, routes, and doses of approved and clinically advanced gene therapies, used as dosing and delivery precedent.

These are public resources; the memo cites the specific evidence it used so a claim can be checked independently.

Determinism and limits

Because the decision path is deterministic, results are reproducible and auditable. The engine designs monogenic targets; for polygenic or complex diseases it declines rather than fabricate a design, and for targets outside its current coverage it says so. Findings are computational and require experimental confirmation.

Validation

The pipeline is benchmarked against gene-therapy programmes with known outcomes. The pre-registered study, including every misclassification, is published as the retrospective concordance preprint.