Schema Reference
The three-object model: ProtocolSpec (the plan) → ProtocolExecution (the run) → ProtocolOutcome (the result). JSON Schema Draft 2020-12. Pydantic v2 models. JSON Schema at GET /v1/schema.
ProtocolSpec
The canonical representation of a biological protocol as a causal program. All Mishale packages produce and consume this format.
| Field | Type | Req | Description |
|---|---|---|---|
| protocol_id | string | yes | Deterministic UUID derived from content hash. Stable across re-ingestion. |
| title | string | yes | Short descriptive name. |
| description | string | — | Free-text abstract or summary. |
| domain | string (enum) | yes | Research domain identifier. See Domain Taxonomy below. |
| steps | Intervention[] | yes | Ordered list of intervention steps — the causal program. |
| reagents | Reagent[] | — | Named reagents with concentration, vendor, and ChEBI ID. |
| equipment | string[] | — | Equipment identifiers (OBI-anchored where possible). |
| cell_type_initial | string | — | Initial cell type. CL ontology ID or free text. |
| cell_type_target | string | — | Target cell type. CL ontology ID or free text. |
| species | string | — | Model organism. NCBI Taxon ID or common name. |
| efficiency_mean | float | null | — | Welford running mean of measured efficiency (0–1). |
| n_measurements | int | — | Number of wet-lab measurements contributing to the mean. |
| tags | string[] | — | Free-form keyword tags. |
| source | string | — | Originating connector ID (e.g. 'benchling', 'pubmed', 'opentrons'). |
| external_id | string | — | Source-system primary key. |
| doi | string | — | DOI for literature-derived protocols. |
| paper_id | string | — | Internal paper identifier (used for contrastive pair construction). |
| metadata | object | — | Connector-specific key-value pairs. Not used for model training. |
Intervention (step)
Each element of steps is an Intervention — the atomic unit of a biological causal program.
| Field | Type | Description |
|---|---|---|
| action | string | Verb describing what is done (OBI-normalised where possible). |
| target | string | Biological target: gene (HGNC), cell type (CL), molecule (ChEBI). |
| target_class | string | One of: transcription_factor, small_molecule, protein_factor, viral_vector, crispr_component, other. |
| dose | string | Dose with units. Normalised to µM (small molecules) or MOI (viral). |
| delivery | string | Delivery method: lentiviral, retroviral, aav, mrna, plasmid, protein, electroporation, crispr. |
| timing_days | float | Day relative to protocol start when this intervention occurs. |
| duration_days | float | Duration of the intervention in days. |
| is_pioneer | boolean | Whether the TF is a pioneer factor (can bind closed chromatin). KB-derived. |
| is_inducible | boolean | Whether this is a dox-inducible transgene. |
Data Tier Model
0
Public
Published protocols. No restrictions. Apache 2.0.
1
Anonymised
Internal procedures without outcomes. Feature vectors only.
2
Outcomes
Procedures + efficiency measurements. Requires DPA / IRB.
3
Proprietary
Full IP records. Never transmitted. Local extraction only.
Domain Taxonomy
direct_reprogrammingipsc_derivationipsc_differentiationcell_culturecrispr_kocrispr_activationcrispr_interferencebase_editingprime_editingorganoidflow_cytometrywestern_blotsequencingcloningviral_transductionmicrobial_fermentationt_cell_engineeringcar_t
Example ProtocolSpec
{
"protocol_id": "prot_sha256_a1b2c3",
"title": "BAM Factor Direct Reprogramming — Fibroblast to Neuron",
"domain": "direct_reprogramming",
"cell_type_initial": "CL:0000057",
"cell_type_target": "CL:0000540",
"species": "9606",
"steps": [
{
"action": "transduce",
"target": "ASCL1",
"target_class": "transcription_factor",
"delivery": "lentiviral",
"dose": "MOI 5",
"timing_days": 0,
"duration_days": 1,
"is_pioneer": true,
"is_inducible": false
},
{
"action": "transduce",
"target": "BRN2",
"target_class": "transcription_factor",
"delivery": "lentiviral",
"dose": "MOI 5",
"timing_days": 0,
"duration_days": 1,
"is_pioneer": false,
"is_inducible": false
}
],
"reagents": [
{ "name": "doxycycline", "concentration": "2 µg/mL", "chebi_id": "CHEBI:50845" }
],
"efficiency_mean": 0.34,
"n_measurements": 3,
"source": "pubmed",
"doi": "10.1038/s41593-019-0548-1",
"paper_id": "PMID:31768042"
}