MISHALE
Docs

Knowledge Base

mshale-kb provides ontology-backed entity resolution for reagents, cell lines, species, and domain classification. It bridges the gap between free-text protocols and machine-comparable structured records.

What it does

🧪

Reagent resolution

Maps "lipofectamine 2000" → ChEBI:33692 with canonical name and SMILES.

🦠

Cell line normalisation

Maps "HEK cells" → EFO:0001187 (HEK293T) + Homo sapiens.

🏷️

Domain classification

Maps free text to a Mishale domain enum using a fine-tuned classifier.

Linked Ontologies

OntologyPurpose
ChEBIChemical Entities of Biological Interest — reagent and small molecule lookup.
GOGene Ontology — biological process, molecular function, cellular component.
EFOExperimental Factor Ontology — cell line and assay type normalisation.
OBIOntology for Biomedical Investigations — protocol step verb normalisation.
NCBI TaxonomySpecies resolution for cell lines and model organisms.

Python API

from mshale_kb import KnowledgeBase

kb = KnowledgeBase()

# Reagent lookup
reagent = kb.lookup_reagent("Lipofectamine 2000")
# ReagentEntity(chebi_id="CHEBI:33692", canonical_name="lipofectamine 2000", ...)

# Cell line lookup
cell_line = kb.lookup_cell_line("HEK293T")
# CellLineEntity(efo_id="EFO:0001187", organism="Homo sapiens", ...)

# Domain classification
domain = kb.classify_domain("CRISPR knock-out using RNP electroporation")
# "crispr_ko"

# Enrich a ProtocolSpec
from mshale_schema import ProtocolSpec
spec = ProtocolSpec(...)
enriched_spec = kb.enrich(spec)  # adds .reagents[].chebi_id etc.

CLI Reference

CommandDescription
mshale-kb lookup reagent <name>Resolve a reagent name to ChEBI ID + canonical name.
mshale-kb lookup cell-line <name>Resolve a cell line to EFO term + organism.
mshale-kb lookup domain <text>Classify free text to a Mishale domain identifier.
mshale-kb enrich <spec.json>Enrich a ProtocolSpec with resolved ontology IDs.
mshale-kb statsShow KB coverage stats (reagents, cell lines, domains).

Offline Cache

The KB ships with a pre-built SQLite cache of the most common reagents and cell lines (~50K entries). Remote ontology lookups fall through to the live OLS API when a term is not cached. The cache is updated with each Mishale release.

# Force refresh the local cache
mshale-kb cache refresh

# Show cache statistics
mshale-kb cache stats
# Reagents:  51,204 terms
# Cell lines: 4,891 terms
# Last updated: 2025-03-15