Overview
TP53 is the most frequently mutated gene in human cancers. Designing effective CRISPR guide RNAs to knock out TP53 requires balancing on-target efficiency with minimizing off-target effects. In this episode, we walk through the complete Hordago pipeline for CRISPR guide design.
Step 1: Define the Target
We start by specifying the gene, genome assembly, and cell line context:
hordago crispr design --gene TP53 --cell-line A549
The pipeline automatically:
- Resolves TP53 to its canonical transcript (ENST00000269305)
- Identifies all exons in the coding sequence
- Scans for PAM sites (NGG) within exonic regions
Step 2: On-Target Scoring
Each candidate guide RNA is scored using CRISPRon, which predicts cutting efficiency based on sequence features and chromatin accessibility:
import pandas as pd
guides = pd.read_csv("guides.tsv", sep="\t")
print(guides[["guide_id", "sequence", "on_target_score"]].head())
| guide_id | sequence | on_target_score |
|---|---|---|
| TP53-g1 | GCAGCCTTTGTGAACCAACA | 0.92 |
| TP53-g2 | TGGTTCTCACTTGGTGGAAG | 0.89 |
| TP53-g3 | AGCAGGTCTGTTCCAAGGGA | 0.87 |
Step 3: Off-Target Analysis
Cas-OFFinder scans the entire genome for potential off-target sites, allowing up to 3 mismatches:
cas-offinder input.txt G output.txt
Results show TP53-g3 has the fewest off-target hits, while TP53-g1 has the highest on-target score — a classic tradeoff.
TP53-g1: 12 off-target sites (max 3 mismatches)
TP53-g2: 18 off-target sites
TP53-g3: 5 off-target sites
Step 4: Final Ranking
The pipeline combines on-target and off-target scores into a composite ranking:
RESULTS: 3 guides ranked
TP53-g1 GCAGCCTTTGTGAACCAACA on=0.92 off=0.02 rank=1
TP53-g3 AGCAGGTCTGTTCCAAGGGA on=0.87 off=0.01 rank=2
TP53-g2 TGGTTCTCACTTGGTGGAAG on=0.89 off=0.04 rank=3
Before & After
Before: Manual workflow
- 3+ hours of manual tool switching
- No reproducibility guarantee
- Results scattered across browser tabs
After: Hordago pipeline
- 4.2 seconds end-to-end
- Full provenance manifest
- Ranked output with composite scoring
Key Takeaways
- Automated pipelines eliminate human error in multi-tool workflows
- Provenance tracking ensures every result can be reproduced
- Composite scoring surfaces the best guide by balancing efficiency vs. safety
- The pipeline’s cell-line context (A549) factors in chromatin state, which improves on-target predictions