In this tutorial, you will learn the basics of TRGT and TRVZ by analyzing a tiny example dataset included in this repository.
Our example dataset consists of a reference genome (reference.fasta
), a file with aligned reads (sample.bam
), and the repeat definition file (repeat.bed
) all stored in the input/
directory.
genome = "input/reference.fasta"
repeats = "input/repeat.bed"
reads = "input/sample.bam"
The repeat definitions are stored in a BED file that, among other information, contains repeat coordinates, repeat identifiers, and motifs:
! cat {repeats}
chrA 10001 10061 ID=TR1;MOTIFS=CAG;STRUC=(CAG)n
We can genotype the repeat by runing TRGT as shown below. Note that TRGT also has a --karyotype
parameter that is used to specify sex of the input sample.
! trgt --genome {genome} --reads {reads} --repeats {repeats} --output-prefix output/sample
The output consists of files sample.vcf.gz
and sample.spanning.bam
. The VCF file contains repeat genotypes while the BAM file contains pieces of HiFi reads that fully span the repeat sequences.
For example, here is the first entry of the VCF file:
! bcftools view --no-header output/sample.vcf.gz | head -n 1
chrA 10002 . CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG 0 . TRID=TR1;END=10061;MOTIFS=CAG;STRUC=(CAG)n GT:AL:ALLR:SD:MC:MS:AP:AM 1/1:33,33:30-39,30-39:15,14:11,11:0(0-33),0(0-33):1,1:.,.
It says that:
TRGT outputs are not sorted. So we first sort and index the VCF:
! bcftools sort -Ob -o output/sample.sorted.vcf.gz output/sample.vcf.gz
! bcftools index output/sample.sorted.vcf.gz
and then the BAM:
! samtools sort output/sample.spanning.bam -o output/sample.bam
! samtools index output/sample.bam
And that's it! The output files are now ready for downstream analysis.
To visualize the repeat with the identifier "TR1", run:
! trvz --genome {genome} --repeats {repeats} --vcf output/sample.sorted.vcf.gz --spanning-reads output/sample.bam --repeat-id TR1 --image output/TR1.png
from IPython.display import Image
Image(filename="output/TR1.png")