## Building an ATAC-seq count matrices from bam files¶

In [2]:
import episcanpy.api as epi


Then specify the path to the test dataset.

In [3]:
path_to_play_data = '../ATAC_play_data/'
file_annot_name = "cortex_enhancer.bed"

# list of the bam files you want to build a count matrix for
list_cells =['AGCGATAGAACGAATTCGACTCGTATCACAGGACGT.bam',
'AGCGATAGAACGAATTCGCCGACTCCAAAGGCGAAG.bam',
'AGCGATAGAACGAATTCGCATATCCTATGGCTCTGA.bam',
'AGCGATAGAACGAATTCGACTCGTATCAAGGCGAAG.bam',
'AGCGATAGAACGAATTCGACCTACGCCAGGCTCTGA.bam'
]


Load the annoation file (peaks or enhancers) with the right set of chromosome names

In [4]:
enhancers = epi.ct.load_features(file_annot_name)
enhancer_names = epi.ct.name_features(enhancers)

0.1724870204925537 seconds


Let's now generate the count matrix.

Important limitation. You can build only one count matrix with the function bld_atac_mtx whereas the methylation where you can build multiple data matrices at the same time.

In [5]:
epi.ct.bld_atac_mtx(list_bam_files=list_cells,
output_file='test_ATAC_mtx.txt',
path=path_to_play_data,
writing_option='w',

AGCGATAGAACGAATTCGACTCGTATCACAGGACGT.bam 130527 mapped reads

epi.ct.save_sparse_mtx(initial_matrix='test_ATAC_mtx.txt',

0.16255593299865723 seconds