Create molecules from scratch

PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U.

In [3]:
from pyrna.features import DNA, RNA
rna = RNA(name = 'my_rna', sequence = 'AGGGGATTAACCCC')
print "%s: %s"%(rna.name, rna.sequence)
dna = DNA(name = 'my_dna', sequence = 'GGTTGGATTAACCCC')
print "%s: %s"%(dna.name, dna.sequence)
my_rna: AGGGGAUUAACCCC
my_dna: GGTTGGATTAACCCC

RNA and DNA molecules can return their length, are slicable and iterable:

In [4]:
print "slice: %s"%rna[0:2]
print "length: %i"%len(rna)
slice: AG
length: 14

You can easily get a single residue:

In [5]:
print rna[3]
G

The sequence can be easily changed by adding a new string at the end:

In [6]:
rna +'AAA'
print rna.sequence
AGGGGAUUAACCCCAAA

Or by removing some residues from the end:

In [7]:
rna-3
print rna.sequence
AGGGGAUUAACCCC

An RNA molecule is iterable over its primary sequence:

In [8]:
for index, residue in enumerate(rna):
    print "residue n%i: %s"%(index+1, residue)
residue n1: A
residue n2: G
residue n3: G
residue n4: G
residue n5: G
residue n6: A
residue n7: U
residue n8: U
residue n9: A
residue n10: A
residue n11: C
residue n12: C
residue n13: C
residue n14: C

Create molecules from files

With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects.

In [12]:
h = open('../data/1ehz.pdb')
pdb_content = h.read()
h.close()

from pyrna.parsers import parse_pdb
tertiary_structures = parse_pdb(pdb_content)

RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary.

In [13]:
for ts in tertiary_structures:
    print ts.rna.name
    print ts.rna.sequence
    print ts.rna.modified_residues
A
GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA
[('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]

If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues.

In [15]:
h = open('../data/telomerases.fasta')
fasta_content = h.read()
h.close()

from pyrna.parsers import parse_fasta
#the default type is RNA
for rna in parse_fasta(fasta_content):
    print "sequence of %s:"%rna.name
    print "%s\n"%rna.sequence
sequence of telomerase 1:
AGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU

sequence of telomerase 2:
AUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU

sequence of telomerase 3:
UACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU

An object RNA will automatically convert T residues into U.

In [17]:
h = open('../data/ft3100_from_FANTOM3_project.fasta')
fasta_content = h.read()
h.close()

for dna in parse_fasta(fasta_content, 'DNA'):
    print "sequence as a DNA:"
    print "%s\n"%dna.sequence

for rna in parse_fasta(fasta_content):
    print "sequence as an RNA:"
    print rna.sequence
sequence as a DNA:
TAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT

sequence as an RNA:
UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU

DNA and RNA objects have a rich textual representation in Jupyter notebooks.

In [18]:
parse_fasta(fasta_content)[0]
Out[18]:
1	UAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGG
61	UUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCG
121	CAAACUGAAAAAAGGCGAUGAAUUUGCGGU

Create molecules from databases

You can load 3D structures directly from the Protein Databank

In [19]:
from pyrna.db import PDB
pdb = PDB()
pdb_content = pdb.get_entry('1GID')

With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure.

In [20]:
from pyrna.parsers import parse_pdb

for tertiary_structure in parse_pdb(pdb_content):
    print "molecular chain %s: %s"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)
molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC
molecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC
In [ ]: