PyRNA stores and describes an RNA secondary structure using two different data structures:

  • a pandas Dataframe
  • a pyrna.features.SecondaryStructure object

The pandas Dataframe lists all the base pairs (a.k.a interactions) making the secondary structure. If a pandas Dataframe restricts the exploration of a secondary structure to the level of the base pairs, a pyrna.features.SecondaryStructure gives us access to high-level objects like:

  • helices,
  • single-strands,
  • junctions.

In both data structures, the edges of the base pairs are described with three different characters:

* '(' or ')' for Watson-Crick edges,
* '[' or ']' for Hoogsteen edges,
* '{' or '}' for Sugar edges.

A base pair can also be in a cis ('c') or trans ('t') orientation.

Creation of secondary structures from scratch

In [36]:
import json #to have a better output for dict describing pyrna.features.RNA objects (interactions, helices, single-strands,..)

from pyrna.parsers import parse_bn
rna = RNA(name = 'my_rna', sequence = 'GGGGGACAACCCC')
bn = '(({(.....))))'
base_pairs = parse_bn(bn)
print base_pairs
  orientation edge1 edge2  pos1  pos2
0           c     (     )     4    10
1           c     {     )     3    11
2           c     (     )     2    12
3           c     (     )     1    13

Now we transform this list of base-pairs into a pyrna.features.SecondaryStructure object.

In [37]:
from pyrna.parsers import base_pairs_to_secondary_structure
ss = base_pairs_to_secondary_structure(rna, base_pairs)

Now we can browse over the helices. An helix contains a field to store non-canonical base pairs (meaning base pairs different from c() AU, c() GU or c() GC.

In [38]:
for helix in ss.helices:
    print json.dumps(helix, indent = 2)
    print "\nAny non canonical base pairs in this helix?\n"
    for interaction in helix['interactions']:
        print json.dumps(interaction, indent = 2)    
{
  "length": 4, 
  "interactions": [
    {
      "edge1": "{", 
      "edge2": ")", 
      "orientation": "c", 
      "location": [
        [
          3, 
          3
        ], 
        [
          11, 
          11
        ]
      ]
    }
  ], 
  "name": "H1", 
  "location": [
    [
      1, 
      4
    ], 
    [
      10, 
      13
    ]
  ]
}

Any non canonical base pairs in this helix?

{
  "edge1": "{", 
  "edge2": ")", 
  "orientation": "c", 
  "location": [
    [
      3, 
      3
    ], 
    [
      11, 
      11
    ]
  ]
}

or over the single-strands...

In [6]:
for single_strand in ss.single_strands:
    print json.dumps(single_strand, indent = 2)
{
  "name": "SS1", 
  "location": [
    5, 
    9
  ]
}

Any high-level object from a pyrna.features.SecondaryStructure is described with a "location". A location is a N x 2 matrix where N is the number of blocks of contiguous residues, each block being described with its start and end positions.

An helix location will be a matrix like [ [1,4] , [10,13] ], meaning that the first strand is between residues 1 and 4, and the second strand is between residues 10 and 13.

A base pair location will be a matrix like [ [3,3] , [11,11] ], meaning a base pair between residues 3 and 11.

From a pyrna.features.SecondaryStructure object, we can easily go back to a list of base pairs stored in a pandas Dataframe.

In [7]:
from pyrna.parsers import secondary_structure_to_base_pairs
print secondary_structure_to_base_pairs(ss)
  edge1 edge2 orientation  pos1  pos2
0     (     )           c     1    13
1     (     )           c     2    12
2     {     )           c     3    11
3     (     )           c     4    10

Now we can create a new secondary structure containing a single tertiary interaction. A tertiary interaction is defined as a base pair which is not contiguous to another base-pair.

In [8]:
rna = RNA(name = 'my_rna', sequence = 'GGGGGACGCAGTAACCCC')
bn = '(({(.(.....)..))))'
base_pairs = parse_bn(bn)
print base_pairs
  orientation edge1 edge2  pos1  pos2
0           c     (     )     6    12
1           c     (     )     4    15
2           c     {     )     3    16
3           c     (     )     2    17
4           c     (     )     1    18

The single tertiary interaction is stored in a pyrna.features.SecondaryStructure object.

In [9]:
ss = base_pairs_to_secondary_structure(rna, base_pairs)
for tertiary_interaction in ss.tertiary_interactions:
    print json.dumps(tertiary_interaction, indent = 2)
{
  "edge1": "(", 
  "edge2": ")", 
  "orientation": "c", 
  "location": [
    [
      6, 
      6
    ], 
    [
      12, 
      12
    ]
  ]
}

If we want to go back to a list of base pairs in a pandas Dataframe, we need to precise if we want to keep the tertiary interactions...

In [10]:
print secondary_structure_to_base_pairs(ss, keep_tertiaries=True)
  edge1 edge2 orientation  pos1  pos2
0     (     )           c     1    18
1     (     )           c     2    17
2     {     )           c     3    16
3     (     )           c     4    15
4     (     )           c     6    12

...or not

In [11]:
print secondary_structure_to_base_pairs(ss)
  edge1 edge2 orientation  pos1  pos2
0     (     )           c     1    18
1     (     )           c     2    17
2     {     )           c     3    16
3     (     )           c     4    15

Creation of secondary structures from files

PyRNA is able to parse several file formats describing secondary structures.

In [13]:
from pyrna.parsers import parse_vienna

h = open('../data/ft3100_2D_with_bracket_notation.fasta')
vienna_content = h.read()
h.close()

print vienna_content
>ft3100 FANTOM3
TAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCA
GAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATG
CAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT
.......((((.....(((((((.((...((.(((........))).)).
....)).))))...)))..((((..(((..(((....((((...(((((.
..))))).....))))..)))..))).......))))........)))).

In a Vienna file, several molecules and bracket notations can be stored. Consequently, the function parse_vienna() returns a list of RNA objects and a list of pandas Dataframes.

In [14]:
all_molecules, all_base_pairs = parse_vienna(vienna_content)

for base_pairs in all_base_pairs:
    print base_pairs
   orientation edge1 edge2  pos1  pos2
0            c     (     )    35    44
1            c     (     )    34    45
2            c     (     )    33    46
3            c     (     )    31    48
4            c     (     )    30    49
5            c     (     )    26    55
6            c     (     )    25    56
7            c     (     )    23    58
8            c     (     )    22    59
9            c     (     )    21    60
10           c     (     )    20    61
11           c     (     )    19    65
12           c     (     )    18    66
13           c     (     )    17    67
14           c     (     )    99   103
15           c     (     )    98   104
16           c     (     )    97   105
17           c     (     )    96   106
18           c     (     )    95   107
19           c     (     )    91   113
20           c     (     )    90   114
21           c     (     )    89   115
22           c     (     )    88   116
23           c     (     )    83   119
24           c     (     )    82   120
25           c     (     )    81   121
26           c     (     )    78   124
27           c     (     )    77   125
28           c     (     )    76   126
29           c     (     )    73   134
30           c     (     )    72   135
31           c     (     )    71   136
32           c     (     )    70   137
33           c     (     )    11   146
34           c     (     )    10   147
35           c     (     )     9   148
36           c     (     )     8   149

Creation of secondary structures from databases

PyRNA allows to recover Rfam entries directly from the database website.

In [20]:
from pyrna.db import Rfam
rfam = Rfam(use_website = True)

The function get_entry() returns:

  • a list of gapped RNA objects
  • a dictionary to map species labels to name/start-end
  • a pandas Dataframe listing the paired columns in the Rfam alignment (a.k.a a consensus secondary structure)
In [22]:
gapped_rnas, organism_names_2_nse, consensus_2d = rfam.get_entry(rfam_id='RF00059')

for gapped_rna in gapped_rnas[:10]:
    print "sequence for %s: %s"%(gapped_rna.name, gapped_rna.sequence)
sequence for AE014074.1/464992-464896: UAUUUCACAAAGGAGUGCUU---------------------------------------------------------UG-GCUGAGAUCGCAA--------------------------------------------------UUGCGAAA-UCCUGAGG-ACCUGA-UCUUGUUAGUACAAGCG-UAGGGA--UUGUGACCAAUAAUCAA
sequence for AL935263.2/99249-99354: UUUAAACACUAGGGGUGUCCAAAA---------------------------------------------------AUGG-GCUGAGAUGGUGCUGUAA---------------------------------------------GUACCGAU-CCCUUUGA-ACCUG--U-AAGCUCAAACUUGCG-UAGGAA--AGUGUCACAGCUAAUGU
sequence for BX248356.1/232868-232999: UUUAUAAAUCACGGGUGCUGGACGGCAUACGUUUGCC-------------------------------------ACAAA-GCUGAGACAGGGCGAGAAGACGUGCACG-----------------------------------UCCCUGAA-CCGUUGA--ACCUGA-UCCGGGUAAUACCGGCGAUAGGAA--GAAUAAUGAACCGAUCG
sequence for AK120238.1/2075-2184: CUAUGUUAGGAGGUGGCCUCUUGGCCUGGAUUGUUGUGA-----------------------------------AUUGG-GCUGAGAA------------------------------------------------------------AGU-CCCUUUGA-ACCUGA-ACAGGAUAAUGCCUGCG-AAGGGA-GUGUGCAUUUCUACUUUU
sequence for X54035.1/52-153: GAUGACCACAAGGGGAGCAUU-------------------------------------------------------AAA-GCUGAGAGUGAGCGGUUUC--------------------------------------------GUUC-UGA-CCCUUUGA-ACCUG--U-UAGUUAACGCUGGCG-UAGGGA--UGUGGCAAAUGCAAUAG
sequence for AE008691.1/1524796-1524694: CUCAAGUGCUAGGGGAGCCAAA-----------------------------------------------------AAUG-GCUGAGAGGGGAUUC------------------------------------------------GUUCCCGA-CCCUUAGA-ACCUGA-CCAGGGUAAUGCCUGCG-AAGGGA--AGCACGUUUUACGAAAU
sequence for AB087258.1/15478-15568: CCGACUUAGUCGGGGUGCUGAU-----------------------------------------------------AACA-GCUGAGAUA-----------------------------------------------------------AUA-CCCGUGA--ACCUGA-UACAGUUAAUACUGACG-UAGGAA--ACUAAUGGUCUUACUCC
sequence for AE008691.1/2140374-2140272: UCAAAGUGCUAGGGGAGCCAGA-----------------------------------------------------AAUG-GCUGAGAGGGGAUUC------------------------------------------------GUUCCCGA-CCCUUAGA-ACCUGA-CCAGGGUAAUGCCUGCG-AAGGGA--AGCACGUUUUACGGAAU
sequence for AL939111.1/162719-162830: CAAGGCACUCGCGGGAGCCCGGACGC------------------------------------------------ACCGG-GCUGAGAGGGAGGCUGGCG--------------------------------------------GCCUCCGA-CCGUACGA-ACCUGA-UCCGGGUCAUGCCGGCG-AAGGGA--GGGGCUGGACGCCCAUG
sequence for AE017221.1/302202-302311: GGGGCAGGCUAGGGGUGCCCGAAUG-------------------------------------------------GAAGG-GCUGAGAGCUGGGUUUCU---------------------------------------------CCCAGCAA-CCCUUGGA-ACCUGA-UCCGGGUAAUGCCGGCG-GAGGGA--AGCCUAUGCGGAAGACC

The function consensus2d_to_base_pairs() allows to easily infer the secondary structure for a given RNA sequence

In [23]:
from pyrna.parsers import consensus2d_to_base_pairs, to_bn

for gapped_rna in gapped_rnas[:10]:
    rna = RNA(name = gapped_rna.name, sequence = gapped_rna.sequence.replace('-',''))
    print rna.name
    print rna.sequence
    print to_bn(consensus2d_to_base_pairs(gapped_rna, consensus_2d), len(rna))
AE014074.1/464992-464896
UAUUUCACAAAGGAGUGCUUUGGCUGAGAUCGCAAUUGCGAAAUCCUGAGGACCUGAUCUUGUUAGUACAAGCGUAGGGAUUGUGACCAAUAAUCAA
.....[[[[[((((..(((()))).....(((((())))))..)))).....((((..((((......))))..))))..]]]]]............
AL935263.2/99249-99354
UUUAAACACUAGGGGUGUCCAAAAAUGGGCUGAGAUGGUGCUGUAAGUACCGAUCCCUUUGAACCUGUAAGCUCAAACUUGCGUAGGAAAGUGUCACAGCUAAUGU
.....[[[[[((((..(((((....))))).....((((((.....))))))..)))).....((((.(((......)))...))))..]]]]]............
BX248356.1/232868-232999
UUUAUAAAUCACGGGUGCUGGACGGCAUACGUUUGCCACAAAGCUGAGACAGGGCGAGAAGACGUGCACGUCCCUGAACCGUUGAACCUGAUCCGGGUAAUACCGGCGAUAGGAAGAAUAAUGAACCGAUCG
.....[[[[[((((..(((((..................))))).....((((((...............))))))..))))....((((..((((......))))...))))..]]]]]............
AK120238.1/2075-2184
CUAUGUUAGGAGGUGGCCUCUUGGCCUGGAUUGUUGUGAAUUGGGCUGAGAAAGUCCCUUUGAACCUGAACAGGAUAAUGCCUGCGAAGGGAGUGUGCAUUUCUACUUUU
.....[[[[[((((..(((((....................))))).....()..)))).....((((..((((......))))..))))...]]]]]............
X54035.1/52-153
GAUGACCACAAGGGGAGCAUUAAAGCUGAGAGUGAGCGGUUUCGUUCUGACCCUUUGAACCUGUUAGUUAACGCUGGCGUAGGGAUGUGGCAAAUGCAAUAG
.....[[[[[((((..((((())))).....(.((((......)))))..)))).....((((.(((......)))...))))..]]]]]............
AE008691.1/1524796-1524694
CUCAAGUGCUAGGGGAGCCAAAAAUGGCUGAGAGGGGAUUCGUUCCCGACCCUUAGAACCUGACCAGGGUAAUGCCUGCGAAGGGAAGCACGUUUUACGAAAU
.....[[[[[((((..(((((..))))).....((((((..))))))..)))).....((((..((((......))))..))))..]]]]]............
AB087258.1/15478-15568
CCGACUUAGUCGGGGUGCUGAUAACAGCUGAGAUAAUACCCGUGAACCUGAUACAGUUAAUACUGACGUAGGAAACUAAUGGUCUUACUCC
.....[[[[[((((..(((((..))))).....(.)..))))....((((..((((......))))..))))..]]]]]............
AE008691.1/2140374-2140272
UCAAAGUGCUAGGGGAGCCAGAAAUGGCUGAGAGGGGAUUCGUUCCCGACCCUUAGAACCUGACCAGGGUAAUGCCUGCGAAGGGAAGCACGUUUUACGGAAU
.....[[[[[((((..(((((..))))).....((((((..))))))..)))).....((((..((((......))))..))))..]]]]]............
AL939111.1/162719-162830
CAAGGCACUCGCGGGAGCCCGGACGCACCGGGCUGAGAGGGAGGCUGGCGGCCUCCGACCGUACGAACCUGAUCCGGGUCAUGCCGGCGAAGGGAGGGGCUGGACGCCCAUG
.....[[[[[((((..(((((.......))))).....((((((......))))))..)))).....((((..((((......))))..))))..]]]]]............
AE017221.1/302202-302311
GGGGCAGGCUAGGGGUGCCCGAAUGGAAGGGCUGAGAGCUGGGUUUCUCCCAGCAACCCUUGGAACCUGAUCCGGGUAAUGCCGGCGGAGGGAAGCCUAUGCGGAAGACC
.....[[[[[((((..(((((......))))).....((((((.....))))))..)))).....((((..((((......))))..))))..]]]]]............

Creation of secondary structures from algorithms

In [24]:
from pyrna.computations import Rnafold
from pyrna.features import RNA
rna = RNA(name = 'my_rna', sequence = 'GGGGTAGGGACGGTAGGGGGACGCAGTGCAGTAACGTACCCGGTAGGGGGTAGGGGGACGCAGTAACCCCGGGGACGCAGTAACCCCACGCAGTAACCCC')
rnafold = Rnafold() #the algorithm is launched locally, using a Docker image
base_pairs = rnafold.fold(rna)
print base_pairs
   orientation edge1 edge2  pos1  pos2
0            c     (     )    41    48
1            c     (     )    40    49
2            c     (     )    39    50
3            c     (     )    38    51
4            c     (     )    37    52
5            c     (     )    56    67
6            c     (     )    55    68
7            c     (     )    54    69
8            c     (     )    53    70
9            c     (     )    35    72
10           c     (     )    32    75
11           c     (     )    31    76
12           c     (     )    29    77
13           c     (     )    28    78
14           c     (     )    27    79
15           c     (     )    24    80
16           c     (     )    23    81
17           c     (     )    20    84
18           c     (     )    19    85
19           c     (     )    18    86
20           c     (     )    17    87
21           c     (     )    14    90
22           c     (     )    13    91
23           c     (     )    11    93
24           c     (     )    10    94
25           c     (     )     5    96
26           c     (     )     4    97
27           c     (     )     3    98
28           c     (     )     2    99
29           c     (     )     1   100

We can also compute the 2D coordinates for a plot.

In [27]:
from pyrna.computations import Rnaplot
rnaplot = Rnaplot()
plot = rnaplot.plot(secondary_structure = base_pairs, rna = rna)
print plot
           x        y
1     51.114  -18.631
2     51.114   -3.631
3     51.114   11.369
4     51.114   26.369
5     51.114   41.369
6     43.659   46.738
7     39.542   55.107
8     39.850   64.568
9     44.640   72.894
10    52.900   78.063
11    54.298   92.998
12    47.780  106.391
13    56.770  119.410
14    58.167  134.345
15    48.494  145.613
16    49.699  160.779
17    61.583  170.848
18    62.981  185.783
19    64.379  200.718
20    65.777  215.652
21    56.104  226.921
22    57.308  242.086
23    69.193  252.156
24    70.590  267.091
25    68.168  271.394
26    68.550  277.290
27    72.568  282.901
28    74.891  297.719
29    77.215  312.538
30    75.104  319.761
..       ...      ...
71   121.623  385.454
72   104.173  375.762
73   112.813  362.801
74   109.780  347.892
75    97.371  339.735
76    94.587  324.996
77    92.034  310.215
78    89.710  295.396
79    87.387  280.577
80    85.525  265.693
81    84.128  250.758
82    93.937  238.658
83    92.307  223.532
84    80.711  214.255
85    79.314  199.320
86    77.916  184.385
87    76.518  169.451
88    86.328  157.351
89    84.698  142.225
90    73.102  132.947
91    71.704  118.012
92    78.122  103.551
93    69.233   91.600
94    67.835   76.666
95    78.142   58.016
96    66.114   41.369
97    66.114   26.369
98    66.114   11.369
99    66.114   -3.631
100   66.114  -18.631

[100 rows x 2 columns]

We can also compute a secondary structure by annotating a tertiary one. First, we need to import a 3D structure from the Protein Databank.

In [28]:
from pyrna.db import PDB
from pyrna.parsers import parse_pdb
pdb = PDB()
tertiary_structures = parse_pdb(pdb.get_entry('1HR2'))

With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Consequently, the function parse_pdb returns a list of such objects. We can iterate over these pyrna.features.TertiaryStructure objects to annotate them with the algorithm Rnaview (Rnaview is available for computations from my own server).

In [29]:
secondary_structures = []
for ts in tertiary_structures:
    #the function annotate() from Rnaview returns a pyrna.features.SecondaryStructure object and its 3D counterpart as a pyrna.features.TertiaryStructure object
    secondary_structure, tertiary_structure = Rnaview().annotate(ts)
    secondary_structures.append(secondary_structure)
    #the function secondary_structure_to_base_pairs() transform a pyrna.features.SecondaryStructure object into a list of base pairs stored in a pandas Dataframe
    print "Molecular chain %s"%secondary_structure.rna.name
    print secondary_structure_to_base_pairs(secondary_structure, keep_tertiaries = True)
Molecular chain A
   edge1 edge2 orientation  pos1  pos2
0      (     )           c     4   112
1      (     )           c     5   111
2      (     )           c     6   110
3      (     )           c     7   109
4      (     )           c     8   108
5      (     )           c     9   107
6      (     )           c    10   106
7      {     ]           t    11   105
8      [     }           t    12   104
9      (     )           c    14   103
10     (     )           c    15   102
11     (     )           c    16   101
12     (     )           c    17   100
13     (     )           c    18    99
14     (     )           c    19    98
15     (     )           c    24    94
16     (     )           c    25    93
17     (     )           c    26    92
18     (     )           c    27    91
19     (     )           c    30    89
20     (     )           c    31    88
21     (     )           c    32    87
22     (     )           c    34    80
23     (     )           c    35    79
24     (     )           c    36    78
25     [     }           t    37    62
26     [     }           t    38    61
27     (     )           c    39    60
28     (     )           c    40    59
29     (     )           c    41    58
..   ...   ...         ...   ...   ...
53     (     ]           t    21    96
54     (     }           t    22    99
55     (     ]           c    33    85
56     {     ]           t    48    51
57     (     )           t    49   145
58     {     }           t    51   147
59     [     )           t    66    86
60     {     ]           c    69    70
61     {     }           t    79    84
62     {     }           t    82   109
63     {     }           c    95    98
64     {     ]           c   114   115
65     (     ]           t   121   145
66     {     ]           c   122   123
67     (     )           c   123   146
68     !     !           !    21    98
69     !     !           !   111   157
70     !     !           !   124   146
71     !     !           !     9    66
72     !     !           !    19    21
73     !     !           !    50   147
74     !     !           !    61    75
75     !     !           !     1   115
76     !     !           !     8    81
77     !     !           !    24   100
78     !     !           !    50   121
79     !     !           !    51   120
80     !     !           !    52   148
81     !     !           !    77    78
82     !     !           !   112   157

[83 rows x 5 columns]
Molecular chain B
   edge1 edge2 orientation  pos1  pos2
0      (     )           c     2   115
1      (     )           c     3   114
2      (     )           c     4   113
3      (     )           c     6   112
4      (     )           c     7   111
5      (     )           c     8   110
6      (     )           c     9   109
7      (     )           c    10   108
8      (     )           c    11   107
9      {     ]           t    12   106
10     [     }           t    13   105
11     (     )           c    15   104
12     (     )           c    16   103
13     (     )           c    17   102
14     (     )           c    18   101
15     (     )           c    19   100
16     (     )           c    20    99
17     (     )           c    25    95
18     (     )           c    26    94
19     (     )           c    27    93
20     (     )           c    28    92
21     (     )           c    30    91
22     (     )           c    31    90
23     (     )           c    32    89
24     (     )           c    33    88
25     (     )           c    35    81
26     (     )           c    36    80
27     (     )           c    37    79
28     [     }           t    38    63
29     [     }           t    39    62
..   ...   ...         ...   ...   ...
59     [     )           c     7   158
60     {     }           c     9    82
61     {     )           t    10    67
62     (     }           t    23   100
63     (     ]           c    34    86
64     {     ]           t    49    52
65     (     )           t    50   146
66     {     }           t    52   148
67     [     )           t    67    87
68     {     ]           c    70    71
69     {     }           t    80    85
70     {     }           t    83   110
71     {     }           c    96    99
72     {     ]           c   116   117
73     (     ]           t   122   146
74     {     ]           c   123   124
75     (     )           c   124   147
76     !     !           !    14   105
77     !     !           !   125   147
78     !     !           !    10    87
79     !     !           !    20    22
80     !     !           !    23    24
81     !     !           !    51   148
82     !     !           !    62    76
83     !     !           !     8    83
84     !     !           !    25   101
85     !     !           !    51   122
86     !     !           !    52   121
87     !     !           !    53   149
88     !     !           !    98    99

[89 rows x 5 columns]

Mining of a Secondary Structure

Now we can search for all base pairs made with at least one Hoogsteen edge. Since the variable base_pairs stores a pandas Dataframe, we're using the operators available from pandas (http://pandas.pydata.org/pandas-docs/stable/indexing.html)

In [30]:
base_pairs = secondary_structure_to_base_pairs(secondary_structures[0], keep_tertiaries = True)
print base_pairs[(base_pairs['edge1'] == '[') | (base_pairs['edge2'] == ']')]
   edge1 edge2 orientation  pos1  pos2
7      {     ]           t    11   105
8      [     }           t    12   104
25     [     }           t    37    62
26     [     }           t    38    61
52     [     )           c     4   156
53     (     ]           t    21    96
55     (     ]           c    33    85
56     {     ]           t    48    51
59     [     )           t    66    86
60     {     ]           c    69    70
64     {     ]           c   114   115
65     (     ]           t   121   145
66     {     ]           c   122   123

A pyrna.features.SecondaryStructure can find all its junctions (meaning the group of single-strands linking helices).

According to the number of single-strands making a junction, we distinguish:

  • an apical looop (a single single-strand)
  • an inner loop (two single-strands)
  • an three-way junction (three single-strands)
  • a four-way junction (four single-strands)
  • a n-way junction (n single-strands)
In [31]:
junctions = []
for ss in secondary_structures:
    ss.find_junctions()
    junctions += ss.junctions

A junction is described with:

  • a list of single-strands making the junction
  • a location: the positions in sequence of the residues
  • a description: a string made with the single-strand sequences joined with single space characters. In the description of a junction, the character '-' means a direct link (phosphodiester bond) between two helices. Consequently, we have no single-strand linking these helices.
In [32]:
print junctions[0]
{'single_strands': [{'name': 'SS_2', 'location': [13, 13]}], 'description': 'A -', 'location': [[12, 14], [103, 104]]}

Now we can search for all GNRA apical loops...

In [33]:
import re
for junction in junctions:
    if re.match('G[AUGC][AG]A',junction['description']):
        print "Apical loop with sequence %s"%junction['description']
Apical loop with sequence GAAA
Apical loop with sequence GAAA

Or for all three-way junctions...

In [34]:
for junction in junctions:
    if len(junction['location']) == 3:
        print "Three-way junctions with sequence %s"%junction['description']
Three-way junctions with sequence GUAU - -
Three-way junctions with sequence GUAU - -