(Specific exercise at: http://rosalind.info/problems/lexf/)
This exercise sounds very much like one I've done earlier: Enumerating gene orders (PERM).
It seems like the only differences are:
So now I should be able to do basically the same thing as in PERM.
import itertools #this library is going to do all the heavy lifting
def read_alphabet_and_length(input_file):
"""
Given a path to a text file with space-separated letters on the first line,
and a number on the second line,
return the 'alphabet' as list and the number as 'length'.
"""
first_line = True #use a trick to separate the first and second line
with open(input_file, "r") as read_file:
for line in read_file:
if first_line:
alphabet = line.split()
first_line = False
else:
length = int(line.strip())
return alphabet, length
example_file = "data/Example_enumerating_k-mers_lexicographically.txt"
(example_alphabet, example_length) = read_alphabet_and_length(example_file)
print("Example alphabet: %s\nExample length: %i" %(
example_alphabet, example_length)
)
Example alphabet: ['A', 'C', 'G', 'T'] Example length: 2
Alright, that works. Now let's see if I can uses the itertools.permutations()
function to make some combinations.
print(list(itertools.permutations(example_alphabet, example_length)))
[('A', 'C'), ('A', 'G'), ('A', 'T'), ('C', 'A'), ('C', 'G'), ('C', 'T'), ('G', 'A'), ('G', 'C'), ('G', 'T'), ('T', 'A'), ('T', 'C'), ('T', 'G')]
Right, that seems fine. Now I need to convert this list of tuples into separate lines of strings...
for letter_combination in list(itertools.permutations(example_alphabet, example_length)):
print("".join(list(letter_combination)))
AC AG AT CA CG CT GA GC GT TA TC TG
And with that I'm practically done, right?
Hold on..., this exercise is again not entirely clear about how the output should be ordered...
First it says the provided 'alphabet' will be ordered (which implies it may be any hypothetical order).
Then it says the output should be ordered alphabetically?
I will for now just go with whatever is provided and see how that works out...
test_file = "data/rosalind_lexf.txt"
(test_alphabet, test_length) = read_alphabet_and_length(test_file)
for string in list(itertools.permutations(test_alphabet, test_length)):
print("".join(list(string)))
ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA BDAC BDCA CABD CADB CBAD CBDA CDAB CDBA DABC DACB DBAC DBCA DCAB DCBA
Hm... So this was wrong.
Let's check what the input was:
print("Test alphabet: %s\nTest length: %i" %(
test_alphabet, test_length)
)
Test alphabet: ['A', 'B', 'C', 'D'] Test length: 4
That looks right... What may have gone wrong then?
Oh yes, I see now that my example went wrong too. The answer I got is too short.
I miss all the repeats! Now how can I include those, too...?
for string in list(itertools.combinations_with_replacement(test_alphabet, test_length)):
print("".join(list(string)))
AAAA AAAB AAAC AAAD AABB AABC AABD AACC AACD AADD ABBB ABBC ABBD ABCC ABCD ABDD ACCC ACCD ACDD ADDD BBBB BBBC BBBD BBCC BBCD BBDD BCCC BCCD BCDD BDDD CCCC CCCD CCDD CDDD DDDD
So I used the wrong function, this one looks more like it!
Let's try the exercise again.
second_test_file = "data/rosalind_lexf2.txt"
(second_test_alphabet, second_test_length) = read_alphabet_and_length(second_test_file)
for string in list(itertools.combinations_with_replacement(
second_test_alphabet, second_test_length)):
print("".join(list(string)))
AA AB AC AD AE AF AG AH AI AJ BB BC BD BE BF BG BH BI BJ CC CD CE CF CG CH CI CJ DD DE DF DG DH DI DJ EE EF EG EH EI EJ FF FG FH FI FJ GG GH GI GJ HH HI HJ II IJ JJ
Oh. This is still wrong.
I thought this would be easy, but it turns out a bit more complicated...
print("Second test alphabet: %s\nSecond test length: %i" %(
second_test_alphabet, second_test_length)
)
Second test alphabet: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] Second test length: 2
Oh, I see... When the 'A' has been used, it is no longer used in combinations with 'B', and so on.
So I need even more repeats than this function does by default.
(Check this page for more info on the functions I tried so far.)
I will get back to this exercise and try it again later.
Update: it looks like I need the product()
function from itertools! (Also see: https://docs.python.org/2/library/itertools.html#itertools.product)
for string in list(itertools.product(
second_test_alphabet, repeat=second_test_length)):
print("".join(list(string)))
AA AB AC AD AE AF AG AH AI AJ BA BB BC BD BE BF BG BH BI BJ CA CB CC CD CE CF CG CH CI CJ DA DB DC DD DE DF DG DH DI DJ EA EB EC ED EE EF EG EH EI EJ FA FB FC FD FE FF FG FH FI FJ GA GB GC GD GE GF GG GH GI GJ HA HB HC HD HE HF HG HH HI HJ IA IB IC ID IE IF IG IH II IJ JA JB JC JD JE JF JG JH JI JJ
So now that I think I know how to do it, let's try it the third time:
third_test_file = "data/rosalind_lexf3.txt"
(third_test_alphabet, third_test_length) = read_alphabet_and_length(third_test_file)
for string in list(itertools.product(
third_test_alphabet, repeat=third_test_length)):
print("".join(list(string)))
AAAA AAAB AAAC AAAD AAAE AABA AABB AABC AABD AABE AACA AACB AACC AACD AACE AADA AADB AADC AADD AADE AAEA AAEB AAEC AAED AAEE ABAA ABAB ABAC ABAD ABAE ABBA ABBB ABBC ABBD ABBE ABCA ABCB ABCC ABCD ABCE ABDA ABDB ABDC ABDD ABDE ABEA ABEB ABEC ABED ABEE ACAA ACAB ACAC ACAD ACAE ACBA ACBB ACBC ACBD ACBE ACCA ACCB ACCC ACCD ACCE ACDA ACDB ACDC ACDD ACDE ACEA ACEB ACEC ACED ACEE ADAA ADAB ADAC ADAD ADAE ADBA ADBB ADBC ADBD ADBE ADCA ADCB ADCC ADCD ADCE ADDA ADDB ADDC ADDD ADDE ADEA ADEB ADEC ADED ADEE AEAA AEAB AEAC AEAD AEAE AEBA AEBB AEBC AEBD AEBE AECA AECB AECC AECD AECE AEDA AEDB AEDC AEDD AEDE AEEA AEEB AEEC AEED AEEE BAAA BAAB BAAC BAAD BAAE BABA BABB BABC BABD BABE BACA BACB BACC BACD BACE BADA BADB BADC BADD BADE BAEA BAEB BAEC BAED BAEE BBAA BBAB BBAC BBAD BBAE BBBA BBBB BBBC BBBD BBBE BBCA BBCB BBCC BBCD BBCE BBDA BBDB BBDC BBDD BBDE BBEA BBEB BBEC BBED BBEE BCAA BCAB BCAC BCAD BCAE BCBA BCBB BCBC BCBD BCBE BCCA BCCB BCCC BCCD BCCE BCDA BCDB BCDC BCDD BCDE BCEA BCEB BCEC BCED BCEE BDAA BDAB BDAC BDAD BDAE BDBA BDBB BDBC BDBD BDBE BDCA BDCB BDCC BDCD BDCE BDDA BDDB BDDC BDDD BDDE BDEA BDEB BDEC BDED BDEE BEAA BEAB BEAC BEAD BEAE BEBA BEBB BEBC BEBD BEBE BECA BECB BECC BECD BECE BEDA BEDB BEDC BEDD BEDE BEEA BEEB BEEC BEED BEEE CAAA CAAB CAAC CAAD CAAE CABA CABB CABC CABD CABE CACA CACB CACC CACD CACE CADA CADB CADC CADD CADE CAEA CAEB CAEC CAED CAEE CBAA CBAB CBAC CBAD CBAE CBBA CBBB CBBC CBBD CBBE CBCA CBCB CBCC CBCD CBCE CBDA CBDB CBDC CBDD CBDE CBEA CBEB CBEC CBED CBEE CCAA CCAB CCAC CCAD CCAE CCBA CCBB CCBC CCBD CCBE CCCA CCCB CCCC CCCD CCCE CCDA CCDB CCDC CCDD CCDE CCEA CCEB CCEC CCED CCEE CDAA CDAB CDAC CDAD CDAE CDBA CDBB CDBC CDBD CDBE CDCA CDCB CDCC CDCD CDCE CDDA CDDB CDDC CDDD CDDE CDEA CDEB CDEC CDED CDEE CEAA CEAB CEAC CEAD CEAE CEBA CEBB CEBC CEBD CEBE CECA CECB CECC CECD CECE CEDA CEDB CEDC CEDD CEDE CEEA CEEB CEEC CEED CEEE DAAA DAAB DAAC DAAD DAAE DABA DABB DABC DABD DABE DACA DACB DACC DACD DACE DADA DADB DADC DADD DADE DAEA DAEB DAEC DAED DAEE DBAA DBAB DBAC DBAD DBAE DBBA DBBB DBBC DBBD DBBE DBCA DBCB DBCC DBCD DBCE DBDA DBDB DBDC DBDD DBDE DBEA DBEB DBEC DBED DBEE DCAA DCAB DCAC DCAD DCAE DCBA DCBB DCBC DCBD DCBE DCCA DCCB DCCC DCCD DCCE DCDA DCDB DCDC DCDD DCDE DCEA DCEB DCEC DCED DCEE DDAA DDAB DDAC DDAD DDAE DDBA DDBB DDBC DDBD DDBE DDCA DDCB DDCC DDCD DDCE DDDA DDDB DDDC DDDD DDDE DDEA DDEB DDEC DDED DDEE DEAA DEAB DEAC DEAD DEAE DEBA DEBB DEBC DEBD DEBE DECA DECB DECC DECD DECE DEDA DEDB DEDC DEDD DEDE DEEA DEEB DEEC DEED DEEE EAAA EAAB EAAC EAAD EAAE EABA EABB EABC EABD EABE EACA EACB EACC EACD EACE EADA EADB EADC EADD EADE EAEA EAEB EAEC EAED EAEE EBAA EBAB EBAC EBAD EBAE EBBA EBBB EBBC EBBD EBBE EBCA EBCB EBCC EBCD EBCE EBDA EBDB EBDC EBDD EBDE EBEA EBEB EBEC EBED EBEE ECAA ECAB ECAC ECAD ECAE ECBA ECBB ECBC ECBD ECBE ECCA ECCB ECCC ECCD ECCE ECDA ECDB ECDC ECDD ECDE ECEA ECEB ECEC ECED ECEE EDAA EDAB EDAC EDAD EDAE EDBA EDBB EDBC EDBD EDBE EDCA EDCB EDCC EDCD EDCE EDDA EDDB EDDC EDDD EDDE EDEA EDEB EDEC EDED EDEE EEAA EEAB EEAC EEAD EEAE EEBA EEBB EEBC EEBD EEBE EECA EECB EECC EECD EECE EEDA EEDB EEDC EEDD EEDE EEEA EEEB EEEC EEED EEEE
That did the trick. I was using the wrong function before...