(Specific exercise can be found at: http://rosalind.info/problems/prtm/)
The exercise is about calculating the molecular weight of a protein
The protein is represented as an amino acid sequence (a string of letters)
Molecular weights per amino acid are given in a table of monoisotopic masses
The practical side of the exercise comes down to reading the table with masses and then translating the letters from a given sequence into numbers using the table and adding the numbers up.
I think I can do this in three functions:
Read the monoisotopic mass table and convert to a dictionary
Read the text file with the amino acid sequence
Take the amino acid sequence and mass table to calculate the mass
def read_monoisotopic_mass_table(input_file):
"""
Given a tab-separatedd input file with amino acids (as capital letters)
in the first column, and molecular weights (as floating point numbers)
in the second column - create a dictionary with the amino acids as keys
and their respective weights as values.
"""
mass_dict = {}
with open(input_file, "r") as read_file:
for line in read_file:
elements = line.split()
amino_acid = str(elements[0])
weight = float(elements[1])
mass_dict[amino_acid] = weight
return mass_dict
mass_dict = read_monoisotopic_mass_table("data/monoisotopic_mass_table.tsv")
print(mass_dict)
{'A': 71.03711, 'C': 103.00919, 'D': 115.02694, 'E': 129.04259, 'F': 147.06841, 'G': 57.02146, 'H': 137.05891, 'I': 113.08406, 'K': 128.09496, 'L': 113.08406, 'M': 131.04049, 'N': 114.04293, 'P': 97.05276, 'Q': 128.05858, 'R': 156.10111, 'S': 87.03203, 'T': 101.04768, 'V': 99.06841, 'W': 186.07931, 'Y': 163.06333}
So far so good, now make the second function:
def read_amino_acid_sequence(input_file):
"""
Read a text file with an amino acid sequence and
return the sequence as string.
"""
with open(input_file, "r") as read_file:
for line in read_file:
amino_acids = str(line.strip())
#Note: the .strip() is required to remove the
# newline, which otherwise would be interpreted
# as amino acid!
return amino_acids
example_protein = read_amino_acid_sequence("data/Example_calculating_protein_mass.txt")
print(example_protein)
SKADYEK
Now that works as well, time to make the final function: the one that converts the amino acid sequence to its weight.
def calculate_protein_weight(protein, mass_table):
"""
Given a protein sequence as string and a mass table as dictionary
(with amino acids as keys and their respective weights as values),
calculate the molecular weight of the protein by summing up the
weight of each amino acid in the protein.
"""
total_weight = 0
for amino_acid in protein:
weight = mass_table[amino_acid]
total_weight += weight
return total_weight
calculate_protein_weight(example_protein, mass_dict)
821.3919199999999
Now this answer looks good, except the rounding of the decimals is slightly different from the example on rosalind.info... Perhaps I should just round the answer to 3 decimals?
round(calculate_protein_weight(example_protein, mass_dict), 3)
821.392
Perfect! Now let me just overwrite the function to incorporate the rounding:
def calculate_protein_weight(protein, mass_table):
"""
Given a protein sequence as string and a mass table as dictionary
(with amino acids as keys and their respective weights as values),
calculate the molecular weight of the protein by summing up the
weight of each amino acid in the protein.
"""
total_weight = 0
for amino_acid in protein:
weight = mass_table[amino_acid]
total_weight += weight
return round(total_weight, 3)
And let's give the actual exercise a shot with this!
test_protein = read_amino_acid_sequence("data/rosalind_prtm.txt")
molecular_weight = calculate_protein_weight(test_protein, mass_dict)
print(molecular_weight)
103133.769
It worked. The problem has been solved.