This is a simple notebook to test the implementation of the BWT.
from Burrows_Wheeler_Transform import transform
from Burrows_Wheeler_Transform import inverse
import time
First we test the correctness of the implementation for the transform using some known BWT. We can have as input a string with or without '$'
finalString = transform('Banana')
if finalString == 'a$nnBaa':
print('The BWT of the string Banana is: %s' %(finalString))
finalString = transform('mississippi$')
if finalString == 'ipssm$pissii':
print('The BWT of the string mississippi is: %s' %(finalString))
The BWT of the string Banana is: a$nnBaa The BWT of the string mississippi is: ipssm$pissii
Then we test the correctness of the implementation for the inverse of BWT
originalString = inverse('a$nnBaa')
if originalString == 'Banana':
print('The original string is: %s' %(originalString))
originalString = inverse('ipssm$pissii')
if originalString == 'mississippi':
print('The original string is: %s' %(originalString))
The original string is: Banana The original string is: mississippi
This is a list of common error that an user could make
# transform with more then one $
finalString = transform('mississippi$$')
if finalString == 'ipssm$pissii':
print('The BWT of the string mississippi is: %s' %(finalString))
--------------------------------------------------------------------------- invalid_string_error Traceback (most recent call last) <ipython-input-4-b7d159470ec6> in <module> 1 # transform with more then one $ ----> 2 finalString = transform('mississippi$$') 3 4 if finalString == 'ipssm$pissii': 5 print('The BWT of the string mississippi is: %s' %(finalString)) ~/Documents/Università/Magistrale/Primo anno/Secondo semestre/scientific programming/ProgrammingPython/Burrows_Wheeler_Transform/functions.py in transform(string) 18 # the string must contains only one termination char 19 elif string.count('$') > 1: ---> 20 raise invalid_string_error('The input ' + string + ' cointains more than one $' ) 21 22 # check if the last char contains the '$' otherwhise add it invalid_string_error: The input mississippi$$ cointains more than one $
# inverse string without the $ character
originalString = inverse('annBaa')
if originalString == 'Banana':
print('The original string is: %s' %(originalString))
--------------------------------------------------------------------------- invalid_string_error Traceback (most recent call last) <ipython-input-5-163491cab8ac> in <module> 1 # inverse string without the $ character ----> 2 originalString = inverse('annBaa') 3 if originalString == 'Banana': 4 print('The original string is: %s' %(originalString)) ~/Documents/Università/Magistrale/Primo anno/Secondo semestre/scientific programming/ProgrammingPython/Burrows_Wheeler_Transform/functions.py in inverse(bwt) 44 raise not_a_string_error('The input ' + bwt + ' is not a string') 45 elif bwt.count('$') != 1: ---> 46 raise invalid_string_error('The input ' + bwt + ' does not contain exactly one $ character') 47 48 n = len(bwt) invalid_string_error: The input annBaa does not contain exactly one $ character
Now that we have test the correctness of the implementation we will test the speed of this implementation, in particular we are interesting in smoothly handle string with 1000 characters.
with open('TestSample/oneThousandChar.txt') as file:
oneChar = file.read().replace('\n', '')
with open('TestSample/twoThousandChar.txt') as file:
twoChar = file.read().replace('\n', '')
with open('TestSample/twelveThousandChar.txt') as file:
twelveChar = file.read().replace('\n', '')
with open('TestSample/twentyFourChar.txt') as file:
twentyFourChar = file.read().replace('\n', '')
# compute the time for the transform of one thousand characters
print('The time for the transform is: ')
%time finalString = transform(oneChar)
# compute the time for the inverse of one thousand characters
print('The time for the inverse is: ')
%time inverseTransf = inverse(finalString)
The time for the transform is: CPU times: user 11.5 ms, sys: 4.61 ms, total: 16.1 ms Wall time: 14.4 ms The time for the inverse is: CPU times: user 23.5 ms, sys: 2.77 ms, total: 26.3 ms Wall time: 24.9 ms
# compute the time for the transform of two thousand characters
print('The time for the transform is: ')
%time finalString = transform(twoChar)
# compute the time for the inverse of two thousand characters
print('The time for the inverse is: ')
inverseTransf = inverse(finalString)
The time for the transform is: CPU times: user 36 ms, sys: 15.9 ms, total: 51.9 ms Wall time: 51 ms The time for the inverse is:
# compute the time for the transform of twelve thousand characters
print('The time for the transform is: ')
%time finalString = transform(twelveChar)
# compute the time for the inverse of twelve thousand characters
print('The time for the inverse is: ')
%time inverseTransf = inverse(finalString)
The time for the transform is: CPU times: user 1.19 s, sys: 470 ms, total: 1.66 s Wall time: 1.66 s The time for the inverse is: CPU times: user 1.61 s, sys: 10.8 ms, total: 1.63 s Wall time: 1.64 s
# compute the time for the transform of twenty-four thousand characters
print('The time for the transform is: ')
%time finalString = transform(twentyFourChar)
# compute the time for the inverse of twenty-four thousand characters
print('The time for the inverse is: ')
%time inverseTransf = inverse(finalString)
The time for the transform is: CPU times: user 1.21 s, sys: 451 ms, total: 1.66 s Wall time: 1.66 s The time for the inverse is: CPU times: user 1.6 s, sys: 7.79 ms, total: 1.61 s Wall time: 1.62 s