Let us write some algorithms to deal with the Vigenère cipher.
from itertools import cycle
def encode(s):
return tuple(ord(c) - 65 for c in s)
def decode(t):
return ''.join(chr(n + 65) for n in t)
def encrypt(plaintext, key):
return decode((p + k) % 26 for p, k in zip(encode(plaintext), cycle(encode(key))))
def decrypt(ciphertext, key):
return decode((c - k) % 26 for c, k in zip(encode(ciphertext), cycle(encode(key))))
def index_of_coincidence(s):
d = {}
for c in s:
d.setdefault(c, 0)
d[c] += 1
l = len(s)
return sum(n * (n-1) for n in d.values()) / (l * (l-1))
Let us encrypt a sample message.
p = 'LJUBLJANA'
k = 'FRI'
c = encrypt(p, k)
c
'QACGCRFEI'
We now compare the indices of coincidence for the plaintext and the ciphertext.
index_of_coincidence(p)
0.08333333333333333
index_of_coincidence(c)
0.027777777777777776
Note that the index of coincidence does not change when a Caesar cipher is used.
cc = encrypt(p, 'C')
cc
'NLWDNLCPC'
index_of_coincidence(cc)
0.08333333333333333
Let us now examine the following ciphertext. Using the Kasiski test, we have determined that the most likely key length is $3$.
ciphertext = "NKASFBBYIYPWZCWTBIYKPFKUFKBJIANKABYIYPWZJMJ"
index_of_coincidence(ciphertext)
0.058693244739756366
We now break the ciphertext into what we expect to be three Caesar-encrypted sub-ciphertexts.
C = [ciphertext[i::3] for i in range(3)]
C
['NSBYZTYFFJNBYZJ', 'KFYPCBKKKIKYPJ', 'ABIWWIPUBAAIWM']
[index_of_coincidence(y) for y in C]
[0.0761904761904762, 0.13186813186813187, 0.10989010989010989]
Let us now compute the indices of coincidence we obtain when concatenating one of these strings with each Caesar-encryption of each of the other two. We list the offsets sorted by the index of coincidence.
sorted(((index_of_coincidence(C[0] + decrypt(C[1], chr(65+i))), i) for i in range(26)), reverse=True)
[(0.09359605911330049, 23), (0.09113300492610837, 11), (0.08866995073891626, 12), (0.08374384236453201, 9), (0.07881773399014778, 17), (0.07881773399014778, 5), (0.07881773399014778, 1), (0.07881773399014778, 0), (0.07389162561576355, 10), (0.07142857142857142, 18), (0.06896551724137931, 22), (0.06896551724137931, 6), (0.0665024630541872, 15), (0.0665024630541872, 4), (0.0665024630541872, 3), (0.06403940886699508, 25), (0.06403940886699508, 19), (0.06403940886699508, 16), (0.06403940886699508, 14), (0.06403940886699508, 2), (0.06157635467980296, 7), (0.05665024630541872, 8), (0.054187192118226604, 21), (0.05172413793103448, 13), (0.04926108374384237, 24), (0.04926108374384237, 20)]
sorted(((index_of_coincidence(C[0] + decrypt(C[2], chr(65+i))), i) for i in range(26)), reverse=True)
[(0.09359605911330049, 21), (0.08620689655172414, 3), (0.08374384236453201, 2), (0.0812807881773399, 17), (0.07881773399014778, 25), (0.07881773399014778, 13), (0.07881773399014778, 9), (0.07635467980295567, 7), (0.07142857142857142, 10), (0.0665024630541872, 24), (0.0665024630541872, 14), (0.06403940886699508, 22), (0.06157635467980296, 23), (0.06157635467980296, 1), (0.05665024630541872, 16), (0.05665024630541872, 15), (0.05665024630541872, 8), (0.054187192118226604, 18), (0.054187192118226604, 11), (0.054187192118226604, 0), (0.05172413793103448, 19), (0.05172413793103448, 4), (0.04926108374384237, 6), (0.046798029556650245, 20), (0.04433497536945813, 12), (0.04433497536945813, 5)]
First, we try all Caesar-decryptions with the offsets giving the highest indices of coincidence. Unfortunately, this does not seem to work.
ccc = ''.join((''.join(t) for t in zip(C[0], decrypt(C[1], chr(65+23)), decrypt(C[2], chr(65+23)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]
['NNDSIEBBLYSZZFZTELYNSFNXFNEJLDNNDBBLYSZZMP', 'MMCRHDAAKXRYYEYSDKXMREMWEMDIKCMMCAAKXRYYLO', 'LLBQGCZZJWQXXDXRCJWLQDLVDLCHJBLLBZZJWQXXKN', 'KKAPFBYYIVPWWCWQBIVKPCKUCKBGIAKKAYYIVPWWJM', 'JJZOEAXXHUOVVBVPAHUJOBJTBJAFHZJJZXXHUOVVIL', 'IIYNDZWWGTNUUAUOZGTINAISAIZEGYIIYWWGTNUUHK', 'HHXMCYVVFSMTTZTNYFSHMZHRZHYDFXHHXVVFSMTTGJ', 'GGWLBXUUERLSSYSMXERGLYGQYGXCEWGGWUUERLSSFI', 'FFVKAWTTDQKRRXRLWDQFKXFPXFWBDVFFVTTDQKRREH', 'EEUJZVSSCPJQQWQKVCPEJWEOWEVACUEEUSSCPJQQDG', 'DDTIYURRBOIPPVPJUBODIVDNVDUZBTDDTRRBOIPPCF', 'CCSHXTQQANHOOUOITANCHUCMUCTYASCCSQQANHOOBE', 'BBRGWSPPZMGNNTNHSZMBGTBLTBSXZRBBRPPZMGNNAD', 'AAQFVROOYLFMMSMGRYLAFSAKSARWYQAAQOOYLFMMZC', 'ZZPEUQNNXKELLRLFQXKZERZJRZQVXPZZPNNXKELLYB', 'YYODTPMMWJDKKQKEPWJYDQYIQYPUWOYYOMMWJDKKXA', 'XXNCSOLLVICJJPJDOVIXCPXHPXOTVNXXNLLVICJJWZ', 'WWMBRNKKUHBIIOICNUHWBOWGOWNSUMWWMKKUHBIIVY', 'VVLAQMJJTGAHHNHBMTGVANVFNVMRTLVVLJJTGAHHUX', 'UUKZPLIISFZGGMGALSFUZMUEMULQSKUUKIISFZGGTW', 'TTJYOKHHREYFFLFZKRETYLTDLTKPRJTTJHHREYFFSV', 'SSIXNJGGQDXEEKEYJQDSXKSCKSJOQISSIGGQDXEERU', 'RRHWMIFFPCWDDJDXIPCRWJRBJRINPHRRHFFPCWDDQT', 'QQGVLHEEOBVCCICWHOBQVIQAIQHMOGQQGEEOBVCCPS', 'PPFUKGDDNAUBBHBVGNAPUHPZHPGLNFPPFDDNAUBBOR', 'OOETJFCCMZTAAGAUFMZOTGOYGOFKMEOOECCMZTAANQ']
We may also try with some other offsets giving high indices of coincidece. After some attempts, we find a legible string.
ccc = ''.join((''.join(t) for t in zip(C[0], decrypt(C[1], chr(65+12)), decrypt(C[2], chr(65+3)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]
['NYXSTYBMFYDTZQTTPFYYMFYRFYYJWXNYXBMFYDTZXJ', 'MXWRSXALEXCSYPSSOEXXLEXQEXXIVWMXWALEXCSYWI', 'LWVQRWZKDWBRXORRNDWWKDWPDWWHUVLWVZKDWBRXVH', 'KVUPQVYJCVAQWNQQMCVVJCVOCVVGTUKVUYJCVAQWUG', 'JUTOPUXIBUZPVMPPLBUUIBUNBUUFSTJUTXIBUZPVTF', 'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSE', 'HSRMNSVGZSXNTKNNJZSSGZSLZSSDQRHSRVGZSXNTRD', 'GRQLMRUFYRWMSJMMIYRRFYRKYRRCPQGRQUFYRWMSQC', 'FQPKLQTEXQVLRILLHXQQEXQJXQQBOPFQPTEXQVLRPB', 'EPOJKPSDWPUKQHKKGWPPDWPIWPPANOEPOSDWPUKQOA', 'DONIJORCVOTJPGJJFVOOCVOHVOOZMNDONRCVOTJPNZ', 'CNMHINQBUNSIOFIIEUNNBUNGUNNYLMCNMQBUNSIOMY', 'BMLGHMPATMRHNEHHDTMMATMFTMMXKLBMLPATMRHNLX', 'ALKFGLOZSLQGMDGGCSLLZSLESLLWJKALKOZSLQGMKW', 'ZKJEFKNYRKPFLCFFBRKKYRKDRKKVIJZKJNYRKPFLJV', 'YJIDEJMXQJOEKBEEAQJJXQJCQJJUHIYJIMXQJOEKIU', 'XIHCDILWPINDJADDZPIIWPIBPIITGHXIHLWPINDJHT', 'WHGBCHKVOHMCIZCCYOHHVOHAOHHSFGWHGKVOHMCIGS', 'VGFABGJUNGLBHYBBXNGGUNGZNGGREFVGFJUNGLBHFR', 'UFEZAFITMFKAGXAAWMFFTMFYMFFQDEUFEITMFKAGEQ', 'TEDYZEHSLEJZFWZZVLEESLEXLEEPCDTEDHSLEJZFDP', 'SDCXYDGRKDIYEVYYUKDDRKDWKDDOBCSDCGRKDIYECO', 'RCBWXCFQJCHXDUXXTJCCQJCVJCCNABRCBFQJCHXDBN', 'QBAVWBEPIBGWCTWWSIBBPIBUIBBMZAQBAEPIBGWCAM', 'PAZUVADOHAFVBSVVRHAAOHATHAALYZPAZDOHAFVBZL', 'OZYTUZCNGZEUARUUQGZZNGZSGZZKXYOZYCNGZEUAYK']
Looking at the offsets used, we determine that the key is again FRI
.
decrypt(ciphertext, 'FRI')
'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSEE'