This notebook will enable you to understand how to analyze data in parallel using the map function of MapReduce.
Please note that the map function used in this notebook is not a real map. A real MapReduce framework like Hadoop or Spark requires some additional configuration and normally will not be applied to data that is so small. Therefore, you might found the runtime between different parallel processing notebooks does not vary too much.
import time
import math
def breakDoc(text,nToBreakInto):
textList=[]
fLength = len(text)
nLinesInEach = int(math.ceil(float(fLength)/nToBreakInto))
for i in range(nToBreakInto):
startIndex=i*nLinesInEach
endIndex=(i+1)*nLinesInEach
if endIndex<=fLength-1:
textList.append(text[startIndex:endIndex])
else:
textList.append(text[startIndex:])
return textList
def loadDocuments():
filename=input('Please Enter the Text You Want to Encipher: ')
with open(filename) as f:
text=f.read()
return text
def cipher(text,key):
import string
stri=""
for ch in text:
if ch not in string.ascii_letters:
stri+=ch
else:
output = chr(ord(ch) + key)
outputNum = ord(output)
if 64 < outputNum < 91 or 96 <outputNum < 123:
stri+=output
else:
x=chr(outputNum-26)
stri+=x
return stri
def CCMap(text,key,nToBreakInto):
#starttime = datetime.datetime.now()
start = time.clock()
textList=breakDoc(text,nToBreakInto)
encodedList=list(map(cipher,textList,[key]*len(textList)))
#for i in encodedList:
# print(i)
print("Runtime: ",(time.clock()-start),"seconds")
return encodedList
The cell below breaks a document into several chunks and encrypts each chunk in turn. It uses the first two phases of the divide-and-conquer strategy, that is, first splitting the data and then processing the data. Once the function is done, it will output its runtime.
Please use the text file called "merge.txt". It includes three novels, Pride and Prejudice, Jane Eyre and Crime and Punishment.
text=loadDocuments()
nToBreakInto=int(input("Please Enter the Number of Chunks: "))
key=int(input("Please Enter Shift Key: "))
encodedList=CCMap(text,key,nToBreakInto)
** Print the encrypted document**
for i in encodedList:
print(i)