Check that file file.txt
exists, view size.
!ls -hal file.txt
Copy file to HDFS
!hdfs dfs -put -f file.txt
Erase result
folder.
!hdfs dfs -rm -R result 2>/dev/null
Run the bash wordcount command wc
in parallel on the distributed file.
!mapred streaming \
-input file.txt \
-output result \
-mapper /bin/cat \
-reducer /usr/bin/wc
Check result of MapReduce job
!hdfs dfs -cat result/part*
Check that the word count is correct by comparing with wc
on local host (warning: do not try with too large files).
!wc file.txt