Notebook ''' ------------------------------------------------------------------------------- Filename : ch4.ipynb Date : 2012-09-17 Author : C. Vogel Purpose : Replicate the naive Bayes e-mail classifier in Chapter 3 of : _Machine Learning for Hackers_. Input Data : e-mail files, split into spam and ham (non-spam) folders are available : at the book's github repository at : ML_for_Hackers.git. Libraries : dateutil, Matplotlib, NLTK, textmining, Numpy 1.6.1, pandas 0.9.0 : statsmodels ------------------------------------------------------------------------------- This notebook is a Python port of the R code in Chapter 4 of _Machine Learning for Hackers_ by D. Conway and J.M. White. E-mail files, split into folders classified as spam or ham (non-spam), should be located in a /data/ subfolder of the working directory. See the paths defined just after the import statements below to see what directory structure this script requires. Copying complete data folder from the book's github repository should be sufficient. For a detailed description of the analysis and the process of porting it to Python, see: '''