Notebook '''' ------------------------------------------------------------------------------- Filename : ch5.ipynb Date : 2012-07-15 Author : C. Vogel Purpose : Replicate the regression analyses in Chapter 5 of_Machine Learning : for Hackers_. Input Data : heights_weights_genders.csv, and top_1000_sites.tsv, avaialable at : the book's github repository at : ML_for_Hackers.git. Libraries : Numpy 1.7.0b2, pandas 0.9.0, matplotlib 1.1.1, statsmodels 0.5.0 : (dev) (with patsy 0.1.0 dependency) ------------------------------------------------------------------------------- This notebook is a Python port of the R code in Chapter 5 of _Machine Learning for Hackers_ by D. Conway and J.M. White. Two datasets, heights_weights_genders.csv and top_1000_sites.tsv, should be located in the in a /data/ subfolder of the working directory. See the paths defined just after the import statements below to see what directory structure this script requires. Copying complete data folder from the book's github repository should be sufficient. For a detailed description of the analysis and the process of porting it to Python, see: '''