Steps
Read in data
Feature Engineering
-- Simple Bins
-- TFIDF
-- NLP
Sparse Representation
Training
-- Naive Bayes
-- SGD
The data used in this module is from the CSDMC2010 SPAM corpus. If you want to follow along with your own data, or make any modifications on the examples/data, do the following first in a Python compatible environment: