We start our machine learning applications with regression for a few simple reasons:

  • Regression is fundamental method for estimating the relationship between a variable ("y") that condition on many ("X") variables.
  • But the coefficients obtained can also be used to generate predictions.
  • Note: The focus in this section is on RELATIONSHIP paradigm
  • Many issues that confront researchers have well understood solutions when regression is the model being used.
  • Regression coefficients are easy to interpret.

Overall objectives

After this subchapter,

  1. You can fit a regression with statsmodels or sklearn
  2. You can view the results visually or numerically of your model with either method
  3. You can measure the goodness of fit on a regression
  4. You can interpret the mechanical meaning of the coefficients for
    • continuous variables
    • categorical a.k.a qualitative variables with two or more values (aka "dummy", "binary", and "categorical" variables
    • interaction terms between two X variables changes interpretation
    • variables in models with other controls included (including categorical variables)
  5. You understand what a t-stat / p-value does and does not tell you
  6. You are aware of common regression analysis pitfalls and disasters