logo


Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.


Table of Contents

Introduction to Machine Learning and Pattern Classification

  • Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]
  • Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]
  • An Introduction to simple linear supervised classification using scikit-learn [IPython nb]

Pre-Processing

  • Feature Extraction
    • Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
  • Scaling and Normalization
    • About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
  • Feature Selection
  • Dimensionality Reduction
    • Principal Component Analysis (PCA) [IPython nb]
    • PCA based on the covariance vs. correlation matrix [IPython nb]
    • Linear Discriminant Analysis (LDA) [IPython nb]
    • The effect of scaling and mean centering of variables prior to a PCA [PDF]
    • Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
  • Representing Text
    • Tf-idf Walkthrough for scikit-learn [IPython nb]

Model Evaluation

  • An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
  • Cross-Validation
    • Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]

Parameter Estimation

  • Parametric Techniques

    • Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
    • How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
  • Non-Parametric Techniques

    • Kernel density estimation via the Parzen-window technique [IPython nb]
    • The K-Nearest Neighbor (KNN) technique
  • Regression Analysis

    • Linear Regression
    • Non-Linear Regression

Machine Learning Algorithms

Bayes Classification

Logistic Regression

  • Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

  • Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]

  • Activation Function Cheatsheet [IPython nb]

Ensemble Methods

  • Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

  • Cheatsheet for Decision Tree Classification [IPython nb]

Clustering

  • Protoype-based clustering
  • Hierarchical clustering
    • Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
  • Density-based clustering
  • Graph-based clustering
  • Probabilistic-based clustering

Collecting Data

  • Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]

  • Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]

  • Reading MNIST into NumPy arrays [IPython nb]

Statistical Pattern Classification Examples

  • Supervised Learning

    • Parametric Techniques

      • Univariate Normal Density

        • Ex1: 2-classes, equal variances, equal priors [IPython nb]
        • Ex2: 2-classes, different variances, equal priors [IPython nb]
        • Ex3: 2-classes, equal variances, different priors [IPython nb]
        • Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
        • Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr.[IPython nb]
      • Multivariate Normal Density

        • Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
        • Ex7: 2-classes, equal variances, equal priors [IPython nb]
    • Non-Parametric Techniques

Resources

  • Matplotlib examples - Visualization techniques for exploratory data analysis [IPython nb]

  • Copy-and-paste ready LaTex equations [Markdown]

  • Open-source datasets [Markdown]

  • Free Machine Learning eBooks [Markdown]

  • Terms in data science defined in less than 50 words [Markdown]

  • Useful libraries for data science in Python [Markdown]

  • General Tips and Advices [Markdown]

  • A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]