# Basic Principles¶

Show different ways to present statistical data.

This script is written in MATLAB or IPpython style, to show how best to use Python interactively. Note than in IPython, the show() commands are automatically generated. The examples contain:

• scatter plots
• histograms
• KDE
• errorbars
• boxplots
• probplots
• cumulative density functions
• regression fits

Author: thomas haslwanter, March-2015

First, import the libraries that you are going to need. You could also do that later, but it is better style to do that at the beginning. pylab imports the numpy, scipy, and matplotlib.pyplot libraries into the current environment

In :
%pylab inline

import scipy.stats as stats
import seaborn as sns
sns.set_context('notebook')
sns.set_style('darkgrid')

In :
# Generate data that are normally distributed
x = randn(50)


## Scatter plot¶

In :
plot(x,'.')
title('Scatter Plot')
xlabel('X')
ylabel('Y')
draw()

## Histogram¶

In :
hist(x)
xlabel('Data Values')
ylabel('Frequency')
title('Histogram, default settings')

x = randn(1000)

In :
hist(x,25)
xlabel('Data Values')
ylabel('Frequency')
title('Histogram, 25 bins')

Kernel Density Estimation (KDE)

In :
import seaborn as sns
sns.kdeplot(x)
xlabel('Data Values')
ylabel('Density')

In :
numbins = 20
cdf = stats.cumfreq(x,numbins)
plot(cdf)
xlabel('Data Values')
ylabel('Cumulative Frequency')
title('Cumulative probablity density function')

In :
# The error bars indicate 1.5* the inter-quartile-range (IQR), and the box consists of the
# first, second (middle) and third quartile
boxplot(x, sym='o')
title('Boxplot')
ylabel('Values')

boxplot(x, vert=False, sym='*')
title('Boxplot, horizontal')
xlabel('Values')

In :
x = arange(5)
y = x**2
errorBar = x/2
errorbar(x,y, yerr=errorBar, fmt='o', capsize=5, capthick=3)

plt.xlabel('Data Values')
plt.ylabel('Measurements')
plt.title('Errorbars')

xlim([-0.2, 4.2])
ylim([-0.2, 19])

In :
# Visual check
x = randn(100)
_ = stats.probplot(x, plot=plt)
title('Probplot - check for normality')

In :
# Generate data
x = randn(200)
y = 10+0.5*x+randn(len(x))

# Scatter plot
scatter(x,y)
# This one is quite similar to "plot(x,y,'.')"
title('Scatter plot of data')
xlabel('X')
ylabel('Y')

In :
M = vstack((ones(len(x)), x)).T
pars = linalg.lstsq(M,y)
intercept = pars
slope = pars
scatter(x,y)
plot(x, intercept + slope*x, 'r')
show() 