The top of Python files should always be a short documentation about the content of the file, or "docstring".
This ipyton notebook is short demonstration of Python for scientific data analysis
This script covers the following points:
For such a short program, the definition of a "main" function, and calling it by default when the module is imported by the main program, is a bit superfluous. But it shows good Python coding style.
Author: Thomas Haslwanter, Feb-2017
# To see the plots inline, even if you have not started the notebook
# via "ipython notebook --pylab=inline"
% pylab inline
# Note: single comment lines are indicated by "#".
Populating the interactive namespace from numpy and matplotlib
In contrast to MATLAB, you explicitly have to load the modules that you need. And don't worry here about not knowing the right modules: numpy, scipy, and matplotlib.pyplot are almost everything you will need most of the time, and you will quickly get used to those.
pylab automatically imports the most important components from numpy and matplotlib into the current workspace.
# Create a sine-wave
t = arange(0,10,0.1)
x = sin(t)
# "arange" and "sin" are from the package "numpy". But since the command
# "pylab" already loaded "numpy" into the current workspace, they are known
# here.
# Next, save the data in a text-file, in column form.
# The formatting is a bit clumsy: data are by default row variables; so to
# get a matrix, you stack the two rows above each other, and then transpose
# the matrix.
outFile = 'test.txt'
savetxt(outFile, vstack([t,x]).T)
# Read the data into a different variable
inData = loadtxt(outFile)
t2 = inData[:,0] # Note that Python starts at "0"!
x2 = inData[:,1]
# Note: Python used (...) for function arguments, and [...] for indexing.
# Plot the data
plot(t,cos(t))
[<matplotlib.lines.Line2D at 0x238971d03c8>]
When working with vectors and matrices, keep the following things in mind
Fit the following function: $y = k*x + d$
# Generate a noisy line
t = arange(-100,100)
# use a Python "dictionary" for named variables
par = {'offset':100, 'slope':0.5, 'noiseAmp':4}
x = par['offset'] + par['slope']*t + par['noiseAmp']*randn(len(t))
# Boolean indexing works in Python: select "late" values, i.e. with t>10
xHigh = x[t>10]
tHigh = t[t>10]
# Plot the "late" data
plot(tHigh, xHigh)
[<matplotlib.lines.Line2D at 0x2bdf9e1d160>]
# Boolean indices can be combined:
x = arange(10)
topRange = x>2
bottomRange = x<8
x[topRange & bottomRange]
array([3, 4, 5, 6, 7])
# Determine the best-fit line
# To do so, you have to generate a so-called Design Matrix, with "time" in the first
# column, and a column of "1" in the second column:
xMat = vstack((tHigh, ones(len(tHigh)))).T
slope, intercept = linalg.lstsq(xMat, xHigh)[0]
# Show and plot the fit, and save it to a PNG-file with a medium resolution.
# The "modern" way of Python-formatting is used
plot(tHigh, xHigh)
plot(tHigh, intercept + slope*tHigh, 'r')
savefig('linefit.png', dpi=200)
print('Fit line: intercept = {0:5.3f}, and slope = {1:5.3f}'.format(intercept, slope))
Fit line: intercept = 100.251, and slope = 0.488
# If you want to know confidence intervals, best switch to *pandas*
# Pandas is mainly used for statistics and worksheet-like data
import pandas
# The calculation of OLS has been moved to *statsmodels* now
import statsmodels.formula.api as smf
# Note that this is an advanced topic, and requires new data structures
# such ad "DataFrames" and "ordinary-least-squares" or "ols-models".
myDict = {'x':tHigh, 'y':xHigh}
df = pandas.DataFrame(myDict)
model = smf.ols('y~x', df).fit()
print(model.summary())
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.885 Model: OLS Adj. R-squared: 0.884 Method: Least Squares F-statistic: 671.9 Date: Sat, 04 Feb 2017 Prob (F-statistic): 1.09e-42 Time: 14:27:52 Log-Likelihood: -260.40 No. Observations: 89 AIC: 524.8 Df Residuals: 87 BIC: 529.8 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 100.2512 1.143 87.697 0.000 97.979 102.523 x 0.4881 0.019 25.921 0.000 0.451 0.526 ============================================================================== Omnibus: 0.760 Durbin-Watson: 1.719 Prob(Omnibus): 0.684 Jarque-Bera (JB): 0.850 Skew: -0.204 Prob(JB): 0.654 Kurtosis: 2.750 Cond. No. 143. ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
http://scipy-lectures.github.com/ Python Scientifc Lecture Notes. If you read nothing else, read this!
http://wiki.scipy.org/NumPy_for_Matlab_Users/ Start here if you have lots of Matlab experience.
https://docs.python.org/3.6/tutorial/ The Python tutorial. The original introduction.
http://jrjohansson.github.com/ Lectures on scientic computing with Python. Great ipython notebooks!