from IPython.core.display import HTML
css_file = '../../styles/styles.css'
HTML(open(css_file, "r").read())
- Be able to describe what a figure of merit is, and what it is for.
- Learn how to use
scipy
to fit models to data.- Understand why sometimes model-fitting algorithms do not give the right answer.
Probably the most common task for an observational astronomer is fitting a model to some data. Suppose we have a model $M$ with some parameters $\theta$. For example, our model, $M$ could be a straight line $y = mx+c$, with parameters $\theta = (m,c)$. Model fitting is the process of finding the parameters which best fit our data, and the corresponding uncertainties on them. To do this, we obviously have to have some objective measure of how well our model fits our data. Such a measure is often called a figure of merit.
Suppose we have a model $y(x)$ and we observe a set of points $(x_i,y_i)$. If our model is a good fit to the data then the model prediction at a certain data point $y(x_i)$ will lie close to the observation $y_i$. Therefore, the sum-of-the squares statistic
$${\rm SS} = \sum_i [y_i - y(x_i)]^2,$$is a measure of how well our model fits the data. Smaller values of ${\rm SS}$ imply a better fit. For scientific data, the sum-of-the-squares is not the best figure of merit. This is because every data point contributes equally to the sum, regardless of it's uncertainty.
This is the most commonly used measure of goodness-of-fit. Suppose again that we have a model $y(x)$ and we observe a set of points $(x_i,y_i)$, each with corresponding uncertainties $\sigma_i$. The $\chi^2$ statistic is
$$\chi^2 = \sum_i \left(\frac{y_i - y(x_i)}{\sigma_i} \right) ^2,$$where $y(x_i)$ is the prediction of our model at $x_i$. Intuitively, we can see that $\chi^2$ measures goodness of fit. The quantity $ [y_i - y(x_i)]/\sigma_i $ is a measure of how many error bars a data point is away from our model. We saw in the lectures that if our model fits well, most data points lie within 1$\sigma$, but a third lie further away. Therefore, we'd expect $\chi^2$ for a good fit to roughly equal the number of data points. Poor fits will yield larger versions of $\chi^2$. A common way to fit a model is therefore to find the model parameters which minimise $\chi^2$.
If you want to prove to yourself that minimising $\chi^2$ is not just intuitively correct, but formally correct, have a look at the companion notebook 08a-why_chisq.ipynb
.
Model fitting is therefore just a matter of finding the model parameters that minimise $\chi^2$. You could do this by hand, of course, but this is the kind of task that computers are designed for. Python has several different methods available in libraries. They all do the same thing - they attempt to automatically search parameter space for the values that minimise $\chi^2$. Which you want to use depends a little on your task. We'll look at two cases: fitting simple polynomial models (e.g a straight line), and fitting more complex models
So we want to fit a straight line to data using python, and plot the results. This is very simple. First we need to import the relevant modules
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Now we wish to load in the data to fit. This is stored in a comma separated file. The loadtxt
function in numpy will do this simply and will load each column into arrays called x, y and e.
# note the "unpack" optional argument to unpack the columns automatically
x, y, e = np.loadtxt('/home/user/PHY241/data/Session7/data.txt', unpack=True)
Now we have our arrays of data we can plot it to see what it looks like.
fig,ax = plt.subplots()
ax.errorbar(x, y, yerr=e, fmt='.')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
In this case our data looks like a straight line will fit it. If we want to fit a polynomial to some data, the easiest way is to use the numpy.polyfit
package. I show how to do that in the companion notebook 08b-polynomials.ipynb
. For this lab, we are going to learn a different approach, which has the advantage that it will scale to more complex models.
If we want to fit a model which is more complex than a polynomial, we can use scipy's optimize library. This library has many routines for finding the parameters of a model which best fit your data. We're going to look at the curve fit function in scipy's optimize package, which uses the Levenberg-Marquardt algorithm to find the parameters which best fit our data.
Finding the model parameters which minimise $\chi^2$ for a function requires an algorithm, or recipe. You could imagine using a brute-force method where you calculate $\chi^2$ over a wide range of possible parameters, and find the best ones. Such a method does work, but is very slow - impractically so for models that have more than two parameters.
The Levenberg-Marquardt algorithm is an attempt to find the best parameters more efficiently. It is a type of "gradient-search" method, illustrated in the figure below:
The plot above shows how $\chi^2$ depends upon the model parameter for a model with a single free parameter. From a given starting point, the Levenberg-Marquardt algorithm finds the "downhill" direction in $\chi^2$, and steps in that direction until $\chi^2$ stops decreasing. This location gives the parameter values which minimise $\chi^2$. The steps taken are shown by the red dots and solid red line. The idea is easily extended to models with more than one free parameter (for example, a straight line fit has two - the slope and the gradient):
The greyscale in the image above shows the $\chi^2$ space for a model with two parameters, $a$ and $b$. Dark areas represent low values of $\chi^2$, blue contours show lines of constant $\chi^2$. From a given starting point, the Levenberg-Marquardt algorithm follows the "downhill" direction in $\chi^2$, and steps in that direction until $\chi^2$ stops decreasing. This location gives the parameter values which minimise $\chi^2$. The steps taken are shown by the red dots and solid red line.
This second figure also illustrates an important point about model-fitting: there can be more than one "valley" of low $\chi^2$ in the parameter space, and the Levenberg-Marquardt algorithm always sets off in the local downhill direction until the minimum is reached. This can lead to the function finding a local $\chi^2$ minimum, which is not the global $\chi^2$ minimum, as illustrated by the dashed red line.
What this means for you in practice is that the "best-fit" you find for a model may not be the overall best fit, and will depend upon your starting position. It is always good practice when fitting a model to data to check that starting from different positions gives the same solution.
Suppose we want to fit a model (any model) to some data. The first step is to write a function that calculates the prediction of our model at some location, $x$. The first argument of this function should be the location $x$. The remaining arguments are our model parameters. Therefore, for a straight line, our function looks like this (as always, read the documentation string carefully to understand how the function works):
def funcToFit(x, m, c):
"""Function to calculate a straight line
Args
--------
x: np.ndarray, float
the positions at which to evaluate the function
m: float
the gradient of the straight line
c: float
the intercept of the line
Returns
---------
np.ndarray, float
the values predicted by our model at positions x
"""
return m*x + c
Now we can use the curve fit function in scipy's optimize package, which uses the Levenberg-Marquardt algorithm to find the parameters which best fit our data.
Look closely at the comments in the code below to understand how it works. curve_fit
returns two variables. The first is a list
of the best fitting model parameters. The second is the covariance matrix. You'll learn more about covariance matrices next year. For now you can assume that the square root of the diagonal elements of this matrix are our uncertainties, and these can easily be extracted using numpy's diag
function
from scipy.optimize import curve_fit
# we need an initial guess for our starting parameters
# this is a list with one entry for each parameter
p0 = [2, 2]
"""
We run curve fit itself below
The first argument is the function that defines our model
The next two arguments are the "x" and "y" data
The fourth argument is our starting guess
The final (optional) argument is the error bars, "e".
Without the error bars, curve_fit will minimise the Sum-of-Squares.
Curve fit returns two variables. The first is the best fitting
model parameters, the second is the covariance matrix.
"""
popt,pcov = curve_fit(funcToFit, x, y, p0, sigma=e)
# let's print out our fits
# I use numpy's diag function to pick out the diagonal
# elements of the covariance matrix
print( popt )
print( np.sqrt(np.diag(pcov)) )
# also, let's use python's string formatting to print out in a
# nice format. Note how I ensure the correct number of decimal places
m, c = popt
em, ec = np.sqrt(np.diag(pcov))
print( 'Gradient = {0:.3f} +/- {1:.3f}'.format(m, em) )
print( 'Intercept = {0:.1f} +/- {1:.1f}'.format(c, ec) )
Make a plot of the data and the best fit to it. Use error bars for the data points. For the best fit, plot a solid line.
You will need to use your function
funcToFit
and the best fit parameterspopt
to calculate the best fitting model at the pointsx
. You can then plot the result againstx
on the same graph as your data.
Generalising the process to more complex models is easy - all we do is change the function we fit our data with - funcToFit
(note, we could call this function anything, but funcToFit
tells you what it is!)
The data file
/home/user/PHY241/data/Session7/complex_model.txt
contains some data in the same format asdata.txt
but the model we think explains the data is more complex. We have reason to believe the data is explained by the model
$$ y = a \cos{bx} + b \sin{ax} $$
In the cell below, write a new version of
funcToFit
which encodes this model. Use the version offuncToFit
above as a template to make sure you get it right.
The test cell below should run without errors if you've done things right...
# YOUR CODE HERE
# this cell should raise no errors!
from nose.tools import assert_almost_equal
assert_almost_equal(funcToFit(2, 12, 2), -9.6548801743765917, places=6)
In the cell(s) below, read the data from the file
/home/user/PHY241/data/Session7/complex_model.txt
into variablesx
,y
ande
.
Use
curve_fit
to fit your function above to the data, and plot the data and best fitting model together.
Start from a guess for the parameters of $a=-12$, $b=2$. Does your model fit your data? If you didn't get a good fit, try again from a starting position of $a=5$, $b=2$. Do you get a good fit now?
Create a markdown cell, explaining the results you found above.
# YOUR CODE HERE - USE AS MANY CELLS AS YOU LIKE
As a reminder, we can use curve_fit
to fit more or less any model we like, as long as we can write that model as a Python function. In the file transit.py
I have written a Python function that calculates the shape of an exoplanet transit lightcurve. We can import this function from the Python file in the same we would import functions from scipy
or numpy
Let's import the function and look at it's documentation:
from transit import transit
help(transit)
Your task in this homework is to fit the transit lightcurve of the exoplanet Wasp-4b. However, we don't want to fit all the parameters taken by the function transit
! I have been very kind, and found reference values for all but the planetary radius, the stellar radius and the inclination. These are the three parameters we want to fit, so we need to write a new function that hard-codes the other parameters in.
Complete the function below to write a new function which takes an array of times, plus three parameters for the planetary radius, the inclination and the limb darkening. Your function must call the transit function above, with the other parameters fixed at values found from the literature. These are:
Orbital period, $P = 1.38823187\,$ days.
Time of mid-transit, $T_0 = 54748.150490$ days.
The limb darkening parameter, $\mu = 0.311$.
def funcToFit(t, Rs, Rp, i):
"""Calculate the transit shape due to an exoplanet.
This function implements the model of exoplanetary transits found in
Sackett et al (1999, ASIC, 532, 189).
Args
----
t: np.ndarray, float
the times at which to calcualte the transit lightcurve. Units of days (MJD)
Rs: float
radius of star, divided by the orbital separation
Rp: float
radius of exoplanet, divided by the orbital separation
i: float
inclination, in degrees
Returns
--------
np.ndarray, float
the transit lightcurve, normalised to 1 outside of transit
"""
# WRITE YOUR CODE HERE
from nose.tools import assert_almost_equal, assert_equal
t = [54822.925253333335,54748.150490]
rp = 0.02883
rs = 0.01827
i = 88.8
result = funcToFit(t,rs,rp,i)
assert_equal(len(result),2)
assert_almost_equal(result[0],1,places=5)
assert_almost_equal(result[1],0.28553724389678403,places=5)
The transit data (time, flux, error) are stored in the text file
/home/user/PHY241/data/Session7/wasp4_transit.txt
. Using the examples provided above as a template, read in the data file and fit the data usingcurve_fit
. Store the best fitting parameters in a variable namedpopt
and the covariance matrix in a variable namedpcov
.
You will need reasonable starting guesses for the scaled star and planet radii and the inclination. I suggest 0.2, 0.02 and 88 degrees respectively.
# YOUR CODE HERE
assert_almost_equal(popt[0],1.72786592e-01,places=3)
assert_almost_equal(popt[1],2.70230620e-02,places=3)
assert_almost_equal(popt[2],9.02249791e+01,places=3)
Plot the data and best fit on a single plot.
# YOUR CODE HERE