We can't really call this a fitting package without being able to fit a straight line, right?
from probfit import describe, Chi2Regression
import iminuit
import numpy as np
import numpy.random as npr
#lets make s straight line with gaussian(mu=0,sigma=1) noise
npr.seed(0)
x = linspace(0,10,20)
y = 3*x+15+ randn(20)
err = np.array([1]*20)
errorbar(x,y,err,fmt='.');
#lets define our line
#first argument has to be independent variable
#arguments after that are shape parameters
def line(x,m,c): #define it to be parabolic or whatever you like
return m*x+c
#We can make it faster but for this example this is plentily fast.
#We will talk about speeding things up later(you will need cython)
describe(line)
['x', 'm', 'c']
#cost function
chi2 = Chi2Regression(line,x,y,err)
#Chi^2 regression is just a callable object nothing special about it
describe(chi2)
['m', 'c']
#minimize it
#yes it gives you a heads up that you didn't give it initial value
#we can ignore it for now
minimizer = iminuit.Minuit(chi2) #see iminuit tutorial on how to give initial value/range/error
minimizer.migrad(); #very stable robust minimizer
#you can look at your terminal to see what it is doing;
-c:4: InitialParamWarning: Parameter m does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter c does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.
FCN = 12.0738531135 | TOTAL NCALL = 36 | NCALLS = 36 |
EDM = 1.10886029888e-21 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | m | 2.886277e+00 | 7.367884e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | c | 1.613795e+01 | 4.309458e-01 | 0.000000e+00 | 0.000000e+00 |
#lets see our results
print minimizer.values
print minimizer.errors
{'c': 16.137947520534624, 'm': 2.8862774144823855} {'c': 0.4309458211385722, 'm': 0.07367884284273937}
#or a pretty printing
minimizer.print_fmin()
FCN = 12.0738531135 | TOTAL NCALL = 36 | NCALLS = 36 |
EDM = 1.10886029888e-21 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | m | 2.886277e+00 | 7.367884e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | c | 1.613795e+01 | 4.309458e-01 | 0.000000e+00 | 0.000000e+00 |
is calculated using the second derivative at the minimum This is good in most cases where the uncertainty is symmetric not much correlation exists. Migrad usually got this accurately but if you want ot be sure call minimizer.hesse() after migrad
is obtained by scanning chi^2 and likelihood profile and find the the point where chi^2 is increased by 1 or -ln likelihood is increased by 0.5 This error is generally asymmetric and is correct in all case. This is quite computationally expensive if you have many parameter. call minimizer.minos('variablename') after migrad to get this error
#let's visualize our line
chi2.draw(minimizer)
#looks good
#Sometime we want the error matrix
print 'error matrix:\n', minimizer.matrix()
#or correlation matrix
print 'correlation matrix:\n', minimizer.matrix(correlation=True)
#or a pretty html
#despite the method named error_matrix it's actually a correlation matrix
minimizer.print_matrix()
error matrix: ((0.005428571882645087, -0.027142859751431703), (-0.027142859751431703, 0.18571430075679826)) correlation matrix: ((1.0, -0.8548504260481388), (-0.8548504260481388, 1.0))
+ |
m
|
c
|
m | 1.00 | -0.85 |
c | -0.85 | 1.00 |
In high energy physics, we usually want to fit a distribution to a histogram. Let's look at simple gaussian distribution
import numpy.random as npr
import iminuit
from probfit import BinnedLH
#First lets make our data
npr.seed(0)
data = randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data,bins=100,histtype='step');
Here is our PDF/model
#normalized gaussian
def myPDF(x,mu,sigma):
return 1/sqrt(2*pi)/sigma*exp(-(x-mu)**2/2./sigma**2)
#build your cost function here we use binned likelihood
cost = BinnedLH(myPDF,data)
#Let's fit
minimizer = iminuit.Minuit(cost,sigma=3.) #notice here that we give initial value to sigma
#but most of the time we want to see it before fitting
cost.draw(minimizer)
-c:2: InitialParamWarning: Parameter mu does not have initial value. Assume 0. -c:2: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1. -c:2: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
minimizer.migrad() #very stable minimization algorithm
#like in all binned fit with long zero tail. It will have to do something about the zero bin
#dist_fit handle them gracefully but will give you a head up
-c:1: LogWarning: x is really small return 0
FCN = 20.9368166553 | TOTAL NCALL = 46 | NCALLS = 46 |
EDM = 1.45381812456e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.258754e-01 | 3.962599e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.952381e+00 | 2.826741e-02 | 0.000000e+00 | 0.000000e+00 |
({'hesse_failed': False, 'has_reached_call_limit': False, 'has_accurate_covar': True, 'has_posdef_covar': True, 'up': 0.5, 'edm': 1.4538181245599543e-06, 'is_valid': True, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 20.93681665525327, 'nfcn': 46}, [{'is_const': False, 'name': 'mu', 'has_limits': False, 'value': 0.9258754454758255, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.039625990354239755, 'is_fixed': False}, {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 3.9523813236078955, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.028267407263212106, 'is_fixed': False}])
#you can see if your fit make any sense too
cost.draw(minimizer)#uncertainty is given by symetric poisson
#let's see the result
print 'Value:', minimizer.values
print 'Error:', minimizer.errors
Value: {'mu': 0.9258754454758255, 'sigma': 3.9523813236078955} Error: {'mu': 0.039625990354239755, 'sigma': 0.028267407263212106}
#That printout can get out of hand quickly
minimizer.print_fmin()
#and correlation matrix
#will not display well in firefox(tell them to fix writing-mode:)
minimizer.print_matrix()
FCN = 20.9368166553 | TOTAL NCALL = 46 | NCALLS = 46 |
EDM = 1.45381812456e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.258754e-01 | 3.962599e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.952381e+00 | 2.826741e-02 | 0.000000e+00 | 0.000000e+00 |
+ |
mu
|
sigma
|
mu | 1.00 | -0.00 |
sigma | -0.00 | 1.00 |
#how about making sure the error making sense
minimizer.draw_mnprofile('mu');
-c:2: LogWarning: x is really small return 0
#2d contour error
#you can notice that it takes sometime to draw
#we will this is because our PDF is defined in Python
#we will show how to speed this up later
minimizer.draw_mncontour('mu','sigma');
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/iminuit/_plotting.py:85: LogWarning: x is really small return 0 sigma=this_sig)
Let's explore another popular cost function chi^2. Chi^2 is bad when you have bin with 0. ROOT just ignore. ROOFIT does something I don't remember. But's it's best to avoid using chi^2 when you have bin with 0 count.
import numpy.random as npr
from math import sqrt,exp,pi
import iminuit
from probfit import Extended, gaussian, BinnedChi2, describe
#we will use the same data as in the previous example
npr.seed(0)
data = npr.randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data, bins=100, histtype='step');
#And the same PDF: normalized gaussian
def myPDF(x,mu,sigma):
return 1/sqrt(2*pi)/sigma*exp(-(x-mu)**2/2./sigma**2)
#binned chi^2 fit only makes sense(for now) for extended fit
extended_pdf = Extended(myPDF)
#very useful method to look at function signature
describe(extended_pdf) #you can look at what your pdf means
['x', 'mu', 'sigma', 'N']
#Chi^2 distribution fit is really bad for distribution with long tail
#since when bin count=0... poisson error=0 and blows up chi^2
#so give it some range
chi2 = BinnedChi2(extended_pdf,data,bound=(-7,10))
minimizer = iminuit.Minuit(chi2,sigma=1)
minimizer.migrad()
-c:5: InitialParamWarning: Parameter mu does not have initial value. Assume 0. -c:5: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1. -c:5: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1. -c:5: InitialParamWarning: Parameter N does not have initial value. Assume 0. -c:5: InitialParamWarning: Parameter N is floating but does not have initial step size. Assume 1.
FCN = 36.5025994291 | TOTAL NCALL = 247 | NCALLS = 247 |
EDM = 3.00607928198e-08 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.064759e-01 | 4.482536e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.959036e+00 | 4.112828e-02 | 0.000000e+00 | 0.000000e+00 | |||
3 | N | 9.969691e+03 | 1.033724e+02 | 0.000000e+00 | 0.000000e+00 |
({'hesse_failed': False, 'has_reached_call_limit': False, 'has_accurate_covar': True, 'has_posdef_covar': True, 'up': 1.0, 'edm': 3.006079281979545e-08, 'is_valid': True, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 36.50259942907618, 'nfcn': 247}, [{'is_const': False, 'name': 'mu', 'has_limits': False, 'value': 0.9064759047700527, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.044825358490154885, 'is_fixed': False}, {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 3.959036058997726, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 0.041128281235997405, 'is_fixed': False}, {'is_const': False, 'name': 'N', 'has_limits': False, 'value': 9969.690654245478, 'number': 2L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 103.37239384508398, 'is_fixed': False}])
chi2.draw(minimizer)
#and usual deal
minimizer.print_fmin()
minimizer.print_matrix()
FCN = 36.5025994291 | TOTAL NCALL = 247 | NCALLS = 247 |
EDM = 3.00607928198e-08 | GOAL EDM = 1e-05 | UP = 1.0 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.064759e-01 | 4.482536e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.959036e+00 | 4.112828e-02 | 0.000000e+00 | 0.000000e+00 | |||
3 | N | 9.969691e+03 | 1.033724e+02 | 0.000000e+00 | 0.000000e+00 |
+ |
mu
|
sigma
|
N
|
mu | 1.00 | -0.10 | -0.05 |
sigma | -0.10 | 1.00 | 0.18 |
N | -0.05 | 0.18 | 1.00 |
Unbinned likelihood is computationally very very expensive. It's now a good time that we talk about how to speed things up with cython
import numpy.random as npr
from probfit import UnbinnedLH, gaussian, describe
import iminuit
#same data
npr.seed(0)
data = npr.randn(10000)*4+1
#sigma = 4 and mean = 1
hist(data,bins=100,histtype='step');
#We want to speed things up with cython
#load cythonmagic
%load_ext cythonmagic
%%cython
cimport cython
from libc.math cimport exp,M_PI,sqrt
#same gaussian distribution but now written in cython
@cython.binding(True)#IMPORTANT:this tells cython to dump function signature too
def cython_PDF(double x,double mu,double sigma):
#these are c add multiply etc not python so it's fast
return 1/sqrt(2*M_PI)/sigma*exp(-(x-mu)**2/2./sigma**2)
#cost function
ublh = UnbinnedLH(cython_PDF,data)
minimizer = iminuit.Minuit(ublh,sigma=2.)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast
ublh.show(minimizer)
minimizer.print_fmin()
minimizer.print_matrix()
-c:3: InitialParamWarning: Parameter mu does not have initial value. Assume 0. -c:3: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1. -c:3: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
FCN = 27927.1139471 | TOTAL NCALL = 69 | NCALLS = 69 |
EDM = 5.05909350517e-09 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.262679e-01 | 3.950226e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.950224e+00 | 2.793227e-02 | 0.000000e+00 | 0.000000e+00 |
FCN = 27927.1139471 | TOTAL NCALL = 69 | NCALLS = 69 |
EDM = 5.05909350517e-09 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu | 9.262679e-01 | 3.950226e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.950224e+00 | 2.793227e-02 | 0.000000e+00 | 0.000000e+00 |
+ |
mu
|
sigma
|
mu | 1.00 | 0.00 |
sigma | 0.00 | 1.00 |
#remember how slow it was?
#now it's super fast(and it's even unbinned likelihood)
minimizer.draw_mnprofile('mu');
But you really don't have to write your own gaussian there are tons of builtin function written in cython for you.
import probfit.pdf
print dir(probfit.pdf)
print describe(gaussian)
print type(gaussian)
['HistogramPdf', 'MinimalFuncCode', 'Polynomial', '_Linear', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__pyx_capi__', '__test__', 'argus', 'cauchy', 'cruijff', 'crystalball', 'describe', 'doublegaussian', 'gaussian', 'linear', 'novosibirsk', 'np', 'poly2', 'poly3', 'rtv_breitwigner', 'ugaussian'] ['x', 'mean', 'sigma'] <type 'builtin_function_or_method'>
ublh = UnbinnedLH(gaussian,data)
minimizer = iminuit.Minuit(ublh,sigma=2.)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast
ublh.draw(minimizer, show_errbars='normal') # control how fit is displayed too
-c:2: InitialParamWarning: Parameter mean does not have initial value. Assume 0. -c:2: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1. -c:2: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
FCN = 27927.1139471 | TOTAL NCALL = 69 | NCALLS = 69 |
EDM = 5.07834778662e-09 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mean | 9.262679e-01 | 3.950226e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | sigma | 3.950224e+00 | 2.793227e-02 | 0.000000e+00 | 0.000000e+00 |
When fitting distribution to a PDF. One of the common problem that we run into is normalization. Not all function is analytically integrable on the range of our interest. Let's look at crystal ball function.
from probfit import crystalball, gen_toy, Normalized, describe, UnbinnedLH
import numpy.random as npr
import iminuit
#lets first generate a crystal ball sample
#dist_fit has builtin toy generation capability
#lets introduce crystal ball function
#http://en.wikipedia.org/wiki/Crystal_Ball_function
#it's simply gaussian with power law tail
#normally found in energy deposited in crystals
#impossible to normalize analytically
#and normalization will depend on shape parameters
describe(crystalball)
['x', 'alpha', 'n', 'mean', 'sigma']
npr.seed(0)
bound = (-1,2)
data = gen_toy(crystalball,10000,bound=bound,alpha=1.,n=2.,mean=1.,sigma=0.3,quiet=False)
#quiet = False tells it to plot out original function
#toy histogram and poisson error from both orignal distribution and toy
['x', 'alpha', 'n', 'mean', 'sigma']
#To fit this we need to tell normalized our crystal ball PDF
#this is done with trapezoid rule with simple cache mechanism
#can be done by Normalized functor
ncball = Normalized(crystalball,bound)
#this can also bedone with declerator
#@normalized_function(bound)
#def myPDF(x,blah):
# return blah
print 'unnorm:', crystalball(1.0,1,2,1,0.3)
print ' norm:', ncball(1.0,1,2,1,0.3)
unnorm: 1.0 norm: 1.10945669814
#it has the same signature as the crystalball
describe(ncball)
['x', 'alpha', 'n', 'mean', 'sigma']
#now we can fit as usual
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh,
alpha=1.,n=2.1,mean=1.2,sigma=0.3)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast Normalize is written in cython
ublh.show(minimizer)
#crystalball function is nortorious for its sensitivity on n parameter
#dist_fit give you a heads up where it might have float overflow
-c:4: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1. -c:6: SmallIntegralWarning: (0.9689428295957161, 0.44086027281289175, -7.852819454184058, 0.9263111214440007, 0.29811644525305303)
FCN = 6154.37579109 | TOTAL NCALL = 178 | NCALLS = 178 |
EDM = 1.09346528365e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | alpha | 1.012962e+00 | 5.321735e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | n | 1.812763e+00 | 2.177144e-01 | 0.000000e+00 | 0.000000e+00 | |||
3 | mean | 9.982474e-01 | 5.583931e-03 | 0.000000e+00 | 0.000000e+00 | |||
4 | sigma | 2.996611e-01 | 4.195338e-03 | 0.000000e+00 | 0.000000e+00 |
probfit check for integrate method that can be called by integrate(bound, nint, *arg) to compute definite integral for given bound and nint(pieces of integral this is normally ignored) and the rest will be passed as positional argument. Couple of probfit already have analytical formula written.
from probfit import integrate1d
def line(x, m, c):
return m*x+c
#compute integral of line from x=(0,1) using 10 intevals with m=1. and c=2.
#all probfit internal use this
print integrate1d(line, (0,1), 10, (1.,2.) ) # no integrate method available probfit use simpson3/8
#Let us illustrate the point by forcing it to have integral that's off by
#factor of two
def wrong_line_integral(bound, nint, m, c):
a, b = bound
return 2*(m*(b**2/2.-a**2/2.)+c*(b-a)) # I know this is wrong
line.integrate = wrong_line_integral
# line.integrate = lambda bound, nint, m, c: blah blah # this works too
print integrate1d(line, (0,1), 10, (1.,2.))
2.5 5.0
#crystalball is nortoriously sensitive to initial parameter
#now it is a good time to show what happen when things...doesn't fit
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh)#NO initial value
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast but tons of read
#Remember there is a heads up
-c:4: InitialParamWarning: Parameter alpha does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter n does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter mean does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1. -c:4: InitialParamWarning: Parameter sigma does not have initial value. Assume 0. -c:4: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
FCN = 10986.1228867 | TOTAL NCALL = 230 | NCALLS = 230 |
EDM = 0.0 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
False | True | False | False | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
True | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | alpha | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 | |||
2 | n | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 | |||
3 | mean | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 | |||
4 | sigma | 0.000000e+00 | 1.000000e+00 | 0.000000e+00 | 0.000000e+00 |
({'hesse_failed': True, 'has_reached_call_limit': False, 'has_accurate_covar': False, 'has_posdef_covar': False, 'up': 0.5, 'edm': 0.0, 'is_valid': False, 'is_above_max_edm': False, 'has_covariance': True, 'has_made_posdef_covar': False, 'has_valid_parameters': True, 'fval': 10986.122886679714, 'nfcn': 230}, [{'is_const': False, 'name': 'alpha', 'has_limits': False, 'value': 0.0, 'number': 0L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False}, {'is_const': False, 'name': 'n', 'has_limits': False, 'value': 0.0, 'number': 1L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False}, {'is_const': False, 'name': 'mean', 'has_limits': False, 'value': 0.0, 'number': 2L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False}, {'is_const': False, 'name': 'sigma', 'has_limits': False, 'value': 0.0, 'number': 3L, 'has_lower_limit': False, 'upper_limit': 0.0, 'lower_limit': 0.0, 'has_upper_limit': False, 'error': 1.0, 'is_fixed': False}])
ublh.show(minimizer)#it bounds to fails
minimizer.migrad_ok(), minimizer.matrix_accurate()
#checking these two method give you a good sign
(False, False)
#fix it by giving it initial value/error/limit or fixing parameter see iminuit Tutorial
#now we can fit as usual
ublh = UnbinnedLH(ncball,data)
minimizer = iminuit.Minuit(ublh,
alpha=1.,n=2.1,mean=1.2,sigma=0.3)
minimizer.set_up(0.5)#remember this is likelihood
minimizer.migrad()#yes amazingly fast. Normalize is written in cython
ublh.show(minimizer)
-c:5: InitialParamWarning: Parameter alpha is floating but does not have initial step size. Assume 1. -c:5: InitialParamWarning: Parameter n is floating but does not have initial step size. Assume 1. -c:5: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1. -c:5: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
FCN = 6154.37579109 | TOTAL NCALL = 178 | NCALLS = 178 |
EDM = 1.09346528365e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | alpha | 1.012962e+00 | 5.321735e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | n | 1.812763e+00 | 2.177144e-01 | 0.000000e+00 | 0.000000e+00 | |||
3 | mean | 9.982474e-01 | 5.583931e-03 | 0.000000e+00 | 0.000000e+00 | |||
4 | sigma | 2.996611e-01 | 4.195338e-03 | 0.000000e+00 | 0.000000e+00 |
This is a hard question but visualization can helps us
from probfit import try_uml, Normalized, crystalball, gen_toy
bound = (-1,2)
ncball = Normalized(crystalball,bound)
data = gen_toy(crystalball,10000,bound=bound,alpha=1.,n=2.,mean=1.,sigma=0.3)
besttry = try_uml(ncball,data,alpha=1.,n=2.1,mean=1.2,sigma=0.3)
#or you can try multiple
#too many will just confuse you
besttry = try_uml(ncball,data,alpha=1.,n=2.1,mean=[1.2,1.1],sigma=[0.3,0.5])
print besttry #and you can find which one give you minimal unbinned likelihood
{'alpha': 1.0, 'mean': 1.1, 'sigma': 0.3, 'n': 2.1}
import numpy.random as npr
from probfit import gen_toy, gaussian, Extended, describe, try_binlh, BinnedLH
import numpy as np
import iminuit
peak1 = npr.randn(3000)*0.2
peak2 = npr.randn(5000)*0.1+4
bg = gen_toy(lambda x : (x+2)**2, 20000,(-2,5))
all_data = np.concatenate([peak1,peak2,bg])
hist((peak1,peak2,bg,all_data),bins=200,histtype='step',range=(-2,5));
%load_ext cythonmagic
The cythonmagic extension is already loaded. To reload it, use: %reload_ext cythonmagic
%%cython
cimport cython
from probfit import Normalized, gaussian
@cython.binding(True)
def poly(double x,double a,double b, double c):
return a*x*x+b*x+c
#remember linear function is not normalizable for -inf ... +inf
nlin = Normalized(poly,(-1,5))
#our extended PDF for 3 types of signal
@cython.binding(True)
def myPDF(double x,
double a, double b, double c, double nbg,
double mu1, double sigma1, double nsig1,
double mu2, double sigma2, double nsig2):
cdef double NBG = nbg*nlin(x,a,b,c)
cdef double NSIG1 = nsig1*gaussian(x,mu1,sigma1)
cdef double NSIG2 = nsig2*gaussian(x,mu2,sigma2)
return NBG + NSIG1 + NSIG2
gcc-4.2 not found, using clang instead
print describe(myPDF)
['x', 'a', 'b', 'c', 'nbg', 'mu1', 'sigma1', 'nsig1', 'mu2', 'sigma2', 'nsig2']
#lets use what we just learned
#for complicated function good initial value(and initial step) is crucial
#if it doesn't converge try play with initial value and initial stepsize(error_xxx)
besttry = try_binlh(myPDF,all_data,
a=1.,b=2.,c=4.,nbg=20000.,
mu1=0.1,sigma1=0.2,nsig1=3000.,
mu2=3.9,sigma2=0.1,nsig2=5000.,extended=True, bins=300, bound=(-1,5) )
print besttry
{'a': 1.0, 'c': 4.0, 'b': 2.0, 'mu1': 0.1, 'mu2': 3.9, 'sigma1': 0.2, 'nsig2': 5000.0, 'nsig1': 3000.0, 'nbg': 20000.0, 'sigma2': 0.1}
binlh = BinnedLH(myPDF,all_data, bins=200, extended=True, bound=(-1,5))
#too lazy to type initial values from what we try?
#use keyword expansion **besttry
#need to put in initial step size for mu1 and mu2
#with error_mu* otherwise it won't converge(try it yourself)
minimizer = iminuit.Minuit(binlh, error_mu1=0.1, error_mu2=0.1, **besttry)
-c:6: InitialParamWarning: Parameter a is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter b is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter nbg is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter sigma1 is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter nsig1 is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter sigma2 is floating but does not have initial step size. Assume 1. -c:6: InitialParamWarning: Parameter nsig2 is floating but does not have initial step size. Assume 1.
minimizer.migrad();
FCN = 102.412104526 | TOTAL NCALL = 253 | NCALLS = 253 |
EDM = 1.65495586821e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | a | 8.328082e-01 | 4.684544e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | b | 3.636513e+00 | 1.243742e-01 | 0.000000e+00 | 0.000000e+00 | |||
3 | c | 3.544285e+00 | 1.310163e-01 | 0.000000e+00 | 0.000000e+00 | |||
4 | nbg | 1.991538e+04 | 1.633711e+02 | 0.000000e+00 | 0.000000e+00 | |||
5 | mu1 | -1.479097e-03 | 4.663990e-03 | 0.000000e+00 | 0.000000e+00 | |||
6 | sigma1 | 1.962934e-01 | 4.062938e-03 | 0.000000e+00 | 0.000000e+00 | |||
7 | nsig1 | 2.989627e+03 | 6.936758e+01 | 0.000000e+00 | 0.000000e+00 | |||
8 | mu2 | 4.003400e+00 | 2.078124e-03 | 0.000000e+00 | 0.000000e+00 | |||
9 | sigma2 | 9.867706e-02 | 1.988008e-03 | 0.000000e+00 | 0.000000e+00 | |||
10 | nsig2 | 5.050094e+03 | 1.004303e+02 | 0.000000e+00 | 0.000000e+00 |
binlh.show(minimizer)
minimizer.print_fmin()
FCN = 102.412104526 | TOTAL NCALL = 253 | NCALLS = 253 |
EDM = 1.65495586821e-06 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | a | 8.328082e-01 | 4.684544e-02 | 0.000000e+00 | 0.000000e+00 | |||
2 | b | 3.636513e+00 | 1.243742e-01 | 0.000000e+00 | 0.000000e+00 | |||
3 | c | 3.544285e+00 | 1.310163e-01 | 0.000000e+00 | 0.000000e+00 | |||
4 | nbg | 1.991538e+04 | 1.633711e+02 | 0.000000e+00 | 0.000000e+00 | |||
5 | mu1 | -1.479097e-03 | 4.663990e-03 | 0.000000e+00 | 0.000000e+00 | |||
6 | sigma1 | 1.962934e-01 | 4.062938e-03 | 0.000000e+00 | 0.000000e+00 | |||
7 | nsig1 | 2.989627e+03 | 6.936758e+01 | 0.000000e+00 | 0.000000e+00 | |||
8 | mu2 | 4.003400e+00 | 2.078124e-03 | 0.000000e+00 | 0.000000e+00 | |||
9 | sigma2 | 9.867706e-02 | 1.988008e-03 | 0.000000e+00 | 0.000000e+00 | |||
10 | nsig2 | 5.050094e+03 | 1.004303e+02 | 0.000000e+00 | 0.000000e+00 |
Sometimes, what we want to fit is the sum of likelihood /chi^2 of two PDF sharing some parameters. dist_fit doesn't provied a built_in facility to do this but it can be built easily.
In this example, we will fit two gaussian distribution where we know that the width are the same but the peak is at different places.
import numpy.random as npr
from probfit import UnbinnedLH, gaussian, describe, draw_compare_hist
import iminuit
npr.seed(0)
#Lets make two gaussian
data1 = npr.randn(10000)+3
data2 = npr.randn(10000)-2
hist([data1,data2],bins=100,histtype='step',label=['data1','data2']);
#remember this is nothing special about builtin cost function
#except some utility function like draw and show
ulh1 = UnbinnedLH(gaussian,data1)
ulh2 = UnbinnedLH(gaussian,data2)
print describe(ulh1)
print describe(ulh2)
['mean', 'sigma'] ['mean', 'sigma']
#you can also use cython to do this
class CustomCost:
def __init__(self,pdf1,data1,pdf2,data2):
self.ulh1 = UnbinnedLH(pdf1,data1)
self.ulh2 = UnbinnedLH(pdf2,data2)
#this is the important part you need __call__ to calculate your cost
#in our case it's sum of likelihood with sigma
def __call__(self,mu1,mu2,sigma):
return self.ulh1(mu1,sigma)+self.ulh2(mu2,sigma)
simul_lh = CustomCost(gaussian,data1,gaussian,data2)
minimizer = iminuit.Minuit(simul_lh,sigma=0.5)
minimizer.set_up(0.5)#remember it's likelihood
minimizer.migrad();
-c:1: InitialParamWarning: errordef is not given. Default to 1. -c:1: InitialParamWarning: Parameter mu1 does not have initial value. Assume 0. -c:1: InitialParamWarning: Parameter mu1 is floating but does not have initial step size. Assume 1. -c:1: InitialParamWarning: Parameter mu2 does not have initial value. Assume 0. -c:1: InitialParamWarning: Parameter mu2 is floating but does not have initial step size. Assume 1. -c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
FCN = 28184.0142876 | TOTAL NCALL = 97 | NCALLS = 97 |
EDM = 2.24683294143e-09 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu1 | 2.981566e+00 | 9.903099e-03 | 0.000000e+00 | 0.000000e+00 | |||
2 | mu2 | -1.989012e+00 | 9.903099e-03 | 0.000000e+00 | 0.000000e+00 | |||
3 | sigma | 9.903098e-01 | 4.951551e-03 | 0.000000e+00 | 0.000000e+00 |
minimizer.print_fmin()
minimizer.print_matrix()
results = minimizer.values
FCN = 28184.0142876 | TOTAL NCALL = 97 | NCALLS = 97 |
EDM = 2.24683294143e-09 | GOAL EDM = 5e-06 | UP = 0.5 |
Valid | Valid Param | Accurate Covar | PosDef | Made PosDef |
True | True | True | True | False |
Hesse Fail | HasCov | Above EDM | Reach calllim | |
False | True | False | False |
+ | Name | Value | Parab Error | Minos Error- | Minos Error+ | Limit- | Limit+ | FIXED |
1 | mu1 | 2.981566e+00 | 9.903099e-03 | 0.000000e+00 | 0.000000e+00 | |||
2 | mu2 | -1.989012e+00 | 9.903099e-03 | 0.000000e+00 | 0.000000e+00 | |||
3 | sigma | 9.903098e-01 | 4.951551e-03 | 0.000000e+00 | 0.000000e+00 |
+ |
mu1
|
mu2
|
sigma
|
mu1 | 1.00 | 0.00 | 0.00 |
mu2 | 0.00 | 1.00 | 0.00 |
sigma | 0.00 | 0.00 | 1.00 |
draw_compare_hist(gaussian,[results['mu1'],results['sigma']],data1,normed=True);
draw_compare_hist(gaussian,[results['mu2'],results['sigma']],data2,normed=True);