1-D data with errors

Here we are going to fit a 1-D spectrum with errors, so our input will be three arrays: x values, y values, and errors on the y values.

In a new working directory, download a MAST spectrum of 3C 273 and start IPython

$ ipython --matplotlib

If you have trouble accessing the spectrum you can download it straight away using Python

from astropy.extern.six.moves.urllib import request
url = 'http://python4astronomers.github.com/_downloads/3c273.fits'
open('3c273.fits', 'wb').write(request.urlopen(url).read())

We also need to load in Sherpa

import sherpa.ui as ui
import numpy as np

Loading the data

ui.load_data('3c273.fits')
print(ui.get_data())
name      = 3c273.fits
x         = Float64[1024]
y         = Float64[1024]
staterror = None
syserror  = None

Note

The load_data command may not work in the stand-alone version of Sherpa. If not, you can use astropy.io.fits to load in the data and then load_arrays, for example:

from astropy.io import fits
dat = fits.open('3c273.fits')[1].data
wlen = dat.field('WAVELENGTH')
flux = dat.field('FLUX')
ui.load_arrays(1, wlen, flux)

As the file contains two columns, they are taken to be the x and y data values. The y values are small (of order 10^-14):

ui.get_data().y.mean()
4.3317031058773416e-14
np.ptp(ui.get_data().y)
9.1256637866471944e-14

Note

The numpy ptp routine returns the range of the data, and is short for “peak to peak”.

Re-scaling the data

Whilst Sherpa can deal with small (and large) values, it can be visually easier to deal with values closer to unity, so we re-scale the data values:

d1 = ui.get_data()
d1.y *= 1e14

Note

I am taking advantage of Sherpa’s use of python objects to directly change the y values of the data.

and add in errors (for the sake of this example we assume a 2 percent error on each data point).

d1.staterror == None
True
ui.set_staterror(0.02, fractional=True)
d1.staterror == None
False
ui.plot_data()
../_images/3c273_data_errors.png

Hint

I could have used d1.staterror = 0.02 * d1.y instead of the set_staterror command.

Note

Sherpa supports both statistical and systematic errors. Here we will be dealing with statistical errors only.

Filtering the data

It can be useful to only fit a subset of the data - e.g. to concentrate on a particular feature - and then go back and fit all the data. The Sherpa commands are notice and ignore.

ui.notice(3000, 5700)

From this point the extra data will not be used by Sherpa, whether in a fit or a plot command.

Hint

Try ui.plot_data() and compare the results to the original plot.

Fitting the continuum

We start with a powerlaw model, with a normalization defined at 4000 Angstroms.

ui.set_source(ui.powlaw1d.pow1)
pow1.ref = 4000.0
print(pow1)
powlaw1d.pow1
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   pow1.gamma   thawed            1          -10           10
   pow1.ref     frozen         4000 -3.40282e+38  3.40282e+38
   pow1.ampl    thawed            1            0  3.40282e+38

Check the statistic:

ui.get_stat()
Chi Squared with Gehrels variance
ui.get_stat_name()
'chi2gehrels'

Since we have explicitly given an error column all the chi-square statistics will give the same result (the Gehrels part of the name is used to indicate how errors are estimated from the data).

ui.fit()
Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 1.41325e+06
Final fit statistic   = 20230.3 at function evaluation 16
Data points           = 983
Degrees of freedom    = 981
Probability [Q-value] = 0
Reduced statistic     = 20.6221
Change in statistic   = 1.39302e+06
   pow1.gamma     1.98798
   pow1.ampl      4.42533
ui.plot_fit()
../_images/3c273_fit_powerlaw.png

Viewing the results

results = ui.get_fit_results()
print(results)
datasets       = (1,)
itermethodname = none
methodname     = levmar
statname       = chi2gehrels
succeeded      = True
parnames       = ('pow1.gamma', 'pow1.ampl')
parvals        = (1.9879834342270963, 4.4253291641631725)
statval        = 20230.3241618
istatval       = 1413250.24877
dstatval       = 1393019.92461
numpoints      = 983
dof            = 981
qval           = 0.0
rstat          = 20.6221449152
message        = successful termination
nfev           = 16

or we can use the show_fit command, which pipes information through a pager (typically less or more).

ui.show_fit()

Hint

There are number of show_* commands; try tab completion to find them all.

Adding lines to the fit

I have decided to include 4 gaussians to deal with the strongest lines in the spectrum:

for n in range(1, 5):
    ui.create_model_component("gauss1d", "g{}".format(n))

ui.set_source(pow1 + g1 + g2 + g3 + g4)
ui.get_source()
<BinaryOpModel model instance '((((powlaw1d.pow1 + gauss1d.g1) + gauss1d.g2) + gauss1d.g3) + gauss1d.g4)'>

Note

I could just have included the components in the set_source expression directly: e.g. set_source(pow1 + ui.gauss1d.g1 + ..).

Manual selection for the starting point suggests:

g1.pos = 3250
g2.pos = 5000
g3.pos = 5260
g4.pos = 5600

Note

I could also set the min/max values for these parameters to ensure they remain in a valid range: for example ui.set_par(g1.pos, 3250, min=3000, max=5700).

We also shift the starting value for the FWHM:

for p in [g1, g2, g3, g4]:
    p.fwhm = 50

Note

Since the parameters are just Python objects we can pass them around as we would other objects.

Note

We do not use guess here since it is not designed to work on multi-copmponent data: all the gaussians would be centered at a wavelength of 3240.

ui.fit()
Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 19336.7
Final fit statistic   = 4767.96 at function evaluation 196
Data points           = 983
Degrees of freedom    = 969
Probability [Q-value] = 0
Reduced statistic     = 4.92049
Change in statistic   = 14568.7
   pow1.gamma     2.10936
   pow1.ampl      4.34391
   g1.fwhm        40.2425
   g1.pos         3239.92
   g1.ampl        2.81148
   g2.fwhm        68.9131
   g2.pos         5032.03
   g2.ampl        0.677329
   g3.fwhm        129.595
   g3.pos         5280.45
   g3.ampl        0.304465
   g4.fwhm        78.9905
   g4.pos         5634.3
   g4.ampl        1.61164

ui.plot_fit_delchi()
../_images/3c273_fit_lines_delchi.png

Hint

Since we have errors we can now look at the residuals in terms of ‘sigma’.

More gaussians

I want to know if there’s a broad-line component for the 3240 Angstrom line, and I want to show you how to “link” model parameters, so I will assume that the broad-line component has four times the width of the narrow component.

ui.gauss1d.g1broad
<Gauss1D model instance 'gauss1d.g1broad'>
g1broad.pos = g1.pos
g1broad.fwhm = g1.fwhm * 4
ui.set_source(ui.get_source() + g1broad)

Note

You can create model components whenever you want; it need not be within a set_source call. Similarly, source expressions can be treated as a variable.

print(ui.get_source())
(((((powlaw1d.pow1 + gauss1d.g1) + gauss1d.g2) + gauss1d.g3) + gauss1d.g4) + gauss1d.g1broad)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   pow1.gamma   thawed      2.10936          -10           10
   pow1.ref     frozen         4000 -3.40282e+38  3.40282e+38
   pow1.ampl    thawed      4.34391            0  3.40282e+38
   g1.fwhm      thawed      40.2425  1.17549e-38  3.40282e+38
   g1.pos       thawed      3239.92 -3.40282e+38  3.40282e+38
   g1.ampl      thawed      2.81148 -3.40282e+38  3.40282e+38
   g2.fwhm      thawed      68.9131  1.17549e-38  3.40282e+38
   g2.pos       thawed      5032.03 -3.40282e+38  3.40282e+38
   g2.ampl      thawed     0.677329 -3.40282e+38  3.40282e+38
   g3.fwhm      thawed      129.595  1.17549e-38  3.40282e+38
   g3.pos       thawed      5280.45 -3.40282e+38  3.40282e+38
   g3.ampl      thawed     0.304465 -3.40282e+38  3.40282e+38
   g4.fwhm      thawed      78.9905  1.17549e-38  3.40282e+38
   g4.pos       thawed       5634.3 -3.40282e+38  3.40282e+38
   g4.ampl      thawed      1.61164 -3.40282e+38  3.40282e+38
   g1broad.fwhm linked       160.97      expr: (g1.fwhm * 4)
   g1broad.pos  linked      3239.92             expr: g1.pos
   g1broad.ampl thawed            1 -3.40282e+38  3.40282e+38

Note

The parameter values indicate when they are linked, and to what, in the output above.

Since I am interested in the first line, and the other lines are unlikely to change the fit significantly, we freeze them:

ui.freeze(g2, g3, g4)

and filter out parts of the data that “look messy” (e.g. the Fe complex).

ui.ignore(3360, 4100)
ui.fit()
Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 4802.25
Final fit statistic   = 2307.19 at function evaluation 92
Data points           = 714
Degrees of freedom    = 708
Probability [Q-value] = 2.14817e-168
Reduced statistic     = 3.25874
Change in statistic   = 2495.06
   pow1.gamma     2.01481
   pow1.ampl      4.22548
   g1.fwhm        28.882
   g1.pos         3239.96
   g1.ampl        2.26982
   g1broad.ampl   1.0672

ui.plot_fit_delchi()
../_images/3c273_fit_broadline_ignore.png

Now we add back in the “ugly” part of the spectrum and plot up the contribution from just the power-law component.

ui.notice(3000, 5700)
ui.plot_fit()
ui.plot_model_component(pow1, overplot=True)
../_images/3c273_fit_broadline_component.png

What about errors?

It is no good just being able to fit parameter values, we want to know errors on these values. Since the overall fit is not particularly good (a reduced chi-square of over 3), here I focus on a single line:

ui.notice()
ui.notice(4900, 5150)
ui.plot_fit()

Note

The notice and ignore commands behave differently when no previous filter has been applied to when they are being used to adjust a previously-filtered data set.

Here we fit just the g2 and pow1 components:

ui.freeze(g1, g1broad, g3, g4)
ui.thaw(g2)
ui.fit()
Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 45.9588
Final fit statistic   = 38.8045 at function evaluation 37
Data points           = 91
Degrees of freedom    = 86
Probability [Q-value] = 0.999997
Reduced statistic     = 0.451215
Change in statistic   = 7.15427
   pow1.gamma     1.90087
   pow1.ampl      4.0954
   g2.fwhm        73.7743
   g2.pos         5031.71
   g2.ampl        0.69471

ui.plot_model(overplot=True)
# pychips.set_curve(['*.color', 'blue'])
../_images/3c273_g2.png

The reduced chi-square value is significantly less than 1, which suggests that the errors have been over-estimated, but let’s continue with the analysis:

ui.get_fit_results().rstat
0.45121542024712424
ui.conf()
pow1.gamma lower bound:       -0.165746
g2.pos lower bound:   -0.937708
g2.pos upper bound:   0.937708
pow1.gamma upper bound:       0.165746
g2.ampl lower bound:  -0.0189111
g2.ampl upper bound:  0.0189111
pow1.ampl lower bound:        -0.149452
pow1.ampl upper bound:        0.154944
g2.fwhm lower bound:  -2.71108
g2.fwhm upper bound:  2.80876
Dataset               = 1
Confidence Method     = confidence
Iterative Fit Method  = None
Fitting Method        = levmar
Statistic             = chi2gehrels
confidence 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   pow1.gamma        1.90087    -0.165746     0.165746
   pow1.ampl          4.0954    -0.149452     0.154944
   g2.fwhm           73.7743     -2.71108      2.80876
   g2.pos            5031.71    -0.937708     0.937708
   g2.ampl           0.69471   -0.0189111    0.0189111

Note

It should hopefully come as no suprise to find out that there is a get_conf_results command that returns the conf results as a Python object.

Hint

The covar command can also be used; for a good search space it should return the same results, but is not as robust for more-complicated situations.

We can look at the search surface for one or two parameters with the int_proj and reg_proj commands:

ui.int_proj(g2.pos)
ui.int_proj(g2.pos, min=5030, max=5033)

Note

int_proj is short for interval projection, and reg_proj stands for region projection. Both commands create a plot showing how the statistic value changes as the parameter(s) vary (by re-fitting all the other thawed parameters).

../_images/3c273_g2_pos.png
ui.reg_proj(g2.fwhm, pow1.gamma)
../_images/3c273_g2_fwhm_gamma.png

Note

The error routines - e.g. conf, int_proj, and reg_proj - will take advantage of multiple cores on your machine. Unfortunately fit does not.