Notebook

Fit a model to observations¶

In the Quickstart example notebook we saw a quick introduction to forward modeling the upper atmosphere and He triplet signal of HD 209458 b. In this notebook we will go over an advanced-level tutorial on retrieving the properties of the upper atmosphere of HAT-P-11 b using p-winds models.

In [ ]:

import numpy as np
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import astropy.constants as c
import astropy.units as u
from scipy.optimize import minimize
from astropy.io import fits
from astropy.convolution import convolve
from p_winds import tools, parker, hydrogen, helium, transit, microphysics

# Uncomment the next line if you have a MacBook with retina screen
# %config InlineBackend.figure_format = 'retina'
pylab.rcParams['figure.figsize'] = 9.0,6.5
pylab.rcParams['font.size'] = 18

Let's start with the observation of the He triplet transmission spectrum of HAT-P-11 b using the CARMENES spectrograph. This data is openly available in the DACE platform. But we will retrieve it from a public Gist for convenience.

In [ ]:

# The observed transmission spectrum
data_url = 'https://gist.githubusercontent.com/ladsantos/a8433928e384819a3632adc469bed803/raw/a584e6e83073d1ad3444248624927838588f22e4/HAT-P-11_b_He.dat'
# We skip 2 rows instead of 1 to have an odd number of rows and allow a fast convolution later
He_spec = np.loadtxt(data_url, skiprows=2)
wl_obs = He_spec[:, 0]  # Angstrom
f_obs = 1 - He_spec[:, 1] * 0.01  # Normalized flux
u_obs = He_spec[:, 2] * 0.01  # Flux uncertainty

# Convert in-vacuum wavelengths to in-air
s = 1E4 / np.mean(wl_obs)
n = 1 + 0.0000834254 + 0.02406147 / (130 - s ** 2) + 0.00015998 / (38.9 - s ** 2)
wl_obs /= n

# We will also need to know the instrumental profile that
# widens spectral lines. We take the width from Allart et al. (2018),
# the paper describing the HAT-P-11 b data.
def gaussian(x, mu=0.0, sigma=1.0):
    return 1 / sigma / (2 * np.pi) ** 0.5 * np.exp(-0.5 * (x - mu) ** 2 / sigma ** 2)

instrumental_profile_width_v = 3.7  # Instrumental profile FWHM in km / s (assumed Gaussian)
sigma_wl = instrumental_profile_width_v / \
    c.c.to(u.km / u.s).value * np.mean(wl_obs)  # Same unit as wl_obs
instrumental_profile = gaussian(wl_obs, np.mean(wl_obs), sigma=sigma_wl)

plt.errorbar(wl_obs, f_obs, yerr=u_obs)
plt.xlabel(r'Wavelength (${\rm \AA}$)')
plt.ylabel('Normalized flux')
plt.show()

Now we set up the simulation. This is quite a dense cell of configurations, but you should be familiar with all of it if you followed the quickstart example.

In [ ]:

# Set up the simulation

# Fixed parameters of HAT-P-11 b (not to be sampled)
R_pl = 0.389  # Planetary radius (Jupiter radii)
M_pl = 0.09  # Planetary mass (Jupiter masses)
a_pl = 0.05254  # Semi-major axis (au)
planet_to_star_ratio = 0.057989
baseline = planet_to_star_ratio ** 2
impact_parameter = 0.132
h_he = 0.9  # H/He fraction (assumed for now, but can be a free parameter)
mean_f_ion = 0.90  # Initially assumed, but the model relaxes for it

# Physical constants
m_h = c.m_p.to(u.g).value  # Hydrogen atom mass in g
m_He = 4 * 1.67262192369e-27  # Helium atomic mass in kg
k_B = 1.380649e-23  # Boltzmann's constant in kg / (m / s) ** 2 / K

# Free parameters (to be sampled with MCMC)
# The reason why we set m_dot and T in log is because
# we will fit for them in log space
log_m_dot_0 = np.log10(7E10)  # Planetary mass loss rate (g / s)
log_T_0 = np.log10(9000)  # Atmospheric temperature (K)
v_wind_0 = -2E3  # Line-of-sight wind velocity (m / s)

# Altitudes samples (this can be a very important setting)
r = np.logspace(0, np.log10(20), 100)

# First guesses of fractions (not to be fit, but necessary for the calculation)
initial_f_ion = 0.0  # Fraction of ionized hydrogen
initial_f_he = np.array([1.0, 0.0])  # Fraction of singlet, triplet helium

# Model settings
relax_solution = True  # This will iteratively relax the solutions until convergence
sample_phases = np.array([-0.5, -0.25, 0.0, 0.25, 0.5])  # Phases that we will average to obtain the final spectrum
w0, w1, w2, f0, f1, f2, a_ij = microphysics.he_3_properties()
w_array = np.array([w0, w1, w2])  # Central wavelengths of the triplet
f_array = np.array([f0, f1, f2])  # Oscillator strengths of the triplet
a_array = np.array([a_ij, a_ij, a_ij])  # This is the same for all lines in then triplet
n_samples = len(sample_phases)
transit_grid_size = 121  # Also very important to constrain computation time
atol = 1E-8  # Absolute numerical tolerance for solve_ivp solver
rtol = 1E-8  # Relative numerical tolerance for solve_ivp solver

The full spectrum of HAT-P-11 until 2600 Å is not known. But we can use a proxy for which we do have a full spectrum: HD 40307. It has a similar size, spectral type, effective temperature, and surface gravity as HAT-P-11. We take the spectrum from the MUSCLES database. The original file is fairly large, so for convenience we also retrieve the relevant section of the spectrum from a public Gist (already scaled to correspond to the irradiation at the planet HAT-P-11 b).

In [ ]:

data_url = 'https://gist.githubusercontent.com/ladsantos/c7d1aae1ecc755bae9f1c8ef1545cf8d/raw/cb444d9b4ff9853672dab80a4aab583975557449/HAT-P-11_spec.dat'
spec = np.loadtxt(data_url, skiprows=1)
host_spectrum = {'wavelength': spec[:, 0], 'flux_lambda': spec[:, 1],
                 'wavelength_unit': u.angstrom,
                 'flux_unit': u.erg / u.s / u.cm ** 2 / u.angstrom}

plt.loglog(host_spectrum['wavelength'], host_spectrum['flux_lambda'])
plt.xlabel(r'Wavelength (${\rm \AA}$)')
plt.ylabel(r'Flux density (erg s$^{-1}$ cm$^{-2}$ ${\rm \AA}^{-1}$)')
plt.show()

Before we start fitting the observed data to models, we have to do a few sanity checks and assess if all the moving parts of p-winds will work well for the configuration you set in the cell above. Most numerical issues are caused when using the scipy.integrate routines.

We start by assessing if the atmospheric model behaves well.

In [ ]:

# Calculate the model
def atmospheric_model(theta):
    log_m_dot, log_T = theta
    m_dot = 10 ** log_m_dot
    T = 10 ** log_T
    
    vs = parker.sound_speed(T, h_he, mean_f_ion)  # Speed of sound (km/s, assumed to be constant)
    rs = parker.radius_sonic_point(M_pl, vs)  # Radius at the sonic point (jupiterRad)
    rhos = parker.density_sonic_point(m_dot, rs, vs)  # Density at the sonic point (g/cm^3)
    r_array = r * R_pl / rs  # altitudes in unit of radius at the sonic point
    v_array, rho_array = parker.structure(r_array)  # velocities and densities in units at the sonic point
    
    f_r = hydrogen.ion_fraction(r, R_pl, T, h_he,
                                m_dot, M_pl, mean_f_ion,
                                spectrum_at_planet=host_spectrum,
                                initial_f_ion=initial_f_ion, relax_solution=relax_solution)

    f_he_1, f_he_3 = helium.population_fraction(r, v_array, rho_array, f_r,
        R_pl, T, h_he, vs, rs, rhos, host_spectrum,
        initial_state=initial_f_he, atol=atol, rtol=rtol, relax_solution=relax_solution,
                                                solver='Radau')

    # Number density of helium nuclei
    n_he = (rho_array * rhos * (1 - h_he) / (1 + 4 * (1 - h_he)) / m_h)
    
    # Number density distribution of helium
    n_he_1 = f_he_1 * n_he
    n_he_3 = f_he_3 * n_he
    n_he_ion = (1 - f_he_1 - f_he_3) * n_he
    
    return n_he_1, n_he_3, n_he_ion

# Let's test if the model function is working
theta = (log_m_dot_0, log_T_0)
y0 = (initial_f_ion, initial_f_he)
n_he_1, n_he_3, n_he_ion = atmospheric_model(theta)

plt.semilogy(r, n_he_1, color='C0', label='He singlet')
plt.semilogy(r, n_he_3, color='C1', label='He triplet')
plt.semilogy(r, n_he_ion, color='C2', label='He ionized')
plt.xlabel(r'Radius (R$_\mathrm{pl}$)')
plt.ylabel('Number density (cm$^{-3}$)')
plt.xlim(1, 10)
plt.ylim(1E-2, 1E10)
plt.legend()
plt.show()

Seems to be working fine. Now we do a sanity check for the radiative transfer. There is not a lot of things that can break here, but we do it anyway.

In [ ]:

# The transmission spectrum model
def transmission_model(wavelength_array, v_wind, n_he_3_distribution, log_T):

    # Set up the transit configuration
    R_pl_physical = R_pl * 71492000  # Planet radius in m
    r_SI = r * R_pl_physical  # Array of altitudes in m
    n_he_3_SI = n_he_3_distribution * 1E6  # Volumetric densities in 1 / m ** 3

    # Set up the simplified ray tracing.
    f_maps = []
    d_maps = []
    for i in range(n_samples):
        flux_map, density_map = transit.draw_transit(
            planet_to_star_ratio,
            impact_parameter=impact_parameter,
            phase=sample_phases[i],
            density_profile=n_he_3_SI,
            profile_radius=r_SI,
            planet_physical_radius=R_pl_physical,
            grid_size=transit_grid_size
                                   )
        f_maps.append(flux_map)
        d_maps.append(density_map)
    
    # Do the radiative transfer
    spectra = []
    v_turb = (5 / 3 * k_B * (10 ** log_T) / m_He) ** 0.5
    for i in range(n_samples):
        spec = transit.radiative_transfer(f_maps[i], d_maps[i],
                                          wavelength_array, w_array, f_array, a_array, 
                                          10 ** log_T, m_He, v_wind, turbulence_speed=v_turb)
        spectra.append(spec)

    spectra = np.array(spectra)
    spectrum = np.mean(spectra, axis=0)
    return spectrum

# Here we divide wl_obs by 1E10 to convert angstrom to m
t_spectrum = transmission_model(wl_obs / 1E10, v_wind_0, n_he_3, log_T_0)
plt.errorbar(wl_obs, f_obs, yerr=u_obs)
plt.plot(wl_obs, t_spectrum + planet_to_star_ratio ** 2, color='k', lw=2)
plt.xlabel(r'Wavelength (${\rm \AA}$)')
plt.ylabel('Normalized flux')
plt.show()

Alright, it seems that our first guess does not fit very well. But we shall soon fix this issue. For now, let's write a cascading model that combines both the atmosphere and the radiative transfer. This function will also convolve our predicted spectrum with the instrumental profile.

In [ ]:

def cascading_model(theta, wavelength_array):
    log_m_dot, log_T, v_wind = theta
    n_he_1, n_he_3, n_he_ion = atmospheric_model((log_m_dot, log_T))
    t_spec = transmission_model(wavelength_array, v_wind, n_he_3, log_T) + baseline
    t_spec_conv = convolve(t_spec, instrumental_profile, boundary='extend')
    return t_spec

theta0 = (log_m_dot_0, log_T_0, v_wind_0)

t_spec = cascading_model(theta0, wl_obs / 1E10)

plt.errorbar(wl_obs, f_obs, yerr=u_obs)
plt.plot(wl_obs, t_spec, color='k', lw=2)
plt.xlabel(r'Wavelength (${\rm \AA}$)')
plt.ylabel('Normalized flux')
plt.show()

Great, it seems that the cascading model is also working well. We will fit it to the observations using a maximum likelihood estimation. The log-likelihood is defined as:

$$ \ln{p(y | x, \sigma, \log{\dot{m}}, \log{T}, v_{\rm wind})} = -\frac{1}{2} \sum_n \left[ \frac{\left(y_n - y_{\rm model}\right)^2}{\sigma^2} + \ln{\left(2\pi \sigma^2 \right)} \right] $$

We do one sneaky trick in the calculation of log-likelihood here to avoid some numerical issues. The problem is that scipy.integrate.solve_ivp(), which calculates the steady-state ionization of He, for some reason, can ocassionally become numerically unstable in some very specific cases and lose precision, yielding a ValueError. These solutions are of no use to us, but we do not want them to stop our optimization. So we discard them by making the log-likelihood function return -np.inf in those cases.

In [ ]:

def log_likelihood(theta, x, y, yerr):
    try:
        model = cascading_model(theta, x)
        sigma2 = yerr ** 2 + model ** 2
        return -0.5 * np.sum((y - model) ** 2 / sigma2 + np.log(sigma2))
    except ValueError:
        return -np.inf

With all that set, we use scipy.optimize.minimize() to maximize the likelihood of our solution and find the best fit. This calculation takes a few minutes to run on a computer with a 3.1 GHz CPU, so I commented the line that actually does this calculation as to not use the resources of online platforms that compile this notebook and upset the powers that be. But you should try running it in your own computer.

You will likely run into some warnings, but the result should be robust. The actual computation time depends on how bad the first guess was, so you will probably save some time if you do a first fit by eye and than optimize it. You can also try changing the method option of minimize().

In [ ]:

nll = lambda *args: -log_likelihood(*args)
args = (wl_obs / 1E10, f_obs, u_obs)
# soln = minimize(nll, theta0, args=args, method='Nelder-Mead')
# m_ml, b_ml, log_f_ml = soln.x

When I started from a very good guess (m_dot = 7E10, T_0 = 12000, v_wind_0 = -2000.0), minimize() converges to a best fit solution of $\dot{m} = 5.7 \times 10^{10}$ g s$^{-1}$, $T = 11235$ K, and $v_{\rm wind} = -2.0$ km s$^{-1}$ in about 5 minutes in a 3.1 GHz CPU.