#!/usr/bin/env python # coding: utf-8 # # Image combination # # Image combination serves several purposes. Combining images: # # + reduces noise in images # + can remove transient artifacts like cosmic rays and satellite tracks # + can remove stars in flat images taken at twilight # # It's essential that several of each type of calibration image (bias, dark, flat) # be taken. Combining them reduces the noise in the images by roughly a factor of # $1/\sqrt{N}$, where $N$ is the number of images being combined. As shown in the # previous notebook, using a single calibration image actually *increases* the # noise in your image. # # There are a few ways to combine images; if done properly, features that show up # in only one of the images (like cosmic rays) are not present in the combination. # If done incorrectly, those features show up in your combined images and then # contaminate your calibrated science images too. # # ## The bottom line: combine by averaging images, but clip extreme values # # The remainder of this notebook demonstrates this conclusion and explains how to # do a combination by averaging images with [ccdproc](https://ccdproc.readthedocs.io/en/latest/). # In[ ]: import os import numpy as np get_ipython().run_line_magic('matplotlib', 'inline') from matplotlib import pyplot as plt from matplotlib import rc from astropy.visualization import hist from astropy.stats import mad_std # In[ ]: # Use custom style for larger fonts and figures plt.style.use('guide.mplstyle') # In[ ]: # Set some default parameters for the plots below rc('font', size=20) rc('axes', grid=True) # In[ ]: # Set up the random number generator, allowing a seed to be set from the environment seed = os.getenv('GUIDE_RANDOM_SEED', None) if seed is not None: seed = int(seed) # This is the generator to use for any image component which changes in each image, e.g. read noise # or Poisson error noise_rng = np.random.default_rng(seed) # ## Combination method: average or median? # # In this section we'll look at a simplified version of the challenges of # combining images to reduce noise. It's fair to think of astronomical images # (especially bias and dark images) as being a Gaussian distribution of pixel # values around the bias level, and a width related to the read noise of the # detector. To simplify what follows, we will work arrays of random numbers drawn # from a Gaussian distribution instead of with astronomical images. # # In properly done flat images the noise is technically a Poisson distribution, # but with a large enough number of counts, the distribution is indistinguishable # from a Gaussian distribution whose width is related to the square root of the # number of counts. While some regions of a science image are dominated by Poisson # noise from sources in the image, most of the image will be dominated by Gaussian # read noise from the detector or Poisson noise from the sky background. # # Instead of working with a combination of images, we'll create 100 Gaussian # distributions with a mean of zero, and a standard deviation of one, and combine # those two different ways: by finding the average and by finding the median. Each # distribution has size $320^2$ so that we can view it as either a distribution of # 102,400 values or as an image that is $320 \times 320$. # # We can think of each of these 100 distributions as representing an image, like a # bias or dark. To make the analogy to real images a little more direct, a "bias" # of 1000 is added to each distribution. # In[ ]: n_distributions = 100 bias_level = 1000 n_side = 320 bits = noise_rng.normal(size=(n_distributions, n_side**2)) + bias_level average = np.average(bits, axis=0) median = np.median(bits, axis=0) # Now that we've created the distributions and combined them in two different # ways, let's take a look at them. The [`hist` function from astropy.visualization](https://astropy.readthedocs.io/en/stable/visualization/histogram.html) is used # below because it can figure out what bin size to use for your data. # # # In[ ]: fig, ax = plt.subplots(1, 2, sharey=True, tight_layout=True, figsize=(20, 10)) hist(bits[0, :], bins='freedman', ax=ax[0]); ax[0].set_title('One sample distribution') ax[0].set_xlabel('Pixel value') ax[0].set_ylabel('Number of pixels') hist(average, bins='freedman', label='average', alpha=0.5, ax=ax[1]); hist(median, bins='freedman', label='median', alpha=0.5, ax=ax[1]); ax[1].set_title('{} distributions combined'.format(n_distributions)) ax[1].set_xlabel('Pixel value') ax[1].legend() # Combining by averaging gives a narrower (i.e. less noisy) distribution than # combining by median, though both substantially reduced the width of the # distribution. The conclusion so far is that combining by averaging is mildly # preferable to combining by median. Computationally, the mean is also faster to # compute than the median. # ### Image view of these distributions # # As suggested above, we could view each of these distributions as an image # instead of a histogram. One take away from the diagram below is that in this # case, the difference between mean and median is not apparent. # # In all cases, the extreme values of the image display are set to bracket the # width of the initial distribution. # In[ ]: fig, axes = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(20, 10)) data_source = [bits[0, :], average, median] titles = ['One distrbution', 'Average of {n}'.format(n=n_distributions), 'Median of {n}'.format(n=n_distributions)] for axis, data, title in zip(axes, data_source, titles): axis.imshow(data.reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3) axis.set_xticks([]) axis.set_yticks([]) axis.grid(False) axis.set_title(title) # ## The effect of outliers # # Suppose that, in just one of the 100 distributions we're combining, there are a # small number of extreme values. In astronomical images these extremes happen # very frequently because of cosmic ray hits on the detector that cause, in one # small patch of a calibration image, much higher counts. Another case occurs when # combining twilight flats, which often contain faint images of stars. # # In the example below, we set just 50 points out of the 102,400 in the first # distribution to a somewhat higher value than the rest. # In[ ]: bits[0, 10000:10050] = 2 * bias_level # Remember, we can think of the values in this distribution as an image, a view # that will be particularly convenient in this case. # In[ ]: plt.imshow(bits[0, :].reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3) plt.xticks([]) plt.yticks([]) plt.title('One distribution with outliers') plt.grid(False) # Now that we know what the outliers in this (and *only* this) distribution look # like, we'll combine all of the distributions as we did above. # In[ ]: average = np.average(bits, axis=0) median = np.median(bits, axis=0) # Even though only one out of the 100 "images" we're combining has these high # pixel values, the distribution of pixels for the average is clearly affected # (well, maybe not clearly, since seeing it requires a logarithmic $y$-axis). The # distribution for the median looks much the same as above. Since median simply # looks for the middle value, an extreme value doesn't affect the result too much. # In[ ]: plt.figure(figsize=(10, 10)) hist(average, bins='freedman', alpha=0.5, label='average'); hist(median, bins='freedman', alpha=0.5, label='median'); plt.legend() plt.xlabel('Counts') plt.ylabel('Number of pixels') plt.semilogy(); # ### Combining using the average has a noticeable effect on the result; median # removes the artifact # # The effect of the outlier is *much* clearer if the distributions are displayed # as images. If the distributions we're combining were calibration images then the # outliers that appear in one image (e.g. a cosmic ray) would affect the combined # image we hoped to use for calibration. # In[ ]: fig, axes = plt.subplots(1, 3, sharey=True, tight_layout=True, figsize=(20, 10)) data_source = [bits[0, :], average, median] titles = ['One distribution with outliers', 'Average of {n}'.format(n=n_distributions), 'Median of {n}'.format(n=n_distributions)] for axis, data, title in zip(axes, data_source, titles): axis.imshow(data.reshape(n_side, n_side), vmin=bias_level - 3, vmax=bias_level + 3) axis.set_xticks([]) axis.set_yticks([]) axis.grid(False) axis.set_title(title) # On one hand, the noise properties are better when you combine by taking the # average. On the other hand, the median eliminates features that appear in only # one image. # # Astronomical images will almost always have those transient features. Even at an # observatory near sea level in an exposure that is very short, cosmic ray hits # are common. # ## The solution: average combine, but clip the extreme values # # The answer here is to first clip extreme values from the distributions and then # combine using the average. That rejects outlying values like the median but with # the modestly better statistical properties of the average. A method called # "sigma clipping" is used to remove the extreme values. # **Please do not use the code below for reducing your data...** # # ...in the next set of notebooks we'll walk through the package # [ccdproc](https://ccdproc.readthedocs.io), which automates much of what you see below. # The section below demonstrates and explains some of what's happening behind the # scenes in [ccdproc](https://ccdproc.readthedocs.io). # ### Sigma clipping # # Sigma clipping means calculating how "far" each pixel to be combined is from the # "typical" value and excluding values from the combination if they are "too far" # from the pixel value. # # To be clear, when evaluating which values to reject we're doing it for each of # the 102,400 points in the distribution (or, if you prefer, each of the # 320$\times$320 pixels in the image) we're going to combine. In other words, for # each point (or pixel), we'll compute a "typical" value for the 100 distributions # (images) we're combining and exclude any from the average that are "too far" # from the "typical value." # # What should be used as the "typical" value, how do we measure how "far" away a # value is, and how far is "too far"? # # The last question is easiest to answer: it depends a bit on the noise level in # your camera but something like 5 farther from the "typical" value than most of # the pixels are. # # Using the average as the typical value and the standard deviation as a measure # of how far a particular value is from the typical value is often not the best # choice. The problem with this is that outlying values in a single distribution # (or image) strongly bias the average and exaggerate the standard deviation. In # this example, where we're combining 100 distributions (images), using the # average and standard deviation might work since there are so many distributions. # A more typical number of bias or dark images that one might combine is 10 or 20. # In that case, an extreme value in one image strongly affects the mean and # standard deviation. # # As an example, consider combining 10, 20, or 100 of our distributions, as shown # in the cell below. Only in the case of 100 distributions would our extreme value # of 2000 be excluded if we excluded values more than 5 times the standard # deviation from the average. # In[ ]: print('Number combined\t Average\t Standard dev σ \t 10σ ') for n_to_combine in [10, 20, n_distributions]: avg = np.mean(bits[:n_to_combine, 10000]) std = np.std(bits[:n_to_combine, 10000]) print('{n:10d}\t{avg:10.2f}\t{std:10.2f}\t{ten_sig:10.2f}'.format(n=n_to_combine, avg=avg, std=std, ten_sig=10 * std)) # A better choice is to use the median as the typical value and the *median # absolute deviation* in place of the standard deviation as the measure of how far # a value is from the typical value. The [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation), or MAD, # of a set of points $x$ is defined by: # $$ # \text{MAD} = \text{median}\big( x_i - \text{median}(x) \big). # $$ # This is a measure of the typical absolute distance from the median of the set of # values. The MAD is not directly equivalent to the standard deviation. The # relationship between the two depends on the distribution of values, but for a # Gaussian distribution multiplying the MAD by 1.4826 does the trick. The # [astropy function `mad_std`](http://docs.astropy.org/en/stable/api/astropy.stats.mad_std.html) will calculate the MAD and multiply by the # appropriate factor for you. # # # Repeating the calculation above but with median as the central value and the MAD # in place of the standard deviation demonstrates that even for 10 distributions # the extreme value will be excluded. # In[ ]: print('{:^20}{:^20}{:^20}{:^20}'.format('Number combined', 'Median', 'MAD σ', '10σ')) for n_to_combine in [10, 20, n_distributions]: avg = np.median(bits[:n_to_combine, 10000]) std = mad_std(bits[:n_to_combine, 10000]) print('{n:^20d}{avg:^20.2f}{std:^20.2f}{ten_sig:^20.2f}'.format(n=n_to_combine, avg=avg, std=std, ten_sig=10 * std)) # The downside to using the median and median absolute deviation? They can be slow # to compute for large images or large stacks of images. # The cells below perform the actual clipping; you should generally use the # astropy function [`sigma_clip`](https://astropy.readthedocs.io/en/stable/stats/robust.html) to do this, but here we'll do # it manually to illustrate the process. # # We begin by calculating the MAD standard deviation estimator for our data. # In[ ]: mad_sigma = mad_std(bits, axis=0) # The expression below is true for all of the points farther than $10 # \sigma_{MAD}$ from the median of the distributions and false everywhere else. # This array will be used to exclude the extreme points. # In[ ]: exclude = (bits - median) / mad_sigma > 10 # Next, we calculate the average, excluding the points identified as "too far" # from from the median. There are two approaches we can take here. One is to use # numpy masked arrays; the other is to temporarily set the excluded values to the # special value `np.nan` and use a numpy function that excludes `nan` from the # calculation. The latter approach is often faster than the former. # # The best approach is really to use a higher-level function from astropy for # ccdproc. Those will take care of the details of implementing the clipping for # you. # In[ ]: original_values = bits[exclude] bits[exclude] = np.nan clip_combine = np.nanmean(bits, axis=0) bits[exclude] = original_values # ## Summary # # Combine images by (1) excluding extreme values using sigma clipping, with the # median as the typical value and the MAD estimator of the standard deviation, and # then (2) averaging the remaining pixels across all of the images. # # Note that in the distribution below the clipped average is a narrower # distribution (less noise) than the median but that it still excludes the extreme # value that appeared in one image. # In[ ]: plt.figure(figsize=(10, 10)) hist(clip_combine, bins='freedman', alpha=0.5, label='clipped average') hist(median, bins='freedman', alpha=0.5, label='median'); plt.legend() plt.xlabel('Counts') plt.ylabel('Number of pixels')