Calculation of the VCI

In this example the Vegetation Condition Index (VCI) is calculated from the Enhanced Vegetation Index (EVI) gained from satellite remote sensing data. A MODIS vegetation product (MOD13Q1) with a temporal resolution of 16 days and a spatial resolution of 250m is used (https://lpdaac.usgs.gov/products/mod13q1v006/).

Note: To execute a cell click on it and then press Shift + Enter. Alternatively run the entire notebook click on Cell > Run All (make sure all the configuration options are set correctly).

Download and install Anaconda 3

To run this Jupyter Notebook you need python installed on your computer. We recommend downloading and installing Anaconda 3 (https://www.anaconda.com/download/) with the python 3.6 version as it includes a lot of useful packages.

Download MODIS data as Geotiffs from AppEEARS

Access the AppEEARS website and create a user account (free) for USGS in case you do not already have one. Click on Extract and Area Sample and start a new request.

  1. Enter a request name.
  2. Define your region of interest by specifying a ESRI shapefile (zip) or drawing a polygon. Country shapefiles can be downloaded here.
    Note: The boundaries and names of the shapefiles do not imply official endorsement or acceptance by the United Nations.
  3. Define the time period for your data: January 2000 to today
  4. Select the product: MOD13Q1 and the layers of interest: EVI and pixel_reliability
  5. Define the output as Geotiff with geographic Pojection
  6. Click on Submit

Click on Explore to check the status of the request. When the status says "Done" you can click on the Download symbol. Select all ordered datasets and start downloading the data.

After downloading store the .tif files in two folders: one folder for the evi_data and one for the pixel_reliability. It is reccomended to use an external hard drive (or other mass storage media) to store your input and output data if possible.

Preparing the working environment

There are two possible approaches:

  1. Install the required packages directly
  2. Use the provided environment file to setup a new environment

1. Installing the required packages directly

The required package versions are:

  • python 3.6 or 3.7
  • gdal >= 3.0.1
  • numpy >= 1.15.0

The gdal package needs to be installed and it must have Bigtiff support; and numpy may need to be updated. For these reasons it is recommended to use the conda-forge package repository as follows:

  1. Open the Anaconda Promt by searching for it in the windows start menu.
  2. Type: conda install -c conda-forge --no-pin gdal numpy and hit Enter
  3. Type y and hit Enter if you are asked if you want to proceed.

Note: --no-pin is added to the install command in order to not restrict maximum package versions, and thus to avoid having certain packages be downgraded when installing gdal. If gdal and/or numpy do not meet the version requirements, then modify the install command to:

conda install -c conda-forge --no-pin gdal=3.0.1 numpy=1.15.0

2. Setup a new environment

All the relevant documentation can be found here.

Carry out the following steps in order to setup the environment:

  1. Download the environment file found here (right click and choose Save As).
  2. Open the Anaconda Promt by searching for it in the windows start menu.
  3. Navigate to where you have saved the environment file: cd C:\my\path
  4. Type: conda env create -f environment.yml and hit Enter

The unspider environment should now be created. Make sure it is activated before running Jupyter or before running the script if you save it as a file (see the general documentation for how to do this).

Nicer progress bars

During execution basic progress updates will be printed, however if you would like nicer and more informative progress bars (not just updates), then you will need to install the tqdm package. To do so:

  1. Open the Anaconda Promt by searching for it in the windows start menu.
  2. Type: conda install tqdm and hit Enter
  3. Type y and hit Enter if you are asked if you want to proceed.

When tqdm is used the bar on top always shows the overall progress, and the bar on the bottom shows the progress for a particular sub-process (such as generating the output images).

Note: If you used the environment file to setup a new environment, then tqdm will already be installed in that environment.

Note2: The latest version of tqdm has a bug which leads to the progress bar description field not getting printed fully when used in a Jupyter notebook. This issue has been reported and should be fixed quickly. The tqdm progress bars work as intended when the code is executed from a terminal.

Note3: If you have tqdm installed but would prefer not to use it, modify the section dealing with tqdm in the section 'Import required packages' as follows:

"""
# If tqdm is installed it will be used to display a nice progress bar
try:
    from tqdm.autonotebook import tqdm
    import sys

    # Overload built-in enumerate function
    def enumerate(iterator, desc=None):
        return builtins.enumerate(tqdm(iterator, desc=desc, file=sys.stdout,
                                       leave=False))
except ImportError:
"""
tqdm = None

# Overload built-in enumerate function
def enumerate(iterator, desc=None):
    if desc is not None:
        print(desc)
    return builtins.enumerate(iterator)

Download this notebook and run it using Jupyter Notebook

Download this Jupyter Notebook here (right click and choose Save As). Start Jupyter Notebook by searching for it in the Windows start menu or by typing jupyter notebook in the Anaconda Prompt and hitting Enter. After Jupyter Notebook opened in your web browser, search for the script on your computer and open it.

Every cell has to be executed individually. A cell, which is still processing is marked by an * and a finished cell with a number. Cells are executed by clicking on the cell and hitting Shift + Enter. Alternatively run the entire notebook sequentially (i.e. one cell at a time) click on Cell > Run All (make sure all the configuration options are set correctly).

Optional: An alternative to running everything in Jupyter is to open this Jupyter Notebook in Jupyter and do:

File > Download as > Python (.py)

Then you run the script either in Spyder (an IDE that comes bundled with Anaconda) or in a terminal. This may yield better perfomance.

Import the required packages

In [1]:
import os
import re
import gc
import builtins
import warnings

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors

from osgeo import gdal
gdal.UseExceptions()


# If tqdm is installed it will be used to display a nice progress bar
try:
    from tqdm.autonotebook import tqdm
    import sys

    # Overload built-in enumerate function
    def enumerate(iterator, desc=None):
        return builtins.enumerate(tqdm(iterator, desc=desc, file=sys.stdout,
                                       leave=False))
except ImportError:
    tqdm = None

    # Overload built-in enumerate function
    def enumerate(iterator, desc=None):
        if desc is not None:
            print(desc)
        return builtins.enumerate(iterator)

Set data paths

In the following cell input data and output image paths are set

In [4]:
# Specify the name of your study area, this will be used for naming the output
# maps
study_area = "Ghana"


#
# Input data paths:
#
# Specify the folder path where the EVI geotiffs are stored on your 
# computer or external hard drive.
path_evi = "D:\Ghana\evi_data"

# Specify the folder path where the pixel reliability geotiffs are stored on your 
# computer or external hard drive.
path_pr = "D:\Ghana\pixel_reliability"


#
# Output data paths:
#
# Create and specify the folder where the output png-files should be stored
path_png = "D:\Ghana\VCI_maps_Ghana_png"

# Create and specify the folder where the output tif-files should be stored
path_tif = "D:\Ghana\VCI_maps_Ghana_tif"

Function definitions

In the following cell various functions are defined.

In [7]:
def get_doy(re_doy, string):
    """
    Searches the input string for a DOY, and if one is found it returns a tuple
    containing the DOY; else it returns None.

    :param re_doy: compiled re pattern used to match a DOY
    :param string: input string that will be searched for a DOY
    :return : if a DOY is found a tuple of the form
              (<int> year, <str> day of year), else None
    """
    search_doy = re_doy.search(string)
    if search_doy is None:  # Case where no doy was found
        return None
    doy = search_doy.group(1)
    year = int(doy[:4])  # Treat year values as integers
    day = doy[4:]  # Day values need to remain strings to be zero padded
    return year, day


def check_prepare_files(evi_files, pr_files):
    """
    Checks that for each evi file there is a corresponding pixel reliability
    file. Then a dictionary is returned containing the sorted files lists for
    each DOY.

    :param evi_files: list of evi files to be processed, containing a tuples of
                       the form (<str> file_path, <int> year, <str> day)
    :param pr_files: list of pixel reliability files to be processed,
                     containing a tuples ofthe form
                     (<str> file_path, <int> year, <str> day)
    :return: dictionary where keys are the available DOYs and values are
             dictionaries for evi_files and pr_files countaining the sorted
             evi_files and sorted pr_files.
    """
    # Dictionary keys have properties similar enough to a mathematical set for
    # our purposes
    doy_dict = dict()

    for path, year, day in evi_files:
        try:
            doy_dict[day]['evi'].append((path, year, day))
        except KeyError:
            doy_dict[day] = dict()
            doy_dict[day]['evi'] = [(path, year, day)]

    for path, year, day in pr_files:
        try:
            doy_dict[day]['pr'].append((path, year, day))
        except KeyError:
            try:
                doy_dict[day]['pr'] = [(path, year, day)]
            except KeyError:
                doy_dict[day] = dict()
                doy_dict[day]['pr'] = [(path, year, day)]

    if len(evi_files) != len(pr_files):
        mismatch_doys = list()
        for day in doy_dict.keys():
            try:
                if len(doy_dict[day]['evi']) != len(doy_dict[day]['pr']):
                    mismatch_doys.append(day)
            except KeyError:
                mismatch_doys.append(day)
        mismatch_files = dict()
        for day in mismatch_doys:
            if any(i not in doy_dict[day]['pr'] for i in doy_dict[day]['evi']):
                evi_years = [i[1] for i in doy_dict[day]['evi']]
                pr_years = [i[1] for i in doy_dict[day]['pr']]
                mismatch_evi = [i for i in evi_years if i not in pr_years]
                mismatch_pr = [i for i in pr_years if i not in evi_years]

                # If the lists are empty future list comprehensions will fail
                # to correctly exclude all files
                if len(mismatch_evi) == 0:
                    mismatch_evi.append(-1)
                if len(mismatch_pr) == 0:
                    mismatch_pr.append(-1)

                mismatch_files[day] = dict()
                mismatch_files[day]['evi'] = [i[0] for i in doy_dict[day]['evi']
                                              if i[1] in mismatch_evi]

                mismatch_files[day]['pr'] = [i[0] for i in doy_dict[day]['pr']
                                             if i[1] in mismatch_pr]
        war_msg = "There is a mismatch in the number of evi data files and "
        war_msg += "pixel reliability files.\n"
        war_msg += "The the mismatched files for each doy are:\n"
        war_msg += str(mismatch_files)
        war_msg += "\n"
        war_msg += "These files will be ignored."
        warnings.warn(war_msg)

        for day, v in mismatch_files.items():
            for k_sub, v_sub in v.items():
                doy_dict[day][k_sub] = [i for i in doy_dict[day][k_sub]
                                        if i[0] not in v_sub]
    else:
        for day in doy_dict.keys():
            doy_dict[day]['evi'] = sorted(doy_dict[day]['evi'],
                                          key=lambda x: x[1])
            doy_dict[day]['pr'] = sorted(doy_dict[day]['pr'],
                                         key=lambda x: x[1])

    return doy_dict


def load_cloud_mask(files_list, cloud_mask):
    """
    Loads all pixel reliability images into the cloud mask array.

    :param files_list: list of files to be processed, containing a tuples of
                       the form (<str> file_path, <int> year, <str> day)
    :param cloud_mask: matrix containing the processed pixel reliability data
    :return:
    """
    for i, image in enumerate(files_list, "Loading cloud_mask"):
        # Open the file from the pixel reliability file list
        dataset = gdal.Open(image[0])
        band = dataset.GetRasterBand(1)
        # del dataset
        data = band.ReadAsArray()

        # Write the data from each file to the array
        cloud_mask[:, :, i] = data

    return cloud_mask


def prepare_cloud_mask(cloud_mask):
    """
    Filters the pixel reliability array.

    :param cloud_mask: pixel reliability array
    :return: filtered and rescaled pixel reliability matrix
    """
    # Exchange value 0 with value 1 (Pixels with value 0 and 1 are used)
    cloud_mask[cloud_mask == 0] = 1

    # Set values that are of no interest to us to NaN
    # cloud_mask[cloud_mask != 1] = np.nan
    cloud_mask[cloud_mask != 1] = np.nan

    # These are additional filters you may want to use
    # Set value 2 to NA (Snow and Ice)
    # cloud_mask[cloud_mask == 2] = np.nan

    # Set value 3 to NA (Cloud)
    # cloud_mask[cloud_mask == 3] = np.nan

    # Set no data value (= -1) to NA
    # cloud_mask[cloud_mask == -1] = np.nan

    # Set all values above 3 to NA
    # cloud_mask[cloud_mask > 3] = np.nan

    return cloud_mask


def load_evi(files_list, cloud_mask):
    """
    Loads a single geotiff into the evi matrix.

    :param files_list: list of files to be processed, containing a tuples of
                       the form (<str> file_path, <int> year, <str> day)
    :param cloud_mask: matrix containing the processed pixel reliability data
    :return: evi matrix with an additional geotiff of data
    """
    for i, image in enumerate(files_list, "Loading EVI"):
        # Open the file from the evi file list
        dataset = gdal.Open(image[0])
        band = dataset.GetRasterBand(1)
        data = band.ReadAsArray()

        # Apply the cloud mask and write the data from each file to the array
        # Note: the evi data is multiplied into the cloud_mask matrix in order
        # to save RAM and speed up the computation
        cloud_mask[:, :, i] *= data

    return cloud_mask


def prepare_evi(evi):
    """
    Filters the evi to set all negative values to nan and rescales the data.

    :param evi: numpy array containing the results of multiplying the
                cloud_mask data with the evi data
    :return: filtered and rescaled input numpy array
    """
    # Set negative values to nan
    evi[evi < 0] = np.nan

    # Rescale the data
    evi *= 0.0001

    return evi


def calculate_vci(evi):
    """
    Calculates the VCI.

    The VCI is calculated using the following formula:
    VCI = ((EVI - min(EVI)) / (max(EVI) - min(EVI))) * 100

    A lot of the operations are done in place, such as squaring the evi matrix,
    using it for something, and then taking the square root of it to get back
    the original values. This may seem inefficient however it is much faster
    than allocating large amount of memory and doing copy based operations.

    When it comes to numpy arrays, abbreviated code like this:
    a += b
    c /= d
    Means that the operations are performed in place.

    Whereas operations like this:
    a = a + b
    c = c / d
    First create a copy of the a and c arrays before performing the addition
    and division. This leads to additional memory and computational overhead
    due to creating a copy of the arrays.

    Since the input evi matrix can have only positive values squaring and
    taking the square root in place causes no computational errors.


    Note: The code for computing the mean and standard deviation is loosely
          based on the built-in numpy methods for nanmean and nanstd.

    :param evi: numpy array containing the geotiff processing results
    :return: evi array which has had its values replaced in-place with the VCI
             values
    """
    with warnings.catch_warnings():
        warnings.simplefilter('ignore')
        # Allocate memory to where intermittent data will be saved
        shape = (evi.shape[0], evi.shape[1], 1)
        min_ = np.zeros(shape, dtype=np.float64)
        max_ = np.zeros(shape, dtype=np.float64)

        # Create a mask that will say where nan values are, then negate it so
        # it shows where the non nan values are
        # mask = np.isnan(evi)

        # Substitute nan values with 0 in place
        # evi[mask] = 0

        # Find the minimum values
        np.fmin.reduce(evi, axis=2, out=min_, keepdims=True)

        # Find the maximum values
        np.fmax.reduce(evi, axis=2, out=max_, keepdims=True)

        # Calculate the VCI
        # Note: the vci is saved into the evi matrix in order to save RAM
        vci = evi
        vci -= min_
        max_ -= min_
        vci /= max_
        vci *= 100

        return vci


def generate_png(file_name, data, extent):
    """
    Generate an output png image representing the VCI for the input map area.

    :param file_name: <str> file name to be used when saving the png image
    :param data: numpy view onto the vci for a single year
    :param extent: tuple containing data for how to modify the output graphic
                   in order for it to scale correctly
    :return:
    """
    # Define the size of the figure (in inches)
    fig, ax = plt.subplots(figsize=(5, 5))
    plt.title(file_name)
    cmap = colors.ListedColormap(['#8B0000', '#FF4500', '#FFFF00', '#9ACD32',
                                  '#008000'])
    cax = ax.imshow(data, cmap=cmap, extent=extent)
    fig.colorbar(cax, cmap=cmap)

    plt.savefig(os.path.join(path_png, file_name + ".png"), dpi=100)
    plt.close()


def generate_geotiff(file_name, data, geo_transform, projection):
    """
    Generate an output geotiff image representing the VCI for the input map
    area.

    :param file_name: <str> file name to be used when saving the tif image
    :param data: numpy view onto the vci for a single year
    :param geo_transform: geo transform data to be used in saving the tif image
    :param projection: projection data to be used in saving the tif image
    :return:
    """
    # Set geotiff output path
    geotiff_path = os.path.join(path_tif, file_name + ".tif")

    # Read columns from data array
    cols = data.shape[1]
    # Read rows from data array
    rows = data.shape[0]

    # Set the driver to Geotiff
    driver = gdal.GetDriverByName('GTiff')
    # Create raster with shape of array as float64
    out_raster = driver.Create(geotiff_path, cols, rows, 1, gdal.GDT_Float64)
    # Read geo information from input file
    out_raster.SetGeoTransform(geo_transform)
    # Read band
    out_band = out_raster.GetRasterBand(1)
    # Set no data value to numpy's nan
    out_band.SetNoDataValue(np.nan)
    out_band.WriteArray(data)
    # Set the projection according to the input file projection
    out_raster.SetProjection(projection)
    out_band.FlushCache()


def generate_images(files_list, vci, extent, geo_transform, projection):
    """
    Generates all the output png and tif images.

    :param files_list: list of files to be processed, containing a tuples of
                       the form (<str> file_path, <int> year, <str> day)
    :param vci: numpy array containing the vci
    :param extent:
    :param geo_transform: geo transform data to be used in saving the tif image
    :param projection: projection data to be used in saving the tif image
    :return:
    """
    day = files_list[0][2]
    years = [i[1] for i in files_list]
    for i, year in enumerate(years, "Generating images"):
        file_name = "VCI_{}_{}_{}".format(study_area, day, year)
        array = vci[:, :, i]
        # Generating pngs for every time step
        # Use pre-defined function to write pngs
        generate_png(file_name, array, extent)

        # Generating geotiffs for every time step
        # Use pre-defined function to write geotiffs
        generate_geotiff(file_name, array, geo_transform, projection)
        gc.collect()  # Force garbage collection to save RAM

Within the following cell the Modis files are loaded per DOY (Day of the year), the cloudmask from the pixel reliability dataset is applied, the EVI data is rescaled, the SVI is calculated and the SVI is saved as a smaller resolution PNG image and as a full scale Geotiff for each DOY and year of available data.

Note: The script only works for files with file names which contain 'doy' followed by 7 digits indicating the year and day of the year (yyyyddd), since the DOY information is extracted from the file name. This corresponds to AppEEARS products, e.g. MOD13Q1.006__250m_16_days_EVI_doy2001001_aid0001.tif

Option: If you want to use a different product (also geotiffs), you need to adapt re_doy to find the corresponding DOY sequences in the different product's filenames.

In [14]:
# The following regular expression will match 'doy' followed by 7 digits,
# followed by 0 or more characters and ending in '.tif'
# It is used for building the initial file list of tif images that have a valid
# DOY structure
re_doy = re.compile(r".*doy(\d{7}).*\.tif$")

# Build initial file list of tuples containing the filename, year and day for
# each file
# e.g. [(<str> '/path/to/my_evi_doy2000193.tif', <int> 2000, <str> '193')]
# Note: the data types are stated in '<>'
evi_files = []

# Create a list of files, which include the defined DOY in their filename
# for the EVI data
for _, _, files in os.walk(path_evi):
    for file in files:
        doy = get_doy(re_doy, file)
        if doy is None:
            continue
        else:
            evi_files.append((os.path.join(path_evi, file), doy[0], doy[1]))

# Create a list of files, which include the defined DOY in their filename
# for the pixel reliability data
pr_files = []

for _, _, files in os.walk(path_pr):
    for file in files:
        doy = get_doy(re_doy, file)
        if doy is None:
            continue
        else:
            pr_files.append((os.path.join(path_pr, file), doy[0], doy[1]))

# Read an example file and define the shape of the data arrays
# Get the first file of the file list as example file
example_file = gdal.Open(evi_files[0][0])

# Store necessary data from the example file
geo_transform = example_file.GetGeoTransform()
projection = example_file.GetProjection()
x_size = example_file.RasterXSize
y_size = example_file.RasterYSize

del example_file  # Save RAM

# Preparing map reshaping
lon_start = geo_transform[0]
lon_stop = (x_size * geo_transform[1]) + geo_transform[0]
lon_step = geo_transform[1]
lat_start = geo_transform[3]
lat_stop = (y_size * geo_transform[5]) + geo_transform[3]
lat_step = geo_transform[5]

extent = (lon_start, lon_stop, lat_stop, lat_start)

# First make sure that image output directories exist, and if not then create
# them
# Check for png folder
if not os.path.exists(path_png):
    os.makedirs(path_png)

# Check for tif folder
if not os.path.exists(path_tif):
    os.makedirs(path_tif)

# Build a dictionary where keys are the available DOYs and values are sorted
# dictionaries for evi files and pr files
doy_dict = check_prepare_files(evi_files, pr_files)

# The script creates some warnings due to the NA in the data.
# They are ignored by executing this cell.
warnings.filterwarnings('ignore')

# If tqdm is available, use it.
# Note: both update_iter_desc and days_iterator are set in either case since it
# allows to avoid having many if clauses throughout the program and reduces
# code duplication.
if tqdm is not None:
    def update_iter_desc(days_iterator, desc):
        days_iterator.set_description(desc)
    days_iterator = tqdm(sorted(doy_dict), desc="Processing",
                         file=sys.stdout)
else:
    def update_iter_desc(days_iterator, desc):
        print(desc)
    days_iterator = sorted(doy_dict)

# Begin iterating through days of the year for which we have data.
for day in days_iterator:
    # Provide a progress update
    update_iter_desc(days_iterator, "Processing DOY {}".format(day))

    # Get the necessary files lists from doy_dict
    evi_files_day = doy_dict[day]['evi']
    pr_files_day = doy_dict[day]['pr']

    # Number of years for which we have data
    num_years = len(evi_files_day)

    # Create a zero-filled 3D numpy array based on the example file
    # dimensions
    array_size = (y_size, x_size, num_years)
    try:
        # Adjust the size of the cloud mask array in place in order to save
        # RAM and speed up processing
        cloud_mask.resize(array_size)
    except NameError:
        # If this is the first iteration, cloud_mask will be undefined and
        # trying to resize it will throw a NameError. This error is caught
        # and cloud_mask is instantiated
        cloud_mask = np.zeros(array_size, dtype=np.float64)

    # Reading the reliability data
    cloud_mask = load_cloud_mask(pr_files_day, cloud_mask)

    # Preparing the reliability data
    cloud_mask = prepare_cloud_mask(cloud_mask)

    # Reading the EVI  data
    evi = load_evi(evi_files_day, cloud_mask)

    # Preparing the EVI  data
    evi = prepare_evi(evi)

    update_iter_desc(days_iterator, "Calculating VCI DOY {}".format(day))
    # Calculate VCI
    vci = calculate_vci(evi)

    update_iter_desc(days_iterator, "Generating images DOY {}".format(day))
    # Generating png images and geotiffs
    generate_images(evi_files_day, vci, extent, geo_transform, projection)

    # Remove references to avoid array resize errors in future loops
    del(evi, vci)

Note: Using a large area of interest can lead to troubles due to a lack of memory (RAM). You can use the Task Manager to keep an eye on the memory usage while running this script. If you encounter problems, consider using a smaller area of interest (shapefile or polygon in AppEEARS) or a satellite product with a smaller resolution (500m (MOD13A1), 1km (MOD13A2)).

This is an example of the output PNG images. The geotiffs can be read by programs like QGIS, and used to produce maps.