This notebook will introduce you to the data sources that we will be working on during this training school. You will learn about:
You will learn about ten sources of data during this training school and practice accessing data from them. Ten sources sounds like a lot but don't worry, we will also include links to access the data in all the example notebooks. We have also pre-downloaded the data for you in case you have limited internet access and it takes a long time for you to download the data.
Here are the ten data sources grouped by type of data:
EUMETSAT Earth Observation (EO) Portal is where you can access numerous data products from EUMETSAT.
NASA LAADS DAAC archives and distributes data on clouds, water vapor, and aerosols in Earth’s atmosphere as well as key instrument data for NASA, NOAA and European Space Administration (ESA) missions.
Sentinel-5P Pre-Operations Data Hub is a web portal to download data products from the Sentinel-5P TROPOMI instrument.
Tropospheric Emission Monitoring Internet Service (TEMIS) is the source for tropospheric data products produced by ESA. We will use this source to access datat from the GOME-2 instrument onboard the MetOp-A/B/C satellites.
Copernicus Climate Data Store (CDS) provides access to climate datasets produced by ECMWF. While there are also model-based data available at CDS, we will use this source to access data products from the IASI instrument onboard the MetOp-A/B/C satellites.
Aerosol Robotic Network (AERONET) is a global network of ground-based measurement stations managed by NASA.
European Aerosol Research Lidar Network (EARLINET) provides ground-based data for aerosol vertical distribution over Europe under the ACTRIS framework.
European Environmental Agency (EEA) AirBase provides data from a Europe-wide network of air quality stations.
Copernicus Atmosphere Data Store (ADS) provides access to atmospheric composition datasets produced by ECMWF. We will use this source to access model forecast and reanalysis data from the Copernicus Atmosphere Monitoring Service (CAMS).
WMO Sand and Dust Storm Warning Advisory and Assessment System (SDS-WAS) is an international framework linking institutions involved in Sand and Dust Storm (SDS) research, operations and delivery of services. We will use this source to access regional dust forecasts from the MONARCH model.
Five of these data sources require you to sign-up for an account to access the data. You are advised to register for an account before the training school so you are ready ahead of time.
There may be more than one way to access the data from a source. You could access the data manually or programmatically. Some data sources allow you to do both.
You could go to a web portal to manually select and download data using a map or a dropdown menu for filtering a date and time. There may be a file system that allows you to see a directory of data files and click on individual files to download them. You can also use the web portal to view and browse available data without downloading anything.
Or you could access the data programmatically using some Python code via an Application Programme Interface (API) or a software package such as wget.
An example of a web portal, the Sentinel-5P Pre-Operations Data Hub, is shown below. The left panel contains dropdown menus that allow you to select the product type and processing level, as well as the period of interest. You can draw a bounding box on the map around your area of interest to filter the data geographically.
One example of a file system is on the SDS-WAS Data Download page, also shown in the screenshot below. After clicking on folders to navigate through the directory structure, you will end up inside a folder where you can click on individual data files to download them.
Note that many file systems also allow you to access data programmatically as well. In the case of the SDS-WAS, this file system is based on a THREDDS Data Server (TDS) which "is a web server that provides metadata and data access for scientific datasets, using OPeNDAP, OGC WMS and WCS, HTTP, and other remote data access protocols." (Source) You can also use the software package wget (will be introduced in the next section) to access the data programmatically.
If you need to download larger amounts of data, using a few lines of code can make things much easier. You can specify the filters to use in order to download specific data products over an geographic area or time period you want. Accessing data programmatically via an API or a software package are common ways to do this.
Simply put, an API allows two computer programmes to communicate and transfer data between each other. The screenshot below shows some code allowing you to use an API to request and download data from the Copernicus Atmosphere Data Store (ADS). Both the ADS and Climate Data Store use the same API, called the CDS API.
To see an example of how to download data programmatically from the ADS using an API, click on this link. You will need to use an API key that is unique to you in order to confirm your identity (or authenticate) your access to the API.
Another way to programmatically download data is using a software package such as wget. This software package helps you retrieve files using common Internet protocols. For an example of using wget to download data from AERONET, click on this link. You could use wget to access data from many different sources as it is a general software package.
There are also some software packages that specialise in accessing data from just one source. For example, the Python package airbase helps you download data from the EEA air quality database. To see how to use airbase to download data, click on this link.
For downloading small amounts of data over a specific area and short period of time, it is often easier to download data manually via a web portal. For longer-term studies and creating time-series from large amounts of data, using a programmatic approach is usually more efficient.
If you use data you downloaded to produce research, figures, maps, articles etc., you should always state the source of the data you used and check if there are license terms you need to follow. Data sources often have a recommended way to cite or acknowledge this data. Usually either the data source or an associated research paper is cited. For each data source below, here are the suggested citations or acknowledgements.
Read the EUMETSAT Data Policy and the EUMETSAT data licensing page. You have to quote EUMETSAT as the data source.
From the Legal notice on the use of Copernicus Sentinel Data and Service Information:
Where the user communicates to the public or distributes Copernicus Sentinel Data and
Service Information, he/she shall inform the recipients of the source of that Data and
Information by using the following notice:
(1) 'Copernicus Sentinel data [Year]' for Sentinel data; and/or
(2) 'Copernicus Service information [Year]' for Copernicus Service Information.
Where the Copernicus Sentinel Data and Service Information have been adapted or
modified, the user shall provide the following notice:
(1) 'Contains modified Copernicus Sentinel data [Year]' for Sentinel data;
and/or
(2) 'Contains modified Copernicus Service information [Year]' for Copernicus
Service Information.
From the License to use Copernicus Products:
5.1. All users of Copernicus Products must provide clear and visible attribution to the Copernicus programme. The Licensee will communicate to the public the source of the Copernicus Products by crediting the Copernicus Climate Change and Atmosphere Monitoring Services:
5.1.1. Where the Licensee communicates or distributes Copernicus Products to the public, the Licensee shall inform the recipients of the source by using the following or any similar notice:
• 'Generated using Copernicus Climate Change Service information [Year]' and/or
• 'Generated using Copernicus Atmosphere Monitoring Service information [Year]'.
5.1.2. Where the Licensee makes or contributes to a publication or distribution containing adapted or modified Copernicus Products, the Licensee shall provide the following or any similar notice:
• 'Contains modified Copernicus Climate Change Service information [Year]'; and/or
• 'Contains modified Copernicus Atmosphere Monitoring Service information [Year]'
5.1.3. Any such publication or distribution covered by clauses 5.1.1 and 5.1.2 shall state that neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains.
Read the EARLINET data policy.
European Commission [YEAR]
From the License to use Copernicus Products:
5.1. All users of Copernicus Products must provide clear and visible attribution to the Copernicus programme. The Licensee will communicate to the public the source of the Copernicus Products by crediting the Copernicus Climate Change and Atmosphere Monitoring Services:
5.1.1. Where the Licensee communicates or distributes Copernicus Products to the public, the Licensee shall inform the recipients of the source by using the following or any similar notice:
• 'Generated using Copernicus Climate Change Service information [Year]' and/or
• 'Generated using Copernicus Atmosphere Monitoring Service information [Year]'.
5.1.2. Where the Licensee makes or contributes to a publication or distribution containing adapted or modified Copernicus Products, the Licensee shall provide the following or any similar notice:
• 'Contains modified Copernicus Climate Change Service information [Year]'; and/or
• 'Contains modified Copernicus Atmosphere Monitoring Service information [Year]'
5.1.3. Any such publication or distribution covered by clauses 5.1.1 and 5.1.2 shall state that neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains.
Read the WMO SDS-WAS Data Policy.
This project is licensed under GNU General Public License v3.0 only and is developed under a Copernicus contract.