#!/usr/bin/env python # coding: utf-8 # # Using Geographic Information Systems (GIS) in QGIS[](http://www.qgis.org/) and Python # ## Geographic Information Systems (GIS) # GIS refers to methods of storing, displaying and analyzing geogaphical information. These methods have become essential in economic analysis (as you have noticed from the reading list for our Ph.D. course on economic growth). For this reason, it is good that you acquaint yourself with these methods. They will prove very useful when doing research, especially to show the spatial distribution of your variables of interest, contructing new measures, or doing spatial analysis. # # Software # ## Commercial # # There are various GIS specialized programs and packages. [ESRI](www.esri.com) produces ArcGIS, which is the most known and commonly used commercial software. It is very easy to use to produce maps and do simple computations. Most universities (including ours) offer it in their computer labs. The main disadvantages are that it requires a computer running Windows, it is costly, and extremely slow for computations. Of course this is changing, e.g., now you can use it online. # # Nonetheless, I always suggest you use and learn open-source alternatives. # ## Open Source # # There are many open source GIS projects, many of which are supported/gathered at GRASS. The most known and widely used seem to be GRASS and QGIS, although Python is increasingly becoming the language of GIS. QGIS can be programmed in Python and many web-based GIS services and databases are programmed in Python. Furthermore, there are many new Python packages which allow you to import, export, analyze, visualize, etc. GIS data. THis notebook will serve as a basic introduction, which I hope to develop further with time. Many of the solutions provded here have been implemented by myself or taken from sources on the web (regretfully I did not keep track from where I took some ideas or snipplets of code, so I cannot provide due credit). # # Installation # # QGIS # Install [QGIS](https://www.qgis.org) using their installers. On OSX you may need to follow the instructions below. # # Anaconda (Suggested) # If you followed the steps for installing the Continuum Anaconda Python Distribution and for creating the ``GeoPythonXenv`` ([presented here](https://econgrowth.github.io/pages/Computation.html)), then you are basically set for working with GIS in Python + R. # # Non Anaconda installation (Discouraged - old and may not work anymore) # Here I will give you the basic idea of what you need to install to have a working GIS environment. I assume you have already installed [Canopy](https://www.enthought.com/products/canopy/academic/) with all the pakages provided by Enthought. Additionally, you will need to install [GRASS](http://grass.osgeo.org), [QGIS](https://www.qgis.org), and [GDAL/OGR](http://www.gdal.org/). # ### Windows # # Download the installers in each of these websites and you should be done! # ### Mac OSX # There are various methods of getting these on your computer. # # 1. Try the installers on each website # 2. Use the installers provided by [Kyngchaos](http://www.kyngchaos.com/) # 3. Use [HomeBrew](http://brew.sh/) # # I used to install using the installers provided by [Kyngchaos](http://www.kyngchaos.com/). But I have moved to using [HomeBrew](http://brew.sh/), which allows you install many other GNU projects. To do so, open a terminal window (I recommend getting [iTerm2](http://www.iterm2.com/), which is more powerful than the one provided by OSX) and run the following code (I think you will need to have Xcode and its command-line utilities installed) # # ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)" # brew doctor # # This should install HomeBrew on your system and let you know of any issues. Once you have done so and have a working homebrew installation, you will be able to install packages and programs using the ``brew install`` command. Before doing so, you should run the following commands # # brew update # brew tap homebrew/science # # This will update HomeBrew's formulas to the latest version. Now issue the command # # brew cask install qgis # # You should be done and have teh latest vanilla version of ``QGIS`` in your Applications folder. # ### Not a vanilla fan? # # Ok. sometimes you want ``QGIS`` with all the additional blows and whistles, e.g. integration with ``GRASS``, etc. In that case, you need a bit more work. Luckily there are awesome people who are trying to keep this streamlined and working for all of us. Head to [osgeo4mac](https://github.com/OSGeo/homebrew-osgeo4mac) and follow their instructions. # # ### Note on GDAL + Canopy # It seems Canopy already has GDAL incorporated, at least so it seems on Mac OS X. But there is a bug that might prevent it from working, unless you add the following line on your ``.profile`` # # export GDAL_DATA=~/Library/Enthought/Canopy_64bit/User/share/gdal # # Data # Ok...now that we have a working GIS desktop system, let us talk a little about types of data. All GIS data includes elements with their properties and their geographical information like location, as determined by e.g. latitude and longitude, address, zip code, etc. So, for example an element might be a country, its properties might be GDP per capita, Gini coefficient, etc. and its location. Another example might be a restaurant with its menu and prices, services (dine-in, take-out, delivery), area of service and location. # ## Formats # # GIS comes in very different formats, althogh most of them can be categorizewd into two types **Rasters** and **Vector** formats. # # 1. **Rasters** are basically matrices $R=[R_{ij}]$ of size $m\times n$ that hold numerical information, where each cell $R_{ij}$ shows the value of interest at a specific location $ij$. Where this location $ij$ is, is determined by the actual format and the projection of the data (this will be clearer below). # 2. **Vectors** represent a geographical element geometrically and include additional information on it. An element can be represented as one of the following geometries: # 1. A point or set of unconnected points # 2. A line or set of (un)connected lines # 3. A Polygon # 4. A set of one or various of the previous types # ## Examples of Raster formats: # # 1. **GeoTiff**: Multipurpose raster format. # 2. **ESRI Grid**: Main ESRI raster format. # 3. **Digital Elevation Model (DEM)**: Used by the US Geological Survey to record elevation data. # 4. **Band Interleaved by Line, Band Interleaved by Pixel, Band Sequential (BIL, BIP, BSQ)**: These data formats are typically used by remote sensing systems. # 5. **Digital Raster Graphic (DRG)**: This format is used to store digital scans of paper maps. # ## Examples of Vector formats: # # 1. **Shapefile**: An open specification, developed by ESRI, for storing and exchanging GIS data. A Shapefile actually consists of a collection of files all with the same base name, for example hawaii.shp, hawaii.shx, hawaii.dbf, and so on. # 2. **Simple Features**: An OpenGIS standard for storing geographical data (points, lines, polygons) along with associated attributes. # 3. **TIGER/Line**: A text-based format previously used by the U.S. Census Bureau to describe geographic features such as roads, buildings, rivers, and coastlines. More recent data comes in the Shapefile format, so the TIGER/Line format is only used for earlier Census Bureau datasets. # 4. **Coverage**: Proprietary data format used by ESRI's ARC/INFO system. # ## Other formats: # # 1. **Well-known Text (WKT)**: text-based format for representing a single geographic feature such as a polygon or line. # 2. **Well-known Binary (WKB)**: similar to WKT uses binary data rather than text to represent a single geographic feature. # 3. **GeoJSON**: An open format for encoding geographic data structures, based on the JSON data interchange format. This is becoming the internet's prefered format. # # Sources of Data # There are many places to find data. Some useful links to start are: # # * [WorldMap](http://worldmap.harvard.edu/) # * [FAO's GeoNetwork](http://www.fao.org/geonetwork) # * [IPUMS USA Boundary files for Censuses](https://usa.ipums.org/usa/volii/boundaries.shtml) # * [IPUMS International Boundary files for Censuses](https://international.ipums.org/international/gis.shtml) # * [GADM database of Global Administrative Areas](http://www.gadm.org/) # * [Global Administrative Unit Layers](http://www.fao.org/geonetwork/srv/en/metadata.show?id=12691) # * [Natural Earth](http://www.naturalearthdata.com/): Various vector and raster files with all kinds of geographical, cultural and socioeconomic variables # * [Global Map](http://www.iscgm.org/cgi-bin/fswiki/wiki.cgi?page=Summary) # * [Digital Chart of the World](http://worldmap.harvard.edu/data/geonode:Digital_Chart_of_the_World) # * [Sage](http://www.sage.wisc.edu/mapsdatamodels.html) and [Sage Atlas](http://www.sage.wisc.edu/atlas/maps.php) # * [Ramankutti's Datasets on land use, crops, etc.](http://www.geog.mcgill.ca/~nramankutty/Datasets/Datasets.html) # * [SEDAC at Columbia University](http://sedac.ciesin.columbia.edu/data/sets/browse): Gridded Population, Hazzards, etc. # * [World Port Index](http://msi.nga.mil/NGAPortal/MSI.portal?_nfpb=true&_pageLabel=msi_portal_page_62&pubCode=0015) # * [USGS elevation maps](http://eros.usgs.gov/elevation-products) # * [Caloric Suitability Index CSI](https://ozak.github.io/Caloric-Suitability-Index/): Agricultural suitability data # * [Peace Research Institute Oslo (PRIO) Conflict Data](https://www.prio.org/Data/) # * [The Armed Conflict Location & Event Data Project (ACLED)](https://acleddata.com/curated-data-files/) # * [Global Terrorism Database (START-GTD)](https://www.start.umd.edu/gtd/) # * [NOOA's Global Land One-km Base Elevation Project (GLOBE)](http://www.ngdc.noaa.gov/mgg/topo/globe.html) # * [NOOA Nightlight data](http://ngdc.noaa.gov/eog/download.html): This is the data used by Henderson, Storeygard, and Weil AER 2012 paper. # * [Other NOOA Data](http://www.ngdc.noaa.gov/ngdcinfo/onlineaccess.html) # * [GEcon](http://gecon.yale.edu/) # * [OpenStreetMap](http://openstreetmap.org) # * [U.S. Census TIGER](http://www.census.gov/geo/maps-data/data/tiger.html) # * [Geo-referencing of Ethnic Groups](http://www.icr.ethz.ch/data/other/greg) # # See also [Wikipedia links](http://en.wikipedia.org/wiki/List_of_GIS_data_sources) # # Creating maps and exploring data # Let us start by creating some simple maps. For this, create a directory called ``mydata`` in your ``$HOME`` directory and download the following datasets and extract them: # # 1. The full [GADM](https://gadm.org/) shape file is available for download [here](https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_shp.zip), since it is large, also download the file for your country of origin from [here](https://gadm.org/download_country_v3.html). # 2. [Natural Earth's Populated Places dataset](http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_populated_places.zip) # 3. [Ramankutty's Suitability Index](https://nelson.wisc.edu/sage/data-and-models/atlas/data.php?incdataset=Suitability%20for%20Agriculture) # 4. [Lights Data for 2012](http://ngdc.noaa.gov/eog/data/web_data/v4composites/F182012.v4.tar) # Now, open QGIS. You should see something like this # # QGIS open # To open a file in QGIS you can: # # * Use ``Finder`` or ``Windows Explorer`` to navigate to the file and ``double-click`` on it. # * Use the ``Layer`` menu in QGIS # # QGIS Layer Menu # * Use the toolbar on the left # # QGIS File Toolbar # ## Working with vector layers # Let's start by adding the vector file for your country of origin using the ``add vector layer`` button in the toolbar. # # QGIS add vector layer # I have added the full GADM dataset, so it looks like this # # GADM layer in QGIS # Now we can use various tools available within QGIS to learn about the data, select the data, analyze the data, etc. # # 1. The handtool for panning: QGIS-handtool # # * Try it out, use the tool to move your country around the screen. # 2. Zoom in/out/to selection/full: QGIS-Zoom # # * Try it out, by zooming around your city of birth. # # 3. Identify Features: QGIS-identify-features # # * Try it out, click on the feature you just zoomed into. Did you zoom into the correct feature? # # 4. Select Features: QGIS-select-features # # * Try it out, select the feature of the region where you were born. # Besides the geometrical information that we observe, most Shape files have other information relating to each feature. This information is contained in the *Attribute Table*, which can be accessed with # # * the ``Open Attribute Table`` button QGIS-attribute-table # * By ``right-clicking`` on the name of the layer in the left panel # # QGIS-attribute-table2 # Doing so shows you the additional information contained in the shape file. For the GADM file it includes # # * the identification number of the feature ``OBJECTID`` # * the identification number of the country (Administrative Level 0) to which the feature belongs ``ID_0`` # * the ISO-3 code of the country to which the feature belongs ``ISO`` # * the name of the country to which the feature belongs ``NAME_0`` # * the identification number of the region in the country (Administrative Level 1) to which the feature belongs ``ID_1`` # # etc. # # QGIS-attribute-table3 # We can use this information to color our maps, or to select features for further analysis. I will use the ``select feature using an expression`` button ![QGIS-select-feature-attributes-button](../images/pics/GIS/QGIS-select-feature-attributes-button.png) to select all features which have the ISO-3 code for Kenya. Once you click on the ![QGIS-select-feature-attributes-button](../images/pics/GIS/QGIS-select-feature-attributes-button.png) button, you will see the following window, where you can write the expression to be used to select features. In this case I wrote the expression # # 'ISO'='KEN' # # to select all features with Kenya's ISO-3 code. # QGIS-select-features-by-ISO # Sometimes we will want to work with a subset of features, especially if the data we are working has many features, like the GADM data. To select a subset of the features and only work with those, use the ``Layer->Query`` option in the ``Menu`` as shown below # # QGIS-Query # This opens a window where you can write expression as the previous one, to select features by their attributes. These are ``SQL`` expressions and have to conform to ``SQL``'s grammar (we will not go into this at this point). Once I execute the same expression as above # # 'ISO'='KEN' # # QGIS leaves only the features in Kenya for analysis. # # QGIS-Query-KEN # Let us zoom into these features by ``right-clicking`` on the name of the layer and choosing ``Zoom to Layer Extent``. This shows the extent of Kenya alone. # # QGIS-zoomed-KEN # Other important tools can be found in ``Layer -> Properties`` or by ``double-clicking`` on the layer's name # # # ### Exercise # # 1. Load the complete GADM dataset. # 2. Select the regions of your country of origin, so that you will work only on these. # 3. Save the result to a new layer/shapefile called ``your_country.shp`` # ## Vector Menu # The ``Vector`` menu has many tools that can be applied to vector layers. Let's use some of these to create new layers. # # Using the ``Vector -> Geometry Tools -> Singlepart to Mutliparts`` we can generate a new layer and shape file where features are aggregated according to some characteristic. Let us use this tool to aggregate administrative level 2 features to the administrative level 0 (Country) level. Thus, each feature will be a country (administrative level 0) instead of the current administrative level 2. # # QGIS-SinglePart-MultiPart # Using the ``Vector -> Geometry Tools -> Mutliparts to singlepart`` we can generate a new layer and shape file where features are disaggregated. # # QGIS-MultiPart-SinglePart # ### Exercises # 1. Load the Populated Places data. # 2. Select the cities that belong to your country and create a new shape file with the selection named ``your_country_places.shp``. # 3. Load the ``your_country.shp`` file you had created in the previous exercise and compute the centroid for each Adminitrative level 1 in your country and export the layer to the file ``your_country_centroids1.shp``. # 4. Search for your place of birth among the most populated places. Where you born in a populated place? If not, identify the closest most populated place. To do so, use [Google](https://www.google.com) or [Wikipedia](http://www.wikipedia.org/) to find the latitude and longitude of your place of birth. Using your mouse and the ``coordinate`` window at the bottom # # QGIS-Coordinates # # to search for your location of birth. # # Better yet, using the ``Plugins->Manage and Install Plugins`` menu, install the ``ZoomToCoordinates`` plugin # # QGIS-ZoomToCoordinatesPlugin # and use it to find your place of birth as shown below QGIS-ZoomToCoordinates # 5. Compute the distance between your place of birth and both the closest centroid and the closest most populated place. If needed find a Plugin that will do the job for you. I recommend installing among many ``realcentroid``, ``mmqgis``, ``coordinate capture``. # 6. Generate the distance matrix among the most populated places # 7. Generate the distance matrix between most populated places and the centroids # # #### Note # # _Make sure you can correctly identify the features. For example using ISO codes, ID numbers, etc. If the shape file does not have an ID identifier, it is best to create one, so that you can correctly identify the features. To do so, use the ``Field Calculator`` by double-clicking on the name of the layer, then choosing ``Fields``. After that you need to select the pencil icon to enable editing mode._ # ## Working with Rasters # Let us load the Ramankutty raster using the ``add raster layer`` button in the toolbar. # # QGIS add raster layer # ``Double-click`` on the name of the raster in order to change the coloring scheme. # # QGIS-raster-color-scheme # So it looks something like this # # QGIS-raster-with-color # Load your ``your_country.shp`` and ``your_country_places.shp`` on top of the Ramankutty data. It might look something like this. # # QGIS-Raster-KEN # Since we do not want to work with all the Ramankutty data, but only with the data for our country, let us ``clip`` the part of the Ramankutty data that belongs to our country. To do this, use the ``Raster->Extraction->Clipper`` tool. # # QGIS-Clipper # In the window that comes up, choose a name for the clipped raster ``suit_your_country`` and choose the ``save as GeoTiff`` option, so that your file is saved as in GeoTiff format. Then click on the ``Mask layer`` button, make sure ``your_country`` layer is chosen as the mask. If you want choose a different ``No data value``. Then click ``ok``. # # QGIS-clip-KEN # This will clip the Ramankutty data to the extent of your country of origin. Notice that in the big text box there is a command written, something like # # gdalwarp -q -cutline "gadm2.shp|layerid=0|subset=\"ISO\" = 'KEN'" -crop_to_cutline -of GTiff "suit/w001001.adf" GitHub/CompEcon/notebooks/QGIS/suit-KEN.tif # # This is a the command QGIS uses to create the clip. QGIS is actually calling GDAL to perform this operation. This command line will be very useful when you are planning to use Python or other scripting languages to perform an operation many times. You can do it by hand once and copy the command executed by QGIS and use it to create an iterable version...more on this later. # # ### Note # You might have to assign a projection to the Suitability Raster. To do so, use the ``Raster->Projection->Assign Projection`` option in the menu. # The output should look like this # # QGIS-KEN-Raster # Notice that given the large size of the cells in the raster, the clipping tool creates a lot of measurement error. It might be better to decrease the size of cells an then clip, so that the clipping is less erroneous. Let's try setting the cell size to $5''$ instead of $0.5^o$. To do so, ``right-click`` on the raster name and select ``Save as``. Then set the *Resolution* to $5''=1/12=0.08333$ for both ``Horizontal`` and ``Vertical``. # # QGIS-change-raster-size # Now use the clip tool again. Much better! Obviously, the smaller the cell size, the more similar the clipped raster and the polygon will look like. # # QGIS-KEN-Raster5 # ### Note: # # _Care has to be taken when converting raster's cells size, since values have to be interpolated. QGIS seems to have taken away your choice for setting it. Luckily, GDAL can help out. You can use its tools to change the cells size, the projection, clip, etc. We will see some tools in another lecture._ # # Let us use this raster to assign the average suitability in each administrative region. But before doing so, we need to reproject both the raster and shape files to a format that ensures the areas are correctly take into account. One such projection is the equal area projection. Right click on the raster or shape and select ``save as...``. Then in the ``CRS`` option choose ``Selected CRS`` and click on ``browse`` and choose the following CRS (or create it if not present by using the ``Settings->Custom CRS`` menu) # # +proj=cea +lon_0=0 +lat_ts=0 +x_0=0 +y_0=0 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs # # QGIS-Cylindrical # Now save the file adding the postfix ``cyl`` so that you know it is the cylindrical equal area rojected one. Now, select ``Raster->Zonal Statistics->Zonal Statistics``. # # QGIS-Zonal-Stats # Select the cylindrical versions of the raster and shape files and ``suitcyl`` as the ``Output column prefix``. Now, in the attribute table you will find three new columns with the prefix ``suitcyl`` that show the ``sum``, ``count``, and ``mean`` suitability in in each feature. If you repeat the analysis with the unprojected (non-cyl) versions of the raster and shape files, you will see that the results vary (sometimes significantly). Whenever you do this type of analysis, it is important to make sure you are using the correct projection for the analysis. # ### Other Raster Tools # In the ``Raster`` menu you will find other useful tools to work with rasters. Especially useful is the ``Raster Calculator``, with which you can do computations on one or more rasters. # ## Exercises # 1. Import the Night Lights raster, convert it to a $5''$ raster, and clip it using your country's shapefile. # 2. Compute the average light in each region in your country. # 3. Generate a buffer of 100kms around each populated place and compute the average suitability and night lights for these buffers. # # Adding Information and Using Colors # Now that you have the average agricultural suitability for each administrative region. Let's color code it so we can visually observe the regional differences. For this double-click on the ``your_country_cyl`` layer, which should open a new window in the ``style`` tab of the ``Layer->Properties`` menu. # # QGIS-Style-Properties # Now, click on ``Single Symbol`` and choose ``Graduated``. This allows us to use colors to show the different values of a field/variable of interest. # # QGIS-Style-Category # Next let us select the field of interest in the attribues table, namely, ``suitmean``. To do so, click on ``column`` and choose the variable/field of interest, inthis case ``suitmean``. # # QGIS-Style-Field # You can choose how many gradations to draw, e.g. increasing it from 5 to 10; change the color ramp, or the mode of creating the gradation. Try it out. You should get something like this # # QGIS-KEN-Graduated-suit