Learning about spatial data and maps for archaeology (and other things)

Spatial Thinking and Skills Exercise 1 for Archaeology of Scotland

Made by Rachel Opitz, Archaeology, University of Glasgow

Archaeologists regularly work with maps and data about where sites, samples and objects are found. We ask lots of questions that have a spatial component. Which Bronze Age cairns are close to the coast in England? In this excavation, is bone found inside a building or outside in the yard? It's important to learn to work with spatial data and maps in order to succeed in a variety of careers in archaeology and heritage management.

To start working with spatial data and maps, you need to put together your toolkit. You're currently working inside something called a jupyter notebook. It's a place to keep notes, pictures, code and maps together. You can add tools and data into your jupyter notebook and then use them to ask spatial questions and make maps and visualisations that help answer those questions.

The aim of this exercise is for you to:

  • learn to make very simple static maps
  • learn to ask simple questions using spatial data. This is sometimes referred to as 'writing queries'.
  • start thinking about the importance of spatial relationships and data in archaeology.

Let's get started... Hit 'Ctrl'+'Enter' to run the code in any cell in the page.

The map that came to life

We'll start by adding some of the tools we will need. They're not quite like these tools...

They're not quite like these tools...

In [ ]:
%matplotlib inline
# Matplotlib is your tool for drawing graphs and basic maps. You need this!

import pandas as pd
import requests
import fiona
import geopandas as gpd
import ipywidgets as widgets
import bokeh

# These are what we call prerequisites. They are basic toosl you need to get started.
# Pandas manipulate data. Geo-pandas manipulate geographic data. They're also black and white and like to eat bamboo... 
# You need these to manipulate your data!
# Fiona helps with geographic data.
# Requests are for asking for things. It's good to be able to ask for things.
# ipywidgets supports interactivity.


# Remember to hit Ctrl+Enter to make things happen!

Now that we have the basic tools loaded up we need some data. This data is from the Gabii Project, an excavation of a town that was a rival to Rome early on and then became part of the emerging Roman Empire.

In [ ]:
url = 'https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/gabii_areaB_published2.geojson'
# This is where I put the data. It's in a format called geojson, used to represent geometry (shapes) and attributes (text).
request = requests.get(url)
# Please get me the data at that web address (url)
b = bytes(request.content)
# I will use the letter 'b' to refer to the data, like a nickname
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gabii_areaB = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gabii_areaB.head())
# I will use the fiona tool to wrap up all the data from 'b', check the coordinate system (crs) listed in the features
# and print out the first few lines of the file so I can check everything looks ok. 
# Don't worry if you don't understand all the details of this part!

Does that look right?

You should have geometry information about polygons (shapes) of the contexts from area B and a bunch of descriptions and interpretation of the archaeology. Importatly you should be able to spot the column names: Descriptio, objectid, shape_area, shape_leng, SU, definition, finds_note, formation, geometry, interpret. Each column contains a different type of information. Note that SU (stratigraphic unit) = context.

In [ ]:
# Let's visualise the data to double check that all is well

gabii_map1 = gabii_areaB.plot(column='SHAPE_Area', cmap='Pastel2', edgecolor='grey', figsize=(10, 10));
# 'plot' means draw me an image showing the geometry of each feature in my data. 
# We want to control things like the color of the individual features on our map. 
# I used the pastel colorscale command (cmap stands for 'colour map') 
# and asked it to draw the polygons differently based on their size.

Well done! You just made your first archaeological map. It shows all the contexts in area B at Gabii.

This is good, but what if you only want to look at one kind of context? We can select specific types of contexts from within our dataset by searching (aka querying) for them.

How do we know what kind of contexts we have? Looking at what's inside the data describing all those polygon shapes on the map should help.

Start by printing out our data in a tidy way. Just type its name...

In [ ]:
gabii_areaB

In archaeology we often talk about how contexts get created. This is referred to as their 'formation process' - how they get formed. We will describe different formation processes as deposits, cuts, fills, constructions, etc. Look at the 'formation' column in the table, and you'll see these terms in many of the entries.

In [ ]:
# Say you only want to look at the contexts formed by cutting a.k.a. 'Cuts'. Pandas use square brackets [] to make selections. 
# Here we select all the rows (.loc) where the column 'formation' has the value 'cutting'. == means 'this thing is true' in code

gabii_areaB.loc[gabii_areaB['formation'] == 'Cutting']
In [ ]:
# If we want to see this result as a map, we just add the .plot command to the end.

gabii_areaB.loc[gabii_areaB['formation'] == 'Cutting'].plot(column='SHAPE_Area', cmap='OrRd', figsize=(10, 10))
In [ ]:
# Try and do the same thing for contexts that are construction type features
gabii_areaB.loc[gabii_areaB['formation'] == 'Construction']
In [ ]:
# Remember to draw it as a map!
gabii_areaB.loc[gabii_areaB['formation'] == 'Construction'].plot(column='SHAPE_Area', cmap='OrRd', figsize=(10, 10))
In [ ]:
# Let's save these selections of 'constructions' and 'cuts' so we can use them again.
# I've given them names here. These are now 'named variables'
gabii_areaB_constructions = gabii_areaB.loc[gabii_areaB['formation'] == 'Construction']
gabii_areaB_cuts = gabii_areaB.loc[gabii_areaB['formation'] == 'Cutting']
In [ ]:
#Test your named variable by printing it out again, calling it by its name.
gabii_areaB_cuts

So far these searches have been about the attributes of our data. Attributes are information that desribes the pologyons We can also ask questions about spatial relationships or real-world location. First, let's find the bounding box, or real world location and extents of our data. These are the smallest and largest coordinates on the x- and y- axes in the real world. We use the command 'total_bounds' to ask this question. Things 'in bounds' are inside the box.

In [ ]:
gabii_areaB_cuts.total_bounds
In [ ]:
# Now do the same thing for the constructions. The results should be similar, but not identical.
gabii_areaB_constructions.total_bounds
In [ ]:
#Now we will select a single context, SU 5018, a nice wall.
gabii_areaB_5018 = gabii_areaB_constructions.loc[gabii_areaB_constructions['SU'] == 5018]
gabii_areaB_5018
gabii_areaB_5018.plot()
In [ ]:
#Now we will select another single context, SU 1178, a nice floor surface. Ok, an oddly shaped floor surface.
gabii_areaB_1178 = gabii_areaB_constructions.loc[gabii_areaB_constructions['SU'] == 1178]
gabii_areaB_1178
gabii_areaB_1178.plot()

Why might our floor surface have such a funny shape? One reason might be if later cuts were made through the floor.

Maybe someone was digging a pit or a grave. We can check.

In [ ]:
# We use the '.overlay' command to see which contexts from the 'cuts' dataset intersect the floor context# 1178
gpd.overlay(gabii_areaB_1178, gabii_areaB_cuts, how='intersection')

Your results should show one context - 1178 that is a cut that has a spatial relationship with the floor.

Is the cut the reason for the funny shaped floor?

We can make a map showing their relationship spatially.

In [ ]:
# To do this we have to provide a list of the values we are interested in seeing on the map, in square brackets []
gabii_areaB.loc[gabii_areaB['SU'].isin([1178,1235])].plot(column='SU', cmap='Paired', figsize=(10, 10))

What does this map suggest? I'd say there is another reason for the funny shape of the floor. It is just barely cut by the context 1178.

Now we might ask a more complicated question, and look at all the places where cuts have been made into constructions. This is a good way to assess damage to built features at an archaeological site. To do this, we need to ask where cuts and constructions intersect.

In [ ]:
# Start by printing out the list of all the polygons that represent places where cuts and constructions intersect. 
gpd.overlay(gabii_areaB_constructions, gabii_areaB_cuts, how='intersection', use_sindex=True)
In [ ]:
#Now draw all the places our construction features intersect with cuts. Our site is like swiss cheese. 
gpd.overlay(gabii_areaB_constructions, gabii_areaB_cuts, how='intersection').plot(column='SHAPE_Area', cmap='Accent', edgecolor='grey', figsize=(15, 15))
#You'll notice I picked a differnt colour map this time, and made the figure a bit larger.

Now try another question. What if we wanted to find all the contexts defined as soil layers?

'Soils' isn't one of our formations, so we must look elsewhere in our data. Looking at the definitions above, soil is mentioned under the 'definition' column sometimes but it's not the only word in that entry. For example, an entry might read 'Soil below tufo floor 1417'. So we need to look inside the 'definition' entry and search for the word 'soil'. We do this by calling every entry (value) within that column a string 'str' and asking what that string 'contains'. If the word 'soil' appears anywhere, it will match.

In [ ]:
gabii_areaB_soil = gabii_areaB[gabii_areaB['definition'].str.contains('soil')]
gabii_areaB_soil
In [ ]:
# Now create a map of all the contexts defined as being made of soil.
gabii_areaB_soil.plot(column='SHAPE_Area', cmap='Accent', edgecolor='grey', figsize=(15, 15))

What if we wanted to know about contexts that were near other contexts? Let's construct a new query.

This time let's look for the contexts that have notable finds and see if they are near our construction contextx

Step 1: define your dataset of contexts that have notable finds. This is everwhere that has text in that field, or is 'not null'.

In [ ]:
gabii_areaB_cool_finds = gabii_areaB[gabii_areaB['finds_note'].notnull()]
gabii_areaB_cool_finds

Let's look for construction features within 2 meters of contexts with our cool finds.

Define a 2m area around all the contexts with cool finds. 2m is pretty close by.

In [ ]:
gabii_areaB_cool_finds_2m = gabii_areaB_cool_finds.buffer(2)
gabii_areaB_cool_finds_2m.plot(cmap='Accent', edgecolor='grey', figsize=(15, 15))
In [ ]:
# Now we plot the intersection between the buffered finds polygons and our construction polygons
gabii_areaB_cool_finds_2m.intersection(gabii_areaB_constructions).plot(cmap='Accent', edgecolor='grey', figsize=(15, 15))

What do we conclude? Not many constructions overlap with contexts with interesting finds in area B at Gabii.

This ends the tutorial. You can practice writing queries (asking questions of your data) by playing around in this notebook. Try changing values or searching for different types of features or descriptions. You'll be doing this in class during your next practical!

Hopefully you learned to:

  • make very simple static maps
  • ask simple questions using spatial data. This is sometimes referred to as 'writing queries'.
  • start thinking about the importance of spatial relationships and data in archaeology.