Learning about spatial data and maps for archaeology (and other things)

Spatial Thinking and Skills Exercise 1 for Theory and Practice

Made by Rachel Opitz, Archaeology, University of Glasgow

Archaeologists regularly work with maps and data about where sites, samples and objects are found. Archaeological survey in particular relies on the collection and analysis of spatial data to understand patterns across whole areas of the landscape. Are most sites in a given landscape from a particular period? Are there more cairns than anything else? Are sites from all periods represented equally? These are the kinds of questions one might as using survey data.

The aim of this exercise is for you to:

  • learn to investigate patterns in archaeological survey data using spatial analytical tools
  • learn to learn to make maps that illustrate patterns in archaeological survey data
  • start thinking about the meaning of the patterns of sites and features in the landscape

You'll do this using data available from Canmore that describes the location and type of monuments surveyed throughout the Shetlands. Canmore houses survey data on sites and monuments recorded throughout Scotland.

As you may recall from Archaeology of Scotland, to start working with spatial data and maps, you need to put together your toolkit. You're currently working inside something called a jupyter notebook. It's a place to keep notes, pictures, code and maps together. You can add tools and data into your jupyter notebook and then use them to ask spatial questions and make maps and visualisations that help answer those questions.

Let's get started... Hit 'Ctrl'+'Enter' to run the code in any cell in the page.

We'll start by adding some of the tools we will need. They're not quite like these tools...

They're not quite like these tools...

In [ ]:
%matplotlib inline
# Matplotlib is your tool for drawing graphs and basic maps. You need this!

import pandas as pd
import requests
import fiona
import geopandas as gpd
import pysal as ps
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import cluster
from sklearn.preprocessing import scale
from shapely.ops import nearest_points

# These are what we call prerequisites. They are basic toosl you need to get started.
# Pandas manipulate data. Geo-pandas manipulate geographic data. They're also black and white and like to eat bamboo... 
# You need these to manipulate your data!
# Fiona helps with geographic data.
# Requests are for asking for things. It's good to be able to ask for things.
# ipywidgets supports interactivity.

# Remember to hit Ctrl+Enter to make things happen!

Now that we have the basic tools loaded up we need to load the Canmore data. Canmore provides polygons showing the site extents. I've converted them to a set of points located at the centre of each polygon to make things a little simpler in this exercise.

In [ ]:
url = 'http://ropitz.github.io/digitalantiquity/data/CanmoreShetlandPoints.geojson'
# This is where I put the data. It's in a format called geojson, used to represent geometry (shapes) and attributes (text).
request = requests.get(url)
# Please get me the data at that web address (url)
b = bytes(request.content)
# I will use the letter 'b' to refer to the data, like a nickname
with fiona.BytesCollection(b) as f:
    crs = f.crs
    canmore_shetland = gpd.GeoDataFrame.from_features(f, crs=crs)
# I will use the fiona tool to wrap up all the data from 'b', check the coordinate system (crs) listed in the features
# and print out the first few lines of the file so I can check everything looks ok. 
# Don't worry if you don't understand all the details of this part!

Does that look right?

You should see descriptions of different types of monuments and notes on them and links to the original data on Canmore. Get help if this isn't what you are seeing.

In [ ]:
# Let's visualise the data to double check that all is well

canmore_shetland_map = canmore_shetland.plot(column='CLASS', cmap='Pastel2', edgecolor='grey', figsize=(10, 10));
# 'plot' means draw me an image showing the geometry of each feature in my data. 
# We want to control things like the color of different types of sites on our map. 
# I used the pastel colorscale command (cmap stands for 'colour map') 
# and asked it to draw the points differently based on the type they were assigned.

Everything good?

If you see a bunch of pastel dots, you are on the right track.Once the data is loaded properly, we can start exploring it and seeing if there are patterns within this national survey dataset.

Let's start by printing out the attributes of our points as a table. You'll be exploring the data based on it's attributes, so it is important to have a sense of what they are. Scroll down and start to ask yourself some basic questions. What are the feature types assigned in this dataset? Do they mostly have periods assigned to them?

In [ ]:

Hmmm... If you're me at this point you are a bit concerned about all the sites with a 'PERIOD UNASSIGNED' note. Let's try and see how much of our dataset is actually dated in some way.

In [ ]:
#How many features do we have in total. Let's count them.
In [ ]:
# Now let's count the ones that have 'PERIOD UNASSIGNED' as part of the 'TYPE' field.

canmore_shetland_undated = canmore_shetland[canmore_shetland['TYPE'].str.contains('PERIOD UNASSIGNED')]
In [ ]:
# Hmmm... quite a few. Almost 85% of the data! I have questions... 
# Let's look at the distribution of the points that do have dates. The ~ symbol means 'opposite of'.

canmore_shetland_dated = canmore_shetland[~canmore_shetland['TYPE'].str.contains('PERIOD UNASSIGNED')]
In [ ]:
canmore_shetland_dated.plot(column='CLASS', cmap='Pastel2', edgecolor='grey', figsize=(10, 10));
In [ ]:
# Well, there are still some sites everywhere. Compare visually with the earlier map... 
# Make the dated sites red and undated sites blue

fig, ax = plt.subplots()
canmore_shetland_undated.plot(ax=ax, color='blue',)
canmore_shetland_dated.plot(ax=ax, color='red')

Do you think the dated sites are representing the whole dataset well? If yes, you might feelc omfortable working with only the dated sites. If not, you might want to continue to work with all the sites.

What are the implications of this choice?

I'm going to start by exploring only the dated sites and then compare the results with all the sites. I've noticed there are site from 'prehistory' and 'norse' periods. I'm wondering if in general preservation is better in some parts of the Shetlands, and so we will find most of the concentrations of these older sites in one area. To explore this question, I'm going to use something called a 'kde plot'. KDE plots show areas where things concentrate or cluster together.

In [ ]:
canmore_shetland_prehistory = canmore_shetland[canmore_shetland['TYPE'].str.contains('PREHISTORIC')]
canmore_shetland_norse = canmore_shetland[canmore_shetland['TYPE'].str.contains('NORSE')]

# prehistoric sites will be in blue
fig, ax = plt.subplots(figsize=(15,15))

sns.kdeplot(canmore_shetland_prehistory['X'], canmore_shetland_prehistory['Y'], shade=True, cmap='Blues');

# Norse sites will be in red
sns.kdeplot(canmore_shetland_norse['X'], canmore_shetland_norse['Y'], shade=True, cmap='Reds', alpha=0.4);

#All the dated sites will appear as green dots, so we can see where the clusters are within the whole set of sites.
canmore_shetland_dated.plot(ax=ax, color='Green',markersize=3)

What do you see? Tp me there seem to be two clusters of prehistoric sites. The norse sites seem to match with the more northern of those clusters.

Now let's look at the pattern for 19th-20th c. sites. Looking at the table, all these sites have the word 'century' in their period description and nothing else seems to match this pattern.

In [ ]:
#sanity check yourself...
canmore_shetland_modern = canmore_shetland_dated[canmore_shetland_dated['TYPE'].str.contains('CENTURY')]
In [ ]:
#and make a map

fig, ax = plt.subplots(figsize=(15,15))

#modern site clusters will be in blue
sns.kdeplot(canmore_shetland_modern['X'], canmore_shetland_modern['Y'], shade=True, cmap='Blues');

#All the dated sites will appear as green dots, so we can see where the clusters are within the whole set of sites.
canmore_shetland_dated.plot(ax=ax, color='Green',markersize=3, alpha =0.3)

Hmm. Our map appears to be squished up into the top right corner. What could be happening? I suggest we have a few points in our data that are not actually in the Shetlands or otherwise have dodgy coordinates. Let's investigate by sorting by the X coordinate.

In [ ]:
In [ ]:
#Two dodgy points! They have significantly smaller x and y values. Let's get rid of them and replot.

canmore_shetland_modern_clean = canmore_shetland_modern[~canmore_shetland_modern['OBJECTID'].isin([99342,127836])]
fig, ax = plt.subplots(figsize=(15,15))

#modern site clusters will be in blue
sns.kdeplot(canmore_shetland_modern_clean['X'], canmore_shetland_modern_clean['Y'], shade=True, cmap='Blues');

#All the dated sites will appear as green dots, so we can see where the clusters are within the whole set of sites.
canmore_shetland_dated.plot(ax=ax, color='Green',markersize=3, alpha =0.3)

I'd say the modern sites are clustered in the same places as the prehistoric sites, with only the norse ones so far having a different distribution. But so far we have only been looking at the dated sites. Let's check the clustering of all the sites.

In [ ]:
canmore_shetland_clean = canmore_shetland[~canmore_shetland['OBJECTID'].isin([99342,127836])]
fig, ax = plt.subplots(figsize=(15,15))
sns.kdeplot(canmore_shetland_clean['X'], canmore_shetland_clean['Y'], shade=True, cmap='Reds');


OK, after all that we are pretty convinced there are two areas that are dominating the pattern of sites we are seeing. Let's look inside the Norse clusters and explore further. WE can start by seeing what kinds of Norses sites have been identified.

In [ ]:
norse = canmore_shetland_norse.groupby(['TYPE']).count()

OK, so clearly there is some inconsistency in how site types have been named in Canmore. Let's grab everything that is a house, which seems a common category, and compare with farmsteads.

In [ ]:
norse_houses = canmore_shetland_norse[canmore_shetland_norse['TYPE'].str.contains('HOUSE')]
norse_farms = canmore_shetland_norse[canmore_shetland_norse['TYPE'].str.contains('FARM')]

fig, ax = plt.subplots(figsize=(15,15))

sns.kdeplot(norse_houses['X'], norse_houses['Y'], shade=True, cmap='Blues');

sns.kdeplot(norse_farms['X'], norse_farms['Y'], shade=True, cmap='Reds', alpha=0.5);
norse_houses.plot(ax=ax, color='Blue',markersize=5, alpha =0.7);
norse_farms.plot(ax=ax, color='Red',markersize=5, alpha =0.7)

The patterns seem rather different, there is one shared cluster, but then we see variation... Do we expect to find houses and farms together? How might we explain the pattern we are seeing? Let's look at the types of sites found closest to farms. This is done through a 'nearest neighbour' calculation.

In [ ]:
norse_otherf = canmore_shetland_norse[~canmore_shetland_norse['TYPE'].str.contains('FARM')]
In [ ]:
pd.options.mode.chained_assignment = None  # default='warn'
neighbours = norse_otherf.geometry.unary_union
def near(point, pts=neighbours):
    # find the nearest point and return the corresponding Place value
    nearest = norse_otherf.geometry == nearest_points(point, pts)[1]
    return norse_otherf[nearest].TYPE.get_values()[0]
norse_farms['Nearest'] = norse_farms.apply(lambda row: near(row.geometry), axis=1)
In [ ]:
#Let's use the count function to see how many of each type of norse site appears near a norse farm.

What do we conclude?

This ends the tutorial. You can practice exploring patterns in survey data further on your own.

Hopefully you learned to:

  • learn to investigate patterns in archaeological survey data using spatial analytical tools
  • learn to learn to make maps that illustrate patterns in archaeological survey data
  • start thinking about the meaning of the patterns of sites and features in the landscape