Exercise - exploring published data from the ADS

To start working with the data from the ADS, you need to put together your toolkit. You're currently working inside something called a jupyter notebook, which will be a key part of your analysis toolkit. It's a place to keep notes, pictures, code and maps together. You can add tools and data into your jupyter notebook and then use them to ask questions and make maps and visualisations that help answer those questions.

Here's what you need to do: Work your way down the page, carefully reading the notes and comments in each cell (each box on the page) and looking through the code. Anything written with a # symbol in front of it is a comment I've included to explain to you what the code is doing. Comments will appear in blue type. Code will be a mix of black, green and purple type. In jupyter notebooks you hit 'Ctrl+Enter' to execute the code in each cell. Once you have read and understood the comments and code hit 'Ctrl+Enter' to execute the code and then think about the results.

You can make changes to the code to ask different questions. Simply double click in the cell to be able to type in it, make your changes, and re-run the code by simply hitting 'Ctrl+Enter' in the cell again. If things go wrong, 'ctrl+z' will undo your changes and you can try again!

Let's get started... Hit 'Ctrl'+'Enter' to run the code in any cell in the page.

In [ ]:
%matplotlib inline
# Matplotlib is your tool for drawing graphs and basic maps. You need this!

import pandas as pd
import requests
import fiona
import geopandas as gpd
import matplotlib.pyplot as plt

# These are what we call prerequisites. They are basic toosl you need to get started.
# Pandas manipulate data. Geo-pandas manipulate geographic data. They're also black and white and like to eat bamboo... 
# You need these to manipulate your data!
# Fiona helps with geographic data.
# Requests are for asking for things. It's good to be able to ask for things.



# Remember to hit Ctrl+Enter to make things happen!

Any data published as CSV files with the ADS can be pulled into a jupyter notebook for exploration, asking your own questions with it, and generally doing research. For example, play with the data from Ewan Campell's 2007 publication "Imported Material in Atlantic Britain and Ireland c.AD400-800", found at http://archaeologydataservice.ac.uk/archives/view/campbell_cba_2007/downloads.cfm

In [ ]:
#start by reading in his table of glass artefacts and printing it out
glass = pd.read_csv("http://ads.ahds.ac.uk/catalogue/adsdata/arch-788-1/dissemination/csv/imports_database/Glass.csv", encoding = "ISO-8859-1")

# Print a preview of any table by typing it's name.
glass
In [ ]:
# Get all the finds from the table where the "Form" is "Cone Beaker"
ConeBeakers = glass.loc[glass['Form'].isin(["Cone beaker"])]
ConeBeakers

# Want a different vessel form? Just change "Cone beaker" to something else you see in the table in the code above. 
In [ ]:
#Now you can start to explore. Which sites have most of the Cone beakers? 
#Make a histogram of how often each site appears in the Cone Beaker table.
ConeBeakers['Site'].value_counts().plot(kind='bar',figsize=(15,15))
In [ ]:
#Whithorn has a lot of Cone Beakers. I wonder what else is there?
Whithorn = glass.loc[glass['Site'].isin(["WHITHORN"])]
Whithorn
Whithorn['Form'].value_counts().plot(kind='bar',figsize=(15,15))
#Unusual dominance of Cone Beakers over other forms?
In [ ]:
#Let's look at the two sites with lots of cone beakers to see if the trend seems to hold.
WhithornDinas = glass.loc[glass['Site'].isin(["WHITHORN","DINAS POWYS"])]
WhithornDinas
In [ ]:
WhithornDinas['Form'].value_counts().plot(kind='bar',figsize=(15,15))
#Unusual dominance of Cone Beakers over other forms seems to hold across the two sites? 
In [ ]:
#Let's look at Dinas Powys on its own.
Dinas = glass.loc[glass['Site'].isin(["DINAS POWYS"])]
Dinas['Form'].value_counts().plot(kind='bar',figsize=(15,15))
In [ ]:
#This looks like an important class of materials! Let's look at variability within the group across all sites. 
#What do you note in the Description and metal fields? 
ConeBeakers
In [ ]:
#You can effectively search on keywords. Get everything Green
GreenConeBeaker = ConeBeakers.loc[ConeBeakers['metal'].str.contains("green",na = False)]
In [ ]:
# group the green Cone Beakers by their description - hope for some consistency!
GreenConeBeaker['metal'].value_counts().plot(kind='bar',figsize=(10,10))

You can do this with any published data in CSV form online. The ADS archive is full of them. We've just scratched the surface here. You can do text mining, search across multiple tables or archives to compare things, quantify finds or stratigraphic data, start doing some stats...