Understanding the meanings behind patterns of finds recovered through excavation is a tricky problem. We hope to distinguish activity areas, places devoted to domestic and industrial use, or inhabited places that are distinct from liminal ones. To successfully unravel these patterns, we must look not only at the distributions of different types of finds, but how they correlate with one another, the character of the contexts in which they were recovered, and their own physical and social characteristics. Are they likely to be curated? Are they light and likely to be moved from one area to another by post-depositional processes? It's all a bit of a mess.

The aim of this exercise is for you to:

- learn to work real special finds data from an excavation, in all its messiness
- start thinking about quantitative and spatial approaches to finds data from excavations and how they can help us better understand the patterns we see

You'll do this using data collected by the Gabii Project, a 10+ year excavation in central Italy.

As you may recall from Archaeology of Scotland, to start working with spatial data and imagery, you need to put together your toolkit. You're currently working inside something called a jupyter notebook. It's a place to keep notes, pictures, code and maps together. You can add tools and data into your jupyter notebook and then use them to ask spatial questions and make maps and visualisations that help answer those questions.

In [1]:

```
%matplotlib inline
# Matplotlib is your tool for drawing graphs and basic maps. You need this!
import pandas as pd
import requests
import fiona
import geopandas as gpd
import ipywidgets as widgets
# These are what we call prerequisites. They are basic toosl you need to get started.
# Pandas manipulate data. Geo-pandas manipulate geographic data. They're also black and white and like to eat bamboo...
# You need these to manipulate your data!
# Fiona helps with geographic data.
# Requests are for asking for things. It's good to be able to ask for things.
# ipywidgets supports interactivity.
# Remember to hit Ctrl+Enter to make things happen!
```

In [ ]:

```
url = 'http://ropitz.github.io/digitalantiquity/data/gabii_SU.geojson'
# This is where I put the data. It's in a format called geojson, used to represent geometry (shapes) and attributes (text).
request = requests.get(url)
# Please get me the data at that web address (url)
b = bytes(request.content)
# I will use the letter 'b' to refer to the data, like a nickname
with fiona.BytesCollection(b) as f:
crs = f.crs
gabii_su_poly = gpd.GeoDataFrame.from_features(f, crs=crs)
print(gabii_su_poly.head())
# I will use the fiona tool to wrap up all the data from 'b', check the coordinate system (crs) listed in the features
# and print out the first few lines of the file so I can check everything looks ok.
# Don't worry if you don't understand all the details of this part!
```

In [ ]:

```
# Now we have polygons, the shapes of our contexts. Let's visualise the data to double check that all is well
gabii_map1 = gabii_su_poly.plot(column='DESCRIPTIO', cmap='Blues', edgecolor='grey', figsize=(15, 15));
# 'plot' means draw me an image showing the geometry of each feature in my data.
# We want to control things like the color of different types of features on our map.
# I used the 'Blues' colorscale command (cmap stands for 'colour map')
# and asked it to draw the polygons differently based on the type of feature.
```

The colorscale options are: Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, winter, winter_r

Swap out 'Blue' in the cell above for any of these options...

In [ ]:

```
# Now I'm going to bring in all the basic Gabii special finds data - descriptions, object types, IDs and the contexts from which they come.
# We've had a few special finds over the years.
sf_su = pd.read_csv("https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv")
sf_su
```

One of our area supervisors, Troy, is super excited about tools related to textile production. They're a great example of how we think about special finds at Gabii. Multiple types of finds are related to textile production. Do we find all types everywhere? Are certain types of tools more concentrated in one type of context or one area than others? Troy has lots of questions about the patterns of places where we find these tools. Do they provide evidence for early textile production? Are they a major factor in the city's early wealth? Do we find the same things in later periods? After all, people under the Republic and Empire wore clothes... Loom Weights, spools, and spindle whorls are the most common weaving tools at Gabii.

In [ ]:

```
#Let's pull all those find types out of the big list.
types = ['Loom Weight','Spool','Spindle Whorl']
textile_tools = sf_su.loc[sf_su['SF_OBJECT_TYPE'].isin(types)]
textile_tools
```

In [ ]:

```
# Now let's count up how many of these tools appear in each context (SU).
# This command will print out a list of the number of textile tools in each SU next to that SU number.
pd.value_counts(textile_tools['SU'].values, sort=True)
```

In [ ]:

```
#Then let's combine our polygons representing context shape and location
#with the special finds data
# We do this with a command called 'merge'
gabii_textools = gabii_su_poly.merge(textile_tools, on='SU')
# adding .head() to the end of a dataframe name will print out just the first few rows.
gabii_textools.head()
```

In [ ]:

```
# If we want to see this result as a map, we just add the .plot command to the end of the dataframe's name
gabii_textools.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
```

OK, what do you see here? Compare the distribution of each type of textile tool. Do some types seem to be concentrated in certain areas? How might you check? What factors might contribute to this pattern? Do big layer simply aggregate lots of stuff? Do late dumps contain early materials? Why would one type of tool appear where the others don't?

In [ ]:

```
# We can try and see the relationship between layer size and count by sorting
#our list of finds by the surface area of each layer.
# We use the command 'sort_values'
gabii_textools.sort_values(by=['Shape_Area'],ascending=False)
```

In [ ]:

```
# We have a couple enormous colluvial layers that should probably be excuded.
# Outliers will mess with your analysis. Cut out these layers by excluding SUs with a surface area greater than 800.
gabii_textools2 = gabii_textools.loc[gabii_textools['Shape_Area']<800]
# If we want to see this result as a map, we just add the .plot command to the end again.
```

In [ ]:

```
# That's better. Plot the results to see that you've removed the big colluvial layers.
gabii_textools2.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
```

In [ ]:

```
# OK, count up how many of each tool type appears in each SU using the 'groupby' command
textools_counts = gabii_textools2.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)
# Sort the list so that the SUs with the most stuff end up at the top.
textools_counts.sort_values(by=['Loom Weight','Spindle Whorl','Spool'], ascending=False)
```

In [ ]:

```
# Merge your textile tool counts with your spatial data for the contexts
# Because both dataframes have a 'SU' column, you can use this to match up the rows.
gabii_textools_counts = gabii_su_poly.merge(textools_counts, on='SU')
gabii_textools_counts.head()
```

In [ ]:

```
# Let's start by looking at each class of textile tool individually.
# Plot the counts of each type of find spatially
gabii_textools_counts.plot(column='Loom Weight', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
gabii_textools_counts.plot(column='Spindle Whorl', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
gabii_textools_counts.plot(column='Spool', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
```

In [ ]:

```
base = gabii_textools_counts.plot(column='Loom Weight', cmap='Blues', figsize=(15, 15), legend=True, alpha=0.7)
gabii_textools_counts.plot(ax=base, column='Spindle Whorl', cmap='Reds', alpha=0.7)
gabii_textools_counts.plot(ax=base, column='Spool', cmap='Greens', alpha=0.7);
```

In [ ]:

```
# It's hard to see what's happening when we have to scroll.
# Let's put the maps side by side.
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=3,figsize=(15, 5))
gabii_textools_counts.plot(column='Loom Weight', cmap='autumn', ax=axes[0], legend=True).axis('equal')
gabii_textools_counts.plot(column='Spindle Whorl', cmap='autumn', ax=axes[1]).axis('equal')
gabii_textools_counts.plot(column='Spool', cmap='autumn',ax=axes[2]).axis('equal')
```

Can you see any patterns here? Do the different types of tools concentrate in the same parts of the site? Why might different types of tools have different distributions?

In [ ]:

```
# I think the distributions of different weaving tools vary.
# To investigate further, we are going to need more tools.
import pysal
from sklearn import cluster
import seaborn as sns
import numpy as np
```

We're going to use cluster analysis to try and better understand our patterns. Clustering is a broad set of techniques for finding groups within a data set. When we cluster observations, we want items in the same group to be similar and items in different groups to be dissimilar. Clustering allows us to identify which things are alike on the basis of multiple characteristics. K-means clustering is a simple and frequently applied clustering method for splitting a dataset into a set of k (k being an arbitrary number you get to choose) groups.

In [ ]:

```
# Next step: cluster together contexts where the pattern of the three types of textile tools are similar,
# with and without respect to the size of the context.
# Make 5 clusters and account for the size of the context and counts of different types of tools. Drop all the other fields.
km5 = cluster.KMeans(n_clusters=5)
km5cls = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)
km5cls
```

Each cluster produced should contain the SUs that are similar to one another on the basis of the number of each type of textile tool and the size of the surface area of the SU.

In [ ]:

```
# Plot the clusters, groups of contexts that have similar textile tool assemblages.
# Give a different colour to the SUs that belong to each cluster.
f1, ax = plt.subplots(1, figsize=(15,15))
gabii_textools_counts.assign(cl=km5cls.labels_)\
.plot(column='cl', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)
ax.set_axis_off()
plt.show()
```

In [ ]:

```
#Do the same, ignoring the size of the context.
km5 = cluster.KMeans(n_clusters=5)
km5cls2 = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)
f2, ax = plt.subplots(1, figsize=(15,15))
gabii_textools_counts.assign(cl2=km5cls2.labels_)\
.plot(column='cl2', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)
ax.set_axis_off()
plt.show()
```

The patterns are definitely different. How can we interpret the fact that context size affects the pattern of the distribution of textile tools? Do big units, which perhaps represent dumps or colluvial mashups, have a fundamentally different character than the varied small contexts?

In [ ]:

```
# Look at the difference with and without context size taken into accoutn.
fig, axes = plt.subplots(ncols=2,figsize=(15, 5))
gabii_textools_counts.assign(cl2=km5cls2.labels_)\
.plot(column='cl2', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[0]).axis('equal')
gabii_textools_counts.assign(cl=km5cls.labels_)\
.plot(column='cl', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[1]).axis('equal')
```

In [30]:

```
# assign the cluster IDs to each context permanently
gabiitextools_clas = gabii_textools_counts.assign(cl=km5cls.labels_)
gabiitextools_class = gabiitextools_clas.assign(cl2=km5cls2.labels_)
gabiitextools_class.head()
```

Out[30]:

In [31]:

```
# Now let's look at some individual classes, with and without context size accounted for in the analyses.
gabiitextools_class0=gabiitextools_class.loc[gabiitextools_class['cl']==0]
gabiitextools_class0noarea=gabiitextools_class.loc[gabiitextools_class['cl2']==0]
fig, axes = plt.subplots(ncols=2,figsize=(15, 5))
gabiitextools_class0.plot(ax=axes[0], legend=True).axis('equal')
gabiitextools_class0noarea.plot(ax=axes[1]).axis('equal')
```

Out[31]:

In [32]:

```
# What happens when we change the number of clusters (groups)?
km7 = cluster.KMeans(n_clusters=7)
km7cls3 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)
f3, ax = plt.subplots(1, figsize=(15,15))
gabii_textools_counts.assign(cl3=km7cls3.labels_)\
.plot(column='cl3', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)
ax.set_axis_off()
plt.show()
```

That also changes things. Without going into too much detail, finding the ideal number of clusters is a black art. Try playing around with the number of clusters in the notebook, or the size cut-off for inclusion. Clustering = black magic

In [33]:

```
# Use 7 clusters and plot them
km7 = cluster.KMeans(n_clusters=7)
km7cls4 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)
f4, ax = plt.subplots(1, figsize=(15,15))
gabii_textools_counts.assign(cl4=km7cls4.labels_)\
.plot(column='cl4', categorical=True, legend=True, \
linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)
ax.set_axis_off()
plt.show()
```

In [34]:

```
# Let's set up to investigate some of the individual clusters
gabiitextools_class3=gabiitextools_class.assign(cl3=km7cls3.labels_)
gabiitextools_class4=gabiitextools_class3.assign(cl4=km7cls4.labels_)
gabiitextools_class4.head()
```

Out[34]:

In [35]:

```
# set up variables to store several classes, with and without context size taken into account.
gabiitextools_class0=gabiitextools_class4.loc[gabiitextools_class4['cl']==0]
gabiitextools_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==0]
gabiitextools_k7_class0=gabiitextools_class4.loc[gabiitextools_class4['cl3']==0]
gabiitextools_k7_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==0]
fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))
gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')
axes[0,0].set_title('cl - 5 clusters - area')
gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')
axes[1,0].set_title('cl2 - 5 clusters - no area')
gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')
axes[0,1].set_title('cl3 - 7 clusters - area')
gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')
axes[1,1].set_title('cl - 7 clusters - no area')
```

Out[35]:

In [36]:

```
gabiitextools_class3=gabiitextools_class4.loc[gabiitextools_class4['cl']==3]
gabiitextools_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==3]
gabiitextools_k7_class3=gabiitextools_class4.loc[gabiitextools_class4['cl3']==3]
gabiitextools_k7_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==3]
fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))
gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')
axes[0,0].set_title('cl - 5 clusters - area')
gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')
axes[1,0].set_title('cl2 - 5 clusters - no area')
gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')
axes[0,1].set_title('cl3 - 7 clusters - area')
gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')
axes[1,1].set_title('cl - 7 clusters - no area')
```

Out[36]:

In [37]:

```
# Maybe some of our (especially the small ones) contexts are similar to or influenced by their immediate neighbours (surroundings)
# We can weight the values in one context to account for its neighbour friends.
w5 = pysal.weights.KNN.from_dataframe(gabiitextools_class4, k=5)
w5.transform = 'r'
#neighbors & weights of the 5th observation (0-index remember)
w5[4]
```

Out[37]:

In [38]:

```
# print out a context and its immediate neigbours as a table
self_and_neighbors = [4]
self_and_neighbors.extend(w5.neighbors[4])
print(self_and_neighbors)
gabiitextools_class4.loc[self_and_neighbors]
```

Out[38]:

In [39]:

```
# Do the same thing with another set
# You can substitute other values for the '11' here and see what happens.
w5[11]
self_and_neighbors = [11]
self_and_neighbors.extend(w5.neighbors[11])
print(self_and_neighbors)
gabiitextools_class4.loc[self_and_neighbors]
```

Out[39]:

In [40]:

```
# Sanity check by plotting a set of self and neighbours as a map.
# Do the counts of differnt textile tools have similar patterns?
# Are they inconsistent? How might we interpret this local pattern?
n11 = gabiitextools_class4.loc[self_and_neighbors]
n11.plot(column='Loom Weight', cmap='autumn', legend=True)
n11.plot(column='Spool', cmap='autumn')
n11.plot(column='Spindle Whorl', cmap='autumn')
```

Out[40]:

In [41]:

```
# We can visualise how the counts of different types of finds appear in other ways.
# Do loom weights appear more often when spools do? What does this mean?
sns.pairplot(n11.drop(['SU','geometry','OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','cl','cl2','cl3','cl4'], axis=1))
```

Out[41]:

In [46]:

```
# Are some clusters more correlated than others?
sns.pairplot(gabiitextools_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind="reg")
plt.show()
```

In [47]:

```
# Do 7 clusters as oppossed to 5 result in more correlation?
sns.pairplot(gabiitextools_k7_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind="reg")
plt.show()
```

Hopefully you have:

- started thinking (and perhaps are a bit confused) about how spatial patterns of different types of finds are created, and how we can interpret them when studying data from an excavation.
- learned to combine spatial data and descriptive tables.
- learned to use some basic clustering tools, and reinforced your knowledge about how to make charts and maps.

We'll be talking more about spatial analysis methods in archaeology throughout the course.