Learning about spatial data and maps for archaeology (and other things)

Spatial Thinking and Skills Exercise for Theory and Practice

Made by Rachel Opitz, Archaeology, University of Glasgow

Understanding the meanings behind patterns of finds recovered through excavation is a tricky problem. We hope to distinguish activity areas, places devoted to domestic and industrial use, or inhabited places that are distinct from liminal ones. To successfully unravel these patterns, we must look not only at the distributions of different types of finds, but how they correlate with one another, the character of the contexts in which they were recovered, and their own physical and social characteristics. Are they likely to be curated? Are they light and likely to be moved from one area to another by post-depositional processes? It's all a bit of a mess.

The aim of this exercise is for you to:

  • learn to work real special finds data from an excavation, in all its messiness
  • start thinking about quantitative and spatial approaches to finds data from excavations and how they can help us better understand the patterns we see

You'll do this using data collected by the Gabii Project, a 10+ year excavation in central Italy.

As you may recall from Archaeology of Scotland, to start working with spatial data and imagery, you need to put together your toolkit. You're currently working inside something called a jupyter notebook. It's a place to keep notes, pictures, code and maps together. You can add tools and data into your jupyter notebook and then use them to ask spatial questions and make maps and visualisations that help answer those questions.

Let's get started... Hit 'Ctrl'+'Enter' to run the code in any cell in the page.

In [1]:
%matplotlib inline
# Matplotlib is your tool for drawing graphs and basic maps. You need this!

import pandas as pd
import requests
import fiona
import geopandas as gpd
import ipywidgets as widgets

# These are what we call prerequisites. They are basic toosl you need to get started.
# Pandas manipulate data. Geo-pandas manipulate geographic data. They're also black and white and like to eat bamboo... 
# You need these to manipulate your data!
# Fiona helps with geographic data.
# Requests are for asking for things. It's good to be able to ask for things.
# ipywidgets supports interactivity.


# Remember to hit Ctrl+Enter to make things happen!
In [ ]:
url = 'http://ropitz.github.io/digitalantiquity/data/gabii_SU.geojson'
# This is where I put the data. It's in a format called geojson, used to represent geometry (shapes) and attributes (text).
request = requests.get(url)
# Please get me the data at that web address (url)
b = bytes(request.content)
# I will use the letter 'b' to refer to the data, like a nickname
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gabii_su_poly = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gabii_su_poly.head())
# I will use the fiona tool to wrap up all the data from 'b', check the coordinate system (crs) listed in the features
# and print out the first few lines of the file so I can check everything looks ok. 
# Don't worry if you don't understand all the details of this part!
In [ ]:
# Now we have polygons, the shapes of our contexts. Let's visualise the data to double check that all is well

gabii_map1 = gabii_su_poly.plot(column='DESCRIPTIO', cmap='Blues', edgecolor='grey', figsize=(15, 15));
# 'plot' means draw me an image showing the geometry of each feature in my data. 
# We want to control things like the color of different types of features on our map. 
# I used the 'Blues' colorscale command (cmap stands for 'colour map') 
# and asked it to draw the polygons differently based on the type of feature.

The colorscale options are: Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, winter, winter_r

Swap out 'Blue' in the cell above for any of these options...

In [ ]:
# Now I'm going to bring in all the basic Gabii special finds data - descriptions, object types, IDs and the contexts from which they come.
# We've had a few special finds over the years.
sf_su = pd.read_csv("https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv")
sf_su

One of our area supervisors, Troy, is super excited about tools related to textile production. They're a great example of how we think about special finds at Gabii. Multiple types of finds are related to textile production. Do we find all types everywhere? Are certain types of tools more concentrated in one type of context or one area than others? Troy has lots of questions about the patterns of places where we find these tools. Do they provide evidence for early textile production? Are they a major factor in the city's early wealth? Do we find the same things in later periods? After all, people under the Republic and Empire wore clothes... Loom Weights, spools, and spindle whorls are the most common weaving tools at Gabii.

In [ ]:
#Let's pull all those find types out of the big list.
types = ['Loom Weight','Spool','Spindle Whorl']
textile_tools = sf_su.loc[sf_su['SF_OBJECT_TYPE'].isin(types)]
textile_tools
In [ ]:
# Now let's count up how many of these tools appear in each context (SU).
# This command will print out a list of the number of textile tools in each SU next to that SU number.
pd.value_counts(textile_tools['SU'].values, sort=True)
In [ ]:
#Then let's combine our polygons representing context shape and location
#with the special finds data
# We do this with a command called 'merge'

gabii_textools = gabii_su_poly.merge(textile_tools, on='SU')

# adding .head() to the end of a dataframe name will print out just the first few rows.
gabii_textools.head()
In [ ]:
# If we want to see this result as a map, we just add the .plot command to the end of the dataframe's name

gabii_textools.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)

OK, what do you see here? Compare the distribution of each type of textile tool. Do some types seem to be concentrated in certain areas? How might you check? What factors might contribute to this pattern? Do big layer simply aggregate lots of stuff? Do late dumps contain early materials? Why would one type of tool appear where the others don't?

In [ ]:
# We can try and see the relationship between layer size and count by sorting
#our list of finds by the surface area of each layer.
# We use the command 'sort_values'
gabii_textools.sort_values(by=['Shape_Area'],ascending=False)
In [ ]:
# We have a couple enormous colluvial layers that should probably be excuded.
# Outliers will mess with your analysis. Cut out these layers by excluding SUs with a surface area greater than 800.
gabii_textools2 = gabii_textools.loc[gabii_textools['Shape_Area']<800]
# If we want to see this result as a map, we just add the .plot command to the end again.
In [ ]:
# That's better. Plot the results to see that you've removed the big colluvial layers.
gabii_textools2.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
In [ ]:
# OK, count up how many of each tool type appears in each SU using the 'groupby' command
textools_counts = gabii_textools2.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)
# Sort the list so that the SUs with the most stuff end up at the top.
textools_counts.sort_values(by=['Loom Weight','Spindle Whorl','Spool'], ascending=False)
In [ ]:
# Merge your textile tool counts with your spatial data for the contexts
# Because both dataframes have a 'SU' column, you can use this to match up the rows. 
gabii_textools_counts = gabii_su_poly.merge(textools_counts, on='SU')
gabii_textools_counts.head()
In [ ]:
# Let's start by looking at each class of textile tool individually. 
# Plot the counts of each type of find spatially
gabii_textools_counts.plot(column='Loom Weight', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
gabii_textools_counts.plot(column='Spindle Whorl', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
gabii_textools_counts.plot(column='Spool', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)
In [ ]:
base = gabii_textools_counts.plot(column='Loom Weight', cmap='Blues', figsize=(15, 15), legend=True, alpha=0.7)
gabii_textools_counts.plot(ax=base, column='Spindle Whorl', cmap='Reds', alpha=0.7)
gabii_textools_counts.plot(ax=base, column='Spool', cmap='Greens', alpha=0.7);
In [ ]:
# It's hard to see what's happening when we have to scroll. 
# Let's put the maps side by side.
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=3,figsize=(15, 5))
gabii_textools_counts.plot(column='Loom Weight', cmap='autumn',  ax=axes[0], legend=True).axis('equal')
gabii_textools_counts.plot(column='Spindle Whorl', cmap='autumn', ax=axes[1]).axis('equal')
gabii_textools_counts.plot(column='Spool', cmap='autumn',ax=axes[2]).axis('equal')

Can you see any patterns here? Do the different types of tools concentrate in the same parts of the site? Why might different types of tools have different distributions?

In [ ]:
# I think the distributions of different weaving tools vary.
# To investigate further, we are going to need more tools.
import pysal
from sklearn import cluster
import seaborn as sns
import numpy as np

We're going to use cluster analysis to try and better understand our patterns. Clustering is a broad set of techniques for finding groups within a data set. When we cluster observations, we want items in the same group to be similar and items in different groups to be dissimilar. Clustering allows us to identify which things are alike on the basis of multiple characteristics. K-means clustering is a simple and frequently applied clustering method for splitting a dataset into a set of k (k being an arbitrary number you get to choose) groups.

In [ ]:
# Next step: cluster together contexts where the pattern of the three types of textile tools are similar, 
# with and without respect to the size of the context.
# Make 5 clusters and account for the size of the context and counts of different types of tools. Drop all the other fields.
km5 = cluster.KMeans(n_clusters=5)
km5cls = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)
km5cls

Each cluster produced should contain the SUs that are similar to one another on the basis of the number of each type of textile tool and the size of the surface area of the SU.

In [ ]:
# Plot the clusters, groups of contexts that have similar textile tool assemblages.
# Give a different colour to the SUs that belong to each cluster.

f1, ax = plt.subplots(1, figsize=(15,15))

gabii_textools_counts.assign(cl=km5cls.labels_)\
   .plot(column='cl', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)

ax.set_axis_off()

plt.show()
In [ ]:
#Do the same, ignoring the size of the context.
km5 = cluster.KMeans(n_clusters=5)
km5cls2 = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)
f2, ax = plt.subplots(1, figsize=(15,15))

gabii_textools_counts.assign(cl2=km5cls2.labels_)\
   .plot(column='cl2', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)

ax.set_axis_off()

plt.show()

The patterns are definitely different. How can we interpret the fact that context size affects the pattern of the distribution of textile tools? Do big units, which perhaps represent dumps or colluvial mashups, have a fundamentally different character than the varied small contexts?

In [ ]:
# Look at the difference with and without context size taken into accoutn.
fig, axes = plt.subplots(ncols=2,figsize=(15, 5))
gabii_textools_counts.assign(cl2=km5cls2.labels_)\
   .plot(column='cl2', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[0]).axis('equal')
gabii_textools_counts.assign(cl=km5cls.labels_)\
   .plot(column='cl', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[1]).axis('equal')
In [30]:
# assign the cluster IDs to each context permanently
gabiitextools_clas = gabii_textools_counts.assign(cl=km5cls.labels_)
gabiitextools_class = gabiitextools_clas.assign(cl2=km5cls2.labels_)
gabiitextools_class.head()
Out[30]:
DESCRIPTIO OBJECTID SU Shape_Area Shape_Length geometry Loom Weight Spindle Whorl Spool cl cl2
0 DEP 28 2258 77.018159 45.450094 POLYGON Z ((1416013.191 5144212.5308 61.347999... 0.0 0.0 1.0 2 0
1 DEP 44 516 43.687257 39.119255 POLYGON Z ((1415914.1831 5144202.0853 64.69299... 1.0 0.0 1.0 3 0
2 DEP 53 587 63.212851 42.805154 POLYGON Z ((1415907.6408 5144196.9608 64.63250... 1.0 0.0 0.0 2 4
3 DEP 77 1306 21.609902 35.183961 POLYGON Z ((1415935.544 5144182.803 63.4133000... 0.0 0.0 1.0 3 0
4 DEP 115 1327 122.494594 52.841679 POLYGON Z ((1415936.349 5144162.362 63.1217999... 1.0 0.0 0.0 4 4
In [31]:
# Now let's look at some individual classes, with and without context size accounted for in the analyses.
gabiitextools_class0=gabiitextools_class.loc[gabiitextools_class['cl']==0]
gabiitextools_class0noarea=gabiitextools_class.loc[gabiitextools_class['cl2']==0]
fig, axes = plt.subplots(ncols=2,figsize=(15, 5))
gabiitextools_class0.plot(ax=axes[0], legend=True).axis('equal')
gabiitextools_class0noarea.plot(ax=axes[1]).axis('equal')
Out[31]:
(1415891.3709749999, 1416036.1953249997, 5144113.125025001, 5144254.3864750005)
In [32]:
# What happens when we change the number of clusters (groups)?
km7 = cluster.KMeans(n_clusters=7)
km7cls3 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)
f3, ax = plt.subplots(1, figsize=(15,15))

gabii_textools_counts.assign(cl3=km7cls3.labels_)\
   .plot(column='cl3', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)

ax.set_axis_off()

plt.show()

That also changes things. Without going into too much detail, finding the ideal number of clusters is a black art. Try playing around with the number of clusters in the notebook, or the size cut-off for inclusion. Clustering = black magic

In [33]:
# Use 7 clusters and plot them
km7 = cluster.KMeans(n_clusters=7)
km7cls4 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)
f4, ax = plt.subplots(1, figsize=(15,15))

gabii_textools_counts.assign(cl4=km7cls4.labels_)\
   .plot(column='cl4', categorical=True, legend=True, \
         linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)

ax.set_axis_off()

plt.show()
In [34]:
# Let's set up to investigate some of the individual clusters
gabiitextools_class3=gabiitextools_class.assign(cl3=km7cls3.labels_)
gabiitextools_class4=gabiitextools_class3.assign(cl4=km7cls4.labels_)
gabiitextools_class4.head()
Out[34]:
DESCRIPTIO OBJECTID SU Shape_Area Shape_Length geometry Loom Weight Spindle Whorl Spool cl cl2 cl3 cl4
0 DEP 28 2258 77.018159 45.450094 POLYGON Z ((1416013.191 5144212.5308 61.347999... 0.0 0.0 1.0 2 0 6 1
1 DEP 44 516 43.687257 39.119255 POLYGON Z ((1415914.1831 5144202.0853 64.69299... 1.0 0.0 1.0 3 0 2 6
2 DEP 53 587 63.212851 42.805154 POLYGON Z ((1415907.6408 5144196.9608 64.63250... 1.0 0.0 0.0 2 4 6 6
3 DEP 77 1306 21.609902 35.183961 POLYGON Z ((1415935.544 5144182.803 63.4133000... 0.0 0.0 1.0 3 0 5 1
4 DEP 115 1327 122.494594 52.841679 POLYGON Z ((1415936.349 5144162.362 63.1217999... 1.0 0.0 0.0 4 4 1 6
In [35]:
# set up variables to store several classes, with and without context size taken into account.
gabiitextools_class0=gabiitextools_class4.loc[gabiitextools_class4['cl']==0]
gabiitextools_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==0]
gabiitextools_k7_class0=gabiitextools_class4.loc[gabiitextools_class4['cl3']==0]
gabiitextools_k7_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==0]
fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))
gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')
axes[0,0].set_title('cl - 5 clusters - area')
gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')
axes[1,0].set_title('cl2 - 5 clusters - no area')
gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')
axes[0,1].set_title('cl3 - 7 clusters - area')
gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')
axes[1,1].set_title('cl - 7 clusters - no area')
Out[35]:
Text(0.5,1,'cl - 7 clusters - no area')
In [36]:
gabiitextools_class3=gabiitextools_class4.loc[gabiitextools_class4['cl']==3]
gabiitextools_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==3]
gabiitextools_k7_class3=gabiitextools_class4.loc[gabiitextools_class4['cl3']==3]
gabiitextools_k7_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==3]
fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))
gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')
axes[0,0].set_title('cl - 5 clusters - area')
gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')
axes[1,0].set_title('cl2 - 5 clusters - no area')
gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')
axes[0,1].set_title('cl3 - 7 clusters - area')
gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')
axes[1,1].set_title('cl - 7 clusters - no area')
Out[36]:
Text(0.5,1,'cl - 7 clusters - no area')
In [37]:
# Maybe some of our (especially the small ones) contexts are similar to or influenced by their immediate neighbours (surroundings)
# We can weight the values in one context to account for its neighbour friends.
w5 = pysal.weights.KNN.from_dataframe(gabiitextools_class4, k=5)
w5.transform = 'r'
 #neighbors & weights of the 5th observation (0-index remember)
w5[4]
Out[37]:
{70: 0.2, 29: 0.2, 89: 0.2, 49: 0.2, 53: 0.2}
In [38]:
# print out a context and its immediate neigbours as a table
self_and_neighbors = [4]
self_and_neighbors.extend(w5.neighbors[4])
print(self_and_neighbors)
gabiitextools_class4.loc[self_and_neighbors]
[4, 70, 29, 89, 49, 53]
Out[38]:
DESCRIPTIO OBJECTID SU Shape_Area Shape_Length geometry Loom Weight Spindle Whorl Spool cl cl2 cl3 cl4
4 DEP 115 1327 122.494594 52.841679 POLYGON Z ((1415936.349 5144162.362 63.1217999... 1.0 0.0 0.0 4 4 1 6
70 DEP 2007 1412 14.598572 18.034358 POLYGON Z ((1415950.9555 5144163.0872 62.55220... 1.0 0.0 0.0 0 4 5 6
29 DEP 1204 1182 6.973412 20.548798 POLYGON Z ((1415939.8974 5144165.7659 63.16879... 1.0 0.0 0.0 0 4 0 6
89 FL 2647 1173 62.610385 86.251233 POLYGON Z ((1415934.0065 5144169.8251 63.22079... 2.0 0.0 0.0 2 4 6 6
49 DEP 1766 1279 14.907877 31.751468 POLYGON Z ((1415936.5361 5144180.7074 63.37669... 4.0 1.0 0.0 0 4 5 0
53 N-S TOMB 1815 1380 0.384362 2.404675 POLYGON Z ((1415956.7658 5144170.944 62.338099... 1.0 0.0 0.0 0 4 0 6
In [39]:
# Do the same thing with another set
# You can substitute other values for the '11' here and see what happens.
w5[11]
self_and_neighbors = [11]
self_and_neighbors.extend(w5.neighbors[11])
print(self_and_neighbors)
gabiitextools_class4.loc[self_and_neighbors]
[11, 79, 128, 126, 5, 107]
Out[39]:
DESCRIPTIO OBJECTID SU Shape_Area Shape_Length geometry Loom Weight Spindle Whorl Spool cl cl2 cl3 cl4
11 DEP 267 3012 35.964558 24.642614 POLYGON Z ((1415981.4906 5144220.1063 61.93349... 1.0 0.0 0.0 3 4 2 6
79 DEP 2202 3089 41.966089 30.456902 POLYGON Z ((1415984.002 5144225.9554 61.885899... 1.0 0.0 0.0 3 4 2 6
128 FILL 3375 3311 5.350597 8.938988 POLYGON Z ((1415984.1495 5144221.7165 61.46430... 0.0 0.0 2.0 0 0 0 1
126 DEP 3361 3306 8.341407 12.880923 POLYGON Z ((1415984.0478 5144221.7844 61.49159... 0.0 1.0 3.0 0 3 0 4
5 FILL 134 3021 10.873247 13.556097 POLYGON Z ((1415976.6777 5144220.3635 61.86689... 0.0 0.0 3.0 0 0 0 4
107 DEP 2808 3167 9.853576 13.615772 POLYGON Z ((1415982.0646 5144219.3923 61.6106,... 0.0 0.0 1.0 0 0 0 1
In [40]:
# Sanity check by plotting a set of self and neighbours as a map.
# Do the counts of differnt textile tools have similar patterns?
# Are they inconsistent? How might we interpret this local pattern?
n11 = gabiitextools_class4.loc[self_and_neighbors]
n11.plot(column='Loom Weight', cmap='autumn', legend=True)
n11.plot(column='Spool', cmap='autumn')
n11.plot(column='Spindle Whorl', cmap='autumn')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f719f2ec748>
In [41]:
# We can visualise how the counts of different types of finds appear in other ways.
# Do loom weights appear more often when spools do? What does this mean?
sns.pairplot(n11.drop(['SU','geometry','OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','cl','cl2','cl3','cl4'], axis=1))
Out[41]:
<seaborn.axisgrid.PairGrid at 0x7f719f159358>
In [46]:
# Are some clusters more correlated than others?
sns.pairplot(gabiitextools_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind="reg")
plt.show()
/srv/conda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
In [47]:
# Do 7 clusters as oppossed to 5 result in more correlation?
sns.pairplot(gabiitextools_k7_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind="reg")
plt.show()
/srv/conda/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval

That concludes this tutorial.

Hopefully you have:

  • started thinking (and perhaps are a bit confused) about how spatial patterns of different types of finds are created, and how we can interpret them when studying data from an excavation.
  • learned to combine spatial data and descriptive tables.
  • learned to use some basic clustering tools, and reinforced your knowledge about how to make charts and maps.

We'll be talking more about spatial analysis methods in archaeology throughout the course.