In this notebook we'll explore the temporal dimensions of the object
data. When were objects created, collected, or used? To do that we'll extract the nested temporal data, see what's there, and create a few charts.
See here for an introduction to the object
data, and here to explore places associated with objects.
If you haven't already, you'll either need to harvest the object
data, or unzip a pre-harvested dataset.
If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!
Some tips:
Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.
import pandas as pd
from ipyleaflet import Map, Marker, Popup, MarkerCluster
import ipywidgets as widgets
from tinydb import TinyDB, Query
from pandas import json_normalize
import altair as alt
from IPython.display import display, HTML, FileLink
# Load JSON data from file
db = TinyDB('nma_object_db.json')
records = db.all()
Object = Query()
# Convert to a dataframe
df = pd.DataFrame(records)
Events are linked to objects through the temporal
field. This field contains nested data that we need to extract and flatten so we can work with it easily. We'll use json_normalize
to extract the nested data and save each event to a new row.
# Use json_normalise() to explode the temporal into multiple rows and columns
# Then merge the exploded rows back with the original dataset using the id value
# df_dates = pd.merge(df.loc[df['temporal'].notnull()], json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id'], record_prefix='temporal_'), how='inner', on='id')
df_dates = json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id', 'title', 'additionalType'], record_prefix='temporal_')
df_dates.head()
Now instead of having one row for each object, we have one row for each object event.
How many date records do we have?
df_dates.shape
Let's extract years from the dates to make comparisons a bit easier.
# Use a regular expression to find the first four digits in the date fields
df_dates['start_year'] = df_dates['temporal_startDate'].str.extract(r'^(\d{4})').fillna(0).astype('int')
df_dates['end_year'] = df_dates['temporal_endDate'].str.extract(r'^(\d{4})').fillna(0).astype('int')
What's the earliest start_year
(greater than 0)?
df_dates.loc[df_dates['start_year'] > 0]['start_year'].min()
What is it?
earliest = df_dates.loc[df_dates.loc[df_dates['start_year'] > 0]['start_year'].idxmin()]
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(earliest['id'], earliest['title'])))
What's the latest end date?
df_dates['end_year'].max()
Oh, that doesn't look quite right! Let's look to see how many of the dates are in the future!
df_dates.loc[(df_dates['start_year'] > 2019) | (df_dates['end_year'] > 2019)]
Looks like these records need some editing.
Events are linked to objects in many different ways, they might document when the object was created, collected, or acquired by the museum. We can examine the types of relationships that have been documented between events and objects by looking in the temporal_roleName
field.
df_dates['temporal_roleName'].value_counts()
Hmmm, you can see that data entry into this field wasn't closely controlled – there are a number of minor variations in capitalisation, format and word order. For example, we have: 'Date of production', 'Date of Production', 'Production date', and 'date of production'!
Some normalisation has taken place though, because of the creation and production related events can be identified through the temporal_interactionType
field. What sorts of values does it contain?
df_dates['temporal_interactionType'].value_counts()
There's only one value – 'Production'. According to the documentation, a value of 'Production' in interactionType
indicates the event was related to the creation of the item. Let's look to see which of the values in roleName
have been aggregated by the 'Production' value.
df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')]['temporal_roleName'].value_counts()
So the temporal_interactionType
field helps us find all the creation-related events without dealing with the variations in the ways event types are described. Yay for normalisation!
Let's create a dataframe that contains just the creation dates.
df_creation_dates = df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')].copy()
df_creation_dates.shape
One other thing to note is that not every event has a start date. Some just have an end date. To make sure we have at least one date for every event, let's create a new year
column – we'll set its value to start_year
if it exists, or end_year
if not.
df_creation_dates['year'] = df_creation_dates.apply(lambda x: x['start_year'] if x['start_year'] else x['end_year'], axis=1)
Time to make a chart! Let's show how the creation events are distributed over time.
# First we'll get the number of objects per year
year_counts = df_creation_dates['year'].value_counts().to_frame().reset_index()
year_counts.columns = ['year', 'count']
# Create a bar chart (limit to years greater than 0)
alt.Chart(year_counts.loc[year_counts['year'] > 0]).mark_bar(size=2).encode(
# Year on the X axis
x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),
# Number of objects on the Y axis
y=alt.Y('count:Q', title='Number of objects'),
# Show details on hover
tooltip=[alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]
).properties(width=700)
Ok, so something interesting was happening in 1980 and 1913. Let's see if we can find out what.
In another notebook I showed how you can use the additionalType
column to find out about the types of things in the collection. Let's use it to see what types of objects were created in 1980.
Let's explode additionalType
and create a new dataframe with the results!
df_creation_dates_types = df_creation_dates.loc[df_creation_dates['additionalType'].notnull()][['id', 'title', 'year', 'additionalType']].explode('additionalType')
df_creation_dates_types.head()
Now we can filter by year to see what types of things were created in 1980.
created_1980 = df_creation_dates_types.loc[df_creation_dates_types['year'] == 1980]
created_1980.head()
Let's look at the top twenty types of things created in 1980!
created_1980['additionalType'].value_counts()[:20]
So the vast majority are either 'Engravings' or 'Mounts'. Let's look at one of the 'Engravings' in more detail.
# Filter by Engravings
created_1980.loc[created_1980['additionalType'] == 'Engravings'].head()
# Get the first item
item = created_1980.loc[created_1980['additionalType'] == 'Engravings'].iloc[0]
# Create a link to the collection db
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(item['id'], item['title'])))
If you follow the link you'll find that the engravings were created for a new publication of Banks' Florilegium.
Can you repeat this process to find out what happened in 1913?
Now that we have a dataframe that combines creation dates with object types, we can look at how the creation of particular object types changes over time. For example let's look at 'Photographs' and 'Postcards'.
# Create a dataframe containing just Photographs and Postcards -- use .isin() to filter the additionalType field
df_photos_postcards = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(['Photographs', 'Postcards']))]
# Create a stacked bar chart
alt.Chart(df_photos_postcards).mark_bar(size=3).encode(
# Year on the X axis
x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),
# Number of objects on the Y axis
y=alt.Y('count()', title='Number of objects'),
# Color according to the type
color='additionalType:N',
# Details on hover
tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]
).properties(width=700)
There's 1913 again... It's also interesting to see a shift from postcards to photos in the early decades of the 20th century.
We could add additional types to this chart, but it will get a bit confusing. Let's try another way of charting changes in the creation of the most common object types over time.
First we'll get the top twenty-five object types (which have creation dates) as a list.
# Get most common 25 values and convert to a list
top_types = df_creation_dates_types['additionalType'].value_counts()[:25].index.to_list()
top_types
Now we'll use the list of top_types
to filter the creation dates, so we only have events relating to those types og objects.
# Only include records where the additionalType value is in the list of top_types
df_top_types = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(top_types))]
# Get the counts for year / type
top_type_counts = df_top_types.groupby('year')['additionalType'].value_counts().to_frame()
top_type_counts.columns = ['count']
top_type_counts.reset_index(inplace=True)
To chart this data we're going to use circles for each point and create 'bubble lines' for each object type to show how the number of objects created varied year by year.
# Create a chart
alt.Chart(top_type_counts).mark_circle(
# Style the circles
opacity=0.8,
stroke='black',
strokeWidth=1
).encode(
# Year on the X axis
x=alt.X('year:O', axis=alt.Axis(format='c', title='Year of production', labelAngle=0)),
# Object type on the Y axis
y=alt.Y('additionalType:N', title='Object type'),
# Size of the circles represents the number of objects
size=alt.Size('count:Q',
scale=alt.Scale(range=[0, 2000]),
legend=alt.Legend(title='Number of objects')
),
# Color the circles by object type
color=alt.Color('additionalType:N', legend=None),
# More details on hover
tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:O', title='Year'), alt.Tooltip('count:Q', title='Objects', format=',')]
).properties(
width=700
)
What patterns can you see? Hover over the cricles for more information. Once again the engravings dominate, but also look at the bark paintings and cartoons, what might be happening there?
Created by Tim Sherratt for the GLAM Workbench.
Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.