In this notebook we'll explore the temporal dimensions of the object
data. When were objects created, collected, or used? To do that we'll extract the nested temporal data, see what's there, and create a few charts.
See here for an introduction to the object
data, and here to explore places associated with objects.
If you haven't already, you'll either need to harvest the object
data, or unzip a pre-harvested dataset.
If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!
Some tips:
Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.
import pandas as pd
from ipyleaflet import Map, Marker, Popup, MarkerCluster
import ipywidgets as widgets
from tinydb import TinyDB, Query
from pandas import json_normalize
import altair as alt
from IPython.display import display, HTML, FileLink
# Load JSON data from file
db = TinyDB('nma_object_db.json')
records = db.all()
Object = Query()
# Convert to a dataframe
df = pd.DataFrame(records)
Events are linked to objects through the temporal
field. This field contains nested data that we need to extract and flatten so we can work with it easily. We'll use json_normalize
to extract the nested data and save each event to a new row.
# Use json_normalise() to explode the temporal into multiple rows and columns
# Then merge the exploded rows back with the original dataset using the id value
# df_dates = pd.merge(df.loc[df['temporal'].notnull()], json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id'], record_prefix='temporal_'), how='inner', on='id')
df_dates = json_normalize(df.loc[df['temporal'].notnull()].to_dict('records'), record_path='temporal', meta=['id', 'title', 'additionalType'], record_prefix='temporal_')
df_dates.head()
temporal_type | temporal_title | temporal_startDate | temporal_endDate | temporal_interactionType | temporal_roleName | temporal_description | id | title | additionalType | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Event | 21 February 2009 | 2009-02-21 | 2009-02-21 | Production | NaN | NaN | 195843 | Reproduction cartoon titled 'Better than the b... | [Political cartoons] |
1 | Event | June 1908 | 1908-06 | 1908-06 | NaN | Date of use | NaN | 31257 | Kind Regards From Newtown | [Postcards] |
2 | Event | 26 January 1982 | 1982-01-26 | 1982-01-26 | NaN | Associated date | NaN | 135579 | Protests during the campaign to save the Frank... | [Photographs] |
3 | Event | 1926 | 1926 | 1926 | NaN | Date acquired by donor | by Australian Institute of Anatomy | 6840 | Spinning top | [Centre of gravity toys] |
4 | Event | 1872 | 1872 | 1872 | NaN | Associated date | NaN | 251967 | Financial document from Tirranna Picnic Race C... | [Financial records] |
Now instead of having one row for each object, we have one row for each object event.
How many date records do we have?
df_dates.shape
(39219, 10)
Let's extract years from the dates to make comparisons a bit easier.
# Use a regular expression to find the first four digits in the date fields
df_dates['start_year'] = df_dates['temporal_startDate'].str.extract(r'^(\d{4})').fillna(0).astype('int')
df_dates['end_year'] = df_dates['temporal_endDate'].str.extract(r'^(\d{4})').fillna(0).astype('int')
What's the earliest start_year
(greater than 0)?
df_dates.loc[df_dates['start_year'] > 0]['start_year'].min()
1001
What is it?
earliest = df_dates.loc[df_dates.loc[df_dates['start_year'] > 0]['start_year'].idxmin()]
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(earliest['id'], earliest['title'])))
What's the latest end date?
df_dates['end_year'].max()
2992
Oh, that doesn't look quite right! Let's look to see how many of the dates are in the future!
df_dates.loc[(df_dates['start_year'] > 2019) | (df_dates['end_year'] > 2019)]
temporal_type | temporal_title | temporal_startDate | temporal_endDate | temporal_interactionType | temporal_roleName | temporal_description | id | title | additionalType | start_year | end_year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5787 | Event | 17 September 2082 | 2082-09-17 | 2082-09-17 | Production | NaN | NaN | 213266 | Courtroom sketch 'NT Ranger, Mr. Roth.' by Ver... | [Courtroom drawings] | 2082 | 2082 |
6360 | Event | 7 January 2085 | 2085-01-07 | 2085-01-07 | Production | NaN | NaN | 195336 | Woven basket with feathers and ochre | [Baskets] | 2085 | 2085 |
12505 | Event | 20 March 2085 | 2085-03-20 | 2085-03-20 | Production | NaN | NaN | 146492 | Feathered stick with handle | [Ornaments] | 2085 | 2085 |
23828 | Event | 12 December 2992 | 2992-12-12 | 2992-12-12 | NaN | Associated date | NaN | 67099 | Souvenir beaker - Princess Anne | [Commemorative mugs] | 2992 | 2992 |
Looks like these records need some editing.
Events are linked to objects in many different ways, they might document when the object was created, collected, or acquired by the museum. We can examine the types of relationships that have been documented between events and objects by looking in the temporal_roleName
field.
df_dates['temporal_roleName'].value_counts()
Date of publication 5022 Associated date 4015 Date made 3950 Date of event 2995 Associated period 2973 Date collected 2503 Date of voyage 2477 Date photographed 1979 Period of use 1706 Date created 1473 Date of production 1030 Date of use 936 Date of issue 857 Date acquired by donor 544 Date acquired by NMA 451 Date written 422 Date of work 399 Date compiled 201 Date worn 198 Date drawn 162 Date of Event 139 Date acquired 128 Content created 120 Date posted 116 Date of purchase 78 Date awarded 76 Date printed 70 Date presented 69 Production date 58 Date designed 47 Date of death 24 Date painted 20 Date of restoration 18 Date of conversion 14 Date reprinted 12 Date of correspondence 10 Date of birth 9 Date built 9 Date of patent 9 Date of Publication 7 date created 6 date of publication 5 Date Acquired 4 Date of Production 4 date of production 2 Date of Correspondence 2 Date repographed 1 date of correspondence 1 Date reproduced 1 date made 1 Period of Use 1 Date of Work 1 Associated Period 1 date painted 1 Name: temporal_roleName, dtype: int64
Hmmm, you can see that data entry into this field wasn't closely controlled – there are a number of minor variations in capitalisation, format and word order. For example, we have: 'Date of production', 'Date of Production', 'Production date', and 'date of production'!
Some normalisation has taken place though, because of the creation and production related events can be identified through the temporal_interactionType
field. What sorts of values does it contain?
df_dates['temporal_interactionType'].value_counts()
Production 18012 Name: temporal_interactionType, dtype: int64
There's only one value – 'Production'. According to the documentation, a value of 'Production' in interactionType
indicates the event was related to the creation of the item. Let's look to see which of the values in roleName
have been aggregated by the 'Production' value.
df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')]['temporal_roleName'].value_counts()
Date of publication 5016 Date made 3950 Date photographed 1761 Date created 1473 Date of production 1030 Date of issue 674 Date written 422 Date of work 374 Date compiled 201 Date drawn 162 Content created 120 Date posted 116 Date printed 70 Production date 58 Date designed 47 Date painted 20 Date of restoration 18 Date of conversion 14 Date reprinted 12 Date of correspondence 10 Date of patent 9 Date of Publication 7 date created 6 date of publication 5 Date of Production 4 Date of Correspondence 2 date of production 2 date of correspondence 1 Date repographed 1 date made 1 Date reproduced 1 Date of Work 1 date painted 1 Name: temporal_roleName, dtype: int64
So the temporal_interactionType
field helps us find all the creation-related events without dealing with the variations in the ways event types are described. Yay for normalisation!
Let's create a dataframe that contains just the creation dates.
df_creation_dates = df_dates.loc[(df_dates['temporal_interactionType'] == 'Production')].copy()
df_creation_dates.shape
(18012, 12)
One other thing to note is that not every event has a start date. Some just have an end date. To make sure we have at least one date for every event, let's create a new year
column – we'll set its value to start_year
if it exists, or end_year
if not.
df_creation_dates['year'] = df_creation_dates.apply(lambda x: x['start_year'] if x['start_year'] else x['end_year'], axis=1)
Time to make a chart! Let's show how the creation events are distributed over time.
# First we'll get the number of objects per year
year_counts = df_creation_dates['year'].value_counts().to_frame().reset_index()
year_counts.columns = ['year', 'count']
# Create a bar chart (limit to years greater than 0)
alt.Chart(year_counts.loc[year_counts['year'] > 0]).mark_bar(size=2).encode(
# Year on the X axis
x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),
# Number of objects on the Y axis
y=alt.Y('count:Q', title='Number of objects'),
# Show details on hover
tooltip=[alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]
).properties(width=700)
Ok, so something interesting was happening in 1980 and 1913. Let's see if we can find out what.
In another notebook I showed how you can use the additionalType
column to find out about the types of things in the collection. Let's use it to see what types of objects were created in 1980.
Let's explode additionalType
and create a new dataframe with the results!
df_creation_dates_types = df_creation_dates.loc[df_creation_dates['additionalType'].notnull()][['id', 'title', 'year', 'additionalType']].explode('additionalType')
df_creation_dates_types.head()
id | title | year | additionalType | |
---|---|---|---|---|
0 | 195843 | Reproduction cartoon titled 'Better than the b... | 2009 | Political cartoons |
5 | 59924 | Walka design from Ernabella | 1954 | Acrylic paintings |
8 | 33064 | Wonderland city, Sydney, 1908 | 1906 | Photographic postcards |
10 | 19877 | Cylindrical hollow wood pipe with protruding bowl | 1973 | Smoking pipes |
12 | 124027 | Oak oil stone | 1790 | Sharpening stones |
Now we can filter by year to see what types of things were created in 1980.
created_1980 = df_creation_dates_types.loc[df_creation_dates_types['year'] == 1980]
created_1980.head()
id | title | year | additionalType | |
---|---|---|---|---|
16 | 166857 | Spergularia media | 1980 | Mounts |
28 | 166221 | Centranthera cochinchinensis | 1980 | Engravings |
36 | 167935 | Carpha alpina var. schoenoides | 1980 | Mounts |
60 | 166367 | Persoonia levis | 1980 | Engravings |
79 | 165539 | Triumfetta repens | 1980 | Engravings |
Let's look at the top twenty types of things created in 1980!
created_1980['additionalType'].value_counts()[:20]
Engravings 1486 Mounts 743 Folders 100 Lists 42 Notes 36 Boxes 35 Technical notes 34 Cartoons 5 Paintings 4 Placards 3 Journals 3 Storybooks 2 Advertising posters 2 Jugs 2 Books 2 Botanical drawings 2 Passes 2 Textbooks 2 Netballs 1 Event posters 1 Name: additionalType, dtype: int64
So the vast majority are either 'Engravings' or 'Mounts'. Let's look at one of the 'Engravings' in more detail.
# Filter by Engravings
created_1980.loc[created_1980['additionalType'] == 'Engravings'].head()
id | title | year | additionalType | |
---|---|---|---|---|
28 | 166221 | Centranthera cochinchinensis | 1980 | Engravings |
60 | 166367 | Persoonia levis | 1980 | Engravings |
79 | 165539 | Triumfetta repens | 1980 | Engravings |
155 | 167443 | Hibiscus tiliaceus subsp. hastatus Malvaceae | 1980 | Engravings |
195 | 167685 | Lecanthus solandri | 1980 | Engravings |
# Get the first item
item = created_1980.loc[created_1980['additionalType'] == 'Engravings'].iloc[0]
# Create a link to the collection db
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(item['id'], item['title'])))
If you follow the link you'll find that the engravings were created for a new publication of Banks' Florilegium.
Can you repeat this process to find out what happened in 1913?
Now that we have a dataframe that combines creation dates with object types, we can look at how the creation of particular object types changes over time. For example let's look at 'Photographs' and 'Postcards'.
# Create a dataframe containing just Photographs and Postcards -- use .isin() to filter the additionalType field
df_photos_postcards = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(['Photographs', 'Postcards']))]
# Create a stacked bar chart
alt.Chart(df_photos_postcards).mark_bar(size=3).encode(
# Year on the X axis
x=alt.X('year:Q', axis=alt.Axis(format='c', title='Year of production')),
# Number of objects on the Y axis
y=alt.Y('count()', title='Number of objects'),
# Color according to the type
color='additionalType:N',
# Details on hover
tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:Q', title='Year'), alt.Tooltip('count():Q', title='Objects', format=',')]
).properties(width=700)
There's 1913 again... It's also interesting to see a shift from postcards to photos in the early decades of the 20th century.
We could add additional types to this chart, but it will get a bit confusing. Let's try another way of charting changes in the creation of the most common object types over time.
First we'll get the top twenty-five object types (which have creation dates) as a list.
# Get most common 25 values and convert to a list
top_types = df_creation_dates_types['additionalType'].value_counts()[:25].index.to_list()
top_types
['Engravings', 'Bark paintings', 'Cartoons', 'Negatives', 'Mounts', 'Photographs', 'Paintings', 'Prints', 'Drawings', 'Photographic postcards', 'Acrylic paintings', 'Letters', 'Books', 'Photographic slides', 'Postcards', 'Courtroom drawings', 'Glass plate negatives', 'Cards', 'Botanical specimens', 'Prize certificates', 'Collecting cards', 'Posters', 'Sculptures', 'Portrait photographs', 'Telegrams']
Now we'll use the list of top_types
to filter the creation dates, so we only have events relating to those types og objects.
# Only include records where the additionalType value is in the list of top_types
df_top_types = df_creation_dates_types.loc[(df_creation_dates_types['year'] > 0) & (df_creation_dates_types['additionalType'].isin(top_types))]
# Get the counts for year / type
top_type_counts = df_top_types.groupby('year')['additionalType'].value_counts().to_frame()
top_type_counts.columns = ['count']
top_type_counts.reset_index(inplace=True)
To chart this data we're going to use circles for each point and create 'bubble lines' for each object type to show how the number of objects created varied year by year.
# Create a chart
alt.Chart(top_type_counts).mark_circle(
# Style the circles
opacity=0.8,
stroke='black',
strokeWidth=1
).encode(
# Year on the X axis
x=alt.X('year:O', axis=alt.Axis(format='c', title='Year of production', labelAngle=0)),
# Object type on the Y axis
y=alt.Y('additionalType:N', title='Object type'),
# Size of the circles represents the number of objects
size=alt.Size('count:Q',
scale=alt.Scale(range=[0, 2000]),
legend=alt.Legend(title='Number of objects')
),
# Color the circles by object type
color=alt.Color('additionalType:N', legend=None),
# More details on hover
tooltip=[alt.Tooltip('additionalType:N', title='Type'), alt.Tooltip('year:O', title='Year'), alt.Tooltip('count:Q', title='Objects', format=',')]
).properties(
width=700
)
What patterns can you see? Hover over the cricles for more information. Once again the engravings dominate, but also look at the bark paintings and cartoons, what might be happening there?
Created by Tim Sherratt for the GLAM Workbench.
Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.