Exploring object records¶

In this notebook we'll have a preliminary poke around in the object data harvested from the NMA Collection API. I'll focus here on the basic shape/stats of the data, other notebooks will explore the object data over time and space.

If you haven't already, you'll either need to harvest the object data, or unzip a pre-harvested dataset.

The shape of the data
Nested data
The additionalType field
The extent field
How big is the collection?
The biggest object?

If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!

Some tips:

Code cells have boxes around them.
To run a code cell click on the cell and then hit Shift+Enter. The Shift+Enter combo will also move you to the next cell, so it's a quick way to work through the notebook.
While a cell is running a * appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.
In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.
To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.

Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.

Import what we need¶

In [34]:

import pandas as pd
import math
from IPython.display import display, HTML, FileLink
from tinydb import TinyDB, Query
from pandas.io.json import json_normalize

Load the harvested data¶

In [35]:

# Load the harvested data from the json db
db = TinyDB('nma_object_db.json')
records = db.all()
Object = Query()

In [36]:

# Convert to a dataframe
df = pd.DataFrame(records)
df.head()

Out[36]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	location	acknowledgement	educationalSignificance
0	145400	object	Wahlo and Tribal law by Kevin Gilbert, reprint...	{'modified': '2018-07-09', 'issued': '2011-10-...	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	Shoes, the soles of which are made from woven ...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	A pair of ceremonial shoes made with several m...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	21507	object	Grinding stone	{'modified': '2018-06-19', 'issued': '2014-12-...	[Grinding stones]	{'id': '2229', 'type': 'Collection', 'title': ...	1985.0288.0109	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	142308	object	'time CHange' [sic]	{'modified': '2019-04-15', 'issued': '2012-06-...	[Compact discs]	{'id': '3893', 'type': 'Collection', 'title': ...	AR00213.012	NaN	NaN	A compact disc, housed within a clear and blac...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

The shape of the data¶

How many objects are there?

In [37]:

print('There are {:,} objects in the collection'.format(df.shape[0]))

There are 86,679 objects in the collection

Obviously not every record has a value for every field, let's create a quick count of the number of values in each field.

In [38]:

df.count()

Out[38]:

id                         86679
type                       86679
title                      86463
_meta                      86679
additionalType             86652
collection                 84256
identifier                 86654
medium                     73743
extent                     64077
physicalDescription        86359
significanceStatement      32437
creator                    25076
spatial                    46658
contributor                40796
isAggregatedBy              4353
isPartOf                   10718
seeAlso                      467
description                 9097
hasVersion                 19845
temporal                   29399
relation                    3066
hasPart                     2345
location                    1364
acknowledgement              785
educationalSignificance      201
dtype: int64

Let's express those counts as a percentage of the total number of records, and display them as a bar chart using Pandas.

In [39]:

# Get field counts and convert to dataframe
field_counts = df.count().to_frame().reset_index()

# Change column headings
field_counts.columns = ['field', 'count']

# Calculate proportion of the total
field_counts['proportion'] = field_counts['count'].apply(lambda x: x / df.shape[0])

# Style the results as a barchart
field_counts.style.bar(subset=['proportion'], color='#d65f5f').format({'proportion': '{:.2%}'.format})

Out[39]:

	field	count	proportion
0	id	86679	100.00%
1	type	86679	100.00%
2	title	86463	99.75%
3	_meta	86679	100.00%
4	additionalType	86652	99.97%
5	collection	84256	97.20%
6	identifier	86654	99.97%
7	medium	73743	85.08%
8	extent	64077	73.92%
9	physicalDescription	86359	99.63%
10	significanceStatement	32437	37.42%
11	creator	25076	28.93%
12	spatial	46658	53.83%
13	contributor	40796	47.07%
14	isAggregatedBy	4353	5.02%
15	isPartOf	10718	12.37%
16	seeAlso	467	0.54%
17	description	9097	10.50%
18	hasVersion	19845	22.89%
19	temporal	29399	33.92%
20	relation	3066	3.54%
21	hasPart	2345	2.71%
22	location	1364	1.57%
23	acknowledgement	785	0.91%
24	educationalSignificance	201	0.23%

Nested data¶

One thing you might note is that some of the fields contain nested JSON arrays or objects. For example additionalType contains a list of object types, while extent is a dictionary with keys and values. Let's unpack these columns for the second row (index of 1).

In [40]:

df['additionalType'][1][0]

Out[40]:

'Shoes'

In [41]:

df['extent'][1]

Out[41]:

{'type': 'Measurement',
 'length': 260,
 'width': 120,
 'depth': 40,
 'unitText': 'mm'}

In [42]:

df['extent'][1]['length']

Out[42]:

The `additionalType` field¶

How many objects have values in the additionalType column?

In [43]:

df.loc[df['additionalType'].notnull()].shape

Out[43]:

(86652, 25)

In [44]:

print('{:%} of objects have an additionalType value'.format(df.loc[df['additionalType'].notnull()].shape[0] / df.shape[0]))

99.968851% of objects have an additionalType value

So which ones don't have an additionalType?

In [45]:

# Just show the first 5 rows
df.loc[df['additionalType'].isnull()].head()

Out[45]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	location	acknowledgement	educationalSignificance
0	145400	object	Wahlo and Tribal law by Kevin Gilbert, reprint...	{'modified': '2018-07-09', 'issued': '2011-10-...	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	A pair of ceremonial shoes made with several m...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1054	224632	object	Glass plate negative of family and horse stand...	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1276	180161	object	Awelye- panel 1 by Lily Kngwarreye	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2333	180168	object	Awelye- panel 5 by Lily Kngwarreye	{'copyright': '', 'licence': ''}	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

How many rows have more than one additionalType?

In [46]:

df.loc[df['additionalType'].str.len() > 1].shape[0]

Out[46]:

Let's have a look at a sample.

In [47]:

df.loc[df['additionalType'].str.len() > 1].head()

Out[47]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	location	acknowledgement	educationalSignificance
45	202601	object	Album of Newspaper clippings	{'modified': '2019-04-22', 'issued': '2010-11-...	[Albums, Newspaper clippings]	{'id': '4760', 'type': 'Collection', 'title': ...	1989.0009.0108	[{'type': 'Material', 'title': 'Cardboard'}, {...	{'type': 'Measurement', 'height': 345, 'width'...	A brown textured hardback album with gold colo...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1935', 'startDate...	NaN	NaN	NaN	NaN	NaN
118	223557	object	Receipt issued to Tirranna Race Club, 1878	{'modified': '2019-04-23', 'issued': '2017-11-...	[Invoices, Receipts]	{'id': '6139', 'type': 'Collection', 'title': ...	2012.0019.0170	[{'type': 'Material', 'title': 'Ink'}, {'type'...	{'type': 'Measurement', 'height': 114, 'width'...	A receipt handwritten on a piece of grey paper...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1878', 'startDate...	NaN	NaN	NaN	NaN	NaN
155	227915	object	Two toned ceramic toy tea set	{'modified': '2019-05-17', 'issued': '2018-08-...	[Tea sets, Toy tea sets]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0255	[{'type': 'Material', 'title': 'Ceramic'}, {'t...	{'type': 'Measurement', 'height': 15, 'diamete...	A hand-painted ceramic toy tea set with a blue...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	Donated through the Australian Government’s Cu...	NaN
173	256766	object	Handmade wolf figurine in yellow dress likely ...	{'modified': '2018-12-13', 'issued': '2018-10-...	[Novelty toys, Toys]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0556.005	[{'type': 'Material', 'title': 'Cotton thread'...	{'type': 'Measurement', 'height': 88, 'width':...	A handmade wolf figurine robed in a yellow dre...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	NaN	NaN
564	224635	object	Photograph of'Freda Mitchell'	{'modified': '2019-07-01', 'issued': '2018-11-...	[Photographs, Sepia photographs]	{'id': '6339', 'type': 'Collection', 'title': ...	2013.0062.0017.002	[{'type': 'Material', 'title': 'Card'}, {'type...	{'type': 'Measurement', 'height': 147, 'width'...	A sepia photograph showing a young woman posin...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 25 columns

The additionalType field contains a nested list of values. Using json_normalize() or explode() we can explode these lists, creating a row for each separate value.

In [48]:

# Use json_normalize to expand 'additionalType' into separate rows, adding the id and title from the parent record
# df_types = json_normalize(df.loc[df['additionalType'].notnull()].to_dict('records'), record_path='additionalType', meta=['id', 'title'], errors='ignore').rename({0: 'additionalType'}, axis=1)

# In pandas v.0.25 and above you can just use explode -- this prodices the same result as above
df_types = df.loc[df['additionalType'].notnull()][['id', 'title', 'additionalType']].explode('additionalType')

df_types.head()

Out[48]:

	id	title	additionalType
1	251390	Pair of woven shoes made from feathers and hair	Shoes
3	21507	Grinding stone	Grinding stones
4	142308	'time CHange' [sic]	Compact discs
5	20174	Ten Days To Live - A supposed sorcery painting.	Bark paintings
6	144359	'The Dance of Life (1898-1902)' by Diana Boyer...	Booklets

Now that we've exploded the type values, we can aggregate them in different ways. Let's look at the 25 most common object types!

In [49]:

df_types['additionalType'].value_counts()[:25]

Out[49]:

Mineral samples                   6000
Photographs                       4747
Stone artefacts                   4364
Photographic postcards            4250
Drawings                          3759
Postcards                         3697
Zoological specimens              2168
Bark paintings                    2110
Geological specimens              1993
Cartoons                          1535
Engravings                        1495
Negatives                         1124
Boomerangs                        1025
Spears                            1012
Percussion and abrading stones     982
Paintings                          840
Clubs                              747
Mounts                             745
Cards                              709
Armbands                           649
Shells                             563
Letters                            542
Documents                          517
Geophysical survey equipment       509
Posters                            495
Name: additionalType, dtype: int64

How many object types only appear once?

In [50]:

type_counts = df_types['additionalType'].value_counts().to_frame().reset_index().rename({'index': 'type', 'additionalType': 'count'}, axis=1)
unique_types = type_counts.loc[type_counts['count'] == 1]
unique_types.shape[0]

Out[50]:

In [51]:

unique_types.head()

Out[51]:

	type	count
1852	Genealogical charts	1
1853	Skivvies	1
1854	Shopping bags	1
1855	Jam spoons	1
1856	Architectural models	1

Let's save the complete list of types as a CSV file.

In [52]:

type_counts.to_csv('nma_object_type_counts.csv', index=False)
display(FileLink('nma_object_type_counts.csv'))

nma_object_type_counts.csv

Browsing the CSV I noticed that there was one item with the type Vegetables. Let's find some more out about it.

In [53]:

# Find in the complete data set
mask = df.loc[df['additionalType'].notnull()]['additionalType'].apply(lambda x: 'Vegetables' in x)
veggie = df.loc[df['additionalType'].notnull()][mask]
veggie

Out[53]:

	id	type	title	_meta	additionalType	collection	identifier	medium	extent	physicalDescription	...	isPartOf	seeAlso	description	hasVersion	temporal	relation	hasPart	location	acknowledgement	educationalSignificance
63775	256742	object	Wooden toy toad stalk	{'modified': '2019-04-24', 'issued': '2018-10-...	[Toys, Vegetables]	{'id': '6773', 'type': 'Collection', 'title': ...	2013.0038.0540	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'height': 65, 'diamete...	A painted wooden toy toad stalk with a red cap...	...	NaN	NaN	NaN	NaN	[{'type': 'Event', 'title': '1925 - 1935', 'st...	NaN	NaN	NaN	NaN	NaN

1 rows × 25 columns

We can create a link into the NMA Collections Explorer using the object id.

In [54]:

display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(veggie.iloc[0]['id'], veggie.iloc[0]['title'])))

Wooden toy toad stalk

Does a toad stool count as a vegetable?

The `extent` field¶

The extent field is a nested object, so once again we'll use json_normalize() to expand it out into separate columns.

In [55]:

# Without reset_index() the rows are misaligned
df_extent = df.loc[df['extent'].notnull()].reset_index().join(json_normalize(df.loc[df['extent'].notnull()]['extent'].tolist()).add_prefix("extent_"))
df_extent.head()

Out[55]:

	index	id	type	title	_meta	additionalType	collection	identifier	medium	extent	...	educationalSignificance	extent_type	extent_length	extent_width	extent_depth	extent_unitText	extent_height	extent_diameter	extent_weight	extent_unitTextWeight
0	1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	...	NaN	Measurement	260.0	120.0	40.0	mm	NaN	NaN	NaN	NaN
1	2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	...	NaN	Measurement	246.0	190.0	45.0	mm	NaN	NaN	NaN	NaN
2	5	20174	object	Ten Days To Live - A supposed sorcery painting.	{'modified': '2019-04-21', 'issued': '2013-06-...	[Bark paintings]	{'id': '2202', 'type': 'Collection', 'title': ...	1985.0246.0077	[{'type': 'Material', 'title': 'Bark'}, {'type...	{'type': 'Measurement', 'length': 574, 'width'...	...	NaN	Measurement	574.0	185.0	NaN	mm	NaN	NaN	NaN	NaN
3	6	144359	object	'The Dance of Life (1898-1902)' by Diana Boyer...	{'modified': '2018-06-18', 'issued': '2012-06-...	[Booklets]	{'id': '3893', 'type': 'Collection', 'title': ...	2008.0043.0022.001	[{'type': 'Material', 'title': 'Paper'}, {'typ...	{'type': 'Measurement', 'height': 214, 'width'...	...	NaN	Measurement	NaN	150.0	5.0	mm	214.0	NaN	NaN	NaN
4	8	42084	object	Child's drawing by Lester Moran, Cabbage Tree ...	{'modified': '2019-04-07', 'issued': '2016-10-...	[Drawings]	{'id': '2261', 'type': 'Collection', 'title': ...	1991.0024.0027	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'length': 560, 'width'...	...	NaN	Measurement	560.0	380.0	0.5	mm	NaN	NaN	NaN	NaN

5 rows × 35 columns

Let's check to see what types of things are in the extent field.

In [56]:

df_extent['extent_type'].value_counts()

Out[56]:

Measurement    64077
Name: extent_type, dtype: int64

So they're all measurements. Let's have a look at the units being used.

In [57]:

df_extent['extent_unitText'].value_counts()

Out[57]:

mm    63382
MM       10
cm        9
m         5
Name: extent_unitText, dtype: int64

In [58]:

df_extent['extent_unitTextWeight'].value_counts()

Out[58]:

g        1473
kg        209
lb          5
oz          4
tonne       1
Name: extent_unitTextWeight, dtype: int64

Hmmm, are those measurements really in metres, or might they be meant to be 'mm'? Let's have a look at them.

In [59]:

df_extent.loc[df_extent['extent_unitText'] == 'm'][['id', 'title', 'extent_length', 'extent_width', 'extent_unitText']]

Out[59]:

	id	title	extent_length	extent_width	extent_unitText
16781	202783	The Percival Project, Gull Twelve, in a manill...	NaN	230.0	m
18291	214193	Extension tube	55.0000	NaN	m
41612	123962	Gunter's chain	20.1168	NaN	m
47232	171768	Fair Breeze	NaN	138.0	m
56789	257184	Fishing line inside envelope	137.0000	110.0	m

Other than 'Gunter's chain' it looks like the unit should indeed by 'mm'. We'll need to take that into account in calculations.

Now let's convert all the measurements into a single unit – millimetre for lengths, and gram for weights.

In [60]:

def conversion_factor(unit):
    '''
    Get the factor required to convery current unit to either mm or g.
    '''
    factors = {
        'mm': 1,
        'cm': 10,
        'm': 1, # Most should in fact be mm (see above)
        'g': 1,
        'kg': 1000,
        'tonne': 1000000,
        'oz': 28.35,
        'lb': 453.592
    }
    try:
        factor = factors[unit.lower()]
    except KeyError:
        factor = 0 
    return factor

def normalise_measurements(row):
    '''
    Convert measurements to standard units.
    '''
    l_factor = conversion_factor(str(row['extent_unitText']))
    length = row['extent_length'] * l_factor
    width = row['extent_width'] * l_factor
    depth = row['extent_depth'] * l_factor
    height = row['extent_height'] * l_factor
    diameter = row['extent_diameter'] * l_factor
    w_factor = conversion_factor(str(row['extent_unitTextWeight']))
    weight = row['extent_weight'] * w_factor
    return pd.Series([length, width, depth, height, diameter, weight])

# Add normalised measurements to the dataframe
df_extent[['length_mm', 'width_mm', 'depth_mm', 'height_mm', 'diameter_mm', 'weight_g']] = df_extent.apply(normalise_measurements, axis=1)

In [61]:

df_extent.head()

Out[61]:

	index	id	type	title	_meta	additionalType	collection	identifier	medium	extent	...	extent_height	extent_diameter	extent_weight	extent_unitTextWeight	length_mm	width_mm	depth_mm	height_mm	diameter_mm	weight_g
0	1	251390	object	Pair of woven shoes made from feathers and hair	{'modified': '2019-01-17', 'issued': '2018-04-...	[Shoes]	{'id': '5244', 'type': 'Collection', 'title': ...	2000.0014.0495	[{'type': 'Material', 'title': 'Feather'}, {'t...	{'type': 'Measurement', 'length': 260, 'width'...	...	NaN	NaN	NaN	NaN	260.0	120.0	40.0	NaN	NaN	NaN
1	2	124081	object	Pair of ceremonial shoes	{'modified': '2018-12-04', 'issued': '2006-10-...	NaN	{'id': '1892', 'type': 'Collection', 'title': ...	1992.0089.0165	[{'type': 'Material', 'title': 'Feather'}]	{'type': 'Measurement', 'length': 246, 'width'...	...	NaN	NaN	NaN	NaN	246.0	190.0	45.0	NaN	NaN	NaN
2	5	20174	object	Ten Days To Live - A supposed sorcery painting.	{'modified': '2019-04-21', 'issued': '2013-06-...	[Bark paintings]	{'id': '2202', 'type': 'Collection', 'title': ...	1985.0246.0077	[{'type': 'Material', 'title': 'Bark'}, {'type...	{'type': 'Measurement', 'length': 574, 'width'...	...	NaN	NaN	NaN	NaN	574.0	185.0	NaN	NaN	NaN	NaN
3	6	144359	object	'The Dance of Life (1898-1902)' by Diana Boyer...	{'modified': '2018-06-18', 'issued': '2012-06-...	[Booklets]	{'id': '3893', 'type': 'Collection', 'title': ...	2008.0043.0022.001	[{'type': 'Material', 'title': 'Paper'}, {'typ...	{'type': 'Measurement', 'height': 214, 'width'...	...	214.0	NaN	NaN	NaN	NaN	150.0	5.0	214.0	NaN	NaN
4	8	42084	object	Child's drawing by Lester Moran, Cabbage Tree ...	{'modified': '2019-04-07', 'issued': '2016-10-...	[Drawings]	{'id': '2261', 'type': 'Collection', 'title': ...	1991.0024.0027	[{'type': 'Material', 'title': 'Paint - non sp...	{'type': 'Measurement', 'length': 560, 'width'...	...	NaN	NaN	NaN	NaN	560.0	380.0	0.5	NaN	NaN	NaN

5 rows × 41 columns

How big is the collection?¶

In [62]:

def calculate_volume(row):
    '''
    Look for 3 linear dimensions and multiply them to get a volume.
    '''
    # Create a list of valid linear measurements from the available fields
    dimensions = [d for d in [row['length_mm'], row['width_mm'], row['depth_mm'], row['height_mm'], row['diameter_mm']] if not math.isnan(d)]
    
    # If there's only 2 dimensions...
    if len(dimensions) == 2:
        # Set a default height of 1 for items with only 2 dimensions
        dimensions.append(1)
        
    # If there's 3 or more dimensions, multiple the first 3 together
    if len(dimensions) >= 3:
        volume = dimensions[0] * dimensions[1] * dimensions[2]
    else:
        volume = 0
    return volume

df_extent['volume'] = df_extent.apply(calculate_volume, axis=1)

In [63]:

print('Total length of objects is {:.2f} km'.format(df_extent['length_mm'].sum() / 1000 / 1000))

Total length of objects is 15.36 km

In [64]:

print('Total weight of objects is {:.2f} tonnes'.format(df_extent['weight_g'].sum() / 1000000))

Total weight of objects is 194.30 tonnes

In [65]:

print('Total volume of objects is {:.2f} m\N{SUPERSCRIPT THREE}'.format(df_extent['volume'].sum() / 1000000000))

Total volume of objects is 2873.14 m³

The biggest object?¶

What's the biggest thing?

In [66]:

# Get the object with the largest volume
biggest = df_extent.loc[df_extent['volume'].idxmax()]

# Create a link to Collection Explorer
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(biggest['id'], biggest['title'])))

Percival Proctor Mk 1 monoplane VH-FEP

Created by Tim Sherratt for the GLAM Workbench.

Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.

In [ ]:

Exploring object records¶

Import what we need¶

Load the harvested data¶

The shape of the data¶

Nested data¶

The additionalType field¶

The extent field¶

How big is the collection?¶

The biggest object?¶

The `additionalType` field¶

The `extent` field¶