In this notebook we'll have a preliminary poke around in the object
data harvested from the NMA Collection API. I'll focus here on the basic shape/stats of the data, other notebooks will explore the object data over time and space.
If you haven't already, you'll either need to harvest the object
data, or unzip a pre-harvested dataset.
If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!
Some tips:
Is this thing on? If you can't edit or run any of the code cells, you might be viewing a static (read only) version of this notebook. Click here to load a live version running on Binder.
import pandas as pd
import math
from IPython.display import display, HTML, FileLink
from tinydb import TinyDB, Query
from pandas.io.json import json_normalize
# Load the harvested data from the json db
db = TinyDB('nma_object_db.json')
records = db.all()
Object = Query()
# Convert to a dataframe
df = pd.DataFrame(records)
df.head()
id | type | title | _meta | additionalType | collection | identifier | medium | extent | physicalDescription | ... | isPartOf | seeAlso | description | hasVersion | temporal | relation | hasPart | location | acknowledgement | educationalSignificance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 145400 | object | Wahlo and Tribal law by Kevin Gilbert, reprint... | {'modified': '2018-07-09', 'issued': '2011-10-... | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 251390 | object | Pair of woven shoes made from feathers and hair | {'modified': '2019-01-17', 'issued': '2018-04-... | [Shoes] | {'id': '5244', 'type': 'Collection', 'title': ... | 2000.0014.0495 | [{'type': 'Material', 'title': 'Feather'}, {'t... | {'type': 'Measurement', 'length': 260, 'width'... | Shoes, the soles of which are made from woven ... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 124081 | object | Pair of ceremonial shoes | {'modified': '2018-12-04', 'issued': '2006-10-... | NaN | {'id': '1892', 'type': 'Collection', 'title': ... | 1992.0089.0165 | [{'type': 'Material', 'title': 'Feather'}] | {'type': 'Measurement', 'length': 246, 'width'... | A pair of ceremonial shoes made with several m... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 21507 | object | Grinding stone | {'modified': '2018-06-19', 'issued': '2014-12-... | [Grinding stones] | {'id': '2229', 'type': 'Collection', 'title': ... | 1985.0288.0109 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 142308 | object | 'time CHange' [sic] | {'modified': '2019-04-15', 'issued': '2012-06-... | [Compact discs] | {'id': '3893', 'type': 'Collection', 'title': ... | AR00213.012 | NaN | NaN | A compact disc, housed within a clear and blac... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 25 columns
How many objects are there?
print('There are {:,} objects in the collection'.format(df.shape[0]))
There are 86,679 objects in the collection
Obviously not every record has a value for every field, let's create a quick count of the number of values in each field.
df.count()
id 86679 type 86679 title 86463 _meta 86679 additionalType 86652 collection 84256 identifier 86654 medium 73743 extent 64077 physicalDescription 86359 significanceStatement 32437 creator 25076 spatial 46658 contributor 40796 isAggregatedBy 4353 isPartOf 10718 seeAlso 467 description 9097 hasVersion 19845 temporal 29399 relation 3066 hasPart 2345 location 1364 acknowledgement 785 educationalSignificance 201 dtype: int64
Let's express those counts as a percentage of the total number of records, and display them as a bar chart using Pandas.
# Get field counts and convert to dataframe
field_counts = df.count().to_frame().reset_index()
# Change column headings
field_counts.columns = ['field', 'count']
# Calculate proportion of the total
field_counts['proportion'] = field_counts['count'].apply(lambda x: x / df.shape[0])
# Style the results as a barchart
field_counts.style.bar(subset=['proportion'], color='#d65f5f').format({'proportion': '{:.2%}'.format})
field | count | proportion | |
---|---|---|---|
0 | id | 86679 | 100.00% |
1 | type | 86679 | 100.00% |
2 | title | 86463 | 99.75% |
3 | _meta | 86679 | 100.00% |
4 | additionalType | 86652 | 99.97% |
5 | collection | 84256 | 97.20% |
6 | identifier | 86654 | 99.97% |
7 | medium | 73743 | 85.08% |
8 | extent | 64077 | 73.92% |
9 | physicalDescription | 86359 | 99.63% |
10 | significanceStatement | 32437 | 37.42% |
11 | creator | 25076 | 28.93% |
12 | spatial | 46658 | 53.83% |
13 | contributor | 40796 | 47.07% |
14 | isAggregatedBy | 4353 | 5.02% |
15 | isPartOf | 10718 | 12.37% |
16 | seeAlso | 467 | 0.54% |
17 | description | 9097 | 10.50% |
18 | hasVersion | 19845 | 22.89% |
19 | temporal | 29399 | 33.92% |
20 | relation | 3066 | 3.54% |
21 | hasPart | 2345 | 2.71% |
22 | location | 1364 | 1.57% |
23 | acknowledgement | 785 | 0.91% |
24 | educationalSignificance | 201 | 0.23% |
One thing you might note is that some of the fields contain nested JSON arrays or objects. For example additionalType
contains a list of object types, while extent
is a dictionary with keys and values. Let's unpack these columns for the second row (index of 1).
df['additionalType'][1][0]
'Shoes'
df['extent'][1]
{'type': 'Measurement', 'length': 260, 'width': 120, 'depth': 40, 'unitText': 'mm'}
df['extent'][1]['length']
260
additionalType
field¶How many objects have values in the additionalType
column?
df.loc[df['additionalType'].notnull()].shape
(86652, 25)
print('{:%} of objects have an additionalType value'.format(df.loc[df['additionalType'].notnull()].shape[0] / df.shape[0]))
99.968851% of objects have an additionalType value
So which ones don't have an additionalType
?
# Just show the first 5 rows
df.loc[df['additionalType'].isnull()].head()
id | type | title | _meta | additionalType | collection | identifier | medium | extent | physicalDescription | ... | isPartOf | seeAlso | description | hasVersion | temporal | relation | hasPart | location | acknowledgement | educationalSignificance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 145400 | object | Wahlo and Tribal law by Kevin Gilbert, reprint... | {'modified': '2018-07-09', 'issued': '2011-10-... | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 124081 | object | Pair of ceremonial shoes | {'modified': '2018-12-04', 'issued': '2006-10-... | NaN | {'id': '1892', 'type': 'Collection', 'title': ... | 1992.0089.0165 | [{'type': 'Material', 'title': 'Feather'}] | {'type': 'Measurement', 'length': 246, 'width'... | A pair of ceremonial shoes made with several m... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1054 | 224632 | object | Glass plate negative of family and horse stand... | {'copyright': '', 'licence': ''} | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1276 | 180161 | object | Awelye- panel 1 by Lily Kngwarreye | {'copyright': '', 'licence': ''} | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2333 | 180168 | object | Awelye- panel 5 by Lily Kngwarreye | {'copyright': '', 'licence': ''} | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 25 columns
How many rows have more than one additionalType
?
df.loc[df['additionalType'].str.len() > 1].shape[0]
1037
Let's have a look at a sample.
df.loc[df['additionalType'].str.len() > 1].head()
id | type | title | _meta | additionalType | collection | identifier | medium | extent | physicalDescription | ... | isPartOf | seeAlso | description | hasVersion | temporal | relation | hasPart | location | acknowledgement | educationalSignificance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45 | 202601 | object | Album of Newspaper clippings | {'modified': '2019-04-22', 'issued': '2010-11-... | [Albums, Newspaper clippings] | {'id': '4760', 'type': 'Collection', 'title': ... | 1989.0009.0108 | [{'type': 'Material', 'title': 'Cardboard'}, {... | {'type': 'Measurement', 'height': 345, 'width'... | A brown textured hardback album with gold colo... | ... | NaN | NaN | NaN | NaN | [{'type': 'Event', 'title': '1935', 'startDate... | NaN | NaN | NaN | NaN | NaN |
118 | 223557 | object | Receipt issued to Tirranna Race Club, 1878 | {'modified': '2019-04-23', 'issued': '2017-11-... | [Invoices, Receipts] | {'id': '6139', 'type': 'Collection', 'title': ... | 2012.0019.0170 | [{'type': 'Material', 'title': 'Ink'}, {'type'... | {'type': 'Measurement', 'height': 114, 'width'... | A receipt handwritten on a piece of grey paper... | ... | NaN | NaN | NaN | NaN | [{'type': 'Event', 'title': '1878', 'startDate... | NaN | NaN | NaN | NaN | NaN |
155 | 227915 | object | Two toned ceramic toy tea set | {'modified': '2019-05-17', 'issued': '2018-08-... | [Tea sets, Toy tea sets] | {'id': '6773', 'type': 'Collection', 'title': ... | 2013.0038.0255 | [{'type': 'Material', 'title': 'Ceramic'}, {'t... | {'type': 'Measurement', 'height': 15, 'diamete... | A hand-painted ceramic toy tea set with a blue... | ... | NaN | NaN | NaN | NaN | [{'type': 'Event', 'title': '1925 - 1935', 'st... | NaN | NaN | NaN | Donated through the Australian Government’s Cu... | NaN |
173 | 256766 | object | Handmade wolf figurine in yellow dress likely ... | {'modified': '2018-12-13', 'issued': '2018-10-... | [Novelty toys, Toys] | {'id': '6773', 'type': 'Collection', 'title': ... | 2013.0038.0556.005 | [{'type': 'Material', 'title': 'Cotton thread'... | {'type': 'Measurement', 'height': 88, 'width':... | A handmade wolf figurine robed in a yellow dre... | ... | NaN | NaN | NaN | NaN | [{'type': 'Event', 'title': '1925 - 1935', 'st... | NaN | NaN | NaN | NaN | NaN |
564 | 224635 | object | Photograph of'Freda Mitchell' | {'modified': '2019-07-01', 'issued': '2018-11-... | [Photographs, Sepia photographs] | {'id': '6339', 'type': 'Collection', 'title': ... | 2013.0062.0017.002 | [{'type': 'Material', 'title': 'Card'}, {'type... | {'type': 'Measurement', 'height': 147, 'width'... | A sepia photograph showing a young woman posin... | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 25 columns
The additionalType
field contains a nested list of values. Using json_normalize()
or explode()
we can explode these lists, creating a row for each separate value.
# Use json_normalize to expand 'additionalType' into separate rows, adding the id and title from the parent record
# df_types = json_normalize(df.loc[df['additionalType'].notnull()].to_dict('records'), record_path='additionalType', meta=['id', 'title'], errors='ignore').rename({0: 'additionalType'}, axis=1)
# In pandas v.0.25 and above you can just use explode -- this prodices the same result as above
df_types = df.loc[df['additionalType'].notnull()][['id', 'title', 'additionalType']].explode('additionalType')
df_types.head()
id | title | additionalType | |
---|---|---|---|
1 | 251390 | Pair of woven shoes made from feathers and hair | Shoes |
3 | 21507 | Grinding stone | Grinding stones |
4 | 142308 | 'time CHange' [sic] | Compact discs |
5 | 20174 | Ten Days To Live - A supposed sorcery painting. | Bark paintings |
6 | 144359 | 'The Dance of Life (1898-1902)' by Diana Boyer... | Booklets |
Now that we've exploded the type values, we can aggregate them in different ways. Let's look at the 25 most common object types!
df_types['additionalType'].value_counts()[:25]
Mineral samples 6000 Photographs 4747 Stone artefacts 4364 Photographic postcards 4250 Drawings 3759 Postcards 3697 Zoological specimens 2168 Bark paintings 2110 Geological specimens 1993 Cartoons 1535 Engravings 1495 Negatives 1124 Boomerangs 1025 Spears 1012 Percussion and abrading stones 982 Paintings 840 Clubs 747 Mounts 745 Cards 709 Armbands 649 Shells 563 Letters 542 Documents 517 Geophysical survey equipment 509 Posters 495 Name: additionalType, dtype: int64
How many object types only appear once?
type_counts = df_types['additionalType'].value_counts().to_frame().reset_index().rename({'index': 'type', 'additionalType': 'count'}, axis=1)
unique_types = type_counts.loc[type_counts['count'] == 1]
unique_types.shape[0]
639
unique_types.head()
type | count | |
---|---|---|
1852 | Genealogical charts | 1 |
1853 | Skivvies | 1 |
1854 | Shopping bags | 1 |
1855 | Jam spoons | 1 |
1856 | Architectural models | 1 |
Let's save the complete list of types as a CSV file.
type_counts.to_csv('nma_object_type_counts.csv', index=False)
display(FileLink('nma_object_type_counts.csv'))
Browsing the CSV I noticed that there was one item with the type Vegetables
. Let's find some more out about it.
# Find in the complete data set
mask = df.loc[df['additionalType'].notnull()]['additionalType'].apply(lambda x: 'Vegetables' in x)
veggie = df.loc[df['additionalType'].notnull()][mask]
veggie
id | type | title | _meta | additionalType | collection | identifier | medium | extent | physicalDescription | ... | isPartOf | seeAlso | description | hasVersion | temporal | relation | hasPart | location | acknowledgement | educationalSignificance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
63775 | 256742 | object | Wooden toy toad stalk | {'modified': '2019-04-24', 'issued': '2018-10-... | [Toys, Vegetables] | {'id': '6773', 'type': 'Collection', 'title': ... | 2013.0038.0540 | [{'type': 'Material', 'title': 'Paint - non sp... | {'type': 'Measurement', 'height': 65, 'diamete... | A painted wooden toy toad stalk with a red cap... | ... | NaN | NaN | NaN | NaN | [{'type': 'Event', 'title': '1925 - 1935', 'st... | NaN | NaN | NaN | NaN | NaN |
1 rows × 25 columns
We can create a link into the NMA Collections Explorer using the object id
.
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(veggie.iloc[0]['id'], veggie.iloc[0]['title'])))
Does a toad stool count as a vegetable?
extent
field¶The extent
field is a nested object, so once again we'll use json_normalize()
to expand it out into separate columns.
# Without reset_index() the rows are misaligned
df_extent = df.loc[df['extent'].notnull()].reset_index().join(json_normalize(df.loc[df['extent'].notnull()]['extent'].tolist()).add_prefix("extent_"))
df_extent.head()
index | id | type | title | _meta | additionalType | collection | identifier | medium | extent | ... | educationalSignificance | extent_type | extent_length | extent_width | extent_depth | extent_unitText | extent_height | extent_diameter | extent_weight | extent_unitTextWeight | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 251390 | object | Pair of woven shoes made from feathers and hair | {'modified': '2019-01-17', 'issued': '2018-04-... | [Shoes] | {'id': '5244', 'type': 'Collection', 'title': ... | 2000.0014.0495 | [{'type': 'Material', 'title': 'Feather'}, {'t... | {'type': 'Measurement', 'length': 260, 'width'... | ... | NaN | Measurement | 260.0 | 120.0 | 40.0 | mm | NaN | NaN | NaN | NaN |
1 | 2 | 124081 | object | Pair of ceremonial shoes | {'modified': '2018-12-04', 'issued': '2006-10-... | NaN | {'id': '1892', 'type': 'Collection', 'title': ... | 1992.0089.0165 | [{'type': 'Material', 'title': 'Feather'}] | {'type': 'Measurement', 'length': 246, 'width'... | ... | NaN | Measurement | 246.0 | 190.0 | 45.0 | mm | NaN | NaN | NaN | NaN |
2 | 5 | 20174 | object | Ten Days To Live - A supposed sorcery painting. | {'modified': '2019-04-21', 'issued': '2013-06-... | [Bark paintings] | {'id': '2202', 'type': 'Collection', 'title': ... | 1985.0246.0077 | [{'type': 'Material', 'title': 'Bark'}, {'type... | {'type': 'Measurement', 'length': 574, 'width'... | ... | NaN | Measurement | 574.0 | 185.0 | NaN | mm | NaN | NaN | NaN | NaN |
3 | 6 | 144359 | object | 'The Dance of Life (1898-1902)' by Diana Boyer... | {'modified': '2018-06-18', 'issued': '2012-06-... | [Booklets] | {'id': '3893', 'type': 'Collection', 'title': ... | 2008.0043.0022.001 | [{'type': 'Material', 'title': 'Paper'}, {'typ... | {'type': 'Measurement', 'height': 214, 'width'... | ... | NaN | Measurement | NaN | 150.0 | 5.0 | mm | 214.0 | NaN | NaN | NaN |
4 | 8 | 42084 | object | Child's drawing by Lester Moran, Cabbage Tree ... | {'modified': '2019-04-07', 'issued': '2016-10-... | [Drawings] | {'id': '2261', 'type': 'Collection', 'title': ... | 1991.0024.0027 | [{'type': 'Material', 'title': 'Paint - non sp... | {'type': 'Measurement', 'length': 560, 'width'... | ... | NaN | Measurement | 560.0 | 380.0 | 0.5 | mm | NaN | NaN | NaN | NaN |
5 rows × 35 columns
Let's check to see what types of things are in the extent
field.
df_extent['extent_type'].value_counts()
Measurement 64077 Name: extent_type, dtype: int64
So they're all measurements. Let's have a look at the units being used.
df_extent['extent_unitText'].value_counts()
mm 63382 MM 10 cm 9 m 5 Name: extent_unitText, dtype: int64
df_extent['extent_unitTextWeight'].value_counts()
g 1473 kg 209 lb 5 oz 4 tonne 1 Name: extent_unitTextWeight, dtype: int64
Hmmm, are those measurements really in metres, or might they be meant to be 'mm'? Let's have a look at them.
df_extent.loc[df_extent['extent_unitText'] == 'm'][['id', 'title', 'extent_length', 'extent_width', 'extent_unitText']]
id | title | extent_length | extent_width | extent_unitText | |
---|---|---|---|---|---|
16781 | 202783 | The Percival Project, Gull Twelve, in a manill... | NaN | 230.0 | m |
18291 | 214193 | Extension tube | 55.0000 | NaN | m |
41612 | 123962 | Gunter's chain | 20.1168 | NaN | m |
47232 | 171768 | Fair Breeze | NaN | 138.0 | m |
56789 | 257184 | Fishing line inside envelope | 137.0000 | 110.0 | m |
Other than 'Gunter's chain' it looks like the unit should indeed by 'mm'. We'll need to take that into account in calculations.
Now let's convert all the measurements into a single unit – millimetre for lengths, and gram for weights.
def conversion_factor(unit):
'''
Get the factor required to convery current unit to either mm or g.
'''
factors = {
'mm': 1,
'cm': 10,
'm': 1, # Most should in fact be mm (see above)
'g': 1,
'kg': 1000,
'tonne': 1000000,
'oz': 28.35,
'lb': 453.592
}
try:
factor = factors[unit.lower()]
except KeyError:
factor = 0
return factor
def normalise_measurements(row):
'''
Convert measurements to standard units.
'''
l_factor = conversion_factor(str(row['extent_unitText']))
length = row['extent_length'] * l_factor
width = row['extent_width'] * l_factor
depth = row['extent_depth'] * l_factor
height = row['extent_height'] * l_factor
diameter = row['extent_diameter'] * l_factor
w_factor = conversion_factor(str(row['extent_unitTextWeight']))
weight = row['extent_weight'] * w_factor
return pd.Series([length, width, depth, height, diameter, weight])
# Add normalised measurements to the dataframe
df_extent[['length_mm', 'width_mm', 'depth_mm', 'height_mm', 'diameter_mm', 'weight_g']] = df_extent.apply(normalise_measurements, axis=1)
df_extent.head()
index | id | type | title | _meta | additionalType | collection | identifier | medium | extent | ... | extent_height | extent_diameter | extent_weight | extent_unitTextWeight | length_mm | width_mm | depth_mm | height_mm | diameter_mm | weight_g | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 251390 | object | Pair of woven shoes made from feathers and hair | {'modified': '2019-01-17', 'issued': '2018-04-... | [Shoes] | {'id': '5244', 'type': 'Collection', 'title': ... | 2000.0014.0495 | [{'type': 'Material', 'title': 'Feather'}, {'t... | {'type': 'Measurement', 'length': 260, 'width'... | ... | NaN | NaN | NaN | NaN | 260.0 | 120.0 | 40.0 | NaN | NaN | NaN |
1 | 2 | 124081 | object | Pair of ceremonial shoes | {'modified': '2018-12-04', 'issued': '2006-10-... | NaN | {'id': '1892', 'type': 'Collection', 'title': ... | 1992.0089.0165 | [{'type': 'Material', 'title': 'Feather'}] | {'type': 'Measurement', 'length': 246, 'width'... | ... | NaN | NaN | NaN | NaN | 246.0 | 190.0 | 45.0 | NaN | NaN | NaN |
2 | 5 | 20174 | object | Ten Days To Live - A supposed sorcery painting. | {'modified': '2019-04-21', 'issued': '2013-06-... | [Bark paintings] | {'id': '2202', 'type': 'Collection', 'title': ... | 1985.0246.0077 | [{'type': 'Material', 'title': 'Bark'}, {'type... | {'type': 'Measurement', 'length': 574, 'width'... | ... | NaN | NaN | NaN | NaN | 574.0 | 185.0 | NaN | NaN | NaN | NaN |
3 | 6 | 144359 | object | 'The Dance of Life (1898-1902)' by Diana Boyer... | {'modified': '2018-06-18', 'issued': '2012-06-... | [Booklets] | {'id': '3893', 'type': 'Collection', 'title': ... | 2008.0043.0022.001 | [{'type': 'Material', 'title': 'Paper'}, {'typ... | {'type': 'Measurement', 'height': 214, 'width'... | ... | 214.0 | NaN | NaN | NaN | NaN | 150.0 | 5.0 | 214.0 | NaN | NaN |
4 | 8 | 42084 | object | Child's drawing by Lester Moran, Cabbage Tree ... | {'modified': '2019-04-07', 'issued': '2016-10-... | [Drawings] | {'id': '2261', 'type': 'Collection', 'title': ... | 1991.0024.0027 | [{'type': 'Material', 'title': 'Paint - non sp... | {'type': 'Measurement', 'length': 560, 'width'... | ... | NaN | NaN | NaN | NaN | 560.0 | 380.0 | 0.5 | NaN | NaN | NaN |
5 rows × 41 columns
def calculate_volume(row):
'''
Look for 3 linear dimensions and multiply them to get a volume.
'''
# Create a list of valid linear measurements from the available fields
dimensions = [d for d in [row['length_mm'], row['width_mm'], row['depth_mm'], row['height_mm'], row['diameter_mm']] if not math.isnan(d)]
# If there's only 2 dimensions...
if len(dimensions) == 2:
# Set a default height of 1 for items with only 2 dimensions
dimensions.append(1)
# If there's 3 or more dimensions, multiple the first 3 together
if len(dimensions) >= 3:
volume = dimensions[0] * dimensions[1] * dimensions[2]
else:
volume = 0
return volume
df_extent['volume'] = df_extent.apply(calculate_volume, axis=1)
print('Total length of objects is {:.2f} km'.format(df_extent['length_mm'].sum() / 1000 / 1000))
Total length of objects is 15.36 km
print('Total weight of objects is {:.2f} tonnes'.format(df_extent['weight_g'].sum() / 1000000))
Total weight of objects is 194.30 tonnes
print('Total volume of objects is {:.2f} m\N{SUPERSCRIPT THREE}'.format(df_extent['volume'].sum() / 1000000000))
Total volume of objects is 2873.14 m³
What's the biggest thing?
# Get the object with the largest volume
biggest = df_extent.loc[df_extent['volume'].idxmax()]
# Create a link to Collection Explorer
display(HTML('<a href="http://collectionsearch.nma.gov.au/?object={}">{}</a>'.format(biggest['id'], biggest['title'])))
Created by Tim Sherratt for the GLAM Workbench.
Work on this notebook was supported by the Humanities, Arts and Social Sciences (HASS) Data Enhanced Virtual Lab.