New to Jupyter notebooks? Try Using Jupyter notebooks for a quick introduction.

Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove.

In [1]:

import os

import altair as alt
import pandas as pd
import requests

# Make sure data directory exists
os.makedirs("data", exist_ok=True)

In [2]:

%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv

Insert your API key between the quotes.

In [3]:

# This creates a variable called 'api_key', paste your key between the quotes
API_KEY = ""

# Use an api key value from environment variables if it is available (useful for testing)
if os.getenv("TROVE_API_KEY"):
    API_KEY = os.getenv("TROVE_API_KEY")

# This displays a message with your key
print("Your API key is: {}".format(API_KEY))

Your API key is: gq29l1g1h75pimh4

In [4]:

api_search_url = "https://api.trove.nla.gov.au/v3/result"

Set up our query parameters. We want everything, so we set the q parameter to be a single space.

In [5]:

params = {
    "q": " ",  # A space to search for everything
    "facet": "format",
    "category": "book",
    "encoding": "json",
    "n": 1,
}

headers = {"X-API-KEY": API_KEY}

In [6]:

response = requests.get(api_search_url, params=params)
data = response.json()

In [7]:

def facet_totals(data):
    """
    Loop through facets saving terms and counts.
    Returns a list of dictionaries.
    """
    facets = []
    try:
        terms = data["category"][0]["facets"]["facet"][0]["term"]
    except KeyError:
        pass
    else:
        for term in terms:
            facets.append({"facet": term["search"], "total": int(term["count"])})
            if "term" in term:
                # There be sub-terms!
                for subterm in term["term"]:
                    facets.append(
                        {"facet": subterm["search"], "total": int(subterm["count"])}
                    )
    return pd.DataFrame(facets)


facet_totals = facet_totals(data)
facet_totals.sort_values("facet")

Out[7]:

	facet	total
21	Archived website	33660
4	Article	7377170
5	Article/Abstract	99
6	Article/Book chapter	67276
7	Article/Conference paper	112605
8	Article/Journal or magazine article	1971332
9	Article/Other article	4770227
10	Article/Report	466581
11	Article/Review	285937
12	Article/Working paper	73468
18	Audio book	321559
0	Book	17061706
1	Book/Braille	36613
2	Book/Illustrated	7922202
3	Book/Large print	119801
17	Conference Proceedings	483440
22	Data set	27
19	Government publication	226184
16	Microform	946703
13	Periodical	2113846
14	Periodical/Journal, magazine, other	2028483
15	Periodical/Newspaper	87122
20	Thesis	38121

In [8]:

# Assign a group by splitting
facet_totals["group"] = facet_totals["facet"].apply(lambda x: x.split("/")[0])

Now we can create a bar chart using Altair. The x values will be the zone names, and the y values will be the totals.

In [9]:

# Comment out either or both of these lines if not necessary
# Sort by total (highest to lowest) and take the top twenty
# top_facets = facet_totals.sort_values(by="total", ascending=False)[:20]

In [10]:

# Create a bar chart
alt.Chart(facet_totals).mark_bar().encode(
    x="total:Q",
    y="facet:N",
    color="group:N",
    tooltip=["facet:N", alt.Tooltip("total:Q", format=",")],
)

Out[10]:

In [11]:

facet_totals.to_csv(f"data/facet-{params['facet']}.csv", index=False)

Once you've saved this file, you can download it from the workbench data directory.

Going further¶

For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see Visualise Trove newspaper searches over time.

Created by Tim Sherrratt for the GLAM workbench. Support this project by becoming a GitHub sponsor.

Exploring facets¶

Going further¶