New to Jupyter notebooks? Try Using Jupyter notebooks for a quick introduction.
Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove.
import os
import altair as alt
import pandas as pd
import requests
# Make sure data directory exists
os.makedirs("data", exist_ok=True)
%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv
Insert your API key between the quotes.
# This creates a variable called 'api_key', paste your key between the quotes
API_KEY = ""
# Use an api key value from environment variables if it is available (useful for testing)
if os.getenv("TROVE_API_KEY"):
API_KEY = os.getenv("TROVE_API_KEY")
# This displays a message with your key
print("Your API key is: {}".format(API_KEY))
Your API key is: gq29l1g1h75pimh4
api_search_url = "https://api.trove.nla.gov.au/v3/result"
Set up our query parameters. We want everything, so we set the q
parameter to be a single space.
params = {
"q": " ", # A space to search for everything
"facet": "format",
"category": "book",
"encoding": "json",
"n": 1,
}
headers = {"X-API-KEY": API_KEY}
response = requests.get(api_search_url, params=params)
data = response.json()
def facet_totals(data):
"""
Loop through facets saving terms and counts.
Returns a list of dictionaries.
"""
facets = []
try:
terms = data["category"][0]["facets"]["facet"][0]["term"]
except KeyError:
pass
else:
for term in terms:
facets.append({"facet": term["search"], "total": int(term["count"])})
if "term" in term:
# There be sub-terms!
for subterm in term["term"]:
facets.append(
{"facet": subterm["search"], "total": int(subterm["count"])}
)
return pd.DataFrame(facets)
facet_totals = facet_totals(data)
facet_totals.sort_values("facet")
facet | total | |
---|---|---|
21 | Archived website | 33660 |
4 | Article | 7377170 |
5 | Article/Abstract | 99 |
6 | Article/Book chapter | 67276 |
7 | Article/Conference paper | 112605 |
8 | Article/Journal or magazine article | 1971332 |
9 | Article/Other article | 4770227 |
10 | Article/Report | 466581 |
11 | Article/Review | 285937 |
12 | Article/Working paper | 73468 |
18 | Audio book | 321559 |
0 | Book | 17061706 |
1 | Book/Braille | 36613 |
2 | Book/Illustrated | 7922202 |
3 | Book/Large print | 119801 |
17 | Conference Proceedings | 483440 |
22 | Data set | 27 |
19 | Government publication | 226184 |
16 | Microform | 946703 |
13 | Periodical | 2113846 |
14 | Periodical/Journal, magazine, other | 2028483 |
15 | Periodical/Newspaper | 87122 |
20 | Thesis | 38121 |
# Assign a group by splitting
facet_totals["group"] = facet_totals["facet"].apply(lambda x: x.split("/")[0])
Now we can create a bar chart using Altair. The x
values will be the zone names, and the y
values will be the totals.
# Comment out either or both of these lines if not necessary
# Sort by total (highest to lowest) and take the top twenty
# top_facets = facet_totals.sort_values(by="total", ascending=False)[:20]
# Create a bar chart
alt.Chart(facet_totals).mark_bar().encode(
x="total:Q",
y="facet:N",
color="group:N",
tooltip=["facet:N", alt.Tooltip("total:Q", format=",")],
)
facet_totals.to_csv(f"data/facet-{params['facet']}.csv", index=False)
Once you've saved this file, you can download it from the workbench data directory.
For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see Visualise Trove newspaper searches over time.
Created by Tim Sherrratt for the GLAM workbench. Support this project by becoming a GitHub sponsor.