In this section we're going to learn how to send a request for information to the Trove API.
API requests are just like normal urls. However, instead of sending us back a web page, they deliver data in a form that computers can understand. We can then use that data in our own programs.
We're going to use the Python Requests module to handle our API queries, so let's import it now.
# Make the Requests module available
import requests
Any requests you make to the Trove API need to be authenticated with a 'key'. For non-commercial projects, you just fill out a simple form and your API key is generated instantly. Follow the instructions in the Trove Help to obtain your own Trove API Key.
Once you've created a key, you can access it at any time on the 'For developers' tab of your Trove user profile.
Copy your API key now, and paste it in the cell below, between the quotes.
# This creates a variable called 'api_key', paste your key between the quotes
api_key = ''
# This displays a message with your key
print('Your API key is: {}'.format(api_key))
All search queries to the Trove API start with the same base url. We'll save it as a variable here.
# Create a variable called 'api_search_url' and give it a value
api_search_url = 'https://api.trove.nla.gov.au/result'
Trove API queries are constructed by adding parameters to the base url. Most of the parameters are optional, but a few are mandatory:
q
– 'q' for query, this is where search terms gozone
– which Trove zone (or zones) do you want to search, use 'all' for everythingkey
– your Trove API keyIf you don't want to specify a search term, you can just use a space or a plus sign – ' ' or '+' – as the value for q
. Of course, this means that you're asking for everything, so Trove might take a bit longer to respond.
We'll meet some other parameters later, but for now let's create a Python dictionary to store our basic parameters. The requests
library will take this dictionary, turn it into a string, and add it to the base url.
For our first API request we're going to search Trove's digitised newspapers, so we'll assign the value 'newspaper' to the zone
parameter. Feel free to edit the q
value to search for something that interests you.
# This creates a dictionary called 'params' and sets values for the API's mandatory parameters
params = {
'q': 'cyclone', # Search for this keyword -- feel free to change!
'zone': 'newspaper', # Search in the newspaper zone
'key': api_key
}
The default output of the API is XML. For most applications it's easier to work with JSON. You set this using the encoding
parameter. Let's add this into params
and view the result.
# This adds a value for 'encoding' to our dictionary
params['encoding'] = 'json'
# Let's view the updated dictionary
params
Ok, we're now now ready to make our first query!
# This sends our request to the Trove API and stores the result in a variable called 'response'
response = requests.get(api_search_url, params=params)
# This shows us the url that's sent to the API
print('Here\'s the formatted url that gets sent to the Trove API:\n{}\n'.format(response.url))
# This checks the status code of the response to make sure there were no errors
if response.status_code == requests.codes.ok:
print('All ok')
elif response.status_code == 403:
print('There was an authentication error. Did you paste your API above?')
else:
print('There was a problem. Error code: {}'.format(response.status_code))
print('Try running this cell again.')
See how requests
has taken our parameters and turned them into a string with '&' between each one?
The url above is live – try clicking on it to see the raw results from Trove.
The response
variable contains all the data returned to us by the Trove API. Let's get it out in a usable form.
# Get the Trove API's JSON results and make them available as a Python variable called 'data'
data = response.json()
# Let's prettify the raw JSON data and then display it.
# We're using the Pygments library to add some colour to the output, so we need to import it
import json
from pygments import highlight, lexers, formatters
# This uses Python's JSON module to output the results as nicely indented text
formatted_data = json.dumps(data, indent=2)
# This colours the text
highlighted_data = highlight(formatted_data, lexers.JsonLexer(), formatters.TerminalFormatter())
# And now display the results
print(highlighted_data)
As you can see, the API results are fairly complex. Individual item records are quite deeply nested. In a future section we'll explore this structure in more detail. But for now, let's run a simple script to display the basic details of each of our matching articles.
# Loop through all the newspaper articles
# The articles themselves are quite deeply nested, so we have to go down several levels to get them
for article in data['response']['zone'][0]['records']['article']:
# Display a string containing the date, title, newspaper, and page for each article
print('{}, "{}", {}, page {}'.format(article['date'], article['heading'], article['title']['value'], article['page']))
You've made your first Trove API request. Now let's move on to learn a bit about Trove's zones.