Accessing Europeana IIIF APIs

Europeana IIIF APIs, allows us to download, share, and reuse images and text of Europeana newspapers.

This notebook introduces how to explore the repository, search, read a record, obtain the fulltext and create a CSV dataset.

Europeana IIIF APIs requires an API key to access the endpoints. Please register at to get a key.

Setting up things

In [ ]:
import requests, csv
import json
import pandas as pd

Glogal configuration

In this section, we can add our api_key, the text that we want to use to search and retrieve the elements, and the number of records to retrieve.

In [ ]:
api_key = 'add_your_api' #J6W44jvPV
query = 'paris'

Performing a search using the API

The API allows us to search on text and retrieve the hits highlighted, as traditional systems (e.g. Lucene and Solr).

In [ ]:
url = ''
r = requests.get(url, params = {'query': query, 'profile': 'hits', 'wskey': api_key })
response = r.text

Displaying the mentions in the transcribed text where the search keyword was found

In [ ]:
results = json.loads(response)

for r in results['hits']:
    print('id:' + r['scope'])
    for s in r['selectors']:
        print(s.get('prefix', '') + s.get('exact', '') + s.get('suffix', ''))

Creating a CSV file

In [ ]:
csv_out = csv.writer(open('eu_records.csv', 'w'), delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)
csv_out.writerow(['title', 'thumbnail', 'date', 'license', 'typem', 'language', 'fulltextUrl', 'manifestUrl', 'fulltext'])

Retrieving the manifests

A manifest describes the information needed for a viewer to present a digital object to the user, such as the title and the sequence of views/images. We can also retrieve the manifest of each item. According to the Europeana documentation, the request follows the pattern[RECORD_ID]/manifest

The manifest includes the metadata, some of the attribues are multivalued.

The full text is available at,10o

In [ ]:
results = json.loads(response)

for r in results['hits']:
    title = thumbnail = date = license = typem = language = fulltextUrl = manifestUrl = fulltext =''
    manifestUrl = '' + r['scope'] + '/manifest'
    responseManifest = requests.get(manifestUrl, params = {'wskey': api_key })
    # retrieving the metadata
    m = json.loads(responseManifest.text)
    # retrieving metadata
    title = m['label'][0]['@value']
    thumbnail = m['thumbnail']['@id']
    date = m['navDate']
    license = m['license']

    for i in m['metadata']:
        if i['label'] == 'type':
            typem = i['value'][0]['@value']
        elif i['label'] == 'language':
            language = i['value'][0]['@value']
        else: pass
    ## getting the full text
    annopageUrl = '' + r['scope'] + '/annopage/1'
    responseAnnopage = requests.get(annopageUrl, params = {'wskey': api_key })
    a = json.loads(responseAnnopage.text)
    fulltextUrl = a['resources'][0]['resource']['@id']
    responseFulltext = requests.get(fulltextUrl, params = {'wskey': api_key })
    # retrieving the metadata
    f = json.loads(responseFulltext.text)
    # TODO check encoding
    fulltext = f['value']
    csv_out.writerow([title, thumbnail, date, license, typem, language, fulltextUrl, manifestUrl, fulltext])
In [ ]:
# Load the CSV file from GitHub.
# This puts the data in a Pandas DataFrame
df = pd.read_csv('eu_records.csv')

Have a peek

In [ ]:

Once we have queried the repository and we have the metadata as a CSV file, let's show the results as a thumbnail gallery.

In [ ]:
from IPython.display import HTML, Image

def _src_from_data(data):
    """Base64 encodes image bytes for inclusion in an HTML img element"""
    img_obj = Image(data=data)
    for bundle in img_obj._repr_mimebundle_():
        for mimetype, b64value in bundle.items():
            if mimetype.startswith('image/'):
                return f'data:{mimetype};base64,{b64value}'

def gallery(images, row_height='auto'):
    """Shows a set of images in a gallery that flexes with the width of the notebook.
    images: list of str or bytes
        URLs or bytes of images to display

    row_height: str
        CSS height value to assign to all images. Set to 'auto' by default to show images
        with their native dimensions. Set to a value like '250px' to make all rows
        in the gallery equal height.
    figures = []
    for image in images:
        if isinstance(image, bytes):
            src = _src_from_data(image)
            caption = ''
            src = image
            caption = f'<figcaption style="font-size: 0.6em">{image}</figcaption>'
            <figure style="margin: 5px !important;">
              <img src="{src}" style="height: {row_height}">
    return HTML(data=f'''
        <div style="display: flex; flex-flow: row wrap; text-align: center;">
In [ ]:
#gallery(urls, row_height='150px')
gallery(df['thumbnail'], row_height='150px')
In [ ]:
In [ ]: