This notebook queries the OpenAIRE HTTP API for the project(s) a publication was produced in. It takes a DOI as input which is used to retrieve the publication's metadata via the API's /publications
endpoint and checks if there is a 'isProducedBy'
relation to a project. If that is the case, the project's ID is used to query the API via its /projects
endpoint and the title, call identifier and funded amount of the project are printed.
# Prerequisites:
import requests # dependency for making HTTP calls
from benedict import benedict # dependency for dealing with json
The input for this notebook is a DOI, e.g. '10.1007/978-3-030-74296-6_19
'.
# input parameter
example_doi="10.1007/978-3-030-74296-6_19"
We use it to query the OpenAIRE HTTP API for the specified publication and its metadata.
# OpenAIRE endpoint to query for publications
OPENAIRE_API_PUBLICATIONS = "https://api.openaire.eu/search/publications"
# query OpenAIRE for a specific publication
def query_openaire_for_publication(doi):
params = {'doi': doi, 'format': "json"}
response = requests.get(url=OPENAIRE_API_PUBLICATIONS,
params=params)
response.raise_for_status()
result=response.json()
return result
# ---- example execution
pub_response=query_openaire_for_publication(example_doi)
From the complete response we get from the API, we extract the metadata for the specified publication.
If the metadata contains a reference to a project within the list of relations ('rels'
), then extract the project's ID.
# extract the metadata about the publication from the response
path_to_result='response.results.result[0].metadata.oaf:entity.oaf:result'
oaf_result=benedict.from_json(pub_response).get(path_to_result, {})
# extract the metadata about relations
# and check for each rel, if it is pointing to a project
rels=oaf_result.get('rels.rel') or []
is_rel_to_project = lambda rel: rel['to']['@class']=="isProducedBy" and rel['to']['@type']=="project"
# unfortunately the json data is inconsistently modeled:
# if there is one rel for a publication, it is a json object
# if there are multiple rels for a publication, they form a json list
if isinstance(rels, list):
project_ids=[rel['to']['$'] for rel in rels if is_rel_to_project(rel)]
else:
project_ids= [rels['to']['$']] if is_rel_to_project(rels) else []
print(project_ids)
['corda__h2020::c6af905285a4bcd97a2fdf7cadc3cf3a']
For each project ID, we query the OpenAIRE HTTP API via its /projects
endpoint for the project's metadata.
# OpenAIRE endpoint to query for projects
OPENAIRE_API_PROJECTS = "https://api.openaire.eu/search/projects"
# query OpenAIRE for a specific project
def query_openaire_for_project(openaire_project_id):
params = {'openaireProjectID': openaire_project_id, 'format': "json"}
response = requests.get(url=OPENAIRE_API_PROJECTS,
params=params)
response.raise_for_status()
result=response.json()
return result
# ---- example execution
project_responses=[query_openaire_for_project(project_id) for project_id in project_ids]
Let's extract and print each project's title, code, call identifier and funded amount.
def extract_data_from_project(project_response):
path_to_project='response.results.result[0].metadata.oaf:entity.oaf:project'
oaf_project=benedict.from_json(project_response).get(path_to_project, {})
title=oaf_project.get('title.$')
code=oaf_project.get('code.$')
callidentifier=oaf_project.get('callidentifier.$')
fundedamount=oaf_project.get('fundedamount.$')
currency=oaf_project.get('currency.$')
return title, code, callidentifier, f"{fundedamount} {currency}"
# ---- example execution
if (not project_responses):
print("No projects associated with publication")
for project in project_responses:
title, code, callidentifier, fundedamount = extract_data_from_project(project)
print("Project data:")
print(f" code: {code}\n title: {title}\n callidentifier: {callidentifier}\n fundedamount:{fundedamount}\n")
Project data: code: 819536 title: Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communication callidentifier: ERC-2018-COG fundedamount:1996250.0 EUR