FREYA Logo FREYA WP2 User Story 1 As a data center, I want to see the citations of publications that use my repository for the underlying data, so that I can demonstrate the impact of our repository.

It is important for repositories of scientific data to monitor and report on the impact of the data they store. One useful proxy of that impact are citations of publications accompanying the deposited data.

This notebook uses the DataCite GraphQL API to retrieve data (a.k.a. works) and their citations from three different repositories: PANGAEA, DRYAD and Global Biodiversity Information Facility, using polarstern, butterfly and Lake Malawi as example queries respectively.

Goal: By the end of this notebook you should be able to:

  • Retrieve works for a chosen repository and query, along with associated metrics such as citation, view and download counts;
  • Visualise the work counts over time, e.g.
  • Present the works in a tabular format and download them in a single BibTeX file;
  • For a given work, retrieve all the citations, present them in a tabular format and then download them in a single BibTeX file.

Install libraries and prepare GraphQL client

In [2]:
%%capture
# Install required Python packages
!pip install gql requests numpy pandas
In [3]:
# Prepare the GraphQL client
import requests
from IPython.display import display, Markdown
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

_transport = RequestsHTTPTransport(
    url='https://api.datacite.org/graphql',
    use_json=True,
)

client = Client(
    transport=_transport,
    fetch_schema_from_transport=True,
)

Define and run the GraphQL query

Define the GraphQL to find all works from PANGAEA, DRYAD and Global Biodiversity Information Facility (GBIF) repositories using keywords: polarstern, butterfly and Lake Malawi respectively.

In [4]:
# Generate the GraphQL query
query_params = {
    "pangaea_repository" : "pangaea.repository",
    "pangaea_keyword" : "polarstern",
    "dryad_repository" : "dryad.dryad",
    "dryad_keyword" : "butterfly",
    "gbif_repository" : "gbif.gbif",
    "gbif_keyword" : "Lake Malawi", 
}

query = gql("""query getWorksByRepositoryAndKeyword(
    $pangaea_repository: ID!, $pangaea_keyword: String!,
    $dryad_repository: ID!, $dryad_keyword: String!
    $gbif_repository: ID!, $gbif_keyword: String!
    )
{
  pangaea: repository(id: $pangaea_repository) {
    id
    name
    citationCount
    works(query: $pangaea_keyword) {
      totalCount
      published {
        title
        count
      }
      nodes {
        id
        type
        publicationYear
        bibtex
        titles {
          title
        }
        citationCount
        viewCount
        downloadCount
      }
    }
  },
  dryad: repository(id: $dryad_repository) {
    id
    name
    citationCount
    works(query: $dryad_keyword) {
      totalCount
      published {
        title
        count
      }
      nodes {
        id
        type
        publicationYear
        bibtex
        titles {
          title
        }
        citationCount
        viewCount
        downloadCount
      }
    }
  },
  gbif: repository(id: $gbif_repository) {
    id
    name
    citationCount
    works(query: $gbif_keyword) {
      totalCount
      published {
        title
        count
      }
      nodes {
        id
        type
        publicationYear
        bibtex
        titles {
          title
        }
        citationCount
        viewCount
        downloadCount
      }
    }
  }
}
""")

Run the above query via the GraphQL client

In [5]:
import json
data = client.execute(query, variable_values=json.dumps(query_params))

Display the number of works

For each repository, display the total number of works matching the respective query.

In [6]:
# Get the total number of datasets matching the query
works = {}
for repo in ['pangaea', 'dryad', 'gbif']:
    works[repo] = data[repo]['works']
    print("The number of works for query '%s' in repository %s:\n%s" % (query_params['%s_keyword' % repo], data[repo]['name'], str(works[repo]['totalCount'])))
The number of works for query 'polarstern' in repository PANGAEA:
19564
The number of works for query 'butterfly' in repository DRYAD:
530
The number of works for query 'Lake Malawi' in repository Global Biodiversity Information Facility:
6613

Display the number of citations of the works

For each repository, display the total number of citations of works matching the respective query.

In [7]:
# Get the total number of citations per repository
for repo in ['pangaea', 'dryad', 'gbif']:
    print("The total number of citations for repository %s:\n%s" % (data[repo]['name'], str(data[repo]['citationCount'])))
    
The total number of citations for repository PANGAEA:
0
The total number of citations for repository DRYAD:
0
The total number of citations for repository Global Biodiversity Information Facility:
0

Plot the number of works per year

For each repository, display a bar plot showing the counts of works matching the respective query, across years.

In [9]:
# Plot the total number of datasets to date, by year
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import numpy as np

for repo in ['pangaea', 'dryad','gbif']:
    works = data[repo]['works']
    name = data[repo]['name']
    plt.rcdefaults()
    sorted_years = sorted([int(s['title']) for s in works['published']])
    num_outputs4sorted_years = [s['count'] for s in works['published']]
    # Get a list of all consecutive years between min and max year (inclusive)
    all_years = list(range(sorted_years[0], sorted_years[-1]))
    # Populate output counts (into num_counts) for all consecutive years
    num_outputs = []
    for year in all_years:
        if year in sorted_years:
            idx = sorted_years.index(year)
            num_outputs.append(num_outputs4sorted_years[idx])
        else:
            num_outputs.append(0)     

    fig, ax = plt.subplots(1, 1, figsize = (10, 5))
    x_pos = np.arange(len(all_years))
    ax.bar(x_pos, num_outputs, align='center', color='blue', edgecolor='black', linewidth=1, alpha=0.5)
    ax.set_xticks(x_pos)
    ax.set_xticklabels(all_years, rotation='vertical')
    ax.set_ylabel('Number of works per Year')
    ax.set_xlabel('Year')
    ax.set_title("Number of works retrieved via query '%s' from %s" % (query_params["%s_keyword" % repo], name))
    plt.show()

Display works in tabular format

For each repository and query, display the works in a html table, including the number of their citations, views and downloads.

In [10]:
from IPython.core.display import display, HTML

# Get details for each output
for repo in ['pangaea', 'dryad','gbif']:
    works = data[repo]['works']
    name = data[repo]['name']
    outputs = [['ID','Type','Publication Year','Titles','Number of Citations', 'Number of Views', 'Number of Downloads']]
    for r in works['nodes']:
        id = '<a href="%s">%s</a></html>' % (r['id'], '/'.join(r['id'].split("/")[3:]))
        titles = '; '.join([s['title'] for s in r['titles']])
        output = [id, r['type'], str(r['publicationYear']), titles, str(r['citationCount']), str(r['viewCount']), str(r['downloadCount'])]
        outputs += [output]
    
    # Display outputs as html table 
    html_table = '<html><table><caption><b>"%s" works from %s</b></caption>' % (query_params["%s_keyword" % repo], name)  
    html_table += '<tr><th style="text-align:center;">' + '</th><th style="text-align:center;">'.join(outputs[0]) + '</th></tr>'
    for row in outputs[1:]:
        html_table += '<tr><td style="text-align:left;">' + '</td><td style="text-align:left;">'.join(row) + '</td></tr>'
    html_table += '</table></html>'
    display(HTML(html_table))
"polarstern" works from PANGAEA
IDTypePublication YearTitlesNumber of CitationsNumber of ViewsNumber of Downloads
10.1594/pangaea.52064Dataset1997Radionuclides and silicate measured on water bottle samples during POLARSTERN cruise ANT-X/6, supplement to: Friedrich, Jana (1997): Polonium-210 und Blei-210 im Südpolarmeer: Natürliche Tracer für biologische und hydrographische Prozesse im Oberflächenwasser des Antarktischen Zirkumpolarstroms und des Weddellmeeres (Polonium-210 and Lead-210 in the Southern Polar Ocean: Naturally occurring tracers of biological and hydrographical tracers of biological and hydrographical processes in the surface waters of the Antarctic Circumpolar Current and the Weddell Sea). Berichte zur Polarforschung = Reports on Polar Research, 235, 155 pp000
10.1594/pangaea.54265Dataset1993Lead and aluminum in Atlantic surface water measured along the track of POLARSTERN cruise ANT-VIII/7 and ANT-IX/1 (Tables 2, 3), supplement to: Helmers, Eckard; Rutgers van der Loeff, Michiel M (1993): Lead and aluminum in Atlantic surface water (50°N to 50°S) reflecting anthropogenic and natural sources in the eolian transport. Journal of Geophysical Research: Oceans, 98(C11), 20261-20273000
10.1594/pangaea.54605Dataset1997Organic geochemistry on surface sediments from the Laptev Sea, supplement to: Fahl, Kirsten; Stein, Ruediger (1997): Modern organic carbon deposition in the Laptev Sea and the adjacent continental slope: surface water productivity vs. terrigenous input. Organic Geochemistry, 26(5-6), 379-390000
10.1594/pangaea.55750Dataset2001Isotope tracers (Delta δ¹³C) in Weddell Sea water during POLARSTERN cruise ANT-XII/3, supplement to: Mackensen, Andreas (2001): Oxygen and carbon stable isotope tracers of Weddell Sea water masses: new data and some paleoceanographic implications. Deep Sea Research Part I: Oceanographic Research Papers, 48(6), 1401-1422000
10.1594/pangaea.56218Dataset1999Lipid composition of particulate matter measured on water bottle samples during POLARSTERN cruise ARK-XI/1, supplement to: Alexandrova, Olga A; Shevchenko, Vladimir P; Fahl, Kirsten; Stein, Ruediger (1999): The lipid composition of particulate matter from the transitional zone between Kara and Laptev Seas. In: Stein, R; Fahl, K; Ivanov, G I; Levitan, M A; Tarasov, G (eds.), Modern and Late Quaternary depositional environment of the St. Anna Trough area, northern Kara Sea, Reports on Polar Research, 244 pp, 342, 93-102000
10.1594/pangaea.60056Dataset2007Radionuclides measured in surface water samples from the South Atlantic, supplement to: Hegner, Ernst; Dauelsberg, Hans-Jürgen; Rutgers van der Loeff, Michiel M; Jeandel, Catherine; de Baar, Hein J W (2007): Nd isotopes constrain the origin of suspended particles in the Atlantic sector of the Southern Ocean. Geochemistry, Geophysics, Geosystems, 8, Q10008000
10.1594/pangaea.314690Dataset2003Effects of varying food-availability on ecology and distribution of smallest benthic organisms in sediments of the arctic Fram Strait during POLARSTERN cruise ARK-XV/2, supplement to: Schewe, Ingo; Soltwedel, Thomas (2003): Benthic response to ice-edge-induced particle flux in the Arctic Ocean. Polar Biology, 26(9), 610-62010300
10.1594/pangaea.370797Dataset2005Terrain model of the Håkon Mosby Mud Volcano (10 m grid size), supplement to: Beyer, Andreas; Rathlau, Rike; Schenke, Hans Werner (2005): Multibeam bathymetry of the Håkon Mosby Mud Volcano. Marine Geophysical Research, 26, 61-75000
10.1594/pangaea.370808Dataset2006Terrain model of the eastern slope of the Porcupine Seabight (50 m grid size), supplement to: Beyer, Andreas; Schenke, Hans Werner; Klenke, Martin; Niederjasper, Fred (2003): High resolution bathymetry of the eastern slope of the Porcupine Seabight. Marine Geology, 198(1-2), 27-54000
10.1594/pangaea.472287Collection2006Standard meteorological measurements on board of POLARSTERN during expedition ANT-IV (PS08, 4 cruises, south summer 1985/86) in the Atlantic and Weddell Sea, Antarctica000
10.1594/pangaea.526589Dataset2006Digital terrain model (DTM) of the central Fram Strait, supplement to: Klenke, Martin; Schenke, Hans Werner (2002): A new bathymetric model for the central Fram Strait. Marine Geophysical Research, 23(4), 367-378000
10.1594/pangaea.526941Collection2006Mapping of the Eltanin impact area in the South-East Pacific000
10.1594/pangaea.536199Collection2006Clay minerals in the Norwegian Sea and Fram Strait, investigation from sediment traps and cores, supplement to: Berner, Heinrich (1991): Mechanismen der Sedimentbildung in der Framstrasse, im Arktischen Ozean und in der Norwegischen See. Berichte aus dem Fachbereich Geowissenschaften der Universität Bremen, 20, 167 pp000
10.1594/pangaea.552177Collection1990Sedimentological and paleomagnetic investigations on two gravity cores and ODP sites 113-690 and 693 off Kapp Norvegia, Antarctic continental margin, supplement to: Grobe, Hannes; Fütterer, Dieter K; Spieß, Volkhard (1990): Oligocene to Quaternary sedimentation processes on the Antarctic continental margin, ODP Leg 113, Site 693. In: Barker, PF; Kennett, JP; et al. (eds.), Proceedings of the Ocean Drilling Program, Scientific Results, College Station, TX (Ocean Drilling Program), 113, 121-131000
10.1594/pangaea.557786Collection2006AWI Bathymetric Chart of the Fram Strait (BCFS) (Scale 1:100,000)000
10.1594/pangaea.610160Collection2007Polyfluorinated alkyl substances (PFAS) in high-volume air samples collected during Polarstern expedition ANT-XXIII/1, supplement to: Jahnke, Annika; Berger, Urs; Ebinghaus, Ralf; Temme, Christian (2007): Latitudinal gradient of airborne polyfluorinated alkyl substances in the marine atmosphere between Germany and South Africa (53° N-33° S). Environmental Science and Technology, 41(9), 3055 -3061000
10.1594/pangaea.646297Collection2007Abundance of copepods from multinet samples during POLARSTERN cruise ANT-XXII/2 (ISPOL), supplement to: Schnack-Schiel, Sigrid B; Michels, Jan; Mizdalski, Elke; Schodlok, Michael P; Schröder, Michael (2008): Composition and community structure of zooplankton in the sea ice covered western Weddell Sea in spring 2004 - with emphasis on calanoid copepods. Deep Sea Research Part II: Topical Studies in Oceanography, 55(8-9), 1040-1055000
10.1594/pangaea.693818Collection1998Sediments in arctic sea ice - entrainment, characterization and quantification, supplement to: Lindemann, Frank (1998): Sedimente im arktischen Meereis - Eintrag, Charakterisierung und Quantifizierung (Sediments in arctic sea ice - entrainment, characterization and quantification). Berichte zur Polarforschung = Reports on Polar Research, 283, 124 pp000
10.1594/pangaea.695350Collection2008Clay mineralogy of sediment cores from the Arctic Ocean, supplement to: Vogt, Christoph; Knies, Jochen (2008): Sediment dynamics in the Eurasian Arctic Ocean during the last deglaciation - The clay mineral group smectite perspective. Marine Geology, 250(3-4), 211-222000
10.1594/pangaea.701279Dataset2005Continuous VM-ADCP (vessel-mounted acoustic Doppler current profiler) profiles of horizontal velocities and raw acoustic gain control data during Polarstern cruise ANT-XVIII/2, supplement to: Cisewski, Boris; Strass, Volker H; Prandke, Hartmut (2005): Upper-ocean vertical mixing in the Antarctic Polar Front Zone. Deep Sea Research Part II: Topical Studies in Oceanography, 52(9-10), 1087-1108000
10.1594/pangaea.701344Collection2001Sedimentology and age determinations on core PS2813-1, supplement to: Holz, Christine (2001): Glazialmarine Sedimentationsprozesse am Kontinentalhang des westlichen Weddellmeeres. Diploma Thesis, Geologisch-Paläontologisches Institut der Christian-Albrechts-Universität zu Kiel, Germany, 70 pp000
10.1594/pangaea.701354Collection2008Heavy mineral investigations from surface sediments of the East Greenland continental margin, supplement to: Grafenauer, Ingo (1998): Terrigener Sedimenteintrag am Ostgrönländischen Kontinentalrand - Rekonstruktion anhand von Schwermineraldaten. Diploma Thesis, Rheinisch-Westfälisch-Technische Hochschule Aachen, 94 pp000
10.1594/pangaea.702107Collection2010Physical oceanography, sea-bed photographs and videos of benthos from the Weddell Sea taken with remote operated vehicle CHEROKEE during POLARSTERN cruise ANT-XXIII/8, supplement to: Gutt, Julian; Barratt, Iain; Domack, Eugene W; d'Udekem d'Acoz, Cédric; Dimmler, Werner; Grémare, Antoine; Heilmayer, Olaf; Isla, Enrique; Janussen, Dorte; Jorgensen, Elaina; Kock, Karl-Hermann; Lehnert, Linn Sophia; López-Gonzáles, Pablo José; Langner, Stephanie; Linse, Katrin; Manjón-Cabeza, Maria Eugenia; Meißner, Meike; Montiel, Américo; Raes, Maarten; Robert, Henri; Rose, Armin; Schepisi, Elisabet Sañé; Saucède, Thomas; Scheidat, Meike; Schenke, Hans Werner; Seiler, Jan; Smith, Craig (2011): Biodiversity change after climate-induced ice-shelf collapse in the Antarctic. Deep Sea Research Part II: Topical Studies in Oceanography, 58(1-2), 74-83000
10.1594/pangaea.706910Collection2005Properties of foraminifera from Denmark Strait, supplement to: Lorenz, Andrea (2005): Variability of benthic foraminifera north and south of the Denmark Strait. PhD Thesis, Mathematisch-Naturwissenschaftliche Fakultät der Christian-Albrechts-Universität zu Kiel, Germany, 139 pp000
10.1594/pangaea.708081Dataset1997AWI Bathymetric Chart of the Weddell Sea, Antarctica (BCWS)000
"butterfly" works from DRYAD
IDTypePublication YearTitlesNumber of CitationsNumber of ViewsNumber of Downloads
10.5061/dryad.8bb43Dataset2011Data from: UV photoreceptors and UV-yellow wing pigments in Heliconius butterflies allow a color signal to serve both mimicry and intraspecific communication08028
10.5061/dryad.8bb43/1Dataset2011Reflectance spectra of Heliconiine butterfly yellow wing pigments000
10.5061/dryad.8bb43/2Dataset2011Appendix: Wing photomicrographs and averaged yellow reflectance spectra000
10.5061/dryad.rg4dc36sDataset2011Data from: Evidence for evolutionary change associated with the recent range expansion of the British butterfly, Polyommatus agestis, in response to climate change000
10.5061/dryad.1705/1Dataset2010Napeogenesspecies.xml000
10.5061/dryad.1705/4Dataset2010Ithomiaspecies.xml000
10.5061/dryad.1540Dataset2010Data from: Rapid microsatellite isolation from a butterfly by de novo transcriptome sequencing: performance and a comparison with AFLP-derived distances015731
10.5061/dryad.1705Dataset2010Data from: Out of the Andes: patterns of diversification in clearwing butterflies09224
10.5061/dryad.1276Dataset2010Data from: Allopatric origin of cryptic butterfly species that were discovered feeding on distinct host plants in sympatry0488
10.5061/dryad.615Dataset2009Data from: A partitioned likelihood analysis of swallowtail butterfly phylogeny (Lepidoptera: Papilionidae)0572
10.5061/dryad.cb6pkDataset2011Data from: Similarity and specialization of the larval versus adult diet of European butterflies and moths07220
10.5061/dryad.f0b2f083Dataset2012Data from: Genomic regions with a history of divergent selection affect fitness of hybrids between two butterfly species06626
10.5061/dryad.vj86fq35Dataset2012Data from: Response to selection on cold tolerance is constrained by inbreeding0434
10.5061/dryad.d56g84n5Dataset2012Data from: Use of an exotic host plant affects mate choice in an insect herbivore0246
10.5061/dryad.s672t4ghDataset2012Data from: Dispersal and gene flow in the rare, parasitic Large Blue butterfly Maculinea arion04012
10.5061/dryad.7nn6pDataset2012Data from: The evolution of alternative developmental pathways: footprints of selection on life-history traits in a butterfly0408
10.5061/dryad.82j66Dataset2012Data from: Food plant-derived disease tolerance and resistance in a natural butterfly-plant-parasite interaction04310
10.5061/dryad.65r9cDataset2012Data from: Does selection on increased cold tolerance in the adult stage confer resistance throughout development?02714
10.5061/dryad.m27qqDataset2012Data from: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species000
10.5061/dryad.c7f1fDataset2012Data from: A range-wide genetic bottleneck overwhelms landscape heterogeneity and local abundance in shaping genetic patterns of an alpine butterfly (Lepidoptera: Pieridae: Colias behrii)03410
10.5061/dryad.284gkDataset2012Data from: Ecological constraints on female fitness in a phytophagous insect0348
10.5061/dryad.5rm2fDataset2012Data from: Environmental risk assessment for the small tortoiseshell Aglais urticae and a stacked Bt-maize with combined resistances against Lepidoptera and Chrysomelidae in central European agrarian landscapes0553
10.5061/dryad.f76f3Dataset2013Data from: Transcriptome analysis reveals novel patterning and pigmentation genes underlying Heliconius butterfly wing pattern variation05119
10.5061/dryad.733d9Dataset2012Data from: Dissecting the contributions of plasticity and local adaptation to the phenology of a butterfly and its host plants04512
10.5061/dryad.cr17vDataset2012Data from: Sharp genetic discontinuity across a unimodal Heliconius hybrid zone016839
"Lake Malawi" works from Global Biodiversity Information Facility
IDTypePublication YearTitlesNumber of CitationsNumber of ViewsNumber of Downloads
10.15468/n6ftydDataset2019rmca-albertine-rift-cichlids8100
10.15468/dl.fk6ozuDataset2015GBIF Occurrence Download000
10.15468/dl.sdaj2eDataset2015GBIF Occurrence Download000
10.15468/dl.pyr4axDataset2015GBIF Occurrence Download000
10.15468/dl.yj2sheDataset2015GBIF Occurrence Download000
10.15468/dl.t1auptDataset2015GBIF Occurrence Download000
10.15468/dl.iad1t6Dataset2015GBIF Occurrence Download000
10.15468/dl.fujlgaDataset2015GBIF Occurrence Download000
10.15468/dl.oqdmccDataset2015GBIF Occurrence Download000
10.15468/lvyqrqDataset2016Royal Museum of Central Africa - Albertian Rift Cichlids (ENBI wp13)000
10.15468/q3e6mnDataset2006Astiotrema turneri n. sp. (Digenea: Plagiorchiidae) from cichlid fishes (Cichlidae: Perciformes) of Lake Malawi, south­eastern Africa000
10.15468/dl.xucfgyDataset2016GBIF Occurrence Download000
10.15468/x0wrhlDataset2006Astiotrema turneri n. sp. (Digenea: Plagiorchiidae) from cichlid fishes (Cichlidae: Perciformes) of Lake Malawi, south­eastern Africa000
10.15468/dl.zvv38tDataset2016GBIF Occurrence Download000
10.15468/dl.shgiosDataset2016GBIF Occurrence Download000
10.15468/dl.obxf4eDataset2016GBIF Occurrence Download000
10.15468/dl.pej1ntDataset2017GBIF Occurrence Download000
10.15468/dl.ezc6o3Dataset2017GBIF Occurrence Download000
10.15468/dl.pu4okyDataset2017GBIF Occurrence Download000
10.15468/dl.szsg5cDataset2017GBIF Occurrence Download000
10.15468/dl.zmysjkDataset2017GBIF Occurrence Download000
10.15468/dl.1pajgbDataset2017GBIF Occurrence Download000
10.15468/dl.gsfp2rDataset2017GBIF Occurrence Download000
10.15468/dl.jdlxgxDataset2018GBIF Occurrence Download000
10.15468/dl.govpwqDataset2018GBIF Occurrence Download000

Download works in BibTeX format

Download the works in a single BibTeX file per repository

In [29]:
import pandas as pd
from IPython.display import Javascript
from requests.utils import requote_uri

# For each repository, download a file of BibTeX entries in csv format
for repo in ['pangaea', 'dryad','gbif']:
    works = data[repo]['works']
    bibtex_data = []
    for r in works['nodes']:
        bibtex_data.append([r['bibtex']])
    df = pd.DataFrame(bibtex_data, columns = None)
    
    js_download = """
var csv = '%s';

var filename = '%s_%s.bib';
var blob = new Blob([csv], { type: 'application/x-bibtex;charset=utf-8;' });
if (navigator.msSaveBlob) { // IE 10+
    navigator.msSaveBlob(blob, filename);
} else {
    var link = document.createElement("a");
    if (link.download !== undefined) { // feature detection
        // Browsers that support HTML5 download attribute
        var url = URL.createObjectURL(blob);
        link.setAttribute("href", url);
        link.setAttribute("download", filename);
        link.style.visibility = 'hidden';
        document.body.appendChild(link);
        link.click();
        document.body.removeChild(link);
    }
}
""" % (df.to_csv(index=False, header=False).replace('\n','\\n').replace("\'","\\'").replace("\"","").replace("\r",""), query_params["%s_repository" % repo], requote_uri(query_params["%s_keyword" % repo]))
    
    display(Javascript(js_download))

Define and run GraphQL query to retrieve citations for a single work

The query will retrieve citations for IUCN Red List assessment occurrence data for freshwater species native to the Lake Malawi/Nyasa/Niassa Catchment.

In [30]:
# Generate the GraphQL query: Get citations for a specific work from the repository
citations_query_params = {
    "id" : "https://doi.org/10.15468/1z5fn8",
    "maxCitations" : 75
}

citation_query = gql("""query getCitationsByWorkId($id: ID!, $maxCitations: Int!)
{
  work(id: $id) {
    id
    titles {
      title
    }
    type
    publicationYear
    citations(first: $maxCitations) {
      totalCount
      nodes {
        id
        type
        publicationYear
        repository {
          id
          name
        }
        titles {
          title
        }
        bibtex
        citationCount
        viewCount
        downloadCount
      }
    }
  }
}
""")

Run the above query

In [31]:
import json
citations = client.execute(citation_query, variable_values=json.dumps(citations_query_params))
In [33]:
# Get the total number of citations matching the query
citations_data = citations['work']['citations']
print("The number of citations for work %s:\n%s" % (citations_query_params["id"], str(citations_data['totalCount'])))
The number of citations for work https://doi.org/10.15468/1z5fn8:
778

Display citations in tabular format

Display citations of IUCN Red List assessment occurrence data for freshwater species native to the Lake Malawi/Nyasa/Niassa Catchment in a html table, including the number of their respective citations, views and downloads.

In [34]:
from IPython.core.display import display, HTML

# Get details for each citation
outputs = [['ID','Type','Publication Year','Titles','Number of Citations', 'Number of Views', 'Number of Downloads']]
for r in citations_data['nodes']:
    citation_id = '<a href="%s">%s</a></html>' % (r['id'], '/'.join(r['id'].split("/")[3:]))
    titles = '; '.join([s['title'] for s in r['titles']])
    output = [citation_id, r['type'], str(r['publicationYear']), titles, str(r['citationCount']), str(r['viewCount']), str(r['downloadCount'])]
    outputs += [output]
    
# Display outputs as html table
id_href = '<a href="%s">%s</a></html>' % (citations_query_params['id'], '/'.join(citations_query_params['id'].split("/")[3:]))
html_table = '<html><table><caption><b>Citations of %s from %s</b></caption>' % (id_href, query_params["%s_repository" % "gbif"] )  
html_table += '<tr><th style="text-align:center;">' + '</th><th style="text-align:center;">'.join(outputs[0]) + '</th></tr>'
for row in outputs[1:]:
    html_table += '<tr><td style="text-align:left;">' + '</td><td style="text-align:left;">'.join(row) + '</td></tr>'
html_table += '</table></html>'
display(HTML(html_table))
Citations of 10.15468/1z5fn8 from gbif.gbif
IDTypePublication YearTitlesNumber of CitationsNumber of ViewsNumber of Downloads
10.15468/dl.ezobqvDataset2019GBIF Occurrence Download000
10.15468/dl.oyiynlDataset2019GBIF Occurrence Download000
10.15468/dl.yjeevxDataset2019GBIF Occurrence Download000
10.15468/dl.pltnooDataset2019GBIF Occurrence Download000
10.15468/dl.nfglleDataset2019GBIF Occurrence Download000
10.15468/dl.2orexsDataset2019GBIF Occurrence Download000
10.15468/dl.3c0gswDataset2019GBIF Occurrence Download000
10.15468/dl.qzsifzDataset2019GBIF Occurrence Download000
10.15468/dl.8w8vv3Dataset2019GBIF Occurrence Download000
10.15468/dl.v2sxryDataset2019GBIF Occurrence Download000
10.15468/dl.gg19neDataset2019GBIF Occurrence Download000
10.15468/dl.urhkedDataset2019GBIF Occurrence Download000
10.15468/dl.8hxedwDataset2019GBIF Occurrence Download000
10.15468/dl.jpdcpuDataset2019GBIF Occurrence Download000
10.15468/dl.bajigqDataset2019GBIF Occurrence Download000
10.15468/dl.wywzy9Dataset2019GBIF Occurrence Download000
10.15468/dl.yqae0uDataset2019GBIF Occurrence Download000
10.15468/dl.dxd0qyDataset2019GBIF Occurrence Download000
10.15468/dl.w83wncDataset2019GBIF Occurrence Download000
10.15468/dl.6t0umrDataset2019GBIF Occurrence Download000
10.15468/dl.yf745qDataset2019GBIF Occurrence Download000
10.15468/dl.esynpzDataset2019GBIF Occurrence Download000
10.15468/dl.we2dpsDataset2019GBIF Occurrence Download000
10.15468/dl.hzxgzzDataset2019GBIF Occurrence Download000
10.15468/dl.6gqucaDataset2019GBIF Occurrence Download000
10.15468/dl.qlykpvDataset2019GBIF Occurrence Download000
10.15468/dl.sk612yDataset2019GBIF Occurrence Download000
10.15468/dl.7geyavDataset2019GBIF Occurrence Download000
10.15468/dl.it1jrjDataset2019GBIF Occurrence Download000
10.15468/dl.ye4kxzDataset2019GBIF Occurrence Download000
10.15468/dl.mzmat2Dataset2019Occurrence Download100
10.15468/dl.c724b1Dataset2020Occurrence Download000
10.15468/dl.eq5bv2Dataset2020Occurrence Download000
10.15468/dl.lzidysDataset2020Occurrence Download000
10.15468/dl.hjjyomDataset2020Occurrence Download000
10.15468/dl.wf2ahzDataset2020Occurrence Download000
10.15468/dl.kfgsqvDataset2020Occurrence Download000
10.15468/dl.qoizdcDataset2020Occurrence Download000
10.15468/dl.vxra4rDataset2020Occurrence Download000
10.15468/dl.cm1qheDataset2020Occurrence Download000
10.15468/dl.mbxqybDataset2020Occurrence Download000
10.15468/dl.mwokr1Dataset2020Occurrence Download000
10.15468/dl.l7n36dDataset2020Occurrence Download000
10.15468/dl.enbjuwDataset2020Occurrence Download000
10.15468/dl.roxbweDataset2020Occurrence Download000
10.15468/dl.tuicitDataset2020Occurrence Download000
10.15468/dl.pflrsgDataset2020Occurrence Download000
10.15468/dl.dovuteDataset2020Occurrence Download000
10.15468/dl.rdgr6tDataset2020Occurrence Download000
10.15468/dl.rw4xq7Dataset2020Occurrence Download000
10.15468/dl.a7bpdvDataset2020Occurrence Download000
10.15468/dl.fw5jmoDataset2020Occurrence Download000
10.15468/dl.x1jtguDataset2020Occurrence Download000
10.15468/dl.kwsn0aDataset2020Occurrence Download000
10.15468/dl.odeh1gDataset2020Occurrence Download000
10.15468/dl.favuggDataset2020Occurrence Download000
10.15468/dl.jyrjyjDataset2020Occurrence Download000
10.15468/dl.gw8zzvDataset2020Occurrence Download000
10.15468/dl.ofkaytDataset2020Occurrence Download000
10.15468/dl.razejzDataset2020Occurrence Download000
10.15468/dl.kgl1hoDataset2020Occurrence Download000
10.15468/dl.udowvcDataset2020Occurrence Download000
10.15468/dl.98b6lsDataset2020Occurrence Download000
10.15468/dl.yefatpDataset2020Occurrence Download000
10.15468/dl.tfbf3iDataset2020Occurrence Download000
10.15468/dl.5dl3fpDataset2020Occurrence Download000
10.15468/dl.hdikfwDataset2020Occurrence Download000
10.15468/dl.3gxfmiDataset2020Occurrence Download000
10.15468/dl.ndrjbjDataset2020Occurrence Download000
10.15468/dl.u3sno9Dataset2020Occurrence Download000
10.15468/dl.aeh9omDataset2020Occurrence Download000
10.15468/dl.pwpc2bDataset2020Occurrence Download000
10.15468/dl.dcgcekDataset2020Occurrence Download000
10.15468/dl.a5qgvrDataset2020Occurrence Download000
10.15468/dl.zc8udzDataset2020Occurrence Download000

Download citations in BibTeX format

Download the citations of IUCN Red List assessment occurrence data for freshwater species native to the Lake Malawi/Nyasa/Niassa Catchment in a single BibTeX file.

In [35]:
import pandas as pd
from IPython.display import Javascript
from requests.utils import requote_uri

# Download a file of BibTeX entries in csv format, for the citations of citations_query_params['id']
for r in works['nodes']:
    bibtex_data = []
    for r in works['nodes']:
        bibtex_data.append([r['bibtex']])
    df = pd.DataFrame(bibtex_data, columns = None)
id_label = '/'.join(citations_query_params['id'].split("/")[3:])

js_download = """
var csv = '%s';
var filename = '%s.bib';
var blob = new Blob([csv], { type: 'application/x-bibtex;charset=utf-8;' });
if (navigator.msSaveBlob) { // IE 10+
    navigator.msSaveBlob(blob, filename);
} else {
    var link = document.createElement("a");
    if (link.download !== undefined) { // feature detection
        // Browsers that support HTML5 download attribute
        var url = URL.createObjectURL(blob);
        link.setAttribute("href", url);
        link.setAttribute("download", filename);
        link.style.visibility = 'hidden';
        document.body.appendChild(link);
        link.click();
        document.body.removeChild(link);
    }
}
""" % (df.to_csv(index=False, header=False).replace('\n','\\n').replace("\'","\\'").replace("\"","").replace("\r",""), requote_uri(id_label))
    
display(Javascript(js_download))
In [ ]: