try:
import IPython
except:
!pip install IPython
import IPython
from IPython.core.display import HTML
# add stylesheet for notebook
HTML("""<link rel="stylesheet" type="text/css" href="src/css/notebook.css">""")
The Linked Art reconciliation exemplar provides a step through the process of reconciling geographical place names that occur in the title of artworks by the artist, John Ruskin, to enrich collections data with an additional representation of the geographical coordinates of the place depicted in the artworks.
From Wikipedia:
"John Ruskin (8 February 1819 – 20 January 1900) was an English writer, philosopher, art critic and polymath of the Victorian era. He wrote on subjects as varied as geology, architecture, myth, ornithology, literature, education, botany and political economy."
John Ruskin travelled extensively in Europe and was a prolific artist, creating drawings of paintings whose titles often included place names for the locations depicted.
The title of the artworks has been recorded in the title field in many of the collection data records , and this has been used as the basis for the reconciliation process shown here.
The place names are reconciled with the Getty Thesaurus of Geographic Names (TGN), using the Open Refine tool.
Reconciliation with the Getty Thesaurus of Geographic Names (TGN) has allowed additional information to be associated with the artwork:
The input files are Linked Art files created with the 01-06-Transform-John-Ruskin
Jupyter notebook.
To reconcile the place names in the artwork titles
id
and _label
propertiesThe script gets a list of all files in a selected directory using os.listdir()
and iterates over them.
json.load
is used to deserialize the Linked Art JSONLD file to a Python dictionary object.Finally, the script uses csv.DictWriter
Dictwriter.writeheader()
writes a row with the field names (as specified in the constructor) to the writer’s file object.Dictwriter.writerows()
writes all elements in rows to the writer’s file object.# import relevant Python libraries
try:
import os
except:
%pip install os
import os
try:
import json
except:
%pip install json
import json
import csv
# list holding a
artworkCSV = []
# Linked Art JSON-LD file location
artworkFileDir = "./data/ruskin/output/json/"
artworkFileList =os.listdir(artworkFileDir)
# iterate over Linked Art JSON-LD files
for artworkFile in artworkFileList:
# read file and append to
with open( artworkFileDir + artworkFile) as artworkFileContents:
# create json object `artwork` from file
artworkObjJSON = json.load(artworkFileContents)
# check for "_label" property
if "_label" not in artworkObjJSON:
continue
# append artwork properties to artwork JSON object
artworkCSV.append( {
"id": artworkObjJSON["id"],
"place" : artworkObjJSON["_label"],
"place_modified": " ",
"coords": " "
})
# end loop
# create CSV file
artworkCsvFile = "./data/ruskin/ruskin-places.csv" # file location
with open(artworkCsvFile, 'w') as f:
# write column headings
w = csv.DictWriter(f, ["id","place","place_modified","coords"])
w.writeheader()
# write rows with artwork properties
w.writerows(artworkCSV)
The contents of the resulting CSV file are shown below for illustration.
The CSV file is read into a pandas
dataframe.
`Pandas` is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
A pandas dataframe
is a pandas data structure containing
two-dimensional, size-mutable, potentially heterogeneous tabular data.
The pandas dataframe allows easy manipulation of two-dimensional tabular data.
The IPython
library is also used to display the contents of the CSV file
try:
import pandas as pd
except:
%pip install pandas
import pandas as pd
try:
import IPython
except:
%pip install IPython
import IPython
from IPython.display import display, HTML, Javascript
# CSV file location
artworkCsvFile = "./data/ruskin/ruskin-places.csv"
# read CSV file into pandas dataframe
dataFrame = pd.read_csv(artworkCsvFile,low_memory=False)
# define how many columns and rows to display == all
pd.options.display.max_columns = len(dataFrame.columns)
#pd.options.display.max_rows = len(dataFrame.index)
# use IPython display to display the contents of CSV file
display(dataFrame)
id | place | place_modified | coords | |
---|---|---|---|---|
0 | https://collections.ashmolean.org/collection/1... | Engraving of Ruskin's Drawing of the Petal Vau... | ||
1 | https://collections.ashmolean.org/collection/1... | Enlarged Study of a Prawn's Rostrum | ||
2 | https://www.harvardartmuseums.org/collections/... | Study of a Venetian Capital | ||
3 | https://collections.ashmolean.org/collection/1... | Autumnal Cloud filling the Valley of Geneva, t... | ||
4 | https://collections.ashmolean.org/collection/1... | Axmouth Landslip from Dolands Farm | ||
... | ... | ... | ... | ... |
274 | https://collections.ashmolean.org/collection/1... | The Head of a Kite, from Life | ||
275 | https://www.harvardartmuseums.org/collections/... | Part of a Sketch of the Northwest Porch of St.... | ||
276 | http://www.rijksmuseum.nl/nl/collectie/nl-RP-T... | Gezicht op S. Anastasia te Verona, over de Adige | ||
277 | https://collections.ashmolean.org/collection/2... | Architectural detail: stone bracket | ||
278 | https://collections.ashmolean.org/collection/1... | Study of the Marble Inlaying on the Front of t... |
279 rows × 4 columns
Next, get place name from artwork title
place_modified
columnA list of place names placeNames
is created to help with extracting place names from the artwork title. This was produced further to a review of the values in the place column.
artworkCsvFile = "./data/ruskin/ruskin-places.csv" # file location
# read CSV file into pandas dataframe
dataFrame = pd.read_csv(artworkCsvFile,low_memory=False)
# A list of place names `placeNames` is created to help with extracting place names from the artwork title.
placeNames = [
"Florence","Bologna","Lucca","Alps","Oxford","Rome", "Venice","Fribourg","Neuchâtel","Sestri","Visp","Chamonix",
"Abbeville","Schaffhausen","Verona","Vorarlberg","Baden","Schaffhausen","Faido","Normandy","Genève","Geneva",
"Gloucester","Basel","Luzern","Padua","Habsburg","Rhine","Zug","Aix-la-Chapelle","Siena","Mont Blanc","Lago di Como",
"Bellinzona","Lake of Lecco"
]
places = {"Venezia":["Venice","Venetian","St Mark","St. Mark"]}
# iterate over dataframe
for index,row in dataFrame.iterrows():
# iterate over place names
# check if any place name in placesNames is present in row
for place in placeNames:
# if place name found, add to place_modified column
if place in row["place"]:
dataFrame.at[index,"place_modified"] = place
# iterate over place names for Venice
for place in places["Venezia"]:
# if place found add `Venezia` to place_modified column
if place in row["place"]:
dataFrame.at[index,"place_modified"] = "Venezia"
# remove records where place_modified is blank
dataFrame = dataFrame[dataFrame.place_modified != " "]
dataFrame.to_csv(artworkCsvFile, index=False)
The result of this process is a CSV file with the column place_modified
containing a place name string that will be used for reconciliation in the OpenRefine tool.
Records where a place name has not be identified have been removed from the CSV file.
artworkCsvFile = "./data/ruskin/ruskin-places.csv" # file location
dataFrame = pd.read_csv(artworkCsvFile,low_memory=False)
# display table for illustration
display(dataFrame)
id | place | place_modified | coords | |
---|---|---|---|---|
0 | https://www.harvardartmuseums.org/collections/... | Study of a Venetian Capital | Venezia | |
1 | https://collections.ashmolean.org/collection/1... | Autumnal Cloud filling the Valley of Geneva, t... | Geneva | |
2 | https://www.harvardartmuseums.org/collections/... | Tom Tower, Christ Church, Oxford | Oxford | |
3 | https://www.harvardartmuseums.org/collections/... | Study of a Venetian Capital | Venezia | |
4 | https://www.tate.org.uk/art/artworks/13033 | View of Bologna | Bologna | |
... | ... | ... | ... | ... |
102 | https://collections.ashmolean.org/collection/1... | Sketch of the Oak Spray in Mantegna's Fresco o... | Padua | |
103 | https://www.nga.gov/collection/72870 | The Garden of San Miniato near Florence | Florence | |
104 | https://www.harvardartmuseums.org/collections/... | Part of a Sketch of the Northwest Porch of St.... | Venezia | |
105 | http://www.rijksmuseum.nl/nl/collectie/nl-RP-T... | Gezicht op S. Anastasia te Verona, over de Adige | Verona | |
106 | https://collections.ashmolean.org/collection/1... | Study of the Marble Inlaying on the Front of t... | Venezia |
107 rows × 4 columns
Next
OpenRefine
is a tool for working with messy data, and includes support for reconciliation with external data such as name authorities.
The Getty Thesaurus of Geographic Names® Online (TGN)
is one of several Getty Vocabularies that provide a structured resource that can be used to improve access to information about art, architecture, and material culture. From the website:
Through rich metadata and links, the Getty Vocabularies provide powerful conduits for knowledge creation, research, and discovery for digital art history and related disciplines.TGN is a thesaurus. TGN is not a geographic information system (GIS), although it may be linked to existing major, general-purpose, geographic databases and maps. While most records in TGN include coordinates, these coordinates are approximate and are intended for reference ("finding purposes") only (as is true of coordinates in most atlases and other resources, including NGA (formerly NIMA) databases).
from IPython.display import IFrame, HTML
display(HTML("<h4>Open Refine Website</h4>"))
display(IFrame('https://openrefine.org/documentation.html', '100%', '600px'))
The method used to reconcile the place names:
place_modified
The following video illustrates how to create a project in OpenRefine using a CSV file on the local drive.
from IPython.display import YouTubeVideo, HTML
display(HTML("<h4>Video - OpenRefine - Create Project</h4>"))
# video of project creation in OpenRefine
YouTubeVideo('h1aLc5uvdck', width=1024, height=576)
place_modified
¶place_modified
column headerStart reconciling
The video shows the following process:
from IPython.display import YouTubeVideo, HTML
display(HTML("<h4>Video - OpenRefine - Start Reconciliation</h4>"))
YouTubeVideo('Zm0woMobjpI', width=1024, height=576)
Once the reconciliation process has completed it is necessary to review the results. Using the TGN it is noticeable that there are many places in the United States of America that have the same names as locations in Italy.
A review of each match is necessary. Once a correct match has been identified, this match can be applied to all cells with the same place name.
from IPython.display import YouTubeVideo, HTML
display(HTML("<h4>Video - OpenRefine - Review Reconciliation Results</h4>"))
YouTubeVideo('pT0b0vsPRJ0', width=1024, height=576)
from IPython.display import YouTubeVideo, HTML
display(HTML("<h4>Video - OpenRefine - Add Entity Identifier Column</h4>"))
YouTubeVideo('PNPhs_7MQ6o', width=1024, height=576)
Some additional manual reconciliation was required using the TGN search form.
The result of the reconciliation process is a new column with TGN name authority identifiers for place names identified in the artwork title.
A CSV file is created with the following steps:
from IPython.display import YouTubeVideo, HTML
display(HTML("<h4>Video - OpenRefine - Export Results as CSV</h4>"))
YouTubeVideo('0tBjqr5AEmA', width=1024, height=576)
reconciledRuskinPlaces = "data/ruskin/ruskin-places-rec.csv"
# read file into pandas dataframe
df = pd.read_csv(reconciledRuskinPlaces,low_memory=False)
# remove rows that have an empty tgn field value
df = df[df.tgn != ""]
# write dataframe to file
df.to_csv(reconciledRuskinPlaces, index=False)
# for illustration display dataframe
display(df)
The next step is to associate geographical coordinates with the Linked Art artwork representations, using the TGN identifiers to query the TGN web service and return geographical coordinates.
Steps:
import requests
latprop = "http://www.w3.org/2003/01/geo/wgs84_pos#lat"
lngprop = "http://www.w3.org/2003/01/geo/wgs84_pos#long"
display(HTML("<H2>Geographical coordinates retrieved from TGN web service</H2>"))
reconciledRuskinPlaces = "./data/ruskin/ruskin-places-rec.csv"
reconciledRuskinPlacesCoords = "./data/ruskin/ruskin-places-rec-coords.csv"
# create dataframe from CSV file containing reconciled data including TGN identifiers
dataFrameRuskinPlaces = pd.read_csv(reconciledRuskinPlaces,low_memory=False)
# set type for column 'coords' as string in dataframe
dataFrameRuskinPlaces['coords'] = dataFrameRuskinPlaces['coords'].astype(str)
display(HTML("<p>Retrieving geocoordinates from vocab.getty.edu TGN API. Please wait for task to complete.</p>"))
# create dataframe to hold geographical coordinates with columns tng and latlng
dataFrameGeo = pd.DataFrame({}, columns=['tgn', 'latlng'])
# iterate through reconciled data containing place names and TGN identifiers
for identifier_tgn in dataFrameRuskinPlaces['tgn'].unique():
# print . to indicate progress
print(".", end='')
#create query string for web service - get tgn id using .split()
query = "http://vocab.getty.edu/tgn/" + identifier_tgn.split("tgn/",1)[1] +"-place.json"
# use requests.get() to query TGN web service using TGN identifier to return geo coordinates
resultsJSON = requests.get(query).json()
# get lat lng from web service query results
for record in resultsJSON:
lat = resultsJSON[record][latprop][0]["value"]
lng = resultsJSON[record][lngprop][0]["value"]
# create string for lat lng
latlng = str(lat) + "," + str(lng)
# append TGN identifier and lat lng to the dataFrameGeo
dataFrameGeo = dataFrameGeo.append(
{
'tgn': identifier_tgn,
'latlng': latlng
},
ignore_index=True)
# for illustration display dataFrameGeo with addition of geo coords
display(dataFrameGeo)
The following code
dataFrameRuskinPlaces
with dataFrameGeo
containing the geocoordinates.coords
from dataFrameRuskinPlaces
latlng
to coords
in dataFrameRuskinPlaces
dataFrameRuskinPlaces
to a CSV filereconciledRuskinPlacesCoords = "./data/ruskin/ruskin-places-rec-coords.csv"
# merge dataframe with coords with dataframe from csv
dataFrameRuskinPlaces = dataFrameRuskinPlaces.merge(dataFrameGeo, on='tgn')
# drop column coords
dataFrameRuskinPlaces = dataFrameRuskinPlaces.drop('coords', 1) # drop column coords
# rename column latlng to coords
dataFrameRuskinPlaces.rename(columns={'latlng': 'coords'}, inplace=True) # rename column tgn to coords
# drop rows that have na value in coords column
dataFrameRuskinPlaces.dropna(subset=['coords'])
# write to CSV file
dataFrameRuskinPlaces.to_csv(reconciledRuskinPlacesCoords, index=False)
display(HTML("<H4>CSV file with Geographical Coordinates</H4>"))
# display dataframe
display(dataFrameRuskinPlaces)
The next step is to define a representation in Linked Art for geographical coordinates of place depicted in artwork. The relevant parts of the Linked Art model are:
From https://linked.art/model/object/aboutness/#depiction
Many sorts of artwork depict things that can be pointed out in the artwork. These could be identifiable entities, such as a known Person or Object with a name or identifier, or unidentifiable (perhaps fictional) instances of a class of entity, such as a depiction of a battle but not any particular battle. For example a portrait depicts the person sitting for it, or a sketch of a generic landscape depicts a place even if it's not a particular, known location. The depiction pattern describes what is in the artwork's image.
This is modeled using the `represents` property on the VisualItem, which refers to the entity that is being depicted.
The following representation will be used for place depicted in Ruskin's artworks:
{ "@context": "https://linked.art/ns/v1/linked-art.json", "id": "https://linked.art/example/object/34", "type": "HumanMadeObject", "_label": "artwork title including place name", "shows": [ { "type": "VisualItem", "represents": [ { "type": "Place", "_label": "place name" } ] } ]}
The Linked Art data model describes how to represent geospatial approximation.
From https://linked.art/model/place/#geospatial-approximation
All recorded locations are approximate to some degree. It may be desirable to capture this approximation separately from the actual place, especially when that approximation is very uncertain. Especially if the place is the exact location of several events, and perhaps an address or other information is known, but not the exact geospatial coordinates.
Secondly, as a place is defined by exactly one definition, but there might be multiple approximations such as a polygon as well as the central point, the real place that an activity occured at can be related to multiple approximate places to capture these different approximations.
Example Linked Art representation of geospatial approximation:
{ "@context": "https://linked.art/ns/v1/linked-art.json", "id": "https://linked.art/example/place/4", "type": "Place", "_label": "True Auction House Location", "approximated_by": [ { "type": "Place", "_label": "Auction House Location Approximation", "defined_by": "POINT(-0.0032937526703165 51.515107154846)" } ] }
Relating the Linked Art model for geospatial approximation to the depiction of places in Ruskin's works, the following representation has been created:
{ "@context": "https://linked.art/ns/v1/linked-art.json", "id": "https://linked.art/example/object/34", "type": "HumanMadeObject", "_label": "artwork title including place name", "shows": [ { "type": "VisualItem", "represents": [ { "type": "Place", "_label": "Lucca", "approximated_by": [ { "type": "Place", "_label": "Lucca - Location Approximation", "defined_by": "POINT(-0.0032937526703165 51.515107154846)" } ] } ] } ]}
Depiction https://linked.art/model/object/aboutness/#depiction
Geospatial approximation https://linked.art/model/place/#geospatial-approximation
Below is a visualisation of the Linked Art JSON-LD representation of geographical coordinates of a place depicted in an artwork.
Further information
from IPython.core.display import Javascript
code2 = "var file = './data/examples/geolocation.json';"\
"var selector = '#vis';" \
"visjsonld(file, selector); "
with open('./src/js/visld.js', 'r') as _jscript:
code = _jscript.read() + code2
Javascript(code)
The final step is to add place names and geocoordinates to the original Linked Art files.
The updated Linked Art files, including the geocoordinates, will later be used in a storymap visualisation of the artworks of John Ruskin, mapping the artworks to the locations that they depict, using the geocoordinates.
The cromulent
Python library is used to create the JSON-LD representation.
try:
import cromulent
except:
%pip install cromulent
import cromulent
from cromulent.model import factory
from cromulent.model import factory, Actor, Production, BeginningOfExistence, EndOfExistence, TimeSpan, Place
from cromulent.model import InformationObject, Phase, VisualItem
from cromulent.vocab import Painting, Drawing,Miniature,add_art_setter, PrimaryName, Name, CollectionSet, instances, Sculpture
from cromulent.vocab import aat_culture_mapping, AccessionNumber, Height, Width, SupportPart, Gallery, MuseumPlace
from cromulent.vocab import BottomPart, Description, RightsStatement, MuseumOrg, Purchase
from cromulent.vocab import Furniture, Mosaic, Photograph, Coin, Vessel, Graphic, Enamel, Embroidery, PhotographPrint
from cromulent.vocab import PhotographAlbum, PhotographBook, PhotographColor, PhotographBW, Negative, Map, Clothing, Furniture
from cromulent.vocab import Sample, Architecture, Armor, Book, DecArts, Implement, Jewelry, Manuscript, SiteInstallation, Text, Print
from cromulent.vocab import TimeBasedMedia, Page, Folio, Folder, Box, Envelope, Binder, Case, FlatfileCabinet
from cromulent.vocab import HumanMadeObject,Tapestry,LocalNumber
from cromulent.vocab import Type,Set
from cromulent.vocab import TimeSpan, Actor, Group, Acquisition, Place
from cromulent.vocab import Production, TimeSpan, Actor
from cromulent.vocab import LinguisticObject,DigitalObject, DigitalService
from cromulent import reader
try:
import pandas as pd
except:
%pip install pandas
import pandas as pd
try:
import os
except:
%pip install os
import os
try:
import json
except:
%pip install json
import json
artwork = {}
cnt=1
# directory that will contain updated Ruskin artwork representations including geo coords
storyvisdir = "data/ruskin/storyvis/json"
# file containing reconciled data with coordinates
filecoord = "./data/ruskin/ruskin-places-rec-coords.csv"
# open file containing reconciled data with geo coordinates
dataframeGeo = pd.read_csv(filecoord,low_memory=False)
# directory containing Rusking artworks represented in Linked Art JSON-LD
ruskindir = "data/ruskin/output/json"
file_list=os.listdir(ruskindir)
# for each linked art json file
for file in file_list:
# open file
with open( ruskindir + "/" + file) as json_file:
# get json object from file object with json.load() https://www.geeksforgeeks.org/json-load-in-python/
artwork = json.load(json_file)
# if id field is in the id field of data file containing geographical coordinates, add update the file
if artwork["id"] in dataframeGeo["id"].tolist():
display(HTML("<h4>" + artwork["_label"] + "</h4>"))
# get rows in dataframeGeo where id == artwork id from JSON-LD file
# Access a group of rows and columns by label(s) or a boolean array
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
rows = dataframeGeo.loc[dataframeGeo['id'] == artwork["id"]]
print("Matching row in geographical coordinates file for artwork")
display(rows)
# get first row https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
row=rows.iloc[0]
# get place name and coords from geocoords file
placeName = row["place"]
coords = row["coords"]
# replace comma in coords with space
coords = coords.replace(",", " ")
# increment counter
cnt = cnt+1
# use cromulent to create Linked Art representation of place depicted
# https://github.com/thegetty/crom
approx_place = Place()
approx_place._label = placeName
approx_place.defined_by = "POINT(" + coords + ")"
place = Place()
place._label = placeName
place.approximated_by = approx_place
visualItem = VisualItem()
visualItem.represents = place
# append new representation to artwork json object
artwork["shows"] = factory.toJSON(visualItem)
print("Geographical coordinates representation to be added:")
print(json.dumps(factory.toJSON(visualItem), indent=2))
# open output file
text_file = open(storyvisdir + "/" + str(cnt) + ".json", "wt")
# write to file and close
n = text_file.write(json.dumps(artwork,indent=2))
text_file.close()
print("File updated" )
HTML("<h4>Files updated</h4>")
print(json.dumps(artwork,indent=2))
If you'd like to view a different file change the value of file
filepath in the code below (examples: 1.json .. 89.json)
from IPython.core.display import Javascript
code2 = "var file = './data/ruskin/storyvis/json/3.json';"\
"var selector = '#vis2';" \
"visjsonld(file, selector); "
with open('./src/js/visld.js', 'r') as _jscript:
code = _jscript.read() + code2
Javascript(code)
Look at the Jupyter notebook that:
01-06-Transform-John-Ruskin
03-04-Visualise-John-Ruskin-Story-Map.ipynb
import ipywidgets
from ipywidgets import Layout, FileUpload
from IPython.display import display, IFrame, HTML, Image
import os
import json
# directory that will contain updated Ruskin artwork representations including geo coords
dir = "data/ruskin/storyvis/json"
file_list=os.listdir(dir)
selectOptions = []
selectOptions.append(("Please select an artwork", ""))
# for each linked art json file
for file in file_list:
# open file
with open( dir + "/" + file) as json_file:
artwork = json.load(json_file)
title = artwork["_label"] + " (" + file + ")"
selectOptions.append((title,file))
from IPython.core.display import Javascript
def dropdown_eventhandler(change):
with open('./src/js/visld.js', 'r') as _jscript:
code = _jscript.read() + "var file = './data/ruskin/storyvis/json/" + change.new + "';var selector = '#vis3';visjsonld(file, selector); "
display(Javascript(code))
with open( dir + "/" + change.new) as json_file:
artwork = json.load(json_file)
if ("representation" in artwork):
image = artwork["representation"][0]["id"]
display(Javascript("document.getElementById('artwork').src = '" + image + "';"))
else:
display(Javascript("document.getElementById('artwork').src = '';"))
selectObject = ipywidgets.Dropdown(options=selectOptions)
selectObject.observe(dropdown_eventhandler, names='value')
display(selectObject)