We now tackle the ingest of annotations for classes and properties in this installment of the Cooking with Python and KBpedia series. In prior installments we built the structural aspects of KBpedia. We now add the labels, definitions, and other assignments to them.
As with the extraction routines, we will split these efforts into class annotations and then property annotations. Our actual load routines are fairly straightforward, and we have no real logic concerns in how these annotations get added. The most complex wrinkle we will need to address are those annotation fields, altLabels
and notes
in particular, where we have potentially many assignments for a single reference concept (RC) or property. Like we saw with the extraction routines, for these items we will need to set up additional internal loops to segregate and assign the items for loading based on our standard double-pipe ('||') delimiter.
The two functions we develop in this installment, class_annot_builder
and prop_annot_builder
will be added to the build.py
module.
Since we are in an active part of the build cycle, we want to continue with our main knowledge graph in-progress for our load routine, so please make sure that kb_src
is set to 'standard' in your config.py
configuration. We then invoke our standard start-up:
from cowpoke.__main__ import *
from cowpoke.config import *
Class annotations consist of potentially the item's prefLabel
, altLabels
, definition
, and editorialNote
. The first item is mandatory, the next two should be provided to adhere to best practices. The last is optional. There are, of course, other standard annotations possible. Should your own conventions require or encourage them, you will likely need to modify the procedure below to account for that fact.
As with these methods before, we provide a header showing 'typical' configuration settings (in config.py
), and then proceed with a method that loops through all of the rows in the input file. Here is the basic class annotation build procedure. There are no new wrinkles in this routine from what has been seen previously:
### KEY CONFIG SETTINGS (see build_deck in config.py) ###
# 'kb_src' : 'standard'
# 'loop_list' : file_dict.values(), # see 'in_file'
# 'loop' : 'class_loop',
# 'in_file' : 'C:/1-PythonProjects/kbpedia/v300/build_ins/classes/Generals_annot_out.csv',
# 'out_file' : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',
def class_annot_build(**build_deck):
print('Beginning KBpedia class annotation build . . .')
loop_list = build_deck.get('loop_list')
loop = build_deck.get('loop')
class_loop = build_deck.get('class_loop')
# r_id = ''
# r_pref = ''
# r_def = ''
# r_alt = ''
# r_note = ''
if loop is not 'class_loop':
print("Needs to be a 'class_loop'; returning program.")
return
for loopval in loop_list:
print(' . . . processing', loopval)
in_file = loopval
with open(in_file, 'r', encoding='utf8') as input:
is_first_row = True
reader = csv.DictReader(input, delimiter=',', fieldnames=[C])
for row in reader:
r_id_frag = row['id']
id = getattr(rc, r_id_frag)
if id == None:
print(r_id_frag)
continue
r_pref = row['prefLabel']
r_alt = row['altLabel']
r_def = row['definition']
r_note = row['editorialNote']
if is_first_row:
is_first_row = False
continue
id.prefLabel.append(r_pref)
id.definition.append(r_def)
i_alt = r_alt.split('||')
if i_alt != ['']:
for item in i_alt:
id.altLabel.append(item)
i_note = r_note.split('||')
if i_note != ['']:
for item in i_note:
id.editorialNote.append(item)
print('KBpedia class annotation build is complete.')
class_annot_build(**build_deck)
kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml')
BTW, when we commit this method to our build.py
module, we will add the save routine at the end.
We now turn our attention to annotations of properties:
### KEY CONFIG SETTINGS (see build_deck in config.py) ###
# 'kb_src' : 'standard'
# 'loop_list' : file_dict.values(), # see 'in_file'
# 'loop' : 'class_loop',
# 'in_file' : 'C:/1-PythonProjects/kbpedia/v300/build_ins/properties/prop_annot_out.csv',
# 'out_file' : 'C:/1-PythonProjects/kbpedia/v300/target/ontologies/kbpedia_reference_concepts_test.csv',
def prop_annot_build(**build_deck):
print('Beginning KBpedia property annotation build . . .')
loop_list = build_deck.get('loop_list')
loop = build_deck.get('loop')
out_file = build_deck.get('out_file')
if loop is not 'property_loop':
print("Needs to be a 'property_loop'; returning program.")
return
for loopval in loop_list:
print(' . . . processing', loopval)
in_file = loopval
with open(in_file, 'r', encoding='utf8') as input:
is_first_row = True
reader = csv.DictReader(input, delimiter=',', fieldnames=['id', 'prefLabel', 'subPropertyOf', 'domain',
'range', 'functional', 'altLabel', 'definition', 'editorialNote'])
for row in reader:
r_id = row['id']
r_pref = row['prefLabel']
r_dom = row['domain']
r_rng = row['range']
r_alt = row['altLabel']
r_def = row['definition']
r_note = row['editorialNote']
r_id = r_id.replace('rc.', '')
id = getattr(rc, r_id)
if id == None:
print(r_id)
continue
if is_first_row:
is_first_row = False
continue
id.prefLabel.append(r_pref)
i_dom = r_dom.split('||')
if i_dom != ['']:
for item in i_dom:
id.domain.append(item)
if 'owl.' in r_rng:
r_rng = r_rng.replace('owl.', '')
r_rng = getattr(owl, r_rng)
id.range.append(r_rng)
elif r_rng == ['']:
continue
else:
# id.range.append(r_rng)
i_alt = r_alt.split('||')
if i_alt != ['']:
for item in i_alt:
id.altLabel.append(item)
id.definition.append(r_def)
i_note = r_note.split('||')
if i_note != ['']:
for item in i_note:
id.editorialNote.append(item)
print('KBpedia property annotation build is complete.')
prop_annot_build(**build_deck)
Hmmm. One of the things we notice in this routine is that our domain
and range
assignments have not been adequately picked up in our earlier KBpedia version 2.50 build routines (the ones undertaken in Clojure before this CWPK series). As a result, we can not adequately test range
and will need to address this oversight before our series is over.
As before, we will add our 'save' routine as well when we commit the method to the build.py
module.
kb.save(file=r'C:/1-PythonProjects/kbpedia/v300/targets/ontologies/kbpedia_reference_concepts_test.owl', format='rdfxml')
We now have all of the building blocks to create our extract-build roundtrip. We summarize the formal steps and configuration settings in CWPK #47. But, first, we need to return to cleaning our input files and instituting some unit tests.
NOTE: This CWPK
installment is available both as an online interactive
file or as a direct download to use locally. Make sure and pick the correct installment number. For the online interactive option, pick the *.ipynb
file. It may take a bit of time for the interactive option to load.