A Resolver is used to link terms or a Resource
to identifiers (URIs) in a knowledge graph thus addressing lexical variations
(merging of synonyms, aliases and acronyms) and disambiguating them. This feature is also referred to as entity linking
specially in the context of Natural Language Processing (NLP) when building knowledge graph from entities extracted from
text documents.
from kgforge.core import KnowledgeGraphForge
A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook 00-Initialization.ipynb.
forge = KnowledgeGraphForge("../../configurations/forge.yml", debug=True)
from kgforge.core.commons.strategies import ResolvingStrategy
from kgforge.core.resource import Resource
With the forge.resolvers()
method, configured resolvers can be inspected.
forge.resolvers() # The values are taken from "../../configurations/forge.yml"
Available scopes: - entities : - resolver: DemoResolver - targets: agents - ontology : - resolver: DemoResolver - targets: cells - schemaorg : - resolver: EntityLinkerSkLearn - targets: terms - terms : - resolver: DemoResolver - targets: sexontology
A scope is a convenient (and arbitrary) way to name a given Resolver along with a set of sources of data (the targets
) to resolve against. Resolve a resource for female
in the 'terms' resolving scope.
Passing output="dict"
as parameter in forge.resolvers()
returns the resolvers as a dictionary of scopes and their
respective targets.
resolvers = forge.resolvers(output="dict")
resolvers
{'entities': {'agents': {'bucket': 'agents.json'}}, 'ontology': {'cells': {'bucket': 'cell_types.json'}}, 'schemaorg': {'terms': {'bucket': 'tfidfvectorizer_model_schemaorg_linking'}}, 'terms': {'sexontology': {'bucket': 'sex.json'}}}
The DemoResolver resolve a term using str comparision and is looking up in a json file.
Resolve the textfemale
againt the 'terms' resolving scope.
female = forge.resolve(text="female", scope="terms")
type(female)
kgforge.core.resource.Resource
print(female)
{ id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
assert forge.resolve(text="feMAle", scope="terms", strategy=ResolvingStrategy.EXACT_MATCH) == None
print(forge.resolve(text="feMAle", scope="terms", strategy=ResolvingStrategy.EXACT_CASE_INSENSITIVE_MATCH))
{ id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
assert forge.resolve(text="emale", scope="terms", strategy=ResolvingStrategy.EXACT_CASE_INSENSITIVE_MATCH) == None
print(forge.resolve(text="emale", scope="terms", strategy=ResolvingStrategy.BEST_MATCH))
{ id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
Resolve the text EPFL
against the 'entities' resolving scope.
epfl = forge.resolve("EPFL", scope="entities")
type(epfl)
kgforge.core.resource.Resource
print(epfl)
{ id: https://www.grid.ac/institutes/grid.5333.6 type: Organization label: École Polytechnique Fédérale de Lausanne acronym: EPFL }
print(forge.resolve("female", scope="terms", target="sexontology"))
{ id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
print(forge.resolve("EPFL", scope="entities", target="agents"))
{ id: https://www.grid.ac/institutes/grid.5333.6 type: Organization label: École Polytechnique Fédérale de Lausanne acronym: EPFL }
print(forge.resolve("female", scope="terms", type="Class"))
{ id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
print(forge.resolve("EPFL", scope="entities", type="Organization"))
{ id: https://www.grid.ac/institutes/grid.5333.6 type: Organization label: École Polytechnique Fédérale de Lausanne acronym: EPFL }
Different strategies can be used to rank resolving candidates.
In the following example, the missing 'e' at the end is intended for the demonstration.
text = "mal"
The default applied strategy is strategy=ResolvingStrategy.BEST_MATCH
.
print(forge.resolve(text, scope="terms"))
{ id: http://purl.obolibrary.org/obo/PATO_0000384 type: Class label: male }
print(forge.resolve(text, scope="terms", strategy=ResolvingStrategy.EXACT_MATCH))
None
The candidates list is ordered by score.
results = forge.resolve(text, scope="terms", strategy=ResolvingStrategy.ALL_MATCHES, limit=3)
type(results)
list
len(results)
2
type(results[0])
kgforge.core.resource.Resource
print(*results, sep="\n")
{ id: http://purl.obolibrary.org/obo/PATO_0000384 type: Class label: male } { id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
pyramidal = 'Pyramidal Neuron'
cell_characters = "Lamp+"
hard_name = "270_L5/6 NP CT CTX"
print(forge.resolve(pyramidal, scope="ontology", strategy="EXACT_MATCH"))
{ id: https://neuroshapes.org/PyramidalNeuron type: Class label: Pyramidal Neuron }
print(forge.resolve(cell_characters, scope="ontology", strategy="EXACT_MATCH"))
{ id: https://bbp.epfl.ch/ontologies/core/celltypes/Lamp_plus type: Class label: Lamp+ }
print(forge.resolve(hard_name, scope="ontology", strategy="EXACT_MATCH"))
{ id: https://bbp.epfl.ch/ontologies/core/ttypes/270_L5_6_NP_CT_CTX type: Class label: 270_L5/6 NP CT CTX }
when using lower cases, it will return None
print(forge.resolve("270_L5/6 np CT CTX", scope="ontology", strategy="EXACT_MATCH"))
None
print(forge.resolve("lamp+", scope="ontology", strategy="EXACT_CASE_INSENSITIVE_MATCH"))
{ id: https://bbp.epfl.ch/ontologies/core/celltypes/Lamp_plus type: Class label: Lamp+ }
print(forge.resolve("lamp+", scope="ontology", strategy="EXACT_CASE_INSENSITIVE_MATCH"))
{ id: https://bbp.epfl.ch/ontologies/core/celltypes/Lamp_plus type: Class label: Lamp+ }
in this case using the case-insensitive match will find the cell type
print(forge.resolve("270_L5/6 np CT CTx", scope="ontology", strategy="EXACT_CASE_INSENSITIVE_MATCH"))
{ id: https://bbp.epfl.ch/ontologies/core/ttypes/270_L5_6_NP_CT_CTX type: Class label: 270_L5/6 NP CT CTX }
print(forge.resolve("2", scope="ontology"))
{ id: https://bbp.epfl.ch/ontologies/core/ttypes/21_Sncg` type: Class label: 21_Sncg }
results = forge.resolve("2", scope="ontology", strategy="ALL_MATCHES")
print(*results, sep="\n")
{ id: https://bbp.epfl.ch/ontologies/core/ttypes/21_Sncg` type: Class label: 21_Sncg } { id: https://bbp.epfl.ch/ontologies/core/ttypes/270_L5_6_NP_CT_CTX type: Class label: 270_L5/6 NP CT CTX }
A kgforge.core.resource.Resource can be resolved. In such case and in addition to the other supported arguments, the resource property to resolve can be provided through the argument 'property_to_resolve'. The resolving result can be merge back in the input resource by setting the 'merge_inplace_as argument'. When 'merge_inplace_as' is not set then the results are returned as separate resources.
resource = Resource(type="Agent", gender="mal")
print(resource)
{ type: Agent gender: mal }
resource_resolved_merged = forge.resolve(resource, scope="terms", target="sexontology",
strategy=ResolvingStrategy.ALL_MATCHES,
property_to_resolve="gender",
merge_inplace_as="gender_resolved",
threshold=0.8)
type(resource_resolved_merged)
kgforge.core.resource.Resource
print(resource_resolved_merged)
{ type: Agent gender: mal gender_resolved: [ { id: http://purl.obolibrary.org/obo/PATO_0000384 type: Class label: male } { id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female } ] }
resource_resolved_separated = forge.resolve(resource, scope="terms", target="sexontology",
strategy=ResolvingStrategy.ALL_MATCHES,
property_to_resolve="gender",
threshold=0.8)
type(resource_resolved_separated)
list
len(resource_resolved_separated)
2
print(*resource_resolved_separated, sep="\n")
{ id: http://purl.obolibrary.org/obo/PATO_0000384 type: Class label: male } { id: http://purl.obolibrary.org/obo/PATO_0000383 type: Class label: female }
Based on a pretrained model and using scikit-learn to generate and rank candidates.
print(forge.resolve("person", scope="schemaorg", target="terms", strategy=ResolvingStrategy.BEST_MATCH))
/opt/miniconda3/envs/kgforge/lib/python3.7/site-packages/sklearn/base.py:338: UserWarning: Trying to unpickle estimator TfidfTransformer from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations UserWarning, /opt/miniconda3/envs/kgforge/lib/python3.7/site-packages/sklearn/base.py:338: UserWarning: Trying to unpickle estimator TfidfVectorizer from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations UserWarning,
{ id: http://schema.org/Person label: Person altLabel: Person definition: A person (alive, dead, undead, or fictional). score: 0.0 }
/opt/miniconda3/envs/kgforge/lib/python3.7/site-packages/sklearn/base.py:338: UserWarning: Trying to unpickle estimator NearestNeighbors from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations UserWarning,