This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the termset-similarity
command, which can be used to do an aggregate comparisons between
two sets of terms (term profiles).
Use cases include:
Note that this command isn't aware of the actual associations themselves - it relies on you to assemble the profile.
The command is general and doesn't make any assumptions about ontology used. The user can control which predicates to use in traversal.
You can get help on any OAK command using --help
!runoak termset-similarity --help
Usage: runoak termset-similarity [OPTIONS] [TERMS]... Termset similarity. This calculates a similarity matrix for two sets of terms. Example: runoak -i go.db termset-similarity -p i,p nucleus membrane @ "nuclear membrane" vacuole -p i,p Python API: https://incatools.github.io/ontology-access-kit/interfaces/semantic- similarity Data model: https://w3id.org/oak/similarity Options: -p, --predicates TEXT A comma-separated list of predicates. This may be a shorthand (i, p) or CURIE -o, --output FILENAME Output file, e.g. obo file -O, --output-type TEXT Desired output type --autolabel / --no-autolabel If set, results will automatically have labels assigned [default: autolabel] --help Show this message and exit.
alias hp runoak -i sqlite:obo:hp
hp termset-similarity "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
subject_termset: HP:0100752: id: HP:0100752 label: Abnormal liver lobulation HP:0007042: id: HP:0007042 label: Focal white matter lesions object_termset: HP:0006555: id: HP:0006555 label: Diffuse hepatic steatosis HP:0025517: id: HP:0025517 label: Hypoplastic hippocampus subject_best_matches: HP:0007042: match_source: HP:0007042 score: 6.775984316965229 similarity: subject_id: HP:0007042 object_id: HP:0025517 ancestor_id: HP:0100547 ancestor_label: Abnormal forebrain morphology ancestor_information_content: 6.775984316965229 jaccard_similarity: 0.5 phenodigm_score: 1.8406499282814792 match_source_label: Focal white matter lesions match_target: HP:0025517 match_target_label: Hypoplastic hippocampus HP:0100752: match_source: HP:0100752 score: 8.632074905566515 similarity: subject_id: HP:0100752 object_id: HP:0006555 ancestor_id: HP:0410042 ancestor_label: Abnormal liver morphology ancestor_information_content: 8.632074905566515 jaccard_similarity: 0.5 phenodigm_score: 2.0775075096815554 match_source_label: Abnormal liver lobulation match_target: HP:0006555 match_target_label: Diffuse hepatic steatosis object_best_matches: HP:0006555: match_source: HP:0006555 score: 8.632074905566515 similarity: subject_id: HP:0100752 object_id: HP:0006555 ancestor_id: HP:0410042 ancestor_label: Abnormal liver morphology ancestor_information_content: 8.632074905566515 jaccard_similarity: 0.5 phenodigm_score: 2.0775075096815554 match_source_label: Diffuse hepatic steatosis match_target: HP:0100752 match_target_label: Abnormal liver lobulation HP:0025517: match_source: HP:0025517 score: 6.775984316965229 similarity: subject_id: HP:0007042 object_id: HP:0025517 ancestor_id: HP:0100547 ancestor_label: Abnormal forebrain morphology ancestor_information_content: 6.775984316965229 jaccard_similarity: 0.5 phenodigm_score: 1.8406499282814792 match_source_label: Hypoplastic hippocampus match_target: HP:0007042 match_target_label: Focal white matter lesions average_score: 7.704029611265872 best_score: 8.632074905566515
OAK has the ability to use semsimian to use a more efficient semantic similarity implementation under the hood
!runoak -i semsimian:sqlite:obo:hp termset-similarity -p i "Abnormal liver lobulation" "Focal white matter lesions" @ "Diffuse hepatic steatosis" "Hypoplastic hippocampus"
[00:00:00] Building (all subjects X all objects) pairwise similarity: ████████████████████████████████████████ 100%ing (all subjects X all objects) pairwise similarity: ████████████████████░░░░░░░░░░░░░░░░░░░░ 50%WARNING:root:Adding labels not yet implemented in SemsimianImplementation. subject_termset: HP:0007042: id: HP:0007042 label: Focal white matter lesions HP:0100752: id: HP:0100752 label: Abnormal liver lobulation object_termset: HP:0025517: id: HP:0025517 label: Hypoplastic hippocampus HP:0006555: id: HP:0006555 label: Diffuse hepatic steatosis subject_best_matches: HP:0007042: match_source: HP:0007042 score: 6.7759382869726945 similarity: subject_id: HP:0007042 object_id: HP:0025517 ancestor_id: HP:0100547 ancestor_label: '' ancestor_information_content: 6.7759382869726945 jaccard_similarity: 0.5 phenodigm_score: 1.8406436764040854 match_source_label: Focal white matter lesions match_target: HP:0025517 match_target_label: Hypoplastic hippocampus HP:0100752: match_source: HP:0100752 score: 8.632028875573981 similarity: subject_id: HP:0100752 object_id: HP:0006555 ancestor_id: HP:0410042 ancestor_label: '' ancestor_information_content: 8.632028875573981 jaccard_similarity: 0.5 phenodigm_score: 2.0775019705855855 match_source_label: Abnormal liver lobulation match_target: HP:0006555 match_target_label: Diffuse hepatic steatosis object_best_matches: HP:0006555: match_source: HP:0006555 score: 8.632028875573981 similarity: subject_id: HP:0006555 object_id: HP:0100752 ancestor_id: HP:0410042 ancestor_label: '' ancestor_information_content: 8.632028875573981 jaccard_similarity: 0.5 phenodigm_score: 2.0775019705855855 match_source_label: Diffuse hepatic steatosis match_target: HP:0100752 match_target_label: Abnormal liver lobulation HP:0025517: match_source: HP:0025517 score: 6.7759382869726945 similarity: subject_id: HP:0025517 object_id: HP:0007042 ancestor_id: HP:0100547 ancestor_label: '' ancestor_information_content: 6.7759382869726945 jaccard_similarity: 0.5 phenodigm_score: 1.8406436764040854 match_source_label: Hypoplastic hippocampus match_target: HP:0007042 match_target_label: Focal white matter lesions average_score: 7.703983581273338 best_score: 8.632028875573981 metric: ancestor_information_content