This notebook is intended as a supplement to the main OAK CLI docs.
This notebook provides examples for the paths
command, which can be used to query for paths between ontology terms
You can get help on any OAK command using --help
!runoak paths --help
Usage: runoak paths [OPTIONS] [TERMS]... List all paths between one or more start curies. Example: runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' This shows all shortest paths from nuclear membrane to all ancestors Example: runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target cytoplasm This shows shortest paths between two nodes Example: runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' 'thylakoid' --target cytoplasm 'thylakoid membrane' This shows all shortest paths between 4 combinations of starts and ends You can also use "@" to separate start node list and end node list. Like most OAK commands, you can pass either explicit terms, or term queries. For example, if you have two files of IDs, then you can do this: runoak -i sqlite:obo:go paths -p i,p .idfile START_NODES.txt @ .idfile END_NODES.txt You can also pass in weights for each predicate, used when calculating shortest paths. Example: runoak -i sqlite:obo:go paths -p i,p 'nuclear membrane' --target cytoplasm --predicate-weights "{i: 0.0001, p: 999}" This shows all shortest paths after weighting relations (Note: you can use the same shorthands as in the `--predicates` option) This command can be combined with others to visualize the paths. Example: alias go="runoak -i sqlite:obo:go" go paths -p i,p 'nuclear membrane' --target cytoplasm --narrow | go viz --fill-gaps - This visualizes the path by first exporting the path as a flat list, then passing the results to viz, using the fill-gaps option Options: --target TEXT end point of path --narrow / --no-narrow If true then output path is written a list of terms [default: no-narrow] --autolabel / --no-autolabel If set, results will automatically have labels assigned [default: autolabel] -p, --predicates TEXT A comma-separated list of predicates -O, --output-type TEXT Desired output type --directed / --no-directed only show directed paths [default: no- directed] --include-predicates / --no-include-predicates show predicates between nodes [default: no- include-predicates] --predicate-weights TEXT key-value pairs specified in YAML where keys are predicates or shorthands and values are weights -o, --output FILENAME Output file, e.g. obo file --help Show this message and exit.
For convenience we will set up an alias for use in this notebook
alias cl runoak -i sqlite:obo:cl
Note if you want to do this on your own machine the syntax is slightly different in bash/zsh:
alias cl="runoak -i sqlite:obo:cl"
cl paths --target cell interneuron
subject subject_label object object_label path path_label CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'GO:0030154', 'CL:0000000'] interneuron|neuron|material entity|precursor cell|cell differentiation|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0000003', 'CL:0000000'] interneuron|neuron|material entity|precursor cell|native cell|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000003', 'CL:0000000'] interneuron|neuron|material entity|motile cell|native cell|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0000393', 'CL:0000211', 'CL:0000003', 'CL:0000000'] interneuron|neuron|electrically responsive cell|electrically active cell|native cell|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0000404', 'CL:0000211', 'CL:0000003', 'CL:0000000'] interneuron|neuron|electrically signaling cell|electrically active cell|native cell|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'CL:0002319', 'CL:0002371', 'CL:0000003', 'CL:0000000'] interneuron|neuron|neural cell|somatic cell|native cell|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0098916', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|anterograde trans-synaptic signaling|postsynapse|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0045202', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|synapse|postsynapse|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0098794', 'CL:0000000'] interneuron|neuron|presynapse|cellular anatomical entity|postsynapse|cell CL:0000099 interneuron CL:0000000 cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0030312', 'CL:0000000'] interneuron|neuron|presynapse|cellular anatomical entity|external encapsulating structure|cell CL:0000099 interneuron CARO:0000013 cell ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0020003', 'CARO:0000013'] interneuron|neuron|nervous system|material anatomical entity|cellular anatomical structure|cell CL:0000099 interneuron CARO:0000013 cell ['CL:0000099', 'CL:0000540', 'UBERON:0001016', 'CARO:0000006', 'CARO:0000003', 'CARO:0000013'] interneuron|neuron|nervous system|material anatomical entity|connected anatomical structure|cell
You can see a similar structure using the tree
command:
cl tree interneuron -p i
* [] BFO:0000002 ! continuant * [i] BFO:0000004 ! independent continuant * [i] BFO:0000040 ! material entity * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron** * [i] CL:0002319 ! neural cell * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron** * [] CL:0000000 ! cell * [i] CL:0000003 ! native cell * [i] CL:0000211 ! electrically active cell * [i] CL:0000393 ! electrically responsive cell * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron** * [i] CL:0000404 ! electrically signaling cell * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron** * [i] CL:0000255 ! eukaryotic cell * [i] CL:0000548 ! animal cell * [i] CL:0002319 ! neural cell * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron** * [i] CL:0002371 ! somatic cell * [i] CL:0002319 ! neural cell * [i] CL:0000540 ! neuron * [i] **CL:0000099 ! interneuron**
By default the paths command will ignore direction and show paths going both up and down:
cl paths interneuron --target "T-cell"
subject subject_label object object_label path path_label CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000738', 'CL:0000842', 'CL:0000542', 'CL:0000084'] interneuron|neuron|material entity|motile cell|leukocyte|mononuclear cell|lymphocyte|T cell CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000542', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|lymphocyte|T cell CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0000785', 'GO:0000792', 'CL:0000542', 'CL:0000084'] interneuron|neuron|presynapse|cellular anatomical entity|chromatin|heterochromatin|lymphocyte|T cell CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0005737', 'CL:0017500', 'CL:0000542', 'CL:0000084'] interneuron|neuron|presynapse|cellular anatomical entity|cytoplasm|neutrophillic cytoplasm|lymphocyte|T cell CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000827', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|pro-T cell|T cell CL:0000099 interneuron CL:0000084 T cell ['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000838', 'CL:0000827', 'CL:0000084'] interneuron|neuron|material entity|precursor cell|progenitor cell|lymphoid lineage restricted progenitor cell|pro-T cell|T cell
Specifying --directed
forces traversal of subject to object; in this case, there are no such paths:
cl paths interneuron --directed --target "T-cell"
The default output is one row per path
You can use the --narrow
option to make a narrow table, with one row per path element:
cl paths --narrow --target CL:4023061 interneuron
subject subject_label object object_label path_node path_node_label CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000099 interneuron CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:0000540 neuron CL:0000099 interneuron CL:4023061 hippocampal CA4 neuron CL:4023061 hippocampal CA4 neuron
cl paths --narrow --target CL:4023061 interneuron -o output/interneuron-CA4-path.tsv
import pandas as pd
df = pd.read_csv("output/interneuron-CA4-path.tsv", sep="\t")
df
subject | subject_label | object | object_label | path_node | path_node_label | |
---|---|---|---|---|---|---|
0 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:0000099 | interneuron |
1 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:0000540 | neuron |
2 | CL:0000099 | interneuron | CL:4023061 | hippocampal CA4 neuron | CL:4023061 | hippocampal CA4 neuron |