obsidiantools in 10 minutes
# built-in libs
import os
from pathlib import Path
# obsidiantools requirements
import numpy as np
import pandas as pd
import networkx as nx
# extra libs for this notebook (visualise graph)
import matplotlib.pyplot as plt
%matplotlib inline
!pip show obsidiantools
VAULT_DIR = Path(os.getcwd()) / 'vault-stub'
VAULT_DIR.exists()
import obsidiantools.api as otools # api shorthand
The Vault class is the object you need for exploring your vault. Set up the object with the path to your directory.
vault = otools.Vault(VAULT_DIR).connect().gather()
Attributes that says whether the main functions are called:
print(f"Connected?: {vault.is_connected}")
print(f"Gathered?: {vault.is_gathered}")
Attribute that stores the location of your vault
vault.dirpath
vault.file_index
If you want to filter on subdirectories, you can do so like this:
(otools.Vault(VAULT_DIR, include_subdirs=['lipsum'], include_root=False)
.file_index)
In graph analysis, nodes are isolated if they do not connect to other nodes in a graph. Each NOTE in your vault in graph terminology is a node.
In the Obsidian world, what does it mean for notes to be isolated?
Isolated notes have no backlinks AND no wikilinks.
vault.isolated_notes
In the Obsidian community these notes are often called 'orphan notes'. This interface is sticking to graph analysis terminology; NetworkX calls the graph nodes 'isolates'.
When you create wikilinks in your vault notes, you can create connections to notes that you haven't created yet. This means that these new notes have backlinks and are displayed in your vault graph, but they don't exist as markdown files.
In this interface these are called nonexistent notes.
vault.nonexistent_notes
The get_note_metadata method gives a summary of your vault's notes.
You can see, for example:
n_backlinks)n_wikilinks)n_embedded_files)modified_time)Note: created time is available across all operating systems so that is not included.
df = vault.get_note_metadata()
df.info()
Sort these notes by number of backlinks (descending order).
df.sort_values('n_backlinks', ascending=False)
We can see that Bacchus has the most backlinks. It's actually a nonexistent note.
vault.get_backlinks('Bacchus')
vault.get_backlink_counts('Bacchus')
You can see all the backlinks in the backlinks_index.
vault.backlinks_index
Similar functionality exists in the API for wikilinks (e.g. wikilinks_index, get_wikilinks)
What are the embedded files in Sussudio?
vault.get_embedded_files('Sussudio')
By default the embedded files are not shown in the Obsidian graph, but there is an option to show them in the graph of a vault. Currently that capability is not supported in obsidiantools; only the default behaviour is supported.
Load the front matter for Sussudio parsed as a dict:
vault.get_front_matter('Sussudio')
In Sussudio note, the tag #y1982 appears twice. The order of appearance in their output from get_tags() is based on their order in the note content:
vault.get_tags('Sussudio')
The Obsidian app should be where you explore your vault visually, for all the interactive benefits!
If you want to do network analysis of your vault, or else focus on a subgraph, then you can do analysis through the NetworkX graph object: vault.graph
color_cat_map = {False: '#D3D3D3', True: '#826ED9'}
color_vals = (df['note_exists']
.map(color_cat_map)
.values)
The notes in the graph below are purple if they have a markdown file.
fig, ax = plt.subplots(figsize=(13,7))
nx.draw(vault.graph, node_color=color_vals, with_labels=True, ax=ax)
ax.set_title('Vault graph')
plt.show()
Where obsidiantools has the potential to be really powerful in your Obsidian workflows is its linkage with the sophisticated graph analysis capabilities of NetworkX.
There are many algorithms that you can use to analyse the centrality of nodes in a graph in NetworkX.
Let's look at the PageRank of notes in the vault. Google has used PageRank to rank the importance of search engine results.
As outlined by Google:
The underlying assumption is that more important websites are likely to receive more links from other websites
In the Obsidian realm, the notes that would be ranked highest by PageRank are the 'notes likely to receive more links from other notes', i.e. the notes that have backlinks from a broad range of notes.
Let's see this in action.
(pd.Series(nx.pagerank(vault.graph), name='pagerank')
.sort_values(ascending=False))
Caelum has the highest rank in the main graph. It has 3 backlinks from 3 notes.Isolated note (of course!)Vulnera uberaAlimentaSussudioBacchus has the most backlinks (5), but doesn't rank highest! Why might that be? Well, the quality of those backlinks are questionable. There are 4 backlinks to the note from Alimenta, which has 0 backlinks. See further analysis below on what those backlinks look like.This is a peek at the plaintext of the files, which are only accessible after the GATHER function is called. All the text is stored in the text_index of the vault.
last_lines_alimenta = (vault.get_text('Alimenta')
.splitlines()[-5:])
for l in last_lines_alimenta:
print(l)
Here we can see that there are a few repetitive wikilinks to Bacchus at the end of the file. As it happens, all the other notes in this vault only link to another note once. This is where the quality of backlinks matter to PageRank: notes don't rank high if they pile up backlinks from one note.