For text exploration, it might make sense to visualize texts as data points and interact with them.
import stackview
import pandas as pd
Here we reuse a list of sentences and a UMAP produced from their text-embeddings. The sentences are taken from Haase et al. 2022 which is licensed CC-BY 4.0.
df = pd.read_csv("data/sentence_embeddings.csv")
df.head()
Unnamed: 0 | sentence | UMAP0 | UMAP1 | |
---|---|---|---|---|
0 | 0 | A Hitchhiker’s Guide through the Bio-image Ana... | -2.863276 | 8.680281 |
1 | 1 | Modern research in the life sciences is unthin... | -3.731295 | 7.875060 |
2 | 2 | In the past decade, we observed a dramatic inc... | -4.748690 | 6.128065 |
3 | 3 | As it is increasingly difficult to keep track ... | -4.183692 | 6.847530 |
4 | 4 | We give guidance on which aspects to consider ... | -4.912832 | 6.691180 |
A word cloud plot is an interactive plot where you can select texts and from your selection, a wordcloud is generated.
stackview.wordcloudplot(df, column_text="sentence", column_x="UMAP0", column_y="UMAP1")
VBox(children=(HBox(children=(HBox(children=(VBox(children=(VBox(children=(HBox(children=(VBox(children=(Image…