In [1]:
import os
import pandas as pd

from bbw import bbw
from IPython.core.display import display, HTML
In [2]:
data = [
    ['col0', 'col1', 'col2', 'col3'],
    ['Mannheim','Rhine', '97', 'Baden-Württemberg'],
    ['Edinburgh','River Forth', '47', 'City of Edinburgh']
]
df = pd.DataFrame(data)
df
Out[2]:
0 1 2 3
0 col0 col1 col2 col3
1 Mannheim Rhine 97 Baden-Württemberg
2 Edinburgh River Forth 47 City of Edinburgh

Simple workflow for semantic annotations

In [3]:
[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(df)

Up to here the examples worked without SearX, because that is not installed locally along this Jupyter notebook.

Metalookup via SearX

However, we can use a public instance https://searx.space/# for trying it out (but carefully as this only works for a handful examples at once).

In [5]:
# For example
os.environ["BBW_SEARX_URL"] = "https://searx.monicz.pl/"
os.environ["BBW_SEARX_URL"]
Out[5]:
'https://searx.monicz.pl/'
In [6]:
# Use searx to get the bestname for a string with mistakes
[bbw.get_searx_bestname('Monnhem'), bbw.get_searx_bestname('dingbur')]
Out[6]:
[['Mannheim'], ['Edinburgh', 'edinburgh', 'Dingbur']]
In [7]:
df[0][1] = "Monnheim"
df[0][2] = "dingbur"
df
Out[7]:
0 1 2 3
0 col0 col1 col2 col3
1 Monnheim Rhine 97 Baden-Württemberg
2 dingbur River Forth 47 City of Edinburgh
In [8]:
[web_table, url_table, label_table, cpa, cea, cta] = bbw.annotate(df)
display(HTML(web_table.to_html(escape=False)))

GUI

The GUI runs on a special port 8501 which you can access from the current URL by replacing the notebooks/bbw.ipynb with proxy/8501/.

In [ ]: