Start with convert


The Banks example corpus as app

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from tf.app import use

We do not only load the main corpus data, but also the additional sim (similarity) feature that is in a module.

For the very last version, use hot.

For the latest release, use latest.

If you have cloned the repos (TF app and data), use clone.

If you do not want/need to upgrade, leave out the checkout specifiers.

In [4]:
A = use(
    "annotation/banks:hot",
    mod="annotation/banks/sim/tf",
    hoist=globals(),
)
rate limit is 5000 requests per hour, with 4933 left for this hour
	connecting to online GitHub repo annotation/banks ... connected
TF-app: ~/text-fabric-data/annotation/banks/app
data: ~/text-fabric-data/annotation/banks/tf/0.2
data: ~/text-fabric-data/annotation/banks/sim/tf/0.2
This is Text-Fabric 9.2.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

11 features found and 0 ignored
Text-Fabric: Text-Fabric API 9.2.0, annotation/banks/app v3, Search Reference
Data: BANKS, Character table, Feature docs
Features:
annotation/banks/sim/tf
sim
int
similarity between words, as a percentage of the common material wrt the combined material
converters:
Dirk Roorda
dateWritten:
2020-06-10T19:40:37Z
name:
Banks (similar words)
sourceUrl:
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/text-fabric/use.ipynb
version:
0.2
writtenBy:
Text-Fabric
Two quotes from Consider Phlebas by Iain M. Banks
str
the author of a book
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
gap
int
1 for words that occur between [ ], which are inserted by the editor
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
str
the letters of a word
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
int
number of chapter, or sentence in chapter, or line in sentence
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
str
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
str
the punctuation after a word
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
remark:
a bit more info is needed
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
str
the last character of a line
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
str
the title of a book
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
none
compiler:
Dirk Roorda
dateWritten:
2020-02-13T13:37:47Z
name:
Culture quotes from Iain Banks
purpose:
exposition
source:
Good Reads
status:
with for similarities in a separate module
url:
https://www.goodreads.com/work/quotes/14366-consider-phlebas
version:
0.2
writtenBy:
Text-Fabric
Text-Fabric API: names N F E L T S C TF directly usable

Use the similarity edge feature

We print all similar pairs of words that are at least 50% similar but not 100%.

In [5]:
query = """
word
<sim>50> word
"""
In [6]:
results = A.search(query)
  0.01s 170 results
In [7]:
A.table(results, end=10, withPassage="1 2")
nwordword
1Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
2Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
3Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
4Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
5Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
6Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
7Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
8Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
9Consider Phlebas 1:1  we Consider Phlebas 1:2  we
10Consider Phlebas 1:1  we Consider Phlebas 1:2  we
In [8]:
A.show(results, end=5)

result 1

Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.

result 2

Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.

result 3

Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.

result 4

Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.

result 5

Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.

We sort each pair. We keep track of pairs we have seen in order to prevent printing duplicate pairs.

In [9]:
seen = set()
for (w1, w2) in results:
    if (w2, 100) in E.sim.b(w1):
        continue
    letters1 = F.letters.v(w1)
    letters2 = F.letters.v(w2)
    pair = tuple(sorted((letters1, letters2)))
    if pair in seen:
        continue
    seen.add(pair)
    print(" ~ ".join(pair))
know ~ own
harness ~ patterns
nothing ~ things
that ~ that’s
the ~ those
bottom ~ most
life ~ line
societies ~ those
not ~ to
make ~ take
elegant ~ languages
mattered ~ terms
left ~ life
humans ~ mountains
care ~ romance
studying ~ things
impossible ~ problems

All chapters:


CC-BY Dirk Roorda