%load_ext autoreload
%autoreload 2
from tf.app import use
We do not only load the main corpus data, but also the additional sim (similarity) feature that is in a module.
For the very last version, use hot
.
For the latest release, use latest
.
If you have cloned the repos (TF app and data), use clone
.
If you do not want/need to upgrade, leave out the checkout specifiers.
A = use(
"annotation/banks:hot",
mod="annotation/banks/sim/tf",
hoist=globals(),
)
rate limit is 5000 requests per hour, with 4933 left for this hour connecting to online GitHub repo annotation/banks ... connected
This is Text-Fabric 9.2.0 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 11 features found and 0 ignored
We print all similar pairs of words that are at least 50% similar but not 100%.
query = """
word
<sim>50> word
"""
results = A.search(query)
0.01s 170 results
A.table(results, end=10, withPassage="1 2")
n | word | word |
---|---|---|
1 | Consider Phlebas 1:1 Everything | Consider Phlebas 1:1 everything |
2 | Consider Phlebas 1:1 Everything | Consider Phlebas 1:1 everything |
3 | Consider Phlebas 1:1 us, | Consider Phlebas 1:1 us, |
4 | Consider Phlebas 1:1 everything | Consider Phlebas 1:1 Everything |
5 | Consider Phlebas 1:1 everything | Consider Phlebas 1:1 everything |
6 | Consider Phlebas 1:1 us, | Consider Phlebas 1:1 us, |
7 | Consider Phlebas 1:1 everything | Consider Phlebas 1:1 Everything |
8 | Consider Phlebas 1:1 everything | Consider Phlebas 1:1 everything |
9 | Consider Phlebas 1:1 we | Consider Phlebas 1:2 we |
10 | Consider Phlebas 1:1 we | Consider Phlebas 1:2 we |
A.show(results, end=5)
result 1
result 2
result 3
result 4
result 5
We sort each pair. We keep track of pairs we have seen in order to prevent printing duplicate pairs.
seen = set()
for (w1, w2) in results:
if (w2, 100) in E.sim.b(w1):
continue
letters1 = F.letters.v(w1)
letters2 = F.letters.v(w2)
pair = tuple(sorted((letters1, letters2)))
if pair in seen:
continue
seen.add(pair)
print(" ~ ".join(pair))
know ~ own harness ~ patterns nothing ~ things that ~ that’s the ~ those bottom ~ most life ~ line societies ~ those not ~ to make ~ take elegant ~ languages mattered ~ terms left ~ life humans ~ mountains care ~ romance studying ~ things impossible ~ problems