Professor Serge Belongie invented a word game:
I'm calling the game One Letter Off, because that's one of the names that Bard suggested when I asked. I'll also introduce a variant of the game where any letter in the word can be dropped, not just the last one.
As an example, pick board, drop the last letter to get the phrase boar board and write the clue pig plank. In the variant, you could drop the letter o to get board bard and write the clue pine poet. (Note the pair of words can be in either order.)
I thought it would be interesting to write a program to generate interesting word pairs and create clues. My plan is:
Given a dictionary, it is easy to find all pairs of words that are one letter off. The tricky part is to decide which are the "good" ones. Clearly, pairing the word race with races or raced or racer is bad, because they are all just forms of the same root word. The game is interesting when two words are very different in meaning, even though they are only one letter apart. Pairing quaker with quake is ok, because a Quaker is not just someone who quakes. So I can't rely on simple heuristic rules like "don't drop a final -r."
I can, however, associate each word with a word vector (also called word embedding) and define the goodness of a pair as the distance between the pairs. Multiple research teams have published open-source dictionaries that map words to word vectors. Two words that are similar in meaning or usage have similar word vectors (and so the distance between them is small), but words with distinct usage should have dissimilar word vectors (and a larger distance between them).
I downloaded a file of word vectors from the NLPL Word Vectors Repository and truncated it to consider only the 30,000 most common words, because I didn't want to be giving clues for words that the guesser is unlikely to know. Each line in the word vector file has a word, followed by its part of speech (which we will ignore), followed by a vector of 300 numbers. For example:
say_VERB -0.008861 0.097097 0.100236 0.070044 -0.079279 0.000923 ...
I'll read the file into a dict of {word: vector}
called vectors
, but I will eliminate words that are too short, or have a non-alphabetic or uppercase character. Also, if a word appears twice with two different parts of speech, I'll use the first one (because the file is sorted with more frequent words first).
from typing import *
import numpy as np
def read_vectors(lines: Iterable[str]) -> Dict[str, np.array]:
"""Read lines of text into a dict of {word: vector} pairs."""
vectors = {}
for line in lines:
entry, *numbers = line.split()
word, POS = entry.split('_') # Ignore the part of speech
if word not in vectors and len(word) >= 4 and word.isalpha() and word.islower():
vectors[word] = np.array([float(x) for x in numbers])
return vectors
vectors = read_vectors(open('model30k.txt'))
len(vectors)
18331
There are 18,331 distinct entries, out of the 30,000 lines in the original file.
The Euclidean distance between two word vectors is the norm of their difference:
def distance(word1, word2, vectors) -> float:
"""Distance between vectors for the two words."""
return np.linalg.norm(vectors[word1] - vectors[word2])
distance('quaker', 'quake', vectors)
1.3230686663760125
distance('smoker', 'smoke', vectors)
0.7579947920361987
The next step is to form word pairs by considering each word, looking at all ways to drop a letter from each word, and checking if those are words. Then we'll sort them so the "best" pairs come first.
def sorted_pairs(vectors, drop_fn: callable) -> List[Tuple[str, str]]:
"""List of (word1, word2) pairs, biggest word-vector distance first."""
pairs = [(w1, w2) for w1 in vectors for w2 in drop_fn(w1, vectors)]
pairs.sort(key=lambda pair: distance(*pair, vectors), reverse=True)
return pairs
Here are the two functions to drop letters (either just the last letter, or any letter) and the function words
, which checks which of the resulting strings are in fact words.
def drop_last_letter(word, vectors) -> Set[str]:
"""All ways to drop the last letter from word to form a word in `vectors`."""
return words({word[:-1]}, vectors)
def drop_any_letter(word, vectors) -> Set[str]:
"""All ways to drop one letter from word to form a word in `vectors`."""
return words({word[:i] + word[i + 1:] for i in range(len(word))}, vectors)
def words(candidates, vectors) -> Set[str]:
"""All candidate strings that are words in the `vectors` dict."""
return {w for w in candidates if w in vectors}
drop_last_letter('clamp', vectors)
{'clam'}
drop_any_letter('clamp', vectors)
{'camp', 'clam', 'clap', 'lamp'}
We're ready to generate a sorted list of word pairs:
pairs = sorted_pairs(vectors, drop_last_letter)
len(pairs)
1129
There are 1,129 pairs, but I'll just look at the first 50:
pairs[:50]
[('seedy', 'seed'), ('depth', 'dept'), ('hindu', 'hind'), ('sloth', 'slot'), ('plumb', 'plum'), ('tense', 'tens'), ('reverb', 'rever'), ('irish', 'iris'), ('siren', 'sire'), ('trusty', 'trust'), ('meter', 'mete'), ('pleat', 'plea'), ('sinew', 'sine'), ('chancel', 'chance'), ('heath', 'heat'), ('aspiring', 'aspirin'), ('forth', 'fort'), ('combo', 'comb'), ('drama', 'dram'), ('paste', 'past'), ('outwith', 'outwit'), ('filly', 'fill'), ('board', 'boar'), ('livery', 'liver'), ('forcep', 'force'), ('stocky', 'stock'), ('corporal', 'corpora'), ('photon', 'photo'), ('forte', 'fort'), ('rabbit', 'rabbi'), ('median', 'media'), ('cello', 'cell'), ('chart', 'char'), ('spark', 'spar'), ('liver', 'live'), ('caster', 'caste'), ('irony', 'iron'), ('insider', 'inside'), ('heron', 'hero'), ('macho', 'mach'), ('heroine', 'heroin'), ('polyp', 'poly'), ('gravely', 'gravel'), ('primer', 'prime'), ('honey', 'hone'), ('quaker', 'quake'), ('tablet', 'table'), ('grant', 'gran'), ('prime', 'prim'), ('valet', 'vale')]
Now I'll consider the variant where we can drop any letter:
sorted_pairs(vectors, drop_any_letter)[:50]
[('mitre', 'mite'), ('seedy', 'seed'), ('posit', 'post'), ('resign', 'resin'), ('insect', 'inset'), ('score', 'core'), ('parse', 'arse'), ('depth', 'dept'), ('convert', 'covert'), ('thank', 'tank'), ('hindu', 'hind'), ('orally', 'rally'), ('stigma', 'sigma'), ('naive', 'nave'), ('harmful', 'armful'), ('sloth', 'slot'), ('canyon', 'canon'), ('bassist', 'assist'), ('preach', 'peach'), ('launder', 'lander'), ('congenital', 'congenial'), ('supper', 'upper'), ('duress', 'dress'), ('usher', 'user'), ('crate', 'rate'), ('platitude', 'latitude'), ('ironic', 'ionic'), ('sever', 'seer'), ('quilt', 'quit'), ('timer', 'tier'), ('yeast', 'east'), ('sturdy', 'study'), ('tamper', 'taper'), ('crater', 'cater'), ('frame', 'fame'), ('broach', 'roach'), ('stage', 'sage'), ('leaver', 'lever'), ('restate', 'estate'), ('sluice', 'slice'), ('blinding', 'binding'), ('gamble', 'gable'), ('strait', 'trait'), ('pledge', 'ledge'), ('clean', 'clan'), ('shite', 'site'), ('sched', 'shed'), ('holist', 'hoist'), ('plumb', 'plum'), ('waive', 'wave')]
Both lists provide pretty promising pairs!
But did I really accomplish the goal of eliminating bad pairs? I think so! The pairs at the end of the list are exactly the kind of thing I wanted to get rid of:
pairs[-20:]
[('northwards', 'northward'), ('decentralised', 'decentralise'), ('insured', 'insure'), ('thanks', 'thank'), ('surpluse', 'surplus'), ('eastwards', 'eastward'), ('devoted', 'devote'), ('opposed', 'oppose'), ('alleged', 'allege'), ('waken', 'wake'), ('diall', 'dial'), ('increased', 'increase'), ('randomised', 'randomise'), ('brewery', 'brewer'), ('involved', 'involve'), ('vaginal', 'vagina'), ('larval', 'larva'), ('fabliaux', 'fabliau'), ('towards', 'toward'), ('mucosal', 'mucosa')]
I can take some of these suggested pairs, and make up clues on my own:
pair | drop | clue |
---|---|---|
alley ally | any | Bowling teammate |
binding blinding | any | Mandatory punishment for the cyclops Polyphemus |
class lass | any | Sophisticated young lady |
finnish finish | any | Scandinavian ending |
harmful armful | any | Dangerous bundle |
latitude platitude | any | Parallel cliche |
plum plumb | last | Most excellent toilet installation job |
I wrote some code to submit pairs of words to a large language model (LLM) and get back a clue. The basic approach looks like this:
from requests import post
def get_a_clue(pair, url, api_key) -> str:
"""Ask a LLM to generate a clue for the pair of words."""
phrase = ' '.join(pair)
return ask(f'Write a clever crossword puzzle clue for the phrase "{phrase}"', url, api_key)
def ask(query, url, api_key) -> str:
"""Ask PaLM model to respond to the query."""
response = post(url, headers={"Authorization": f"Bearer {api_key}"}, json={"query": query})
response.raise_for_status()
return response.json()["queryResult"]["fulfillmentText"]
However, I'm not going to run the above code in this notebook, for three reasons:
Instead I'll show in the table below some of the previously-generated responses, edited for brevity, and selecting mostly good responses, with a few bad ones to illustrate specific points. The model gets a lot of things right:
The model also gets some things wrong:
pair | drop | clue | explanation |
---|---|---|---|
aspiring aspirin | any | Wannabe pain reliever | The clue suggests that an "aspiring aspirin" is a something that wants to be a pain reliever, but is not yet one. |
blinding binding | any | A dazzling cover-up | The clue suggests that a "blinding binding" is a very bright or dazzling cover-up, such as the cover of a book with a shiny metallic finish or a book with a cover made of sequins. |
board boar | last | A pig that's a member of the board of directors | This clue is clever because it is a pun on the words "board" and "boar". It also plays on the stereotype of pigs being greedy and wanting to be in positions of power. |
cello cell | last | Musical prison? | This clue is clever because it is a pun on the word "cell." A "cell" can refer to a room in a prison, or it can refer to the part of a cello that contains the strings. |
covert convert | any | Agent of change | This clue is clever because it is a play on words. "Agent of change" can refer to someone who is trying to make a difference in the world, but it can also refer to a spy or undercover agent. |
crater cater | any | A space rock that serves food | The word "crater" refers to a large bowl-shaped depression in the ground, often formed by the impact of a meteorite or asteroid. |
finnish finish | any | How a Suomi might end a race | The word "Suomi" is the Finnish word for Finland. The clue suggests how a Finnish person might end a race, because it is a play on the words "Finnish" and "finish." |
frame fame | any | What kind of fame is displayed on a wall? | This clue is clever because it is a play on words. The word "fame" can mean both "celebrity" and "a picture in a frame." |
flying fling | any | Airy affair | This clue is a bit of a pun, as it plays on the two meanings of the word "fling": a brief and passionate romantic relationship, and the act of throwing something. |
ledge pledge | any | A promise to hang on |
The clue suggests that a "pledge ledge" is a promise to hang on to something, such as a promise to keep a secret or to repay a debt. |
pleat plea | last | Fashionable plea for more folds? | This clue is clever because it is a pun on the two meanings of the word "pleat": (1) A fold in a piece of fabric (2) A request or appeal |
plum plumb | last | Perfectly perpendicular | The clue suggests that a "plum plumb" is a line that is perfectly perpendicular to the ground, like a plumb line. |
quaker quake | any | A religious tremor | The clue suggests that a "quaker quake" is a type of religious tremor, which could be interpreted in a number of ways. For example, it could refer to a powerful spiritual experience, or to a time of social and political upheaval. |
quilt quit | any | Sewing machine strike | The clue suggests that a "quilt quit" is a strike by sewing machines, which would prevent people from making quilts. |
rabbit rabbi | last | What kind of religious leader is all ears | This clue is clever because it plays on the double meaning of the word "ears." On the one hand, rabbits are known for having large ears. On the other hand, "all ears" is an idiom that means to be very attentive and eager to listen. |
rally orally | any | Shout for support | The answer is "rally orally", which is a phrase that could be used to describe someone shouting for support. |
seedy seed | last | The kernel of a shady deal | The clue suggests that a "seedy seed" is the beginning of something shady or immoral. This could refer to a criminal activity, a broken promise, or even a relationship that is doomed. |
stage sage | any | wise actor | The clue suggests that a "stage sage" is a wise actor who is at the top of their game. |
stocky stock | last | Beefy broth | This clue is a play on the two meanings of the word "stock": (1) A type of broth made from meat or bones. (2) A large quantity of something, especially goods or money. |
timer tier | any | A level of timekeeping | The clue suggests that a "timer tier" is a level of timekeeping, such as the different tiers of accuracy that are available for timers. For example, a timer used for cooking may have a lower tier of accuracy than a timer used for scientific research. |
yeast east | any | Where the dough rises | The clue suggests that "yeast east" is the direction in which the sun rises because yeast causes dough to rise. |
What do you think? Are these clues any good? Can you do better, either by manually writing your own, or by writing a better program (e.g. by finding better word pairs, by constructing a better prompt, or by using a different LLM that performs better on this task)?