Recipe generator¶

In this notebook we use TextBlob to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!

Inspired by Australian Plain Cookery by a Practical Cook, 1882.

In [ ]:

import requests
from textblob import TextBlob
import re
import random
import pandas as pd
from IPython.display import display, HTML
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

In [2]:

# The Cloudstor URL links to the repository of OCRd text from Trove digitised books
CLOUDSTOR_URL = 'https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL'
# File name of the cookery book
text_file = 'australian-plain-cookery-by-a-practical-cook-nla.obj-579917051.txt'

First we procure a recipe book.

In [3]:

# Download the text of the book
response = requests.get(f'{CLOUDSTOR_URL}/download?files={text_file}')

Then we slice and dice the words to create a new TextBlob.

In [4]:

# Create a TextBlob using the text
blob = TextBlob(response.text)

Carefully we remove the nouns and the verbs, discarding any that are spoiled.

In [5]:

# Get the verbs filtering out short words and those including non-alpha characters.
# 'VBD' is the part of speech tag for a past tense verb
verbs = [w.title() for w, t in blob.tags if t == 'VBD' and len(w) > 3 and w.isalpha()]

In [6]:

# Get the nouns filtering out short words and those including non-alpha characters.
# NNP is the POS tag for proper nouns
nouns = [w.title() for w, t in blob.tags if t.startswith('NNP') and len(w) > 3 and w.isalpha()]

Now it is necessary to prepare the sentences. First extract them from the blob. Discard any that seem ill-formed.

In [7]:

# Get the sentences from the blob
# Uses a regexp to exclude those that include anything other than standard letters, numbers, and punctuation.
sentences = [str(s).replace('\n', ' ') for s in blob.sentences if re.match(r'^[a-zA-Z\s\-,\.;0-9\'&\(\):]*$', str(s))]

The sentences now need to be divided, to separate out the titles, which are recognised by their case.

In [8]:

# Titles in this cookbook are in uppercase, so we can separate them out from the rest of the sentences.
titles = [s for s in sentences if s.strip('.').isupper()]
sentences = [s for s in sentences if not s.strip('.').isupper()]

Now we are ready to start cooking!

In [9]:

def recipe_maker(num=5):
    html = ''
    # Get a random title
    title = random.choice(titles)
    html = f'<h4>{title}</h4>'
    html += '<h5>Ingredients:</h5>'
    html += '<ol>'
    for n in range(1, num + 1):
        # Make a random selection from the nouns & verbs
        html += f'<li>{random.choice(verbs)} {random.choice(nouns)}</li>'
    html += '</ol>'
    html += '<h5>Method:</h5>'
    # Get random sentences and combine
    html += f'<p>{" ".join(random.sample(sentences, num))}</p>'
    display(HTML(html))

In [11]:

recipe_maker(6)

BREAKFAST ROLLS, PLAIN.

Ingredients:

Beat Arrowroot
Added Wipe
Buttered Boil
Dipped Made
Stew Melt
Peeled Plain

Method:

In carving a large fish, as here engraved, cut thin slices, as from A to B, and help with it pieces of the belly, in the direction marked from C to D ; the best flavoured is the upper or thick part. Take two pounds of fat bacon, and a pound and a half of beef suet. When you put in the vegetables, cover all closely, and do not use for at least six weeks. Lay in cold water, and when it boils simmer for eight or ten minutes. Add one onion, three sage leaves, some whole pepper, and a little salt in three pints of water. Edge and cover with short crust, and ornament the edges.

What's next?¶

There's a full list of the POS (Part of Speech) tags here if you'd like to play with different combinations.

Perhaps we could add some more cookbooks? Let's load details of all the digitised books in Trove that include the word 'cookery' in the title.

In [11]:

df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/trove-books/master/trove_digitised_books_with_ocr.csv')

In [12]:

df.loc[(df['title'].str.contains('cookery')) & (df['text_downloaded'] == True)]

Out[12]:

	title	url	contributors	date	fulltext_url	trove_id	language	rights	pages	form	volume	parent	children	text_downloaded	text_file
1888	The Kingswood cookery book / by H. F. Wicken	https://trove.nla.gov.au/work/12721516	Wicken, H	1885-1950	https://nla.gov.au/nla.obj-43987239	nla.obj-43987239	English	Out of Copyright\|http://rightsstatements.org/v...	278	Book	NaN	NaN	NaN	True	the-kingswood-cookery-book-by-h-f-wicken-nla.o...
2582	Electric cookery book : being an indispensable...	https://trove.nla.gov.au/work/16383834	State Electricity Commission of Victoria	1940-1949	http://nla.gov.au/nla.obj-52836472	nla.obj-52836472	English	No known copyright restrictions\|http://rightss...	73	Book	NaN	NaN	NaN	True	electric-cookery-book-being-an-indispensable-h...
2654	The English and Australian cookery book : cook...	https://trove.nla.gov.au/work/16551115	Abbott, Edward, 1801-1869	1864-2014	https://nla.gov.au/nla.obj-9562000	nla.obj-9562000	English	Out of Copyright\|http://rightsstatements.org/v...	356	Book	NaN	NaN	NaN	True	the-english-and-australian-cookery-book-cooker...
4431	Australian plain cookery / by a Practical Cook...	https://trove.nla.gov.au/work/18493439	Old housekeeper	1882-1897	http://nla.gov.au/nla.obj-579917051	nla.obj-579917051	NaN	NaN	148	Book	NaN	NaN	NaN	True	australian-plain-cookery-by-a-practical-cook-r...
7688	The Armidale Red Cross cookery book of tested ...	https://trove.nla.gov.au/work/20631441	Australian Red Cross Society. Armidale Branch	1920	https://nla.gov.au/nla.obj-52792201	nla.obj-52792201	English	Out of Copyright\|http://rightsstatements.org/v...	82	Book	NaN	NaN	NaN	True	the-armidale-red-cross-cookery-book-of-tested-...
8173	The Kandy Koola cookery book and housewife's c...	https://trove.nla.gov.au/work/21067450	Kandy Koola Tea	1898	https://nla.gov.au/nla.obj-2409723409	nla.obj-2409723409	English	Out of Copyright\|http://rightsstatements.org/v...	76	Book	NaN	NaN	NaN	True	the-kandy-koola-cookery-book-and-housewife-s-c...
8491	The Hawkesbury and Shoalhaven calendar, cultur...	https://trove.nla.gov.au/work/21309432	Woodhill & Co	1905	http://nla.gov.au/nla.obj-28658844	nla.obj-28658844	English	Out of Copyright\|http://rightsstatements.org/v...	200	Book	NaN	NaN	NaN	True	the-hawkesbury-and-shoalhaven-calendar-cultura...
9457	Hebrew cookery / by an Australian	https://trove.nla.gov.au/work/22242397	Australian	1867	http://nla.gov.au/nla.obj-52864954	nla.obj-52864954	English	No known copyright restrictions\|http://rightss...	25	Book	NaN	NaN	NaN	True	hebrew-cookery-by-an-australian-nla.obj-528649...
9472	Recipes given by Mrs. Wicken at cookery class,...	https://trove.nla.gov.au/work/22249810	Wicken, H	1888	http://nla.gov.au/nla.obj-533356312	nla.obj-533356312	English	Out of Copyright\|http://rightsstatements.org/v...	16	Book	NaN	NaN	NaN	True	recipes-given-by-mrs-wicken-at-cookery-class-w...
13145	Southland Red Cross cookery book, 1916	https://trove.nla.gov.au/work/237279068	NaN	1916	https://nla.gov.au/nla.obj-49498371	nla.obj-49498371	English	Out of Copyright\|http://rightsstatements.org/v...	187	Book	NaN	NaN	NaN	True	southland-red-cross-cookery-book-1916-nla.obj-...
19740	Barossa cookery book : 400 tried recipes	https://trove.nla.gov.au/work/237367083	NaN	1917	https://nla.gov.au/nla.obj-497806529	nla.obj-497806529	English	No known copyright restrictions\|http://rightss...	60	Book	NaN	NaN	NaN	True	barossa-cookery-book-400-tried-recipes-nla.obj...
19823	Australian plain cookery / by a practical cook	https://trove.nla.gov.au/work/237367586	NaN	1882	https://nla.gov.au/nla.obj-579917051	nla.obj-579917051	English	Out of Copyright\|http://rightsstatements.org/v...	148	Book	NaN	NaN	NaN	True	australian-plain-cookery-by-a-practical-cook-n...
22262	The Australian women's weekly cookery book : p...	https://trove.nla.gov.au/work/237539542	NaN	1948	https://nla.gov.au/nla.obj-2122602128	nla.obj-2122602128	English	No known copyright restrictions\|http://rightss...	68	Book	NaN	NaN	NaN	True	the-australian-women-s-weekly-cookery-book-pri...
29983	The Banner cookery book : over 300 tested recipes	https://trove.nla.gov.au/work/24494136	Dimboola Bush Nursing Hospital	1953	https://nla.gov.au/nla.obj-43445961	nla.obj-43445961	English	Out of Copyright\|http://rightsstatements.org/v...	48	Book	NaN	NaN	NaN	True	the-banner-cookery-book-over-300-tested-recipe...
30410	The War chest cookery book	https://trove.nla.gov.au/work/26653596	Citizens' War Chest Fund (N.S.W.)	1917	https://nla.gov.au/nla.obj-37545603	nla.obj-37545603	English	Out of Copyright\|http://rightsstatements.org/v...	156	Book	NaN	NaN	NaN	True	the-war-chest-cookery-book-nla.obj-37545603.txt
30637	Southland Red Cross cookery book, 1916	https://trove.nla.gov.au/work/26863907	NaN	1916	http://nla.gov.au/nla.obj-49498371	nla.obj-49498371	NaN	NaN	187	Book	NaN	NaN	NaN	True	southland-red-cross-cookery-book-1916-nla.obj-...
32264	Flinders Island : souvenir : cookery book	https://trove.nla.gov.au/work/35649557	Country Women's Association in Tasmania. Flind...	1946	https://nla.gov.au/nla.obj-2531663107	nla.obj-2531663107	English	No known copyright restrictions\|http://rightss...	84	Book	NaN	NaN	NaN	True	flinders-island-souvenir-cookery-book-nla.obj-...
32955	Barossa cookery book : 400 tried recipes	https://trove.nla.gov.au/work/6619781	Tanunda Australia Day Celebrations Committee (...	1917	http://nla.gov.au/nla.obj-497806529	nla.obj-497806529	NaN	NaN	60	Book	NaN	NaN	NaN	True	barossa-cookery-book-400-tried-recipes-nla.obj...
32963	"Caroona" cookery book : over 240 favourite re...	https://trove.nla.gov.au/work/6663148	North Coast Methodist Homes for the Aged. Lism...	1900	http://nla.gov.au/nla.obj-52837739	nla.obj-52837739	English	No known copyright restrictions\|http://rightss...	54	Book	NaN	NaN	NaN	True	caroona-cookery-book-over-240-favourite-recipe...

To use a different one of these as the source for our recipe generator, just copy the index value, and then get the name of the text_file. Like this:

In [13]:

df.loc[8173]['text_file']

Out[13]:

'the-kandy-koola-cookery-book-and-housewife-s-compa-nla.obj-2409723409.txt'

Copy and paste the file name into the text_file value at the top of this notebook, and then re-run the cells.

How might we combine ingredients from all of these cook books?

Created by Tim Sherratt for the GLAM Workbench.