Recipe generator

In this notebook we use TextBlob to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!

Inspired by Australian Plain Cookery by a Practical Cook, 1882.

In [ ]:
import requests
from textblob import TextBlob
import re
import random
import pandas as pd
from IPython.display import display, HTML
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
In [2]:
# The Cloudstor URL links to the repository of OCRd text from Trove digitised books
CLOUDSTOR_URL = 'https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL'
# File name of the cookery book
text_file = 'australian-plain-cookery-by-a-practical-cook-nla.obj-579917051.txt'

First we procure a recipe book.

In [3]:
# Download the text of the book
response = requests.get(f'{CLOUDSTOR_URL}/download?files={text_file}')

Then we slice and dice the words to create a new TextBlob.

In [4]:
# Create a TextBlob using the text
blob = TextBlob(response.text)

Carefully we remove the nouns and the verbs, discarding any that are spoiled.

In [5]:
# Get the verbs filtering out short words and those including non-alpha characters.
# 'VBD' is the part of speech tag for a past tense verb
verbs = [w.title() for w, t in blob.tags if t == 'VBD' and len(w) > 3 and w.isalpha()]
In [6]:
# Get the nouns filtering out short words and those including non-alpha characters.
# NNP is the POS tag for proper nouns
nouns = [w.title() for w, t in blob.tags if t.startswith('NNP') and len(w) > 3 and w.isalpha()]

Now it is necessary to prepare the sentences. First extract them from the blob. Discard any that seem ill-formed.

In [7]:
# Get the sentences from the blob
# Uses a regexp to exclude those that include anything other than standard letters, numbers, and punctuation.
sentences = [str(s).replace('\n', ' ') for s in blob.sentences if re.match(r'^[a-zA-Z\s\-,\.;0-9\'&\(\):]*$', str(s))]

The sentences now need to be divided, to separate out the titles, which are recognised by their case.

In [8]:
# Titles in this cookbook are in uppercase, so we can separate them out from the rest of the sentences.
titles = [s for s in sentences if s.strip('.').isupper()]
sentences = [s for s in sentences if not s.strip('.').isupper()]

Now we are ready to start cooking!

In [9]:
def recipe_maker(num=5):
    html = ''
    # Get a random title
    title = random.choice(titles)
    html = f'<h4>{title}</h4>'
    html += '<h5>Ingredients:</h5>'
    html += '<ol>'
    for n in range(1, num + 1):
        # Make a random selection from the nouns & verbs
        html += f'<li>{random.choice(verbs)} {random.choice(nouns)}</li>'
    html += '</ol>'
    html += '<h5>Method:</h5>'
    # Get random sentences and combine
    html += f'<p>{" ".join(random.sample(sentences, num))}</p>'
    display(HTML(html))
In [11]:
recipe_maker(6)

BREAKFAST ROLLS, PLAIN.

Ingredients:
  1. Beat Arrowroot
  2. Added Wipe
  3. Buttered Boil
  4. Dipped Made
  5. Stew Melt
  6. Peeled Plain
Method:

In carving a large fish, as here engraved, cut thin slices, as from A to B, and help with it pieces of the belly, in the direction marked from C to D ; the best flavoured is the upper or thick part. Take two pounds of fat bacon, and a pound and a half of beef suet. When you put in the vegetables, cover all closely, and do not use for at least six weeks. Lay in cold water, and when it boils simmer for eight or ten minutes. Add one onion, three sage leaves, some whole pepper, and a little salt in three pints of water. Edge and cover with short crust, and ornament the edges.

What's next?

There's a full list of the POS (Part of Speech) tags here if you'd like to play with different combinations.

Perhaps we could add some more cookbooks? Let's load details of all the digitised books in Trove that include the word 'cookery' in the title.

In [11]:
df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/trove-books/master/trove_digitised_books_with_ocr.csv')
In [12]:
df.loc[(df['title'].str.contains('cookery')) & (df['text_downloaded'] == True)]
Out[12]:
title url contributors date fulltext_url trove_id language rights pages form volume parent children text_downloaded text_file
1888 The Kingswood cookery book / by H. F. Wicken https://trove.nla.gov.au/work/12721516 Wicken, H 1885-1950 https://nla.gov.au/nla.obj-43987239 nla.obj-43987239 English Out of Copyright|http://rightsstatements.org/v... 278 Book NaN NaN NaN True the-kingswood-cookery-book-by-h-f-wicken-nla.o...
2582 Electric cookery book : being an indispensable... https://trove.nla.gov.au/work/16383834 State Electricity Commission of Victoria 1940-1949 http://nla.gov.au/nla.obj-52836472 nla.obj-52836472 English No known copyright restrictions|http://rightss... 73 Book NaN NaN NaN True electric-cookery-book-being-an-indispensable-h...
2654 The English and Australian cookery book : cook... https://trove.nla.gov.au/work/16551115 Abbott, Edward, 1801-1869 1864-2014 https://nla.gov.au/nla.obj-9562000 nla.obj-9562000 English Out of Copyright|http://rightsstatements.org/v... 356 Book NaN NaN NaN True the-english-and-australian-cookery-book-cooker...
4431 Australian plain cookery / by a Practical Cook... https://trove.nla.gov.au/work/18493439 Old housekeeper 1882-1897 http://nla.gov.au/nla.obj-579917051 nla.obj-579917051 NaN NaN 148 Book NaN NaN NaN True australian-plain-cookery-by-a-practical-cook-r...
7688 The Armidale Red Cross cookery book of tested ... https://trove.nla.gov.au/work/20631441 Australian Red Cross Society. Armidale Branch 1920 https://nla.gov.au/nla.obj-52792201 nla.obj-52792201 English Out of Copyright|http://rightsstatements.org/v... 82 Book NaN NaN NaN True the-armidale-red-cross-cookery-book-of-tested-...
8173 The Kandy Koola cookery book and housewife's c... https://trove.nla.gov.au/work/21067450 Kandy Koola Tea 1898 https://nla.gov.au/nla.obj-2409723409 nla.obj-2409723409 English Out of Copyright|http://rightsstatements.org/v... 76 Book NaN NaN NaN True the-kandy-koola-cookery-book-and-housewife-s-c...
8491 The Hawkesbury and Shoalhaven calendar, cultur... https://trove.nla.gov.au/work/21309432 Woodhill & Co 1905 http://nla.gov.au/nla.obj-28658844 nla.obj-28658844 English Out of Copyright|http://rightsstatements.org/v... 200 Book NaN NaN NaN True the-hawkesbury-and-shoalhaven-calendar-cultura...
9457 Hebrew cookery / by an Australian https://trove.nla.gov.au/work/22242397 Australian 1867 http://nla.gov.au/nla.obj-52864954 nla.obj-52864954 English No known copyright restrictions|http://rightss... 25 Book NaN NaN NaN True hebrew-cookery-by-an-australian-nla.obj-528649...
9472 Recipes given by Mrs. Wicken at cookery class,... https://trove.nla.gov.au/work/22249810 Wicken, H 1888 http://nla.gov.au/nla.obj-533356312 nla.obj-533356312 English Out of Copyright|http://rightsstatements.org/v... 16 Book NaN NaN NaN True recipes-given-by-mrs-wicken-at-cookery-class-w...
13145 Southland Red Cross cookery book, 1916 https://trove.nla.gov.au/work/237279068 NaN 1916 https://nla.gov.au/nla.obj-49498371 nla.obj-49498371 English Out of Copyright|http://rightsstatements.org/v... 187 Book NaN NaN NaN True southland-red-cross-cookery-book-1916-nla.obj-...
19740 Barossa cookery book : 400 tried recipes https://trove.nla.gov.au/work/237367083 NaN 1917 https://nla.gov.au/nla.obj-497806529 nla.obj-497806529 English No known copyright restrictions|http://rightss... 60 Book NaN NaN NaN True barossa-cookery-book-400-tried-recipes-nla.obj...
19823 Australian plain cookery / by a practical cook https://trove.nla.gov.au/work/237367586 NaN 1882 https://nla.gov.au/nla.obj-579917051 nla.obj-579917051 English Out of Copyright|http://rightsstatements.org/v... 148 Book NaN NaN NaN True australian-plain-cookery-by-a-practical-cook-n...
22262 The Australian women's weekly cookery book : p... https://trove.nla.gov.au/work/237539542 NaN 1948 https://nla.gov.au/nla.obj-2122602128 nla.obj-2122602128 English No known copyright restrictions|http://rightss... 68 Book NaN NaN NaN True the-australian-women-s-weekly-cookery-book-pri...
29983 The Banner cookery book : over 300 tested recipes https://trove.nla.gov.au/work/24494136 Dimboola Bush Nursing Hospital 1953 https://nla.gov.au/nla.obj-43445961 nla.obj-43445961 English Out of Copyright|http://rightsstatements.org/v... 48 Book NaN NaN NaN True the-banner-cookery-book-over-300-tested-recipe...
30410 The War chest cookery book https://trove.nla.gov.au/work/26653596 Citizens' War Chest Fund (N.S.W.) 1917 https://nla.gov.au/nla.obj-37545603 nla.obj-37545603 English Out of Copyright|http://rightsstatements.org/v... 156 Book NaN NaN NaN True the-war-chest-cookery-book-nla.obj-37545603.txt
30637 Southland Red Cross cookery book, 1916 https://trove.nla.gov.au/work/26863907 NaN 1916 http://nla.gov.au/nla.obj-49498371 nla.obj-49498371 NaN NaN 187 Book NaN NaN NaN True southland-red-cross-cookery-book-1916-nla.obj-...
32264 Flinders Island : souvenir : cookery book https://trove.nla.gov.au/work/35649557 Country Women's Association in Tasmania. Flind... 1946 https://nla.gov.au/nla.obj-2531663107 nla.obj-2531663107 English No known copyright restrictions|http://rightss... 84 Book NaN NaN NaN True flinders-island-souvenir-cookery-book-nla.obj-...
32955 Barossa cookery book : 400 tried recipes https://trove.nla.gov.au/work/6619781 Tanunda Australia Day Celebrations Committee (... 1917 http://nla.gov.au/nla.obj-497806529 nla.obj-497806529 NaN NaN 60 Book NaN NaN NaN True barossa-cookery-book-400-tried-recipes-nla.obj...
32963 "Caroona" cookery book : over 240 favourite re... https://trove.nla.gov.au/work/6663148 North Coast Methodist Homes for the Aged. Lism... 1900 http://nla.gov.au/nla.obj-52837739 nla.obj-52837739 English No known copyright restrictions|http://rightss... 54 Book NaN NaN NaN True caroona-cookery-book-over-240-favourite-recipe...

To use a different one of these as the source for our recipe generator, just copy the index value, and then get the name of the text_file. Like this:

In [13]:
df.loc[8173]['text_file']
Out[13]:
'the-kandy-koola-cookery-book-and-housewife-s-compa-nla.obj-2409723409.txt'

Copy and paste the file name into the text_file value at the top of this notebook, and then re-run the cells.

How might we combine ingredients from all of these cook books?


Created by Tim Sherratt for the GLAM Workbench.