In this notebook we use TextBlob to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!
Inspired by Australian Plain Cookery by a Practical Cook, 1882.
import requests
from textblob import TextBlob
import re
import random
import pandas as pd
from IPython.display import display, HTML
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
# The Cloudstor URL links to the repository of OCRd text from Trove digitised books
CLOUDSTOR_URL = 'https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL'
# File name of the cookery book
text_file = 'australian-plain-cookery-by-a-practical-cook-nla.obj-579917051.txt'
First we procure a recipe book.
# Download the text of the book
response = requests.get(f'{CLOUDSTOR_URL}/download?files={text_file}')
Then we slice and dice the words to create a new TextBlob.
# Create a TextBlob using the text
blob = TextBlob(response.text)
Carefully we remove the nouns and the verbs, discarding any that are spoiled.
# Get the verbs filtering out short words and those including non-alpha characters.
# 'VBD' is the part of speech tag for a past tense verb
verbs = [w.title() for w, t in blob.tags if t == 'VBD' and len(w) > 3 and w.isalpha()]
# Get the nouns filtering out short words and those including non-alpha characters.
# NNP is the POS tag for proper nouns
nouns = [w.title() for w, t in blob.tags if t.startswith('NNP') and len(w) > 3 and w.isalpha()]
Now it is necessary to prepare the sentences. First extract them from the blob. Discard any that seem ill-formed.
# Get the sentences from the blob
# Uses a regexp to exclude those that include anything other than standard letters, numbers, and punctuation.
sentences = [str(s).replace('\n', ' ') for s in blob.sentences if re.match(r'^[a-zA-Z\s\-,\.;0-9\'&\(\):]*$', str(s))]
The sentences now need to be divided, to separate out the titles, which are recognised by their case.
# Titles in this cookbook are in uppercase, so we can separate them out from the rest of the sentences.
titles = [s for s in sentences if s.strip('.').isupper()]
sentences = [s for s in sentences if not s.strip('.').isupper()]
Now we are ready to start cooking!
def recipe_maker(num=5):
html = ''
# Get a random title
title = random.choice(titles)
html = f'<h4>{title}</h4>'
html += '<h5>Ingredients:</h5>'
html += '<ol>'
for n in range(1, num + 1):
# Make a random selection from the nouns & verbs
html += f'<li>{random.choice(verbs)} {random.choice(nouns)}</li>'
html += '</ol>'
html += '<h5>Method:</h5>'
# Get random sentences and combine
html += f'<p>{" ".join(random.sample(sentences, num))}</p>'
display(HTML(html))
recipe_maker(6)
In carving a large fish, as here engraved, cut thin slices, as from A to B, and help with it pieces of the belly, in the direction marked from C to D ; the best flavoured is the upper or thick part. Take two pounds of fat bacon, and a pound and a half of beef suet. When you put in the vegetables, cover all closely, and do not use for at least six weeks. Lay in cold water, and when it boils simmer for eight or ten minutes. Add one onion, three sage leaves, some whole pepper, and a little salt in three pints of water. Edge and cover with short crust, and ornament the edges.
There's a full list of the POS (Part of Speech) tags here if you'd like to play with different combinations.
Perhaps we could add some more cookbooks? Let's load details of all the digitised books in Trove that include the word 'cookery' in the title.
df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/trove-books/master/trove_digitised_books_with_ocr.csv')
df.loc[(df['title'].str.contains('cookery')) & (df['text_downloaded'] == True)]
title | url | contributors | date | fulltext_url | trove_id | language | rights | pages | form | volume | parent | children | text_downloaded | text_file | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1888 | The Kingswood cookery book / by H. F. Wicken | https://trove.nla.gov.au/work/12721516 | Wicken, H | 1885-1950 | https://nla.gov.au/nla.obj-43987239 | nla.obj-43987239 | English | Out of Copyright|http://rightsstatements.org/v... | 278 | Book | NaN | NaN | NaN | True | the-kingswood-cookery-book-by-h-f-wicken-nla.o... |
2582 | Electric cookery book : being an indispensable... | https://trove.nla.gov.au/work/16383834 | State Electricity Commission of Victoria | 1940-1949 | http://nla.gov.au/nla.obj-52836472 | nla.obj-52836472 | English | No known copyright restrictions|http://rightss... | 73 | Book | NaN | NaN | NaN | True | electric-cookery-book-being-an-indispensable-h... |
2654 | The English and Australian cookery book : cook... | https://trove.nla.gov.au/work/16551115 | Abbott, Edward, 1801-1869 | 1864-2014 | https://nla.gov.au/nla.obj-9562000 | nla.obj-9562000 | English | Out of Copyright|http://rightsstatements.org/v... | 356 | Book | NaN | NaN | NaN | True | the-english-and-australian-cookery-book-cooker... |
4431 | Australian plain cookery / by a Practical Cook... | https://trove.nla.gov.au/work/18493439 | Old housekeeper | 1882-1897 | http://nla.gov.au/nla.obj-579917051 | nla.obj-579917051 | NaN | NaN | 148 | Book | NaN | NaN | NaN | True | australian-plain-cookery-by-a-practical-cook-r... |
7688 | The Armidale Red Cross cookery book of tested ... | https://trove.nla.gov.au/work/20631441 | Australian Red Cross Society. Armidale Branch | 1920 | https://nla.gov.au/nla.obj-52792201 | nla.obj-52792201 | English | Out of Copyright|http://rightsstatements.org/v... | 82 | Book | NaN | NaN | NaN | True | the-armidale-red-cross-cookery-book-of-tested-... |
8173 | The Kandy Koola cookery book and housewife's c... | https://trove.nla.gov.au/work/21067450 | Kandy Koola Tea | 1898 | https://nla.gov.au/nla.obj-2409723409 | nla.obj-2409723409 | English | Out of Copyright|http://rightsstatements.org/v... | 76 | Book | NaN | NaN | NaN | True | the-kandy-koola-cookery-book-and-housewife-s-c... |
8491 | The Hawkesbury and Shoalhaven calendar, cultur... | https://trove.nla.gov.au/work/21309432 | Woodhill & Co | 1905 | http://nla.gov.au/nla.obj-28658844 | nla.obj-28658844 | English | Out of Copyright|http://rightsstatements.org/v... | 200 | Book | NaN | NaN | NaN | True | the-hawkesbury-and-shoalhaven-calendar-cultura... |
9457 | Hebrew cookery / by an Australian | https://trove.nla.gov.au/work/22242397 | Australian | 1867 | http://nla.gov.au/nla.obj-52864954 | nla.obj-52864954 | English | No known copyright restrictions|http://rightss... | 25 | Book | NaN | NaN | NaN | True | hebrew-cookery-by-an-australian-nla.obj-528649... |
9472 | Recipes given by Mrs. Wicken at cookery class,... | https://trove.nla.gov.au/work/22249810 | Wicken, H | 1888 | http://nla.gov.au/nla.obj-533356312 | nla.obj-533356312 | English | Out of Copyright|http://rightsstatements.org/v... | 16 | Book | NaN | NaN | NaN | True | recipes-given-by-mrs-wicken-at-cookery-class-w... |
13145 | Southland Red Cross cookery book, 1916 | https://trove.nla.gov.au/work/237279068 | NaN | 1916 | https://nla.gov.au/nla.obj-49498371 | nla.obj-49498371 | English | Out of Copyright|http://rightsstatements.org/v... | 187 | Book | NaN | NaN | NaN | True | southland-red-cross-cookery-book-1916-nla.obj-... |
19740 | Barossa cookery book : 400 tried recipes | https://trove.nla.gov.au/work/237367083 | NaN | 1917 | https://nla.gov.au/nla.obj-497806529 | nla.obj-497806529 | English | No known copyright restrictions|http://rightss... | 60 | Book | NaN | NaN | NaN | True | barossa-cookery-book-400-tried-recipes-nla.obj... |
19823 | Australian plain cookery / by a practical cook | https://trove.nla.gov.au/work/237367586 | NaN | 1882 | https://nla.gov.au/nla.obj-579917051 | nla.obj-579917051 | English | Out of Copyright|http://rightsstatements.org/v... | 148 | Book | NaN | NaN | NaN | True | australian-plain-cookery-by-a-practical-cook-n... |
22262 | The Australian women's weekly cookery book : p... | https://trove.nla.gov.au/work/237539542 | NaN | 1948 | https://nla.gov.au/nla.obj-2122602128 | nla.obj-2122602128 | English | No known copyright restrictions|http://rightss... | 68 | Book | NaN | NaN | NaN | True | the-australian-women-s-weekly-cookery-book-pri... |
29983 | The Banner cookery book : over 300 tested recipes | https://trove.nla.gov.au/work/24494136 | Dimboola Bush Nursing Hospital | 1953 | https://nla.gov.au/nla.obj-43445961 | nla.obj-43445961 | English | Out of Copyright|http://rightsstatements.org/v... | 48 | Book | NaN | NaN | NaN | True | the-banner-cookery-book-over-300-tested-recipe... |
30410 | The War chest cookery book | https://trove.nla.gov.au/work/26653596 | Citizens' War Chest Fund (N.S.W.) | 1917 | https://nla.gov.au/nla.obj-37545603 | nla.obj-37545603 | English | Out of Copyright|http://rightsstatements.org/v... | 156 | Book | NaN | NaN | NaN | True | the-war-chest-cookery-book-nla.obj-37545603.txt |
30637 | Southland Red Cross cookery book, 1916 | https://trove.nla.gov.au/work/26863907 | NaN | 1916 | http://nla.gov.au/nla.obj-49498371 | nla.obj-49498371 | NaN | NaN | 187 | Book | NaN | NaN | NaN | True | southland-red-cross-cookery-book-1916-nla.obj-... |
32264 | Flinders Island : souvenir : cookery book | https://trove.nla.gov.au/work/35649557 | Country Women's Association in Tasmania. Flind... | 1946 | https://nla.gov.au/nla.obj-2531663107 | nla.obj-2531663107 | English | No known copyright restrictions|http://rightss... | 84 | Book | NaN | NaN | NaN | True | flinders-island-souvenir-cookery-book-nla.obj-... |
32955 | Barossa cookery book : 400 tried recipes | https://trove.nla.gov.au/work/6619781 | Tanunda Australia Day Celebrations Committee (... | 1917 | http://nla.gov.au/nla.obj-497806529 | nla.obj-497806529 | NaN | NaN | 60 | Book | NaN | NaN | NaN | True | barossa-cookery-book-400-tried-recipes-nla.obj... |
32963 | "Caroona" cookery book : over 240 favourite re... | https://trove.nla.gov.au/work/6663148 | North Coast Methodist Homes for the Aged. Lism... | 1900 | http://nla.gov.au/nla.obj-52837739 | nla.obj-52837739 | English | No known copyright restrictions|http://rightss... | 54 | Book | NaN | NaN | NaN | True | caroona-cookery-book-over-240-favourite-recipe... |
To use a different one of these as the source for our recipe generator, just copy the index value, and then get the name of the text_file
. Like this:
df.loc[8173]['text_file']
'the-kandy-koola-cookery-book-and-housewife-s-compa-nla.obj-2409723409.txt'
Copy and paste the file name into the text_file
value at the top of this notebook, and then re-run the cells.
How might we combine ingredients from all of these cook books?
Created by Tim Sherratt for the GLAM Workbench.