#!/usr/bin/env python
# coding: utf-8

# # Working with Tracery in Python
# 
# By [Allison Parrish](http://www.decontextualize.com)
# 
# This tutorial shows you how to use [Tracery](http://tracery.io) in your Python programs. In particular, it shows a handful of useful patterns for incorporating large amounts of data into your Tracery grammars that would be impractical or inconvenient to do with a Tracery generator on its own.
# 
# Tracery is an easy-to-use but powerful language and toolset for generating text from grammars made by [Kate Compton](http://www.galaxykate.com/). If you're not familiar with how Tracery works, try [the official tutorial](http://www.crystalcodepalace.com/traceryTut.html) or [this tutorial I wrote](http://air.decontextualize.com/tracery/).
# 
# This tutorial is written for Python 3+ and should work also on 2.7.

# ## Simple example
# 
# In order to generate text from a Tracery grammar in Python, you'll need to install the [Tracery Python module](https://pypi.python.org/pypi/tracery). It's easiest to do this with `pip` at the command line, like so:
# 
#     pip install tracery
#     
# (If you get a permissions error, try `pip install --user tracery`.)
# 
# Once you've installed the `tracery` module, try the following example program:

# In[1]:


import tracery
from tracery.modifiers import base_english

# put your grammar here as the value assigned to "rules"
rules = {
    "origin": "#hello.capitalize#, #location#!",
    "hello": ["hello", "greetings", "howdy", "hey"],
    "location": ["world", "solar system", "galaxy", "universe"]
}

grammar = tracery.Grammar(rules) # create a grammar object from the rules
grammar.add_modifiers(base_english) # add pre-programmed modifiers
print(grammar.flatten("#origin#")) # and flatten, starting with origin rule


# This program takes a Tracery grammar (in the form of a Python dictionary) and "flattens" it, printing its output to standard output. You can take the content of a Tracery grammar you've written and paste it into this Python program as the value being assigned to the variable `rules` (unless your Tracery grammar uses some aspect of JSON formatting that works differently in Python, like Unicode escapes). Run the program from the command line (or in the cell above, if you're viewing this in Jupyter Notebook) and you'll get a line of output from your grammar.

# ## Reading a Tracery grammar from a JSON file
# 
# You may already have a set of Tracery grammar files that you want to generate from, or don't want to cut-and-paste the grammar into your Python script. If this is the case, no problem! You can use Python's `json` library to load any file containing a Tracery grammar from a JSON file. The program below shows how this works.
# 
# Included is a sample grammar called `test-grammar.json`. Let's have a look.

# In[2]:


get_ipython().system('cat test-grammar.json')


# Python's `json` module provides functions for reading JSON-formatted data into Python as Python data structures, and exporting Python data structures to JSON format. The `.loads()` function from the module parses a string containing JSON-formatted data and returns the corresponding Python data structure (a dictionary or a list).

# In[3]:


import tracery
from tracery.modifiers import base_english
import json

# use json.loads() and open() to read in a JSON file as a Python data structure
rules = json.loads(open("test-grammar.json").read())

grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)

# print ten random outputs
for i in range(10):
    print(grammar.flatten("#origin#"))


# The above example uses a `for` loop to call the `.flatten()` method multiple times.

# ## Using external data in Tracery rules
# 
# An interesting affordance of using Tracery in Python is the ability to *fill in* parts of the grammar using external data. By "external" data, what I mean is data that isn't directly in the grammar itself, but data that you insert into the grammar when your program runs. One reason to do this might be to make the output of your grammar dynamic, using (e.g.) data returned from a web API. A simpler reason is simply that large Tracery grammars can be difficult to edit and navigate, especially when you're working with rules that might have hundreds or thousands of possible replacements.
# 
# To demonstrate, let's start with the generator discussed in [my Tracery tutorial](http://air.decontextualize.com/tracery/) that generates takes on the "Dammit Jim, I'm a doctor, not a `OTHER PROFESSION`" snowclone/trope. Such a grammar might look like this:
# 
#     {
#       "origin": "#interjection#, #name#! I'm a #profession#, not a #profession#!",
#       "interjection": ["alas", "congratulations", "eureka", "fiddlesticks",
#         "good grief", "hallelujah", "oops", "rats", "thanks", "whoa", "yes"],
#       "name": ["Jim", "John", "Tom", "Steve", "Kevin", "Gary", "George", "Larry"],
#       "profession": [
#             "accountant",
#             "butcher",
#             "economist",
#             "forest fire prevention specialist",
#             "mathematician",
#             "proofreader",
#             "singer",
#             "teacher assistant",
#             "travel agent",
#             "welder"
#         ]
#     }
#     
# An immediately recognizable shortcoming of this grammar is that it doesn't have a large number of alternatives. If we want there to be more professions that, dammit Jim, I'm not, we need to type them into the grammar by hand. The selection of names is also woefully small.
# 
# It would be nice if we could *supplement* the grammar by adding rule expansions from existing databases. For example, [Corpora](https://github.com/dariusk/corpora/) has [a list of occupations](https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/occupations.json) and [a list of first names](https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/firstNames.json), which we could incorporate into our grammar. One way to do this would be simply to copy/paste the relevant part of the JSON file into the grammar. But we can also load the data directly *into* the grammar using Python.
# 
# The program in the following cell specifies a *partial* Tracery grammar in a Python dictionary assigned to variable `rules`. The grammar is then augmented with data loaded from JSON files obtained from Corpora. Using the `json` library, we load the Corpora Project JSON files, find the data we need, and then assign it to new rules in the grammar. To get the example to work, we'll need to download [firstNames.json](https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/firstNames.json) and [occupations.json](https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/occupations.json), so let's do that first using wget.

# In[4]:


get_ipython().system('wget https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/firstNames.json')
get_ipython().system('wget https://raw.githubusercontent.com/dariusk/corpora/master/data/humans/occupations.json')


# The key trick here is that when creating the grammar, we refer to rules that don't yet exist. Later in the code, we add those rules (and their associated expansions, from the Corpora Project JSON files) by assigning values to keys in the `rules` dictionary. We're essentially building the grammar up gradually over the course of the program, instead of writing it all at once.

# In[6]:


import tracery
from tracery.modifiers import base_english
import json

# the grammar refers to "name" and "profession" rules. we're not including them in the grammar
# here, but adding them later on (using corpora project data!)
rules = {
    "origin": "#interjection.capitalize#, #name#! I'm #profession.a#, not #profession.a#!",
    "interjection": ["alas", "congratulations", "eureka", "fiddlesticks",
        "good grief", "hallelujah", "oops", "rats", "thanks", "whoa", "yes"],
}

# load the JSON data from files downloaded from corpora project
names_data = json.loads(open("firstNames.json").read())
occupation_data = json.loads(open("occupations.json").read())

# set the values for "name" and "profession" rules with corpora data
rules["name"] = names_data["firstNames"]
rules["profession"] = occupation_data["occupations"]

# generate!
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
print(grammar.flatten("#origin#"))


# > EXERCISE: Write a Tracery grammar that changes its output based on the current time of day. You'll need to use something like [`datetime`](https://docs.python.org/2.7/library/datetime.html#datetime.datetime.now) for this; after you've imported it, the expression `datetime.datetime.now().hour` evaluates to the current hour of the day.

# ## Doing a Tracery "mail merge" with CSV data
# 
# In [my CSV tutorial](https://gist.github.com/aparrish/f8e7eab47542678a39a39dddbca4ec2f), the final example shows how you might build sentences from data in a CSV file (in particular, a CSV exported from [this spreadsheet](https://docs.google.com/spreadsheets/d/1SmxsgSAcNqYahUXa9-XpecTg-fhy_Ko_-RMD3oT2Ukw/edit?usp=sharing) containing data about the dogs of NYC, originally from [here](https://project.wnyc.org/dogs-of-nyc/)). The method chosen in that example for constructing sentences is suboptimal: there's a lot of just slamming strings together with the `+` operator, which makes it hard to build in variation. It would be nice if we could build a Tracery grammar for generating these sentences instead!
# 
# The following example does exactly this. As with the example in the previous section, this example constructs a *partial* Tracery grammar, and then adds rules to the grammar with new information. The difference with this example is that we generate a sentence for multiple data sets—instead of loading in data once at the beginning. For each row in the CSV file, we create a fresh copy of the grammar, then add rule/expansion pairs with the relevant data from the row. Inside the `for` loop, we construct a new grammar object and print the "flattened" (i.e. expanded) text.

# In[9]:


import tracery
from tracery.modifiers import base_english
import json
import csv

# create the "template" grammar, which will be copied and augmented for each record of the CSV
rules = {
    "origin": [
        "#greeting.capitalize#! My name is #name#. #coatdesc.capitalize# and #breeddesc#. #homedesc#. #woof.capitalize#!",
        "#greeting.capitalize#! #woof.capitalize#! My name is #name# and #breeddesc#. #homedesc.capitalize# and #coatdesc#.",
        "#woof.capitalize#! #homedesc.capitalize# and they call me #name#. #breeddesc.capitalize# and #coatdesc#."
    ],
    "greeting": ["hi", "howdy", "hello", "hey"],
    "woof": ["woof", "arf", "bow-wow", "yip", "ruff-ruff"],
    "coatdesc": [
        "my coat is #color#",
        "I've got #color.a# coat",
        "you could say my coat is #color#"
    ],
    "breeddesc": [
        "I'm #breed.a#",
        "my breed is #breed#",
        "I'm the #superlative# #breed# you'll ever meet"
    ],
    "superlative": ["cutest", "strongest", "most playful", "friendliest", "cleverest", "most loyal"],
    "homedesc": [
        "I'm from #borough#",
        "I live in #borough#",
        "I love living in #borough#",
        "#borough# is where I call home"
    ]
}

# iterate over the first 100 rows in the CSV file
for row in list(csv.DictReader(open("dogs-of-nyc.csv")))[:100]:
    # copy rules so we're not continuously overwriting values
    rules_copy = dict(rules) # make a copy of the rules
    
    # now assign new rule/expansion pairs with the data from the current row
    rules_copy["name"] = row["dog_name"]
    rules_copy["color"] = row["dominant_color"].lower()
    rules_copy["breed"] = row["breed"]
    # little bit of fluency clean-up...
    if row["borough"] == "Bronx":
        rules_copy["borough"] = "the " + row["borough"]
    else:
        rules_copy["borough"] = row["borough"]

    # now generate!
    grammar = tracery.Grammar(rules_copy)
    grammar.add_modifiers(base_english)
    print(grammar.flatten("#origin#"))


# As you can see, this technique allows us to combine the expressive strengths of Tracery-based text generation with Python's CSV parser to generate simple "stories" from spreadsheet data. Pretty neat!