What is this? This is an interactive jupyter notebook document.

Page down through it, following the instructions…

With what looks to be a permanent and long-run partial moving-online of the university, the already important topic of “data science” seems likely to become even more foundational. Hence this first problem set tries to provide you with an introduction—to “data science”, and to the framework we will be using for problem sets that we hope will make things much easier for you and for us…

When you are finished, satisfied, or stuck, print your notebook to pdf, & upload the pdf to us on the bCourses assignment webpage:

Please include, in the comment box at the bottom of this interactive notebook, whatever comments on this assignment you want us to know...

Problem Set 0.1.5. Python, Notebooks, & the Biggest-Picture Overview of Economic Growth

These computer programming problem set assignments are a required part of the course.

Collaborating on the problem sets is more than okay—it is encouraged! Seek help from a classmate or an instructor or a roommate or a passerby when you get stuck! (Explaining things is beneficial, too—the best way to solidify your knowledge of a subject is to explain it.)

But the work has to be your own: no cutting-&-pasting from others' problem sets, please! We want you to learn this stuff, and your fingers typing every keystroke is an important way of building muscle memory here.

In fact, we strongly recommend that as you work through this notebook, whenever you come to a "code" cell—something intended not for you to read but also to direct the computer—the python interpreter—to do calculations, you (1) click on the code cell to bring it into your browser's focus; (2) click on the + button in the toolbar above to create a new code cell just below the one you were now in; and then (3) retype, line-by-line, the computer code in the cell (not the comment lines beginning with #s, but the code cells) while trying to figure out what the line of code is intended to tell the python interpreter to do. "Muscle"—in this case, fingertip—memory is an important but undervalued part of "active learning" here at Berkeley. In Germany, however, they have a term for it: das Fingerspitzengefühl; it's the kind of understanding-through-the-fingertips that a true expert has.

For reference, you might find it useful to read chapter 3 of the Data 8 textbook: <http://www.inferentialthinking.com/chapters/03/programming-in-python.html>.

Chapters 1 <https://www.inferentialthinking.com/chapters/01/what-is-data-science.html> and 2 <https://www.inferentialthinking.com/chapters/02/causality-and-experiments.html> are worth skimming as well...

0. Why Are We Making You Do This?

First of all, we are doing this because our section leaders are overworked: teaching online takes more time and effort than teaching in person, and our section leaders were not overpaid before the 'Rona arrived on these shores. Taking the bulk of the work of grading calculation assignments off of their backs is a plus—and it appears that the best way for us to do that is to distribute a number of the course assignements to you in this form: the form of a python computer language "jupyter notebook"

Second, we are doing this because learning Jupyter Notebooks and Python may well turn out to be the intellectual equivalent for you of "eat your spinach": something that may seem unpleasant and unappetizing now, but that makes you stronger and more capable. In 1999 Python programming language creator Guido van Rossem compared the ability to read, write, and use software you had built or modified yourself to search and analyze data and information collections. Guido predicted that mass programming, if it could be attained, would produce increases in societal power and changes in societal organization of roughly the same magnitude as mass literacy has had over the past several centuries.

Guido may be right, and he may be wrong. But what is clear is that your lives may be richer, and you may have more options, if the data science and basic programming intellectual tools become a useful part of your intellectual panoplies.

An analogy: An anology: Back in the medieval European university, people would learn the Trivium—the 'trivial' subjects of Grammar (how to write), Rhetoric (how to speak in public), and lLogic (how to think coherently)—then they would learn the Quadrivium of Arithmetic, Geometry, Music/Harmony, and Astronomy/Astrology; and last they would learn the advanced and professional subjects: Law or Medicine or Theology and Physics, Metaphysics, and Moral Philosophy.

But a student would also learn two more things: how to learn by reading—how to take a book and get something useful out of it, without a requiring a direct hands-on face-to-face teacher; and (2) how to write a fine chancery hand so that they could prepare their own documents, for submission to secular courts or to religious bishops or even just put them in a form where they would be easily legible to any educated audience back in those days before screens-and-attachments, before screens-and-printers, before typewriters, before printing.

The Data Science tools may well turn out to be in the first half of the 2000s the equivalent of a fine chancery hand, just as a facility with the document formats and commands of the Microsoft office suite were the equivalent of a fine chancery hand at the end of the 1900s: practical, general skills that make you of immense value to most if not nearly all organizations. This—along with the ability to absorb useful knowledge without requiring hands-on person-to-person face-to-face training—will greatly boost your social power and your set of opportunities in your life.

If we are right about its value.

I know a number of Berkeley graduates who in 2009 kept their jobs—and now have very good careers—solely because they knew then how to make Microsoft Office get up and dance, while the people to their right to their left, in front and behind them in the cubicle farm did not and so were let go when the U.S. unemployment rate spiked in the Great Recession. Literonumeracy in Microsoft Office was, 15 years ago, the then-equivalent of being able to write a fine chancery hand. We suspect the Data Science tools will be the counterpart for the next fifteen years.

Third, why Jupyter and Python, rather than R-studio and R, or C++ and Matlab? Because jupyter Project founder Fernando Perez has an office on the fourth floor of Evans. Because 40% of Berkeley undergraduates currently take Data 8 and so, taking account of other channels, more than half of Berkeley students are already going to graduate literonumerate in Python.

Let us get started!


1. What Jupyter Notebooks Are

This webpage is called a Jupyter notebook. A notebook is a place to write programs and view their results, and also to write text.

A notebook is thus an editable computer document in which you can write computer programs; view their results; and comment, annotate, and explain what is going on. Project Jupyter https://en.wikipedia.org/wiki/Project_Jupyter is headquartered here at Berkeley, where jupyter originator and ringmaster Fernando Pérez https://en.wikipedia.org/wiki/Fernando_Pérez_(software_developer) works: its purpose is to build human-friendly frameworks for interactive computing. If you want to see what Fernando looks and sounds like, you can load and watch a 15-minute inspirational video by clicking on "YouTubeVideo" below and then on the in the toolbar above:

In [2]:
from IPython.display import YouTubeVideo
# The original URL is: 
# https://www.youtube.com/watch?v=Wd6a3JIFH0s



1.1. Text cells

In a notebook, each rectangle containing text or code is called a cell.

Text cells (like this one) can be edited by double-clicking on them. They're written in a simple format created by Jon Gruber called markdown <http://daringfireball.net/projects/markdown/syntax> to add formatting and section headings. You almost surely want to learn how to use markdown.

After you edit a text cell, click the "run cell" button at the top that looks like in the toolbar at the top of this window, or hold down shift + press return, to confirm any changes to the text and formatting.

(Try not to delete the problem set instructions. If you do, then (a) rename your current notebook via the Rename command in the File menu so that you do not lose your work done so far, and then reenter the url http://datahub.berkeley.edu/user-redirect/interact?account=braddelong&repo=lecture-support-2020&branch=master&path=ps00.ipynb in the web address bar at the top of your browser to download a new, fresh copy of this problem set.)

This paragraph is in its own text cell. Try editing it so that this sentence is the last sentence in the paragraph, and then click the "run cell" ▶| button or hold down shift + return. This sentence, for example, should be deleted. So should this one.


1.2. Code cells

Other cells contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

To run the code in a code cell, first click on that cell to activate it. It'll be highlighted with a little green or blue rectangle. Next, either press or hold down shift + press return.

Try running this cell:

In [3]:
print("Hello, World!")
Hello, World!

And this one:

In [4]:
👋, 🌏!

The fundamental building block of Python code is an expression. Cells can contain multiple lines with multiple expressions. When you run a cell, the lines of code are executed in the order in which they appear. Every print expression prints a line. Run the next cell and notice the order of the output.

In [5]:
print('First this line is printed,')
print('and then this one.')
First this line is printed,
and then this one.

Please change the cell above so that it prints out:

First this line,
then the whole 🌏,
and then this one.

Hint: If you're stuck on the Earth symbol for more than a few minutes, try talking to a classmate or to one of us. That's a good idea for any prgramming problem. There is a saying: 'to enough eyes, all bugs are shallow'. Computer programming seems to be a human intellectual discipline that much more than others benefits massively from having multiple eyes and brains looking at the problem.


1.3. Writing notebooks

You can use Jupyter notebooks for your own projects or documents. When you make your own notebook, you'll need to create your own cells for text and code.

To add a cell, click the + button in the menu bar. It'll start out as a text cell. You can change it to a code cell by clicking inside it so it's highlighted, clicking the drop-down box next to the restart (⟳) button in the menu bar, and choosing "Code".

Add a code cell below this one. Write code in it that prints out:

A whole new cell! ♪🌏♪

(That musical note symbol is like the Earth symbol. Its long-form name is \N{EIGHTH NOTE}.)

Run your cell to verify that it works.


1.4. "Errors"

Python is a language, and like natural human languages, it has rules. It differs from natural language in two important ways:

  1. The rules are simple. You can learn most of them in a few weeks and gain reasonable proficiency with the language in a semester.
  2. The rules are rigid. If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes. A computer running Python code is not smart enough to do that.

Whenever you write code, you'll make mistakes. When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong.

Errors are okay; even experienced programmers make many errors; often times you will want to make errors, because they are ways for the computer to communicate to you what its internal state is and what it thinks it needs from you in order to proceed. When you make an error, you just have to find the source of the problem, fix it, and move on. You cannot break the computer by making errors.

We have made an error in the next code cell. Run it and see what happens.

(Note: In the toolbar, there is the option to click Cell > Run All, which will run all the code cells in this notebook in order. However, the notebook stops running code cells if it hits an error, like the one in the cell just below.)

In [6]:
# print("This line is missing something." enable before deploying

You should see something like this (minus our annotations):

The last line of the error output attempts to tell you what went wrong. The syntax of a language is its structure, and this SyntaxError tells you that you have created an illegal structure. "EOF" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message like this, you can often get by without deciphering it. (Of course, if you're frustrated, ask a neighbor or a staff member for help.)

Try to fix the code above so that you can run the cell and see the intended message instead of an error.


1.5. The 'Kernel'

The kernel is that part of the Jupyter Notebook environment that actually reads the lines of Python code in the code cells when they are executed, executes the code inside, and outputs the results.

In the top right of your window, you can see a circle that indicates the status of your kernel. If the circle is empty (⚪), the kernel is idle and ready to execute code. If the circle is filled in (⚫), the kernel is busy running some code.

Next to every code cell, you'll see some text that says In [...]. Before you run the cell, you'll see In [ ]. When the cell is running, you'll see In [*]. If you see an asterisk (*) next to a cell that doesn't go away, it's likely that the code inside the cell is taking too long to run, and it might be a good time to interrupt the kernel (discussed below). When a cell is finished running, you'll see a number inside the brackets, like so: In [1]. The number corresponds to the order in which you run the cells; so, the first cell you run will show a 1 when it's finished running, the second will show a 2, and so on.

You may run into problems where your Kernel is stuck for an excessive amount of time, your notebook is very slow and unresponsive, or your Kernel loses its connection. It will look as though the computer is working fine—and it will be, all except the Python Kernel module, which will have wedged itself.

If this happens, try the following steps:

  1. At the top of your screen, click Kernel, then Interrupt.
  2. If that doesn't help, click Kernel, then Restart. If you do this, you will have to run your code cells from the start of your notebook up until where you paused your work.
  3. If that doesn't help, restart your server. First, save your work by clicking File at the top left of your screen, then Save and Checkpoint. Next, click Control Panel at the top right. Choose Stop My Server to shut it down, then Start My Server to start it back up. Then, navigate back to the notebook you were working on. You will still have to run your code cells again.

In fact, it is good programming practice every fifteen minutes or so to go up to the top of this window, to the Kernel menu, and run the menu command: 'Restart Kernel and Run All Cells...' Otherwise, the Kernel will still have in its memory the results of stray computations and false starts. Those can create bugs in your notebook that are very hard to understand, and thus very hard to fix.


1.6. Libraries

There are many add-ons and extensions to the core of python that are useful—indeed essential—to using it to get work done. They are contained in what are called libraries. The rest of this notebook needs three libraries. So let us tell the python interpreter to install them.

Run the code cell below to do so:

In [7]:
# install the numerical python, python data analysis, and mathematical
# plotting libraries for python

!pip install numpy
!pip install pandas
!pip install matplotlib

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
Requirement already satisfied: numpy in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (1.19.2)
Requirement already satisfied: pandas in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (1.1.3)
Requirement already satisfied: numpy>=1.15.4 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from pandas) (1.19.2)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from pandas) (2020.4)
Requirement already satisfied: six>=1.5 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
Requirement already satisfied: matplotlib in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (3.3.2)
Requirement already satisfied: cycler>=0.10 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: numpy>=1.15 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (1.19.2)
Requirement already satisfied: pillow>=6.2.0 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (8.0.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: certifi>=2020.06.20 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (2020.12.5)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: six in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.15.0)
Requirement already satisfied: six in /Users/braddelong/opt/anaconda3/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.15.0)
In [ ]:
# set up the computing environment: ensure that graphs appear inline in the notebook & not in extra windows:

%matplotlib inline


1.7. Submitting your work

All problem sets in the course will be distributed as notebooks like this one. When you finish an assignment, you need to submit it by (a) printing your notebook to .pdf format, and (b) uploading your .pdf to the problem set assignment page. It's fine to submit multiple times.

Don't forget to submit your problem set, even if you haven't finished everything!


2. The Human Biggest-Picture Economic-Growth Overview Table

Now that you have started your notebook and loaded in the standard libraries, we want you to begin working. We want you to load in a a table of numbers from the 'longest_run_growth_summary.csv' file (located in the same directory as this Jupyter Notebook file). And we want you to give this table of numbers the name 'longest_run_growth_summary_df'. 'longest_run_growth_summary' to remind you of what the numbers in the table are. '_df' to remind you that the computer represents the table to itself in a format called a Pandas Dataframe. The fact that we have the computer do this is going to save you a lot of typing, and a lot of wrestling with picky and hard-to-remember details.


2.1. The "Levels" Dataframe

You do this by running the next code cell:

In [8]:
longest_run_growth_summary_df = pd.read_csv('longest_run_growth_summary.csv')

And then it is good programming practice to immediately check that the computer did what you expected when you asked it to run the code cell. It may well do something different—and it may well not issue an error message when it does something different:

In [9]:
row year human_ideas_index income_level population
0 0 -68000 0.03 1200.0 0.1
1 1 -8000 0.16 1200.0 2.5
2 2 -6000 0.20 900.0 7.0
3 3 -3000 0.30 900.0 15.0
4 4 -1000 0.54 900.0 50.0
5 5 1 1.00 900.0 170.0
6 6 1500 1.71 900.0 500.0
7 7 1770 2.57 1100.0 750.0
8 8 1870 3.99 1300.0 1300.0
9 9 2020 87.98 11842.0 7600.0

As you see, if you put an '=' sign into a code cell, Python will assign the value of the expression on the right side of the equals sign to the expression on the left side. ('=' is probably not the best symbol to use for this, since it is really a request to assign a value. 'Y <- X' or 'Let Y = X' would probably be better conventions. But we are stuck doing what the programming community has standardized on.)

But if you just write an expression, without an equals sign, then Python thinks you want it to print out the value of that expression on the line below.

To complicate things further, what do you suppose a double equals sign '==' does?

In [10]:
longest_run_growth_summary_df == pd.read_csv('longest_run_growth_summary.csv')
row year human_ideas_index income_level population
0 True True True True True
1 True True True True True
2 True True True True True
3 True True True True True
4 True True True True True
5 True True True True True
6 True True True True True
7 True True True True True
8 True True True True True
9 True True True True True

A '==' examines the elements of the expression on the left, compares them to the corresponding elements of the expression on the right, and then takes on the values 'True' and 'False' depending on whether they are in fact equal. Since each element of 'longest_run_growth_summary_df' is identical to the corresponding element of 'pd.read_csv('longest_run_growth_summary.csv')', the result is a table of 'Trues'.

And if we want the computer to print on expression below a code cell whether or not the expression is the last line of the cell, we explicitly use 'print()':

In [11]:
print('The computer just printed "5" on the row above')
The computer just printed "5" on the row above

Suppose we wanted to tell the computer that you want it to focus on the population column. How do you do that? Well, run the next code cell:

In [12]:
0       0.1
1       2.5
2       7.0
3      15.0
4      50.0
5     170.0
6     500.0
7     750.0
8    1300.0
9    7600.0
Name: population, dtype: float64

And suppose you wanted to tell the computer to focus on the 7th element of the 'population' column (yes, I know, I know; the 7th element is actually the eighth; this is actually Alan Turing's fault). Well, run the next code cell:

In [13]:

Now let us look at our table:

In [14]:
   row   year  human_ideas_index  income_level  population
0    0 -68000               0.03        1200.0         0.1
1    1  -8000               0.16        1200.0         2.5
2    2  -6000               0.20         900.0         7.0
3    3  -3000               0.30         900.0        15.0
4    4  -1000               0.54         900.0        50.0
5    5      1               1.00         900.0       170.0
6    6   1500               1.71         900.0       500.0
7    7   1770               2.57        1100.0       750.0
8    8   1870               3.99        1300.0      1300.0
9    9   2020              87.98       11842.0      7600.0

The 'year' column of the table tells us which year according to the common calendar that row of the data table corresponds to. The 'row' column tells us the row number of the table. The 'population' column of the table gives our guesses as to the human population in that year in millions.

The 'income_level' column of the table gives our guesses as to the average human living standard per capita in that year in 2020 'international dollars'—that is, taking account of the fact that since prices are generally lower in poor countries a given nominal income buys a higher standard of living there than in rich countries. The numbers from the year 1770 on are real global average estimates.

The numbers for the years from -6000 to 1500 are our guess that back in the Agrarian Age the average human standard of living was at a desperately-poor level that modern development economists would classify at perhaps \$2.50/day. That had to be the case: given that a nutritionally-unstressed pre-artificial birth-control human population triples in fifty years or so, average population growth would have been much, much faster in the Agrarian Age if humanity had then been any richer.

The numbers for the years -68000 and -8000 are our guess that back in the Gatherer-Hunter Age standards of living were somewhat higher: about \$3.50/day. They had to be: the life of a gatherer-hunter was more strenuous, and if people had not been substantially better-nourished than in the Agrarian Age, they would have failed to reproduce themselves.

The 'human_ideas_index' column provides our guesses as to the value of the stock of useful human ideas about technology and organization discovered, invented, and deployed around the globe. It is—arbitrarily—set equal to 1 for the year 1 of the common world calendar. That is a calibration: we have to set the index equal to one sometime. We assume that every 1% increase in average standards of living with population held constant. is associated with an equiproportional 1% increase in the value of our ideas index. That is also a calibration: it is easier to interpret the index as a value if it scales with living standards and productivity levels. And we assume that every 1% increases in human population with average standards of living held constant is associated with an 0.5% increase in the value in the value of the ideas index. That is a guess, or more politely a judgment: a higher human population means smaller average farm sizes and less abundant raw materials available for the typical craftworker; thus human living standards would be lower unless greater natural resource scarcity were not counterbalanced by better ideas about how to produce useful things; the judgment—and it is contestable—is that over the millennia and on average resources have been only half as salient as ideas in enabling the economic efficiency of human labor.


2.2. The "Growth Rates" Dataframe

Of interest to us are not only the levels of human population, of average income per capita, and of the value of the ideas stock over time. Of interest to us also are the growth rates of these values. So let us have Python calculate those growth rates and stuff them into another dataframe.

Our growth-rate dataframe is going to have six columns:

  1. the 'initial_year' of the period to which the calculated growth rate applies, which will serve as the index of the rows of the dataframe
  2. the 'span' of the period—its length in years
  3. 'h', the proportional rate of growth of the value of the useful-ideas stock
  4. 'g', the growth rate of income per capita and of the efficiency of labor
  5. 'n', the growth rte of the population and the labor force
  6. 'year', the same as 'initial_year', because sometimes Python will throw an error if you try to do arithmetic calculations with elements of a column that is a dataframe's index. (No, I do not know why.)

First we set up four empty lists to hold the results of our calculations:

In [15]:
span = []
g = []
h = []
n = []

Let me digress a little to tell you about lists—things bookended by brackets—in Python:

Lists and their siblings, numpy arrays, are ordered collections of objects that have an order. Lists allow us to store groups of variables under one name. The order then allows us to access the objects in the list for easy access and analysis. If you want an in-depth look at the capabilities of lists, take a look at <https://www.tutorialspoint.com/python/python_lists.htm>

To initialize a list, you use brackets. Putting objects separated by commas in between the brackets will add them to the list. For example, we can create and name an empty list:

In [16]:
list_example = []

We can add an object to the end of a list:

In [17]:
list_example = list_example + [5]

Now we have a one-element list. And we can add another element:

In [18]:
list_example = list_example + [10]

to make a two-element list—note that the '+' here... what it does may not be what you expected.

We can join—"concatenate"—two lists together:

In [19]:
list_example_two = list_example + [1, 3, 6, 'lists', 'are', 'fun', 4]
[5, 10, 1, 3, 6, 'lists', 'are', 'fun', 4]

It is, I think, a mistake for python to use + in this way. In arithmetic, + is simply addition. With lists, + smashes the two lists on either side together to make a bigger list. This can be a source of great confusion. I think this 'overloading' of the '+' operator is confusing and harmful. For example:

In [20]:
four = 4
print("this '4' is a number:", four, "; so '+' is addition and so", four, "+", four, "=", four + four)

four = [4]
print("this '4' is a list:", four, "; so '+' is list concatenation and so", four, "+", four, "=", four + four)

four = '4'
print("even worse is: this '4' is a string-of-symbols:", four, "; so '+' is symbol concatenation and so", four, "+", four, "=", four + four)
this '4' is a number: 4 ; so '+' is addition and so 4 + 4 = 8
this '4' is a list: [4] ; so '+' is list concatenation and so [4] + [4] = [4, 4]
even worse is: this '4' is a string-of-symbols: 4 ; so '+' is symbol concatenation and so 4 + 4 = 44

which gives you no clue in the output as to why the result is different at all.

To access not the list as a whole but an individual value in the list, simply count from the start of the list, and put the place of the object you want to access in brackets after the name of the list. But you have to start counting from not one but zero. Thus the initial object of a list has index 0, the second object of a list has index 1, and in the list above the eighth object has index 7:

Thus ends the digression.

Now return to building our growth-rates dataframe. Next, we are going to use a loop <https://www.tutorialspoint.com/python/python_loops.htm> to march through the rows of our dataframe table-to-be. A loop is a construct that executes a related series of calculations over and over again. For example, here is a loop that adds 1 to a number 10 times;

In [21]:
x = 0

for i in range(10):
    x = x+1

And here is a loop that, instead of printing out the number each time through, adds them to a list:

In [22]:
our_list = []
x = 0

for i in range(10):
    x = x+1
    our_list = our_list + [x]

<class 'list'>

The code cell above should... disturb you. You should understand the first line: assign the value of an empty list to the variable named 'our_list'. You should understand the second line: assign the value of 0 to the variable named 'x'. You should understand the third line: loop through the following instruction lines ten times. And you should understand the fourth line: update the value of our variable 'x' by adding one to it.

But then comes the fifth line: 'our_list = our_list + [x]'. What is going on here? Unless you paid sharp attention up above, you may well not have a clue.

What is going on here is that Python sharply distinguishes between numbers and lists. When a '+' sign is between two numbers, Python thinks '+' means 'add them'. When a '+' sign is beween two lists, Python thinks '+' means 'make a big list by joining the two lists together'—concatenating them. So 'our_list = our_list + [x]' means 'update the list "our_list" by adding on to the end of it the list "[x]"'. What is the list '[x]'? It is the one-element list whose element is the variable 'x'.

What do you think Python would do with 'our_list = our_list + x'? Let us try it and see:

In [23]:
# our_list + x uncomment before deployment

Python throws out an error message. It is saying: '"our_list" is a list, and so that tells me that the "+" means "make a bigger list", but "x" is a number, and so that tells me that "+" means add things up, so I do not know what to do!'

This has just been a—hopefully painless—lesson on loops and data types.

And with that behind us, here is our loop to calculate our growth rates:

In [24]:
for t in range(9):
    span = span + [longest_run_growth_summary_df['year'][t+1]-longest_run_growth_summary_df['year'][t]]
    h = h + [np.log(longest_run_growth_summary_df['human_ideas_index'][t+1]/longest_run_growth_summary_df['human_ideas_index'][t])/span[t]]
    g = g + [np.log(longest_run_growth_summary_df['income_level'][t+1]/longest_run_growth_summary_df['income_level'][t])/span[t]]
    n = n + [np.log(longest_run_growth_summary_df['population'][t+1]/longest_run_growth_summary_df['population'][t])/span[t]]

In order to calculate a growth rate, we take the end value, divide it by the beginning value, take the natural log of the quotient, and then divide that by the number of years from beginning to end. The loop does that calculation for each of the columns—population, income, and ideas value—and for each row—and adds each number as it is calculated to the appropriate list.

Now I want to check your comprehension. So, in the next code cell, between the single quote marks, type your understanding of what each component of the terms in the "h = h..." line of code in the above code cell is doing, and then execute the code cell:

In [25]:
loop_line_explanation = '...'

Now we continue building the growth-rates dataframe. First we need to assemble our lists into an array, then switch our lists from rows into columns, and last stuff the numbers into the 'long_run_growth_rates' dataframe:

In [26]:
data_list = np.array([span, h, g, n]).transpose()

long_run_growth_rates_df = pd.DataFrame(
  data=np.array(data_list), columns = ['span', 
  'h', 'g', 'n']

Then we add the starting year and index columns:

In [27]:
long_run_growth_rates_df['year'] = longest_run_growth_summary_df['year'].apply(np.int64)

index_year = longest_run_growth_summary_df['year']

long_run_growth_rates_df['index_year'] = index_year
long_run_growth_rates_df.set_index('index_year', inplace=True)
In [28]:
# now check to see that the long_run_growth_rates_df
# is in fact what you expected and wanted it to be:

span h g n year
-68000 60000.0 0.000028 0.000000 0.000054 -68000
-8000 2000.0 0.000112 -0.000144 0.000515 -8000
-6000 3000.0 0.000135 0.000000 0.000254 -6000
-3000 2000.0 0.000294 0.000000 0.000602 -3000
-1000 1001.0 0.000616 0.000000 0.001223 -1000
1 1499.0 0.000358 0.000000 0.000720 1
1500 270.0 0.001509 0.000743 0.001502 1500
1770 100.0 0.004399 0.001671 0.005500 1770
1870 150.0 0.020622 0.014729 0.011772 1870

And we are done.


2.3. Print Formatting

As we know, we can simply print a dataframe by putting its name by itself on the last line of a code cell. But also note that the printing is not very pretty.

We can make the printing prettier by defining a format dictionary. To do so, we construct an object called format_dict (or whatever other name we choose), and we then feed that object to the dataframe. thus we tell the dataframe that we want it to evaluate itself using its .style method, and that .style() should use its .format() submethod to understand what the format_dict object is asking it to do:

In [29]:
format_dict = {'year': '{0:.0f}', 'human_ideas_index': '{0:,.2f}', 
    'income_level': '${0:,.0f}', 'population': '{0:,.1f}'}

row year human_ideas_index income_level population
0 0 -68000 0.03 $1,200 0.1
1 1 -8000 0.16 $1,200 2.5
2 2 -6000 0.20 $900 7.0
3 3 -3000 0.30 $900 15.0
4 4 -1000 0.54 $900 50.0
5 5 1 1.00 $900 170.0
6 6 1500 1.71 $900 500.0
7 7 1770 2.57 $1,100 750.0
8 8 1870 3.99 $1,300 1,300.0
9 9 2020 87.98 $11,842 7,600.0


3. Data Visualization

Suppose that you wanted to graph how the human population has changed over time. You could take out a piece of graph paper, look at the table, and start placing dots on the graph paper where the dot's place in the width of the paper corresponds to the number in a particular row in the 'year' column and the dot's place in the height of the paper corresponds to the number in that particular row in the 'population' column. And then you could connect the dots. Or you could get the computer to do it.

First, tell the computer that you want it to focus on the population column:

In [30]:
0       0.1
1       2.5
2       7.0
3      15.0
4      50.0
5     170.0
6     500.0
7     750.0
8    1300.0
9    7600.0
Name: population, dtype: float64

As you know, putting "longest_run_growth_summary_df['population']" in a code cell is a command to the computer to focus on the 'population' column of the longest_run_growth_summary_df' dataframe table.

Next, we tell the computer we want it to draw graphs so that they appear inline in the notebook and not in separate windows:

And if you run the next code cell:

In [31]:

There is your graph: adding '.plot()' to lots of expressions in Python will generate a graph. It will be an ugly graph, probably, so let us make a prettier graph. First, let us tell Python that we want to refer to rows in this dataframe—and use as the x-axis in our graphs—by the 'year' column rather than by the 'row' column:

In [32]:
longest_run_growth_summary_df.set_index('year', inplace=True)

Then let us tell Python that we want labels and a title for our graph:

In [33]:
plt.title('Human Economic History: Population', size=20)
plt.xlabel('Year', size=12)
plt.ylabel('Human Population (Years)', size=12)
Text(0, 0.5, 'Human Population (Years)')

And there we are!

Now we want you to draw the same graph, but for average income per capita, not population. So fill in the ellipsis '...' with what will make that happen:

In [34]:
# longest_run_growth_summary_df['...'].plot() uncomment before deployment
longest_run_growth_summary_df['income_level'].plot() # delete before deployment

plt.title('Human Economic History: Average Income', size=20)
plt.xlabel('Year', size=12)
plt.ylabel('Average Annual Income Per Capita', size=12)
Text(0, 0.5, 'Average Annual Income Per Capita')

Freaky, no?

This is why U.C. Davis economic historian Greg Clark says that there is really only one graph that is important in economic history.

The picture looks the same even if we narrow in our scope to just the positive years of the common calendar:

In [35]:
plt.title('Human Economic History: Average Income', size=20)
plt.ylabel('Annual Income per Capita, 2020 Dollars')
Text(0, 0.5, 'Annual Income per Capita, 2020 Dollars')

After the spring of coronavirus, we are used to exponential growth processes—things that explode, but only after a time in which they gather force, and which look like straight line growth on a graph plotted on a logarithmic scale. Let us plot income levels, populations, and ideas stock values on log scales and see what we see:

In [52]:
plt.title('Human Economic History: Average Income', size=20)
plt.ylabel('Log Annual Income per Capita, 2020 Dollars')
Text(0, 0.5, 'Log Annual Income per Capita, 2020 Dollars')

And now look at the size of the human population on a log scale:

In [37]:

#... uncomment before deployment
#... uncomment before deployment
#... uncomment before deployment

plt.title('Human Economic History: Population', size=20) # delete before deployment
plt.xlabel('Year')                                       # delete before deployment
plt.ylabel('Log Population, Millions')                   # delete before deployment
Text(0, 0.5, 'Log Population, Millions')

And the growth of the useful-ideas stock:

In [38]:
#... uncomment before deployment
#... uncomment before deployment
#... uncomment before deployment
#... uncomment before deployment

np.log(longest_run_growth_summary_df['human_ideas_index'][5:12]).plot()  # delete before deployment
plt.title('Human Economic History: Ideas', size=20) # delete before deployment
plt.xlabel('Year')                                  # delete before deployment
plt.ylabel('Log Human Ideas Index')                 # delete before deployment
Text(0, 0.5, 'Log Human Ideas Index')

We see not exponential but superexponential growth: on a log scale, all three series are not straight lines, but rapidly upward-curving ones.


4. The Psychology of Programming

4.1. Why Jump Through All These Hoops Here?

You may feel that we have gone through a lot of extra and unnecessary work to create these dataframes. If you are familiar with a spreadsheet program like Microsoft Excel, you may wonder why we don't just use a spreadsheet to hold and then do calculations with the data in this small table that is long_run_growth_df. Indeed, Bob Frankston and Dan Bricklin who implemented and designed the original Visicalc were geniuses. Visicalc was a tremendously useful advance over earlier mainframe-based report generators, such as ITS's Business Planning Language <https://en.wikipedia.org/wiki/VisiCalc>. And today's Microsoft Excel is not that great an advance over Jonathan Sachs's Lotus 1-2-3, which was itself close to being merely a knockoff of Visicalc. Why not follow the line of least resistance? Why not do our data analysis and visualization in a spreadsheet?

I do not recommend using spreadsheet programs. In fact, I greatly disrecommend using spreadsheet programs.


This is why:

If you do your work in a spreadsheet, and it rapidly becomes impossible to check or understand. A spreadsheet is a uniquely easy framework to work in. A spreadsheet is a uniquely opaque and incomprehensible framework to assess for its correctness.

Since we all make errors, frequently, the ability to look back and assess whether one's calculations are correct is absolutely essential. With spreadsheets, such checking is impossible. And sooner or later with very high probability you will make a large and consequential mistake that you will not catch.


4.2. Bewildering Magic...

Do you feel bewildered? As if I have been issuing incomprehensible and arcane commands to some deep and mysterious entities that may or may not respond in the expected way? All who program feel this way some of the time, and most of those who program feel this way most of the time. For example, consider the URL you typed to get to this notebook. It was the arcane: <> which accessed the ps00.ipynb file in the master branch of my lecture-support-2020 repository of files on the <http://github.com/> website, which is in the ingenious and powerful git format. But I have never met anybody able and willing to explain how git works to me, and I do not think cartoonist Randall Munroe has met anyone either. It is all just magic. :


And at times you are sure to feel worse than bewildered. You will feel like python newbee Gandalf feels at this moment:

This "sorcerer's apprentice" <https://www.youtube.com/watch?v=2DX2yVucz24> feeling is remarkably common among programmers. It is explicitly referenced in the introduction to the classic computer science textbook, Abelson, Sussman, & Sussman: Structure and Interpretation of Computer Programs <https://github.com/braddelong/public-files/blob/master/readings/book-abelson-structure.pdf>:

In effect, we conjure the spirits of the computer with our spells. A computational process is indeed much like a sorcerer’s idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer’s spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform. A computational process, in a correctly working computer, executes programs precisely and accurately. Thus, like the sorcerer’s apprentice, novice programmers must learn to understand and to anticipate the consequences of their conjuring.... Master software engineers have the ability to organize programs so that they can be reasonably sure that the resulting processes will perform the tasks intended...


4.3. Comment Your Code!

You may recall lines in the code cells above like these:

# set up the computing environment: ensure that graphs appear inline in the notebook & not in extra windows:

This is called a comment. It doesn't make anything happen in Python; Python ignores anything on a line after a #. Instead, it's there to communicate something about the code to you, the human reader. Comments are extremely useful.

Source: <http://imgs.xkcd.com/comics/future_self.png>

Why are comments useful? Because anyone who will read and try to understand your code in the future is guaranteed to be an idiot. You need to explain things to them very simply, as if they were a small child.

They are not really an idiot, of course. It is just that they are not in-the-moment, and do not have the context in their minds that you have when you write your code.

And always keep in mind that the biggest idiot of all is also the one who will be most desperate to understand what you have written: it is yourself, a month or more from now, especially near th eend of the semester.


4.4. Maintain Your Machines

These assignments will be very difficult to do on a smartphone.

Understand and keep your laptop running—or understand, keep running, and get really really good at using your tablet. Machines do not need to be expensive: around 150 dollars should do it for a Chromebook. People I know like the Samsung Exynos 5 <https://www.amazon.com/Samsung-Chromebook-Exynos-Dual-Core-XE303C12-A01US/dp/B01LXJZWVF/> or the Lenovo 3 11" <https://www.walmart.com/ip/11-Drive-82BA0000US-Processor-RAM-Celeron-Black-Intel-Solid-Chromebook-4GB-Display-Chrome-4GB-32GB-Lenovo-OS-Dual-Core-N4020-32GB-eMMC-3-11-6-State-O/402347782>.

And have a backup plan: what will you do if your machine breaks and has to go into the shop, or gets stolen?


5. You Are Done!

Now add your comments on the problem set—what you learned, what you failed to learn, how we could make this better—between the single-quote marks in the code cell below, and then be sure that you have run all code cells.

Print this notebook to .pdf, and upload it to the bCourses page. You will then be done with this problem set.

In [ ]:
ps_comments = '...'

Thanks to Umar Maniku, Eric van Dusen, Anaise Jean-Philippe, Marc Dordal i Carrerras, & others for helpful comments. Very substantial elements of this were borrowed from the Berkeley Data 8 <http://data8.org> teaching materials, specifically Lab 01: https://github.com/data-8/materials-sp20/blob/master/materials/sp20/lab/lab01/lab01.ipynb>

Note: datahub link: <http://datahub.berkeley.edu/user-redirect/interact?account=braddelong&repo=econ-135-s-2021-assignments&branch=main&path=ps00-DRAFT.ipynb>
Note: github link: <https://github.com/braddelong/econ-135-s-2021-assignments/blob/main/ps00-DRAFT.ipynb>

In [ ]:
print(' ')