Created by Nathan Kelber for JSTOR Labs under Creative Commons CC BY License
For questions/comments/improvements, email nathan.kelber@ithaka.org.
____
Description: This notebook introduces Jupyter notebooks and Python for absolute beginners. If you are completely new to text mining, this is the place to start.
Difficulty: Beginner
Knowledge Required: None
Purpose: Learning (Optimized for explanation over code)
Knowledge Recommended: None
Completion time: 15 minutes
Data Format: None
Libraries Used: None ___
Welcome to your first Jupyter notebook. Jupyter notebooks are documents that contain both computer code (like Python alongside explanatory images, figures, videos, and links. Most importantly, the code in a Jupyter notebook can be executed, modified, and deleted. As you explore this notebook, please feel free to modify the text, the code, and to generally play around with the environment. You can always launch another instance of this notebook that will restore its original configuration. Later, we will learn how to create and save your own notebooks to share with others.
Similar to the way an essay is composed of paragraphs, Jupyter notebooks are composed of cells. A cell is like a container for a particular kind of content. There are essentially two kinds of content in Jupyter notebooks:
A code cell can be distinguished from a markdown cell by the fact that it contains a pair of brackets with a colon to its left, like so [ ]:
# This is a code cell
A markdown cell provides information, but a code cell can be executed to perform an action. The code cell above does not contain any executable content, only a text comment. We can tell the text in the code cell is a comment because it is prefixed by a #
. In Python, any time a line is prefaced by a #
that line is a comment and will not be executed if the code is run. In a code cell, comments are also blueish-green in color.
It is traditional in programming education to begin with a program that prints Hello World
. In Python, this is a simple task. We will use the print()
function. This function simply prints out whatever is inside the parentheses (). We will pass the quotation "Hello World" to the print function like so:
print("Hello World")
Write this code into the following code cell below. To execute our code, we have a couple options:
Click the code cell you wish to run and then push the triangle-shaped "play" button above.
Click the code cell you wish to run and press Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard.
Type print("Hello World")
into the box below and then run the cell.
After your code runs, you'll receive any output and a number will appear in the pair of brackets [ ]:
to the left of the code cell to show the order the cell was run. If your code is complicated or takes some time to execute, an asterisk * will be displayed in the pair of brackets [*]:
while the code executes.
Execute the code cell below which:
As the program is running, watch the pair of brackets and you will see the code is running [*]:
.
print('Waiting 5 seconds...')
import time
time.sleep(5)
print('Done')
If you missed the asterisk, you can run the code cell as many times as you like. Notice that each time you run a code cell the number increases in the pair of brackets [ ]:
. This keeps track of the order cells were run. While we will always run code in order from top to bottom, keep in mind that code cells can be run in any order. If you run a code cell at the bottom of a notebook that depends on the output of a code cell at the top, you will probably get an error. When you get an error, it's a good idea to check if you missed a code cell earlier that needed to be run first.
To create a new cell, click the + at the top of the menu. A new cell will be created immediately underneath the currently sl
By default, a code cell is created. To change the cell type, click on the dropdown menu.
To delete a cell, select the cell (or set of cells) and right-click (ctrl-click on OS X) and select Delete Cells
The text in code cells can be quickly changed like a regular textbox. In order to change the content of a markdown cell, you need to expose the markdown content underneath by double-clicking the cell. This will reveal the plain text of the markdown that creates various elements like headings, links, images, etc. When you want the cell to render again, you can simply run it again by pushing the play button or pressing Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard.
If you are familiar with HTML, markdown is a simplified way to write HTML elements. Basically it allows you to mark out where headings, italics, bold, and other kinds of basic formatting go. In terms of styling, markdown is very minimalist. If you would like to include an element that is not included in markdown in your notebook, you can also use HTML and CSS in your markdown cells.
Here are some basic examples to get you started. Double-click on this cell to see how each was made. There are many markdown cheatsheets available on the web. It can be useful to print one out and keep it handy.
Use asterisks around texts to add emphasis, also known as italics
You can also use underscores
A strike-thru effect is created with two tildes ~~
A list of ordered items:
Unordered items:
This is a link to JSTOR.
Create a horizontal rule with three hyphens, asterisks, or underscores. ____