Created by Nathan Kelber and Ted Lawless for JSTOR Labs under Creative Commons CC BY License
For questions/comments/improvements, email nathan.kelber@ithaka.org.
___
Description: This lesson describes operators, expressions, data types, variables, and basic functions. Complete this lesson if you are familiar with Jupyter notebooks or have completed Getting Started with Jupyter Notebooks, but do not have any experience with Python programming. This is part 1 of 3 in the series Python Basics that will prepare you to do text analysis using the Python programming language.
Use Case: For Learners (Detailed explanation, not ideal for researchers)
Difficulty: Beginner
Completion Time: 75 minutes
Knowledge Required:
Knowledge Recommended: None
Data Format: None
Libraries Used: None
Research Pipeline: None ___
Python is the fastest-growing language in computer programming. Learning Python is a great choice because Python is:
The second most-popular language for digital humanities and data science work is R. We plan to create additional support for learning R soon. If you are interested in helping develop open educational resources for R, please reach out to Nathan Kelber (nathan.kelber@ithaka.org).
The skills you'll learn in Python Basics 1-3 are general-purpose Python skills, applicable for any of the text analysis notebooks that you may explore later. They are also widely applicable to many other kinds of tasks in Python beyond text analysis.
Making Mistakes is Important
Every programmer at every skill level gets errors in their code. Making mistakes is how we all learn to program. Programming is a little like solving a puzzle where the goal is to get the desired outcome through a series of attempts. You won't solve the puzzle if you're afraid to test if the pieces match. An error message will not break your computer. Remember, you can always reload a notebook if it stops working properly or you misplace an important piece of code. Under the edit menu, there is an option to undo changes. (Alternatively, you can use command z on Mac and control z on Windows.) To learn any skill, you need to be willing to play and experiment. Programming is no different.
The simplest form of Python programming is an expression using an operator. An expression is a simple mathematical statement like:
1 + 1
The operator in this case is +
, sometimes called "plus" or "addition". Try this operation in the code box below. Remember to click the "Run" button or press Ctrl + Enter (Windows) or shift + return (OS X) on your keyboard to run the code.
# Type the expression in this code block. Then run it.
Python can handle a large variety of expressions. Let's try subtraction in the next code cell.
# Type an expression that uses subtraction in this cell. Then run it.
# Try a multiplication in this cell. Then try a division.
# What happens if you combine them? What if you combine them with addition and/or subtraction?
When you run, or evaluate, an expression in Python, the order of operations is followed. (In grade school, you may remember learning the shorthand "PEMDAS".) This means that expressions are evaluated in this order:
Python can evaluate parentheses and exponents, as well as a number of additional operators you may not have learned in grade school. Here are the main operators that you might use presented in the order they are evaluated:
Operator | Operation | Example | Evaluation |
---|---|---|---|
** | Exponent/Power | 3 ** 3 | 27 |
% | Modulus/Remainder | 34 % 6 | 4 |
/ | Division | 30 / 6 | 5 |
* | Multiplication | 7 * 8 | 56 |
- | Subtraction | 18 - 4 | 14 |
+ | Addition | 4 + 3 | 7 |
# Try operations in this code cell.
# What happens when you add in parentheses?
All expressions evaluate to a single value. In the above examples, our expressions evaluated to single numerical value. Numerical values come in two basic forms:
An integer, what we sometimes call a "whole number", is a number without a decimal point that can be positive or negative. When a value uses a decimal, it is called a float or floating-point number. Two numbers that are mathematically equivalent could be in two different data types. For example, mathematically 5 is equal to 5.0, yet the former is an integer while the latter is a float.
Of course, Python can also help us manipulate text. A snippet of text in Python is called a string. A string can be written with single or double quotes. A string can use letters, spaces, line breaks, and numbers. So 5 is an integer, 5.0 is a float, but '5' and '5.0' are strings. A string can also be blank, such as ''.
Familiar Name | Programming name | Examples |
---|---|---|
Whole number | integer | -3, 0, 2, 534 |
Decimal | float | 6.3, -19.23, 5.0, 0.01 |
Text | string | 'Hello world', '1700 butterflies', '', '1823' |
The distinction between each of these data types may seem unimportant, but Python treats each one differently. For example, we can ask Python whether an integer is equal to a float, but we cannot ask whether a string is equal to an integer or a float.
To evaluate whether two values are equal, we can use two equals signs between them. The expression will evaluate to either True
or False
.
# Run this code cell to determine whether the values are equal
42 == 42.0
# Run this code cell to compare an integer with a string
15 == 'fifteen'
# Run this code cell to compare an integer with a string
15 == '15'
# Combine the strings 'Hello' and 'World'
# Combine three strings
# Try adding a string to an integer
'55' + 23
# Multiply a string by an integer
A variable is like a container that stores information. There are many kinds of information that can be stored in a variable, including the data types we have already discussed (integers, floats, and string). We create (or initialize) a variable with an assignment statement. The assignment statement gives the variable an initial value.
# Initialize an integer variable and add 22
new_integer_variable = 5
new_integer_variable + 22
The value of a variable can be overwritten with a new value.
# Overwrite the value of my_favorite_number when the commented out line of code is executed.
# Remove the # in the line "#my_favorite_number = 9" to turn the line into executable code.
my_favorite_number = 7
my_favorite_number = 9
my_favorite_number
# Overwriting the value of a variable using its original value
cats_in_house = 1
cats_in_house = cats_in_house + 2
cats_in_house
# Initialize a string variable and concatenate another string
new_string_variable = 'Hello '
new_string_variable + 'World!'
You can create a variable with almost any name, but there are a few guidelines that are recommended.
If we create a variable that stores the day of the month, it is helpful to give it a name that makes the value stored inside it clear like day_of_month
. From a logical perspective, we could call the variable almost anything (hotdog
, rabbit
, flat_tire
). As long as we are consistent, the code will execute the same. When it comes time to read, modify, and understand the code, however, it will be confusing to you and others. Consider this simple program that lets us change the days
variable to compute the number of seconds in that many days.
# Compute the number of seconds in 3 days
days = 3
hours_in_day = 24
minutes_in_hour = 60
seconds_in_minute = 60
days * hours_in_day * minutes_in_hour * seconds_in_minute
We could write a program that is logically the same, but uses confusing variable names.
hotdogs = 60
sasquatch = 24
example = 3
answer = 60
answer * sasquatch * example * hotdogs
This code gives us the same answer as the first example, but it is confusing. Not only does this code use variable names that are confusing, it also does not include any comments to explain what the code does. It is not clear that we would change example
to set a different number of days. It is not even clear what the purpose of the code is. As code gets longer and more complex, having clear variable names and explanatory comments is very important.
# Which of these variable names are acceptable?
# Comment out the variables that are not allowed in Python and run this cell to check if the variable assignment works.
# If you get an error, the variable name is not allowed in Python.
$variable = 1
a variable = 2
a_variable = 3
4variable = 4
variable5 = 5
variable-6 = 6
variAble = 7
Avariable = 8
The three rules above describe absolute rules of Python variable naming. If you break those rules, your code will create an error and fail to execute properly. There are also style guidelines that, while they won't break your code, are generally advised for making your code readable and understandable. These style guidelines are written in the Python Enhancement Proposals (PEP) Style Guide.
The current version of the style guide advises that variable names should be written:
lowercase, with words separated by underscores as necessary to improve readability.
If you have written code before, you may be familiar with other styles, but these notebooks will attempt to follow the PEP guidelines for style. Ultimately, the most important thing is that your variable names are consistent so that someone who reads your code can follow what it is doing. As your code becomes more complicated, writing detailed comments with #
will also become more important.
Many different kinds of programs often need to do very similar operations. Instead of writing the same code over again, you can use a function. Essentially, a function is a small snippet of code that can be quickly referenced. There are three kinds of functions:
We'll address functions you write yourself in Python Basics 2. For now, let's look at a few of the native functions. One of the most common functions used in Python is the print()
function which simply prints a string.
# A print function that prints: Hello World!
print('Hello World!')
# Define a string and then print it
our_string = 'Hello World!'
print(our_string)
There is also an input()
function for taking user input.
# A program to greet the user by name
print('Hi. What is your name?') # Ask the user for their name
user_name = input() # Take the user's input and put it into the variable user_name
print('Pleased to meet you, ' + user_name) # Print a greeting with the user's name
We defined a string variable user_name
to hold the user's input. We then called the print()
function to print the concatenation of 'Pleased to meet you, ' and the user's input that was captured in the variable user_name
. Remember that we can use a +
to concatenate, meaning join these strings together.
Here are couple more tricks we can use. You can pass a string variable into the input
function for a prompt and you can use an f string
to add the variable into the print string without use the +
operator to concatenate both strings.
# A program to greet the user by name
user_name = input('Hi. What is your name? ')
print(f'Pleased to meet you, {user_name}')
We can concatenate many strings together, but we cannot concatenate strings with integers or floats.
# Concatenating many strings within a print function
print('Hello, ' + 'all ' + 'these ' + 'strings ' + 'are ' + 'being ' + 'connected ' + 'together.')
# Trying to concatenate a string with an integer causes an error
print('There are ' + 7 + 'continents.')
print('There are ' + str(7) + ' continents.')
# A program to tell a user how many months old they are
user_age = input('How old are you? ') # Take the user input and put it into the variable user_age
number_of_months = user_age * 12 # Define a new variable number_of_months that multiplies the user's age by 12
print('That is more than ' + number_of_months + ' months old!' ) # Print a response that tells the user they are at least number_of_months old
Congratulations! You have completed Python Basics 1. There are two more lessons in Python Basics:
If you would like to check your understading of this lesson, you can take this quick quiz.