Reproducibility: when someone else (e.g., future self) can obtain the same outcomes from the same dataset and analysis
This semester, we will incorporate the fundamentals of version control, the process by which all changes to code, text, and files are tracked. In this manner, we're also able to maintain data and information to support collaborative projects, but to also make sure your analyses are preserved.
Before coming to class, you were asked to create a GitHub.com account. GitHub is the web hosting platform for maintaining our Git repositories. Our version control system for the purposes of this course is Git.
When you open your JupyterLab container, you will see the JupyterLab interface. Documentation on the interface is provided here:
https://jupyterlab.readthedocs.io/en/stable/user/interface.html
A Jupyter Notebook is an analog to an R Markdown document. It too can include text chunks and R code chunks that can be viewed together. A few unique aspects of notebooks over Rmd files are:
Pretty much, after a small learning curve, notebooks and the JuptyerLab interface should become fairly intuitive. So, rather than write all this down here, we'll do some hands-on work that will be recorded for your benefit...
1 + 1
1 - 1
2 * 2
1 / 2
1 / 200 * 30
5 + 2 * 3
(5 + 2) * 3
import math #we need to import the `math` package for this
math.sqrt(25)
math.sin(3)
math.pi
import statistics as stats #we need to import the statistics package for this
stats.mean([5, 4, 6, 4, 6])
stats.median([5, 4, 6, 4, 6])
4 > 5
4 < 5
4 != 5
4 == 5
Python does not use R's an assignment statement; it just uses =
.
x = 3*4
Now, call up the object x
.
x
Unlike R-Studio, Jupyter lab does not have a built in variable explorer. (There are extensions for this, but we won't go into those here...) However, we can run the %whos
command to reveal all named objects in our current session (including packages).
%whos
Python objects can be named with a combination of letters, numbers, and underscore (_
) - BUT NO PERIODS (.
). The best object names are informative. Resist the temptation to call your object something convenient, like "a", "b", and so on. Calling your object something specific means that you can call up that object later and have an idea of what it contains, with less need for specific context.
Informative names are the first illustration of a common data management recommendation: take the time to use best management practices at the outset, and it will save you time in the long term.
Run the first code cell below. Then, type in "long" and press tab
. What happens?
long_name_for_illustration = 11
What happens if there is a typo in your code?
Type the following in the R window:
Long_name_for_illustration
longnameforillustration
Within your Python code, it is often useful to include notes about your workflow. So that these aren't interpreted by the software as code, precede the notes with a #
sign. Your editor will display this comment as a different color to indicate it will not be run in the console. Comments can be placed on their own lines or at the end of a line of code.
# I am demonstrating a comment here.
1 + 1 # This is a simple math problem
Python functions are the major tool. Functions can do virtually unlimited things within the Python universe, but each function requires specific inputs that are provided under specific syntax. We will start with a simple function that is built into Python, len()
, which returns the length of an object.
len("ABCDEF")
To mimic the code in the R counterpart of this document, we actually need two functions in Python. The range()
function works like R's seq()
function, but it returns a "range" object, not a vector. To conver this to a vector we coerce the range object into a list with the list()
function...
list(range(10))
ten_sequence = list(range(10))
ten_sequence
list(range(1,10,2))
?range
The basic form of a function is functionname()
, and the packages we will use in this class will use these basic forms. However, there may be situations when you will want to create your own function. Below is a description of how to write functions through the metaphor of creating a recipe (credit: @IsabellaGhement on Twitter).
Writing a function is like writing a recipe. Your function will need a recipe name (functionname). Your recipe ingredients will go inside the parentheses. The recipe steps and end product go inside the curly brackets.
→ Note that Python does not use curly braces "{ }" to indicate which code goes into the function. Instead it uses indentation: all indented code will be part of the function's code...
def functionname():
statement_1
statement_2
return(result)
♦ A single ingredient recipe:
# Write the recipe
def recipe1(x):
mix = x*2
return(mix)
# Bake the recipe
simplemeal = recipe1(5)
# Serve the recipe
simplemeal
♦ Two single ingredient recipes, baked at the same time:
def recipe2(x):
mix1 = x*2
mix2 = x/2
return([mix1, #comma indicates we continue onto the next line, as long as values are between ( ).
mix2])
doublesimplemeal = recipe2(6)
doublesimplemeal
♦ Two double ingredient recipes, baked at the same time:
def recipe3(x, f):
mix1 = x*f
mix2 = x/f
return([mix1,mix2])
doublecomplexmeal = recipe3(x = 5, f = 2)
doublecomplexmeal
#Show the first item in the returned list
doublecomplexmeal[0]
♦Make a recipe based on the ingredients you have
def recipe4(x):
if(x < 3):
return x*2
else:
return x/2
def recipe5(x):
if(x < 3): return x*2
elif(x > 3): return x/2
else: return x
meal = recipe4(4); meal
meal2 = recipe4(2); meal2
meal3 = recipe5(3); meal3