Information Retrieval Lab: Jupyter Tutorial

(Re)sources:

Obligatory Wikipedia excerpts:

Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R, and also a homage to Galileo's notebooks recording the discovery of the moons of Jupiter

Jupyter is language agnostic and it supports execution environments (aka kernels) in several dozen languages among which are Julia, R, Haskell, Ruby, and of course Python (via the IPython kernel).

A notebook interface (also called a computational notebook) is a virtual notebook environment used for literate programming.

Literate programming is a programming paradigm introduced by Donald Knuth in which a computer program is given an explanation of its logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which compilable source code can be generated.

The literate programming paradigm, as conceived by Knuth, represents a move away from writing computer programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.

The Jupyter Notebook has two different keyboard input modes. Edit mode allows you to type code or text into a cell and is indicated by a green cell border. Command mode binds the keyboard to notebook level commands and is indicated by a grey cell border with a blue left margin.

Esc: Move from edit mode to command mode

Enter: Move from command mode to edit mode

⚠️ Note that there are two kinds of notebook cells. What are you are reading right now are the contents of a Markdown Cell. Later, we will enter python code into a Code Cell.

⚠️ H: List all keyboard shortcuts
⚠️ P: Open the searchable Command Palette

Useful keyboard shortcuts in command mode:

dd: Delete highlighted cell
Space: Go to bottom
Shift+Space: Go to top
(Shift+)L: Toggle (all) line numbers
M: Convert cell to Markdown Cell
Y: Convert cell to Code Cell
C: Copy Cell
P: Paste Cell
X: Cut Cell

Useful keyboard shortcuts in edit mode:

Shift+Enter: Execute highlighted cell then highlight the cell below it
Ctrl+Enter: Execute highlighted cell and stay there
Alt+Enter: Execute highlighted, create cell below and insert cursor into it
Ctrl+]: Indent selection
Ctrl+[: Dedent selection
Ctrl+/: Toggle comment ON/OFF (Both Markdown and Code)
Shift+Tab: Display function doctstring

Running Bash Commands:

In a code cell, prefixing a command with ! is equivalent to running it in a terminal. Try running the following commands:

In [ ]:
!whoami
In [ ]:
!pwd
In [ ]:
!ls -lah

You can use python variables and pass those to your bash commands by using $ or {.}

In [ ]:
a = "/"
!ls $a
#!ls {a}

You can even assign the returned output of a bash command to a python variable

In [ ]:
a = "/"
b = !ls $a
print(b)

People usually use this to install packages into their environments using the python package manager pip:

In [ ]:
!pip install numpy

Notebook Magics:

Commands specific to the IPython kernel and prefixed with a % (line magic) or a %% (cell magic)

Line Magics:

Prefix a line with % to enable behavior that affects that line.

Cell Magics:

Prefix with %% to enable behavior that affects the entire cell. (Has to be the first thing in the cell)

⚠️ You can use the lsmagic magic to list all available magics:

In [ ]:
%lsmagic

⚠️ You can prepend any magic command (or in fact any python object) with a ? to bring up information in the pager.

In [ ]:
%time?
In [ ]:
%timeit?
In [ ]:
%prun?
In [ ]:
%load_ext?

⚠️ You can define your own magic commands or install and enable additional ones:

In [ ]:
!pip install -U memory_profiler
In [ ]:
%load_ext memory_profiler
In [ ]:
%memit?
In [ ]:
%mprun?
In [ ]:
%time a=[i**2 for i in range(100)];
In [ ]:
%timeit a=[i**2 for i in range(100)];
In [ ]:
%%time
a=[i**2 for i in range(1000)]
b=[i**i for i in range(1000)]
In [ ]:
%%timeit
a=[i**2 for i in range(1000)]
b=[i**i for i in range(1000)]
In [ ]:
%prun a=[i**2 for i in range(10000000)]
In [ ]:
%memit a=[i**2 for i in range(10000000)]
In [ ]:
%%writefile?
In [ ]:
%%writefile memory_profiler_demo.py
def list_comp(N=1000):
    a = [i**2 for i in range(N)] # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81,...]
    b = [i**3 for i in range(N)] # [0, 1, 8, 27, 64, 125, 216, 343, 512, 729,...]
    c = [(i+j) for i,j in zip(a,b)] # [0, 2, 12, 36, 80, 150, 252, 392, 576, 810, ...]
    return c
In [ ]:
list_comp(215)
In [ ]:
from memory_profiler_demo import list_comp
%mprun -f list_comp list_comp(15000)
In [ ]:
%%perl

@list = (1,2,3,4,5);
foreach $a (@list) {
    print "$a\n";
}
In [ ]:
%%bash
for i in 1,2,3,4,5; do
    echo $i
done
In [ ]:
%%svg
<!-- Source: https://www.w3schools.com/graphics/tryit.asp?filename=trysvg_path2 -->        
<svg height="400" width="450">
<path id="lineAB" d="M 100 350 l 150 -300" stroke="red" stroke-width="3" fill="none" />
<path id="lineBC" d="M 250 50 l 150 300" stroke="red" stroke-width="3" fill="none" />
<path d="M 175 200 l 150 0" stroke="green" stroke-width="3" fill="none" />
<path d="M 100 350 q 150 -300 300 0" stroke="blue" stroke-width="5" fill="none" />
<!-- Mark relevant points -->
<g stroke="black" stroke-width="3" fill="black">
<circle id="pointA" cx="100" cy="350" r="3" />
<circle id="pointB" cx="250" cy="50" r="3" />
<circle id="pointC" cx="400" cy="350" r="3" />
</g>
<!-- Label the points -->
<g font-size="30" font-family="sans-serif" fill="black" stroke="none" text-anchor="middle">
<text x="100" y="350" dx="-30">A</text>
<text x="250" y="50" dy="-10">B</text>
<text x="400" y="350" dx="30">C</text>
</g>
</svg>

Notebook Aesthetics and Rich Media:

There are 23 classes in the display module

The display function is implicitly called on the last expression of a cell

In [12]:
1
2
3
Out[12]:
3
In [15]:
display(3)
3
In [14]:
1
2
3; # Use a ; to suppress that functionality
In [21]:
[1 for thing in range(100)]; #Sometimes display does weird things, try removing the semicolon
In [22]:
print([1 for thing in range(100)]) #Print works better in this case
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
In [23]:
display?
In [ ]:
#MAKE PAGE WIDER
from IPython.display import HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

Displaying Tables:

A B C
123 456 789
123 456 789
123 456 789
123 456 789



Header left Header center Header right
This is left aligned centered This text is right aligned
left also centered And more

Displaying equations and Math notation with MathJax:

MathJax Code:

\begin{equation*}
\mathbf{V}_1 \times \mathbf{V}_2 =  \begin{vmatrix}
\mathbf{i} & \mathbf{j} & \mathbf{k} \\
\frac{\partial X}{\partial u} &  \frac{\partial Y}{\partial u} & 0 \\
\frac{\partial X}{\partial v} &  \frac{\partial Y}{\partial v} & 0
\end{vmatrix}
\end{equation*}

Rendered Display:

\begin{equation*} \mathbf{V}_1 \times \mathbf{V}_2 = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \frac{\partial X}{\partial u} & \frac{\partial Y}{\partial u} & 0 \\ \frac{\partial X}{\partial v} & \frac{\partial Y}{\partial v} & 0 \end{vmatrix} \end{equation*}

You can also feature your equations, such as the Cauchy-Schwarz Inequality, $\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)$ inline as part of a sentence by enclosing the expression in $.

Display an algorithm using Markdown and inline MathJax:

source: https://ai.meta.stackexchange.com/questions/1679/writing-algorithm-formulas-using-mathjax

Algorithm parameters: step size $\alpha \in (0 , 1] , \epsilon > 0$
Initialize $Q ( s, a ), \ \forall s \in S^+ , a \in A ( s ),$ arbitrarily except that $Q ( terminal , \cdot ) = 0$

Loop for each episode:
$\quad$Initialize $S$
$\quad$Loop for each step of episode:
$\qquad$Choose $A$ from $S$ using some policy derived from $Q$ (eg $\epsilon$-greedy)
$\qquad$Take action $A$, observe $R, S'$ | $\qquad Q(S,A) \leftarrow Q(S,A) + \alpha[R+\gamma \max_a(S', a) - Q(S, A)]$
$\qquad S \leftarrow S'$
$\quad$ until $S$ is terminal

Play Sound:

In [ ]:
from IPython.display import Audio
import numpy as np
framerate = 44100
t = np.linspace(0,5,framerate*5)
dataleft = np.sin(2*np.pi*220*t)
dataright = np.sin(2*np.pi*224*t)
Audio([dataleft, dataright],rate=framerate)

Embed a Youtube Video:

In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo('2eCHD6f_phE')

Embed a Photo:

Embed an arbitrary IFrame:

In [ ]:
from IPython.display import IFrame
IFrame('https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf', width=700, height=600)