The IPython Notebook

A Comprehensive Tool for Data Science

Brian E. Granger

Strata 2013

About me

Physics Professor

  • I teach Physics and do research with undergraduates
  • Cal Poly, San Luis Obispo
  • Research in theoretical and computational quantum mechanics and quantum computing
  • A user of scientific and technical computing tools and libraries

Open source hacker

  • I am a developer of tools and libraries for scientific and technical computing
  • Creator and lead developer of the IPython Notebook
  • Creator of PyZMQ (Python bindings to ZeroMQ)
  • Contributor to SymPy (symbolic mathematics library for Python)

CEO and co-founder

  • Chronicle Labs
  • Early stage startup
  • Building a web/cloud-based environment for interactive and collaborative computing

Telling stories with code and data

Data science is about more than just code and data.

It is about telling stories with code and data.

These data driven stories include other types of content:

  • Narrative text
  • Images and video
  • Equations
  • Plots and other visualizations

We go through different phases as we tell these stories:

  • Individual exploration
  • Collaborative development
  • Production execution
  • Debugging
  • Publication
  • Presentation
  • Education

Telling stories with code and data

We need to tell these stories in different contexts:

  • In collaboration with colleagues
  • In written publications
  • In talks/presentations
  • In the classroom
  • In meetings
  • In boardrooms
  • On the internet

How do you tell your data driven stories?

Telling stories with code and data is super painful

  • A complex set of tools:
    • Matlab Mathematica Excel Python R MPI Hadoop sed awk grep Perl Bash git svn Word LaTeX Powerpoint Keynote gnuplot C C++ Java Hive Pig
    • Some of which are expensive and proprietary
    • Users often have to install and maintain all of these tools on all of their computers
  • Painful workflow:
    • Complex pipeline of tools
    • Difficult to reproduce
    • Detached from the final presentation formats
      • Do you ever have to manually copy and paste content into Word, PowerPoint, HTML?
      • What if the content changes?
    • Leads to an incomprehensible tangle of code, data, emails, documents, etc.
    • Many of the products are not version control friendly
  • Difficult to collaborate with others
  • Painful to communicate computational results

The IPython Notebook

The IPython Notebook is an open source (BSD) tool for telling stories with code and data that are:

  • Interactive
  • Exploratory
  • Collaborative
  • Open
  • Reproducible

The IPython project

  • IPython: open source (BSD) interactive computing environment in Python
  • History:
  • $>$ 20 person years of development, $>$ 150 contributors
  • IPython is the de facto standard environment for interactive work in Python
  • Funded by:
    • Mostly by volunteers
    • NASA, DOD/DRC, NIH
    • Microsoft, Enthought
    • Alfred P. Sloan Foundation ($1.15 million dollar grant starting in Jan. 2013)
  • Components:
    • IPython Kernel
      • Stateful computation engine
      • Runs code and returns results
      • Uses language agnostic JSON based message protocol over ZeroMQ/WebSockets
    • Frontends:
      • Terminal Console
      • Qt Console
      • Notebook
    • Parallel computing framework

Terminal Console

This shows the classic terminal based IPython with Matplotlib for interactive plotting:

Qt Console

The Qt Console adds inline plotting, syntax highlighting, multiline editing, rich output display and the two process Kernel model.

What is the IPython Notebook?

An open, JSON based document format

In [1]:
from IPython.nbformat import current
with open('StrataIPythonSlides.ipynb') as f:
    nb = current.read(f,'json')
In [2]:
nb.worksheets[0].cells[0:5]
Out[2]:
[{u'cell_type': u'heading',
  u'level': 1,
  u'metadata': {u'slideshow': {u'slide_type': u'slide'}},
  u'source': u'The IPython Notebook'},
 {u'cell_type': u'heading',
  u'level': 1,
  u'metadata': {},
  u'source': u'A Comprehensive Tool for Data Science'},
 {u'cell_type': u'heading',
  u'level': 1,
  u'metadata': {u'slideshow': {u'slide_type': u'fragment'}},
  u'source': u'Brian E. Granger'},
 {u'cell_type': u'heading',
  u'level': 1,
  u'metadata': {},
  u'source': u'Strata 2013'},
 {u'cell_type': u'markdown',
  u'metadata': {u'slideshow': {u'slide_type': u'slide'}},
  u'source': u'<img src="files/figures/calpoly_logo.png" width=400/>'}]

Notebook documents

  • Are stored as files in your local directory
  • Can store:
    • Code in any language
    • Text (Markdown)
    • Equations (LaTeX)
    • Images
    • Links to video
    • HTML
  • Can be version controlled
    • Change 1 line of code, get a 1 line diff
  • Can be viewed by anyone online without IPython installed (http://nbviewer.ipython.org/)
  • Can be exported to HTML, Markdown, reStructured Text, LaTeX, PDF
  • Can be viewed as slideshows with live computations

What is the IPython Notebook?

A web-based UI for writing and running code

We try to make writing code pleasant:

  • Tab completion
  • Integrated help
  • Syntax highlighting
  • Civilized multiline editing
  • Interactive shorthands (aliases, magics)
In [3]:
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
In [4]:
plot(rand(50))
Out[4]:
[<matplotlib.lines.Line2D at 0x108e14650>]

Not just Python code though! Though cell magics (%%) the Notebook supports running code in other languages:

  • R
  • Octave
  • Cython
  • Bash
  • Perl
  • Ruby
  • Julia
  • etc.
In [5]:
%%bash
echo "Hi there Strata!"
Hi there Strata!

IPython message protocol and Notebook web application is language independent:

  • The IPython Kernel can be replaced by kernels in other languages
  • Initial work has begun on Ruby, node.js
  • Plans for native R, Julia, MATLAB kernels

This is not just about Python! While we are passionate about Python, we deeply believe that data science is a multi-language enterprise and any tool that ignores that is fatally flawed.

Live demo Notebooks

In the live talk, I will show some live Notebook examples.

If you are viewing this slide show later, here are static versions of IPython's demo notebooks:

IPython Demo Notebooks

What are users doing with IPython Notebooks?

They are:

  • Telling data driven stories
  • Using the Notebook as their daily working environment for data science
  • Sharing their Notebooks on GitHub and http://nbviewer.ipython.org

We have collected some of the highlights in our notebook gallery

An ecosystem of third party tools

IPython provides an open source foundation for an ecosystem of tools and products

The future

  • Notebook conversion (2012)
    • To and from a wide range of formats
    • HTML, LaTeX, PDF, Markdown, reStructured Text
  • Interactive widgets (2012)
    • The IPython kernel already knows how to send JSON representations of objects to the browser
    • We are developing an architecture that will allow interactive JavaScript widgets and Python objects to pass JSON data back and forth.
    • Imagine all of your d3 visualizations backed by the power of Python/NumPy/SciPy/Pandas/etc.
    • Automatic generation of UI controls
  • Multiuser Notebook server (2013)
    • Small to medium groups of users
    • Trusted users - you would give these folks shell accounts

See our development roadmap for details

More information

  • Visit the IPython website
  • IPython is completely open source so you can download it and play with it today
  • The IPython Notebook is a single user web application that you run on your local computer

      $ipython notebook
  • Quick installation instructions

  • Strata Office Hours
    • Myself and Wes McKinney (author of Pandas)
    • Today, 3 pm, Expo Hall, Table B
    • All things related to Python and data science
  • Strata BOF on Python
    • Thursday, lunch, Expo Hall, Table 6
  • I am ellisonbg on GitHub, Twitter, Gmail

Chronicle Labs

chronicle.io

Team: founded by the folks who created and lead the IPython project

  • Brian Granger (CEO and co-founder)
  • Fernando Perez (co-founder)
  • Min Ragan-Kelley (co-founder)
  • Stefan van der Walt

Vision: Simple, collaborative and interactive computing in the cloud based on the IPython Notebook

Focus:

  • Scientific and technical computing
  • Data science $\neq$ Big Data
  • Education

More details coming soon...visit our website and sign up for further information...

Conclusion

The IPython Notebook provides an open source environment and foundation for telling stories with code and data

It puts the fun back into working with code and data

Formatting

In [1]:
from IPython.display import HTML
HTML("""
<style> 

div.cell {
  width: 940px;
  margin-left: auto;
  margin-right: auto;
}

.rendered_html {
  font-size: 123%;
}

</style>""")
Out[1]: