This notebook contains the code examples for Andreas Zeller's keynotes:
Go and
Notebooks – rich, interactive documents that join together code, documentation, and outputs – are all the rage with data scientists. But can they be used for actual software development? In this talk, I share experiences from authoring two interactive textbooks – fuzzingbook.org and debuggingbook.org – and show how notebooks not only serve for exploring and explaining code and data, but also how they can be used as software modules, integrating self-checking documentation, tests, and tutorials all in one place. The resulting software focuses on the essential, is well-documented, highly maintainable, easily extensible, and has a much higher shelf life than the "duct tape and wire” prototypes frequently found in research and beyond.
import bookutils.setup
from Tracer import Tracer
from typing import Any, Callable
from types import FrameType
from inspect import signature, getmembers
class SequenceDiagramTracer(Tracer):
def __init__(self, client='Client', server='Server'):
super().__init__()
self.lines = []
self.client = client
self.server = server
def traceit(self, frame: FrameType, event: str, arg: Any):
if event == 'call':
func = frame.f_code.co_name
args = frame.f_locals
args_line = ", ".join(reversed([var + "=" + repr(args[var])
for var in args]))
line = f'{self.client}->>+{self.server}: {func}({args_line})'
self.lines.append(line)
if event == 'return':
line = f'{self.server}-->>-{self.client}'
if arg is not None:
line += f': {repr(arg)}'
self.lines.append(line)
def _repr_markdown_(self) -> str:
return '\n'.join(['```mermaid'] +
['sequenceDiagram'] +
self.lines +
['```'])
class ClassDiagram():
def __init__(self, cls):
self.cls = cls
def methods_str(self):
members = [(name, fun) for (name, fun) in getmembers(self.cls)
if not name.startswith('_')]
attributes = '\n'.join([f' -{name} = {repr(member)}'
for (name, member) in members if not callable(member)
])
methods = '\n'.join([f' +{name}{str(signature(member)).replace(" -> ", " ")}'
for (name, member) in members if callable(member)
])
return attributes + '\n' + methods
def _repr_markdown_(self) -> str:
return f"""
```mermaid
classDiagram
direction TD
class Server {{
{self.methods_str()}
}}
```"""
We do a bit of Jupyter demo. Double-click on a cell to edit it. Press Shift
+Return
to execute/render it.
factorial(n)
computes the factorial of n, that is $n! = \prod_{i=1}^n i = 1 \times 2 \times \dots \times n$.
def factorial(n: int) -> int:
return 1 if n <= 1 else n * factorial(n - 1)
factorial(3)
6
assert factorial(3) == 6
We define a function middle(x, y, z)
that returns the "middle" of three integers $x$, $y$, and $z$ – i.e. the one that is neither the maximum nor the minimum of the three.
We show how to use notebooks to
Let us define an interface for middle()
:
def middle_i(x: int, y: int, z: int) -> int:
"""Return the middle of three numbers x, y, z"""
...
Here's an (executable) specification of middle()
:
def middle_spec(x: int, y: int, z: int) -> int:
return sorted([x, y, z])[1]
This specification is executable, so we can easily include examples:
middle_spec(5, 3, 7)
5
Or just write the examples as assertions, so we can use them as tests later:
assert middle_spec(5, 4, 7) == 5
Of course, your specification can also include all sorts of diagrams. (Install jupyterlab-markup for this.)
mermaid
sequenceDiagram
Client->>+Server: middle(5, 4, 7)
Server-->>-Client: 5
For the record, this diagram is created with five lines of Markdown:
```mermaid
sequenceDiagram
Client->>+Server: middle(5, 4, 7)
Server-->>-Client: 5
```
Let us now provide an efficient implementation for middle()
:
def middle(x: int, y: int, z: int) -> int:
"""Return the middle of three numbers x, y, z"""
if y < z:
if x < y:
return y
elif x < z:
return x
else:
if x > y:
return y
elif x > z:
return x
return z
Once written, this is executable:
middle(5, 4, 1)
4
Tests and results become part of the doc!
Why do we implement middle()
as above, rather than using the much shorter middle_spec()
code? Because it is about twice as fast:
import time
import random
def middle_benchmark(middle_fun: Callable[[int, int, int], int],
n: int = 1000000) -> float:
"""Return elapsed time for calling `middle_fun` `n` times"""
elapsed = 0.0
for i in range(n):
x = random.randint(0, 1000)
y = random.randint(0, 1000)
z = random.randint(0, 1000)
start = time.perf_counter()
_ = middle_fun(x, y, z)
end = time.perf_counter()
elapsed += (end - start)
return elapsed
middle_elapsed = middle_benchmark(middle)
middle_elapsed
0.15140068903929205
middle_spec_elapsed = middle_benchmark(middle_spec)
middle_spec_elapsed
0.25074575236430974
The document can include all these experiments and their results as a rationale, as above
The document can also discuss and evaluate more alternatives, reproducing the thoughts and experiments of the original programmer
We can have the document check automatically whether the rationale holds:
assert middle_elapsed < middle_spec_elapsed, "Inconsistent doc"
assert middle_elapsed * 1.2 < middle_spec_elapsed, "Inconsistent doc"
This ensures consistency between text and code.
Tests can be written as additional examples on how the code should work:
assert middle(1, 2, 3) == 2
assert middle(3, 1, 2) == 2
assert middle(2, 3, 1) == 2
If a test fails, that's the same as an example failing. (And examples act as tests.)
One can analyze (and report) test performance, again in the document – for instance, measure the coverage of our code (*
= line is covered during testing)
from StatisticalDebugger import CoverageCollector, code_with_coverage
with CoverageCollector() as c:
assert middle(1, 2, 3) == 2
assert middle(3, 1, 2) == 2
assert middle(2, 3, 1) == 2
code_with_coverage(middle, c.coverage())
1 * def middle(x: int, y: int, z: int) -> int: 2 """Return the middle of three numbers x, y, z""" 3 * if y < z: 4 * if x < y: 5 * return y 6 * elif x < z: 7 return x 8 else: 9 * if x > y: 10 return y 11 * elif x > z: 12 * return x 13 * return z
It seems we need to add more tests to cover all lines:
with c:
assert middle(2, 1, 3) == 2
assert middle(3, 2, 1) == 2
And we achieve 100% coverage:
code_with_coverage(middle, c.coverage())
1 * def middle(x: int, y: int, z: int) -> int: 2 """Return the middle of three numbers x, y, z""" 3 * if y < z: 4 * if x < y: 5 * return y 6 * elif x < z: 7 * return x 8 else: 9 * if x > y: 10 * return y 11 * elif x > z: 12 * return x 13 * return z
Assumptions about coverage can be made in the document, too:
assert len(c.coverage()) >= 11
One can check against the spec, again in the document. Here we compare middle()
against middle_spec()
with 100,000 random numbers.
for i in range(100000):
x = random.randint(0, 1000)
y = random.randint(0, 1000)
z = random.randint(0, 1000)
assert middle(x, y, z) == middle_spec(x, y, z)
All these tests can be run (and debugged) right from the document.
One can also include static checks or symbolic verification. Here, we encode the path constraints from the middle()
code for the Z3 constraint solver:
from z3 import *
s = Solver() # Create a Z3 solver with four variables
x, y, z = Int('x'), Int('y'), Int('z')
m = Int('middle')
s.add(Implies(And(y < z, x < y), m == y)) # Encode the middle() constraints
s.add(Implies(And(y < z, x >= y), m == x))
s.add(Implies(And(y >= z, x > y), m == y))
s.add(Implies(And(y >= z, x <= y), m == x))
s.add(Implies(And(Not(x < y), Not(x < z), Not(x > y), Not(x > z)), m == z))
We can actually prove correctness this way: Is it possible that middle()
returns a wrong result? (no.)
s.add(And(x < y, y < z, m != y)) # This shouldn't be possible
s.check()
We can extract architecture diagrams such as a class diagram from code, always kept up to date:
class Server():
state = 42
def middle(x: int, y: int, z: int) -> int:
return middle(x, y, z)
def min(x: int, y: int, z: int) -> int:
return min(x, y, z)
def max(x: int, y: int, z: int) -> int:
return max(x, y, z)
ClassDiagram(Server)
mermaid
classDiagram
direction TD
class Server {
-state = 42
+max(x: int, y: int, z: int) int
+middle(x: int, y: int, z: int) int
+min(x: int, y: int, z: int) int
}
We can extract dynamic diagrams from executions:
with SequenceDiagramTracer() as tracer:
m = middle(30, 50, 20)
tracer
mermaid
sequenceDiagram
Client->>+Server: middle(z=20, y=50, x=30)
Server-->>-Client: 30
Client->>+Server: __del__(self=middle)
Client->>+Server: ref(self=<z3.z3.Context object at 0x166fff3a0>)
Server-->>-Client: <ContextObj object at 0x166ed1540>
Client->>+Server: ref(self=<z3.z3.Context object at 0x166fff3a0>)
Server-->>-Client: <ContextObj object at 0x166ed1540>
Client->>+Server: as_ast(self=middle)
Server-->>-Client: <Ast object at 0x166ed1840>
Client->>+Server: Z3_dec_ref(_elems=<z3.z3core.Elementaries object at 0x166f83b20>, a1=<Ast object at 0x166ed1840>, a0=<ContextObj object at 0x166ed1540>)
Client->>+Server: from_param(obj=<ContextObj object at 0x166ed1540>)
Server-->>-Client: <ContextObj object at 0x166ed1540>
Client->>+Server: from_param(obj=<Ast object at 0x166ed1840>)
Server-->>-Client: <Ast object at 0x166ed1840>
Server-->>-Client
Server-->>-Client
One may even compare this diagram with the one in the specification and flag mismatches.
The document can contain instructions on how to run things. (Of course, these would be executable too, testing the tutorial.)
To use middle
, you need Python 3.9 or later. First install the debuggingbook
module, available via the Python pip
program:
!pip install --quiet debuggingbook
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pytest-asyncio 0.19.0 requires pytest>=6.1.0, but you have pytest 3.10.1 which is incompatible. pytest-html 3.1.1 requires pytest!=6.0.0,>=5.0, but you have pytest 3.10.1 which is incompatible. pytest-rerunfailures 10.2 requires pytest>=5.3, but you have pytest 3.10.1 which is incompatible. pytest-xdist 2.5.0 requires pytest>=6.2.0, but you have pytest 3.10.1 which is incompatible.
Within debuggingbook
, the StatisticalDebugger
provides a middle()
function, but it is buggy (as it serves to demonstrate statistical debugging). A correct version is available as middle_fixed()
, which you can import as middle()
as follows:
from debuggingbook.StatisticalDebugger import middle_fixed as middle
middle(5, 4, 1)
4
from z3 import *
s = Solver() # Create a Z3 solver with four variables
x, y, z = Int('x'), Int('y'), Int('z')
m = Int('middle')
s.add(Implies(And(y < z, x < y), m == y)) # Encode the middle() constraints
s.add(Implies(And(y < z, x >= y), m == x))
s.add(Implies(And(y >= z, x > y), m == y))
s.add(Implies(And(y >= z, x <= y), m == x))
s.add(Implies(And(Not(x < y), Not(x < z), Not(x > y), Not(x > z)), m == z))
We can actually prove correctness this way: Is it possible that middle()
returns a wrong result? (no.)
s.add(And(x < y, y < z, m != y)) # This shouldn't be possible
s.check()
The document can contain sections with questions and answers. These would be managed by the public, and continuously ensure consistency.
middle()
?¶Question. I use middle
from the StatisticalDebugger
module. However, it doesn't seem to return the correct value. What am I doing wrong? -- novice@python.net
from StatisticalDebugger import middle
middle(2, 1, 3) # should be 2
1
Best Answer (+10). You need to import middle_fixed
instead. -- expert@debugging.com
from StatisticalDebugger import middle_fixed as middle
middle(2, 1, 3) # should be 2
2
Answer (+5) Don't use middle(x, y, z)
-- just write sorted([x, y, z])[1]
-- say.no.to@libraries.com
sorted([2, 1, 3])[1]
2
Comment (-2) Actually, sort()
is more efficient. -- cpluspluslover@programming.com
xyz = list([2, 1, 3]); xyz.sort(); xyz[1]
2
Comment (+2) sort()
takes three lines of code, whereas sorted()
takes one. Also, avoid multiple statements on a line -- haskellfan@beautifulcode.org
Question. When I run the above middle()
code, I get a message
SyntaxError: invalid syntax
. What am I doing wrong? -- appleuser@mac.com
Best Answer (+10). Are you using Python 2.x? Type annotations work with Python 3.6 and later -- updates@python.org
largest()
and smallest()
function too?¶Question. How do I get the greatest (or smallest) of $x$, $y$, and $z$? -- novice@python.net
Best Answer (+10). Try Python max()
and min()
-- guido@python.org
max(2, 1, 3) # should be 3
3
min(2, 1, 3) # should be 1
1
Install jupyterlab-markup
mermaid
graph TD;
A-->B;
A-->C;
B-->D;
C-->D;
from IPython.display import display, Markdown
Markdown("""
```mermaid
graph TD;
A-->B;
B-->A;
A-->C;
B-->D;
C-->D;
```""")
mermaid
graph TD;
A-->B;
B-->A;
A-->C;
B-->D;
C-->D;
Influences:
Our goal: Have a single document that encompasses
in an executable and self-updating way.
I
is for I
llustrate¶Every piece of code should come with an example that illustrates is usage.
L
is for L
avish¶Go beyond typewriter text. Make use of all media modern tools have to offer – diagrams! charts! videos! Tie these to your code and examples, such that they stay in sync.
L
is for L
og¶Do not just describe the final version. Discuss alternatives you have tried (and revoked, and why). This will be helpful for understanding later.
Note down your ideas and plans during development + testing
Also log during debugging. Keep track of your experiments and their results. Turn these into test cases later.
U
is for U
pdate¶Make your document the single place to be updated.
S
is for S
piralize¶Focus on the essentials first, and add details later. Give the reader moments where they can stay and recapitulate what they learned. Make these abstraction layers in your code, such that readers can choose what to use (and what to read, too!) Proceed step by step, illustrating one piece at a time.
T
is for T
utorial¶Allow your readers to learn what your code is about – from a usage perspective, but also from an implementation perspective. Use quizzes, tests, exercises.
R
is for R
eference¶Allow your readers to extract and study those parts they need – interfaces, specs. And, of course, allow them to use your code.
A
is for A
ssist¶Have others contribute to your tutorial – e.g. by providing recipes for specific how-to questions Think of StackOverflow, but constantly tested and updated
StackOverflow is filled with how-to questions. Yet, many of these are outdated over time. Allow for your readers to ask questions (with code that will be executed and tested as part of your work). Make your work a community effort!
T
is for T
est¶Having detailed examples should automatically lead to full coverage of your code. Have assertions on top – to explain what is going on, but also to ensure consistency between code and text.
Save previous outputs of your examples; get notified when things change. Great way to do regression testing!
E
is for E
ase¶Your job is coding, teaching, and testing – all in one place
Ease the life of future readers (which may include you)
Ease the life of people who want to use your code