Notebook

Introduction to Debugging¶

In this book, we want to explore debugging - the art and science of fixing bugs in computer software. In particular, we want to explore techniques that automatically answer questions like: Where is the bug? When does it occur? And how can we repair it? But before we start automating the debugging process, we first need to understand what this process is.

In this chapter, we introduce basic concepts of systematic software debugging and the debugging process, and at the same time get acquainted with Python and interactive notebooks.

In [1]:

from bookutils import YouTubeVideo, quiz
YouTubeVideo("bCHRCehDOq0")

Out[1]:

Prerequisites

The book is meant to be a standalone reference; however, a number of great books on debugging are listed at the end,
Knowing a bit of Python is helpful for understanding the code examples in the book.

Synopsis¶

To use the code provided in this chapter, write

>>> from debuggingbook.Intro_Debugging import <identifier>

and then make use of the following features.

In this chapter, we introduce some basics of how failures come to be as well as a general process for debugging.

A Simple Function¶

In [2]:

# ignore

# If this is the first time you are opening a notebook, please read this.
#
# Interaction basics
# ------------------
#
# * The notebook is composed of _code cells_ (like this one) and
#   _text cells_ (like the one above).
# 
# * To interact with the notebook, click into a cell (or type `Return`) 
#   and start editing. Type `Shift+Return` to execute the code, or 
#   `Esc` to leave without executing it.
# 
# * Since code cells depend on earlier cells, you will have to execute
#   those first - beginning from the top of the notebook.
# 
# * You can change code (and text) at your leisure to try out alternatives.
#   In Jupyter, use `b` to add a new cell below, and `Return` to enter it.
#   Type `x` to delete (cut) a cell, and `z` to undo this.
# 
# * There's a `Help` menu at the top of Jupyter Notebook. Enjoy!
#
#
# `# ignore` markers
# ------------------
#
# * In the notebook, there are some extra code cells starting with `# ignore`
#   (like this one, actually). These are code blocks used to create diagrams,
#   run tests, create or tear down special environments or more - code blocks
#   that are not necessary for reading (but for creation).
#
#
# `# type: ignore` markers
# ------------------------
#
# * In the notebook, some lines come with a comment `# type: ignore`. This tells
#   static code checkers to ignore any typing errors in that line. Most frequently,
#   this is used to mark untyped Python code (as in our examples) such that the
#   static code checker does not mark it as erroneous.

Your Task: Remove HTML Markup¶

Let us start with a simple example. You may have heard of how documents on the Web are made out of text and HTML markup. HTML markup consists of tags in angle brackets that surround the text, providing additional information on how the text should be interpreted. For instance, in the HTML text

This is <em>emphasized</em>.

the word "emphasized" is enclosed in the HTML tags <em> (start) and </em> (end), meaning that it should be interpreted (and rendered) in an emphasized way – typically in italics. In your environment, the HTML text gets rendered as

This is emphasized.

There's HTML tags for pretty much everything – text markup (bold text, ~~strikethrough~~ text), text structure (titles, lists), references (links) to other documents, and many more. These HTML tags shape the Web as we know it.

However, with all the HTML markup, it may become difficult to actually access the text that lies within. We'd like to implement a simple function that removes HTML markup and converts it into text. If our input is

Here's some <strong>strong argument</strong>.

the output should be

Here's some strong argument.

Here's a Python function which does exactly this. It takes a (HTML) string and returns the text without markup.

In [3]:

def remove_html_markup(s):  # type: ignore
    tag = False
    out = ""

    for c in s:
        if c == '<':    # start of markup
            tag = True
        elif c == '>':  # end of markup
            tag = False
        elif not tag:
            out = out + c

    return out

This function works, but not always. Before we start debugging things, let us first explore its code and how it works.

Understanding Python Programs¶

If you're new to Python, you might first have to understand what the above code does. We very much recommend the Python tutorial to get an idea on how Python works. The most important things for you to understand the above code are these three:

Python structures programs through indentation, so the function and for bodies are defined by being indented.
Python is dynamically typed, meaning that the type of variables like c, tag, or out is determined at run-time.
Most of Python's syntactic features are inspired by other common languages, such as control structures (while, if, for), assignments (=), or comparisons (==, !=, <).

With that, you can already understand what the above code does: remove_html_markup() takes a (HTML) string s and then iterates over the individual characters (for c in s). By default, these characters are added to the return string out. However, if remove_html_markup() finds a < character, it sets the tag flag, meaning that all further characters are ignored until a > character is found.

In contrast to other languages, Python makes no difference between strings and characters – there are only strings. As in HTML, strings can be enclosed in single quotes ('a') and in double quotes ("a"). This is useful if you want to specify a string that contains quotes, as in 'She said "hello", and then left' or "The first character is a 'c'"

Running a Function¶

To find out whether remove_html_markup() works correctly, we can test it with a few values. For the string

Here's some <strong>strong argument</strong>.

for instance, it produces the correct value:

In [4]:

remove_html_markup("Here's some <strong>strong argument</strong>.")

Out[4]:

"Here's some strong argument."

Interacting with Notebooks¶

If you are reading this in the interactive notebook, you can try out remove_html_markup() with other values as well. Click on the above cells with the invocation of remove_html_markup() and change the value – say, to remove_html_markup("<em>foo</em>"). Press Shift+Enter (or click on the play symbol) to execute it and see the result. If you get an error message, go to the above cell with the definition of remove_html_markup() and execute this first. You can also run all cells at once; see the Notebook menu for details. (You can actually also change the text by clicking on it, and corect mistaks such as in this sentence.)

Executing a single cell does not execute other cells, so if your cell builds on a definition in another cell that you have not executed yet, you will get an error. You can select Run all cells above from the menu to ensure all definitions are set.

Also keep in mind that, unless overwritten, all definitions are kept across executions. Occasionally, it thus helps to restart the kernel (i.e. start the Python interpreter from scratch) to get rid of older, superfluous definitions.

Testing a Function¶

Since one can change not only invocations, but also definitions, we want to ensure that our function works properly now and in the future. To this end, we introduce tests through assertions – a statement that fails if the given check is false. The following assertion, for instance, checks that the above call to remove_html_markup() returns the correct value:

In [5]:

assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
    "Here's some strong argument."

If you change the code of remove_html_markup() such that the above assertion fails, you will have introduced a bug.

Oops! A Bug!¶

As nice and simple as remove_html_markup() is, it is buggy. Some HTML markup is not properly stripped away. Consider this HTML tag:

<input type="text" value="<your name>">

This would render as an input field in a form:

If we feed this string into remove_html_markup(), we would expect an empty string as the result. Instead, this is what we get:

In [6]:

remove_html_markup('<input type="text" value="<your name>">')

Out[6]:

'"'

Every time we encounter a bug, this means that our earlier tests have failed. We thus need to introduce another test that documents not only how the bug came to be, but also the result we actually expected.

The assertion we write now fails with an error message. (The ExpectError magic ensures we see the error message, but the rest of the notebook is still executed.)

In [7]:

from ExpectError import ExpectError

In [8]:

with ExpectError():
    assert remove_html_markup('<input type="text" value="<your name>">') == ""

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/145241764.py", line 2, in <cell line: 1>
    assert remove_html_markup('<input type="text" value="<your name>">') == ""
AssertionError (expected)

With this, we now have our task: Fix the failure as above.

Visualizing Code¶

To properly understand what is going on here, it helps drawing a diagram on how remove_html_markup() works. Technically, remove_html_markup() implements a state machine with two states tag and ¬ tag. We change between these states depending on the characters we process. This is visualized in the following diagram:

In [9]:

from graphviz import Digraph, nohtml

In [10]:

from IPython.display import display

In [11]:

# ignore
PASS = "✔"
FAIL = "✘"

PASS_COLOR = 'darkgreen'  # '#006400' # darkgreen
FAIL_COLOR = 'red4'  # '#8B0000' # darkred

STEP_COLOR = 'peachpuff'
FONT_NAME = 'Raleway'

In [12]:

# ignore
def graph(comment: str ="default") -> Digraph:
    return Digraph(name='', comment=comment, graph_attr={'rankdir': 'LR'},
        node_attr={'style': 'filled',
                   'fillcolor': STEP_COLOR,
                   'fontname': FONT_NAME},
        edge_attr={'fontname': FONT_NAME})

In [13]:

# ignore
state_machine = graph()
state_machine.node('Start', )
state_machine.edge('Start', '¬ tag')
state_machine.edge('¬ tag', '¬ tag', label=" ¬ '<'\\nadd character")
state_machine.edge('¬ tag', 'tag', label="'<'")
state_machine.edge('tag', '¬ tag', label="'>'")
state_machine.edge('tag', 'tag', label="¬ '>'")

In [14]:

# ignore
display(state_machine)

You see that we start in the non-tag state (¬ tag). Here, for every character that is not '<', we add the character and stay in the same state. When we read a '<', though, we end in the tag state (tag) and stay in that state (skipping characters) until we find a closing '>' character.

A First Fix¶

We will now iteratively try to find a correct solution to our bug. On the way we will come up with different interim solutions that bring us closer and closer to the perfect code. Of course, we could come up with a near-perfect solution right away, but each of these interim solutions illustrate some techniques we can use during debugging.

Let us look at the above state machine, and process through our input:

<input type="text" value="<your name>">

So what you can see is: We are interpreting the '>' of "<your name>" as the closing of the tag. However, this is a quoted string, so the '>' should be interpreted as a regular character, not as markup. This is an example of missing functionality: We do not handle quoted characters correctly. We haven't claimed yet to take care of all functionality, so we still need to extend our code.

So we extend the whole thing. We set up a special "quote" state which processes quoted inputs in tags until the end of the quoted string is reached. This is how the state machine looks like:

In [15]:

# ignore
state_machine = graph()
state_machine.node('Start')
state_machine.edge('Start', '¬ quote\\n¬ tag')
state_machine.edge('¬ quote\\n¬ tag', '¬ quote\\n¬ tag',
                   label="¬ '<'\\nadd character")
state_machine.edge('¬ quote\\n¬ tag', '¬ quote\\ntag', label="'<'")
state_machine.edge('¬ quote\\ntag', 'quote\\ntag', label="'\"'")
state_machine.edge('¬ quote\\ntag', '¬ quote\\ntag', label="¬ '\"' ∧ ¬ '>'")
state_machine.edge('quote\\ntag', 'quote\\ntag', label="¬ '\"'")
state_machine.edge('quote\\ntag', '¬ quote\\ntag', label="'\"'")
state_machine.edge('¬ quote\\ntag', '¬ quote\\n¬ tag', label="'>'")

In [16]:

# ignore
display(state_machine)

This is a bit more complex already. Proceeding from left to right, we first have the state ¬ quote ∧ ¬ tag, which is our "standard" state for text. If we encounter a '<', we again switch to the "tagged" state ¬ quote ∧ tag. In this state, however (and only in this state), if we encounter a quotation mark, we switch to the "quotation" state quote ∧ tag, in which we remain until we see another quotation mark indicating the end of the string – and then continue in the "tagged" state ¬ quote ∧ tag until we see the end of the string.

Things get even more complicated as HTML allows both single and double quotation characters. Here's a revised implementation of remove_html_markup() that takes the above states into account:

In [17]:

def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

Now, our previous input works well:

In [18]:

remove_html_markup('<input type="text" value="<your name>">')

Out[18]:

''

and our earlier tests also pass:

In [19]:

assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
    "Here's some strong argument."

In [20]:

assert remove_html_markup('<input type="text" value="<your name>">') == ""

However, the above code still has a bug. In two of these inputs, HTML markup is still not properly stripped:

<b>foo</b>
<b>"foo"</b>
"<b>foo</b>"
<"b">foo</"b">

Can you guess which ones these are?

Again, a simple assertion will reveal the culprits:

In [21]:

with ExpectError():
    assert remove_html_markup('<b>foo</b>') == 'foo'

In [22]:

with ExpectError():
    assert remove_html_markup('<b>"foo"</b>') == '"foo"'

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/4031927246.py", line 2, in <cell line: 1>
    assert remove_html_markup('<b>"foo"</b>') == '"foo"'
AssertionError (expected)

In [23]:

with ExpectError():
    assert remove_html_markup('"<b>foo</b>"') == '"foo"'

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/242211734.py", line 2, in <cell line: 1>
    assert remove_html_markup('"<b>foo</b>"') == '"foo"'
AssertionError (expected)

In [24]:

with ExpectError():
    assert remove_html_markup('<"b">foo</"b">') == 'foo'

So, unfortunately, we're not done yet – our function still has errors.

The Devil's Guide to Debugging¶

Let us now discuss a couple of methods that do not work well for debugging. (These "devil's suggestions" are adapted from the 1993 book "Code Complete" from Steve McConnell.)

Printf Debugging¶

When I was a student, I never got any formal training in debugging, so I had to figure this out for myself. What I learned was how to use debugging output; in Python, this would be the print() function. For instance, I would go and scatter print() calls everywhere:

In [25]:

def remove_html_markup_with_print(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        print("c =", repr(c), "tag =", tag, "quote =", quote)

        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

This way of inspecting executions is commonly called "Printf debugging", after the C printf() function. Then, running this would allow me to see what's going on in my code:

In [26]:

remove_html_markup_with_print('<b>"foo"</b>')

c = '<' tag = False quote = False
c = 'b' tag = True quote = False
c = '>' tag = True quote = False
c = '"' tag = False quote = False
c = 'f' tag = False quote = True
c = 'o' tag = False quote = True
c = 'o' tag = False quote = True
c = '"' tag = False quote = True
c = '<' tag = False quote = False
c = '/' tag = True quote = False
c = 'b' tag = True quote = False
c = '>' tag = True quote = False

Out[26]:

'foo'

Yes, one sees what is going on – but this is horribly inefficient! Think of a 1,000-character input – you'd have to go through 1,000 lines of logs. It may help you, but it's a total time waster. Plus, you have to enter these statements, remove them again... it's a maintenance nightmare.

(You may even forget printf's in your code, creating a security problem: Mac OS X versions 10.7 to 10.7.3 would log the password in clear because someone had forgotten to turn off debugging output.)

Debugging into Existence¶

I would also try to debug the program into existence. Just change things until they work. Let me see: If I remove the conditions "and not quote" from the program, it would actually work again:

In [27]:

def remove_html_markup_without_quotes(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<':  # and not quote:
            tag = True
        elif c == '>':  # and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

In [28]:

assert remove_html_markup_without_quotes('<b id="bar">foo</b>') == 'foo'

Cool! Unfortunately, the function still fails on the other input:

In [29]:

with ExpectError():
    assert remove_html_markup_without_quotes('<b>"foo"</b>') == '"foo"'

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/3864559921.py", line 2, in <cell line: 1>
    assert remove_html_markup_without_quotes('<b>"foo"</b>') == '"foo"'
AssertionError (expected)

So, maybe we can change things again, such that both work? And maybe the other tests we had earlier won't fail? Let's just continue to change things randomly again and again and again.

Oh, and of course, I would never back up earlier versions such that I would be able to keep track of what has changed and when. Especially keep in mind that our tests will never cover all corner cases. Randomly patching code will just lead to harder to find bugs.

Use the Most Obvious Fix¶

My favorite: Use the most obvious fix. This means that you're fixing the symptom, not the problem. In our case, this would be something like:

In [30]:

def remove_html_markup_fixed(s):  # type: ignore
    if s == '<b>"foo"</b>':
        return '"foo"'
    ...

Miracle! Our earlier failing assertion now works! Now we can do the same for the other failing test, too, and we're done. (Rumor has it that some programmers use this technique to get their tests to pass...)

Things to do Instead¶

As with any devil's guide, you get an idea of how to do things by doing the opposite. What this means is:

Understand the code
Fix the problem, not the symptom
Proceed systematically

which is what we will apply for the rest of this chapter.

From Defect to Failure¶

To understand how to systematically debug a program, we first have to understand how failures come to be. The typical debugging situation looks like this:

We have a program (execution), taking some input and producing some output.
The output is in error (✘), meaning an unwanted and unintended deviation from what is correct, right, or true.
The input, in contrast, is assumed to be correct (✔) – that is, it should be properly processed by the program in question.

If the input is incorrect, we wouldn't normally be searching for the bug in our program, but in whatever produced its input. If the input is under control of a third party, though (which typically is the case for system inputs), your program must check its correctness and reject it if it is incorrect. Once your program accepts a system input as valid, it must be properly processed by the program in question.

Hence, the (very simple) situation we're having can be shown as:

In [31]:

# ignore
from typing import List, Optional

In [32]:

# ignore
def execution_diagram(show_steps: bool = True, variables: List[str] = [],
                      steps: int = 3, error_step: int = 666,
                      until: int = 666, fault_path: List[str] = []) -> Digraph:
    dot = graph()

    dot.node('input', shape='none', fillcolor='white', label=f"Input {PASS}",
             fontcolor=PASS_COLOR)
    last_outgoing_states = ['input']

    for step in range(1, min(steps + 1, until)):

        step_color: Optional[str]
        if step == error_step:
            step_label = f'Step {step} {FAIL}'
            step_color = FAIL_COLOR
        else:
            step_label = f'Step {step}'
            step_color = None

        if step >= error_step:
            state_label = f'State {step} {FAIL}'
            state_color = FAIL_COLOR
        else:
            state_label = f'State {step} {PASS}'
            state_color = PASS_COLOR

        state_name = f's{step}'
        outgoing_states = []
        incoming_states = []

        if not variables:
            dot.node(name=state_name, shape='box',
                     label=state_label, color=state_color,
                     fontcolor=state_color)
        else:
            var_labels = []
            for v in variables:
                vpath = f's{step}:{v}'
                if vpath in fault_path:
                    var_label = f'<{v}>{v} ✘'
                    outgoing_states.append(vpath)
                    incoming_states.append(vpath)
                else:
                    var_label = f'<{v}>{v}'
                var_labels.append(var_label)
            record_string = " | ".join(var_labels)
            dot.node(name=state_name, shape='record',
                     label=nohtml(record_string), color=state_color,
                     fontcolor=state_color)

        if not outgoing_states:
            outgoing_states = [state_name]
        if not incoming_states:
            incoming_states = [state_name]

        for outgoing_state in last_outgoing_states:
            for incoming_state in incoming_states:
                if show_steps:
                    dot.edge(outgoing_state, incoming_state,
                             label=step_label, fontcolor=step_color)
                else:
                    dot.edge(outgoing_state, incoming_state)

        last_outgoing_states = outgoing_states

    if until > steps + 1:
        # Show output
        if error_step > steps:
            dot.node('output', shape='none', fillcolor='white',
                     label=f"Output {PASS}", fontcolor=PASS_COLOR)
        else:
            dot.node('output', shape='none', fillcolor='white',
                     label=f"Output {FAIL}", fontcolor=FAIL_COLOR)

        for outgoing_state in last_outgoing_states:
            label = "Execution" if steps == 0 else None
            dot.edge(outgoing_state, 'output', label=label)

    display(dot)

In [33]:

# ignore
execution_diagram(show_steps=False, steps=0, error_step=0)

This situation we see above is what we call a failure: An externally visible error in the program behavior, with the error again being an unwanted and unintended deviation from what is correct, right, or true.

How does this failure come to be? The execution we see above breaks down into several program states, one after the other.

In [34]:

# ignore
for until in range(1, 6):
    execution_diagram(show_steps=False, until=until, error_step=2)

Initially, the program state is still correct (✔). However, at some point in the execution, the state gets an error, also known as a fault. This fault – again an unwanted and unintended deviation from what is correct, right, or true – then propagates along the execution, until it becomes externally visible as a failure. (In reality, there are many, many more states than just this, but these would not fit in a diagram.)

How does a fault come to be? Each of these program states is produced by a step in the program code. These steps take a state as input and produce another state as output. Technically speaking, the program inputs and outputs are also parts of the program state, so the input flows into the first step, and the output is the state produced by the last step.

In [35]:

# ignore
for until in range(1, 6):
    execution_diagram(show_steps=True, until=until, error_step=2)

Now, in the diagram above, Step 2 gets a correct state as input and produces a faulty state as output. The produced fault then propagates across more steps to finally become visible as a failure.

The goal of debugging thus is to search for the step in which the state first becomes faulty. The code associated with this step is again an error – an unwanted and unintended deviation from what is correct, right, or true – and is called a defect. This is what we have to find – and to fix.

Sounds easy, right? Unfortunately, things are not that easy, and that has something to do with the program state. Let us assume our state consists of three variables, var1 to var3, and that Step 2 produces a fault in var2. This fault then propagates to the output:

In [36]:

# ignore
for until in range(1, 6):
    execution_diagram(show_steps=True, variables=['var1', 'var2', 'var3'],
                      error_step=2,
                      until=until, fault_path=['s2:var2', 's3:var2'])

The way these faults propagate is called a cause-effect chain:

The defect in the code causes a fault in the state when executed.
This fault in the state then propagates through further execution steps...
... until it becomes visible as a failure.

Even worse: in most cases the fault often expresses itself first in the control flow rather than the variable contents. Such a deviation from a normal execution is in general even harder to find.

Since the code was originally written by a human, any defect can be related to some original mistake the programmer made. This gives us a number of terms that all are more precise than the general "error" or the colloquial "bug":

A mistake is a human act or decision resulting in an error.
A defect is an error in the program code. Also called bug.
A fault is an error in the program state. Also called infection.
A failure is an externally visible error in the program behavior. Also called malfunction.

The cause-effect chain of events is thus

Mistake → Defect → Fault → ... → Fault → Failure

Note that not every defect also causes a failure, which is despite all testing, there can still be defects in the code looming around until the right conditions are met to trigger them. On the other hand, though, every failure can be traced back to the defect that causes it. Our job is to break the cause-effect chain.

From Failure to Defect¶

To find a defect from a failure, we trace back the faults along their propagation – that is, we find out which faults in the earlier state have caused the later faults. We start from the very end of the execution and then gradually progress backwards in time, examining fault after fault until we find a transition from a correct state to a faulty state – that is, a step in which a correct state comes in and a faulty state comes out. At this point, we have found the origin of the failure – and the defect that causes it.

What sounds like a straight-forward strategy, unfortunately, doesn't always work this way in practice. That is because of the following problems of debugging:

First, program states are actually large, encompassing dozens to thousands of variables, possibly even more. If you have to search all of these manually and check them for faults, you will spend a lot of time for a single state.
Second, you do not always know whether a state is correct or not. While most programs have some form of specification for their inputs and outputs, these do not necessarily exist for intermediate results. If one had a specification that could check each state for correctness (possibly even automatically), debugging would be trivial. Unfortunately, it is not, and that's partly due to the lack of specifications.
Third, executions typically do not come in a handful of steps, as in the diagrams above; instead, they can easily encompass thousands to millions of steps. This means that you will have to examine not just one state, but several, making the problem much worse.

To make your search efficient, you thus have to focus your search – starting with most likely causes and gradually progressing to the less probable causes. This is what we call a debugging strategy. Intermediate assertions of the program state also come in handy, as they make the cause-effect chain smaller.

The Scientific Method¶

Now that we know how failures come to be, let's look into how to systematically find their causes. What we need is a strategy that helps us search for how and when the failure comes to be. For this, we use a process called the scientific method.

When we are debugging a program, we are trying to find the causes of a given effect – very much like natural scientists try to understand why things in nature are as they are and how they come to be. Over thousands of years, scientists have conducted observations and experiments to come to an understanding of how our world works. The process by which experimental scientists operate has been coined "The scientific method". This is how it works:

Formulate a question, as in "Why does this apple fall down?".
Invent a hypothesis based on knowledge obtained while formulating the question, that may explain the observed behavior.
Determining the logical consequences of the hypothesis, formulate a prediction that can support or refute the hypothesis. Ideally, the prediction would distinguish the hypothesis from likely alternatives.
Test the prediction (and thus the hypothesis) in an experiment. If the prediction holds, confidence in the hypothesis increases; otherwise, it decreases.
Repeat Steps 2–4 until there are no discrepancies between hypothesis and predictions and/or observations.

At this point, your hypothesis may be named a theory – that is, a predictive and comprehensive description of some aspect of the natural world. The gravitational theory, for instance, predicts very well how the moon revolves around the earth, and how the earth revolves around the sun. Our debugging problems are of a slightly lesser scale – we'd like a theory of how our failure came to be – but the process is pretty much the same.

In [37]:

# ignore
dot = graph()

dot.node('Hypothesis')
dot.node('Observation')
dot.node('Prediction')
dot.node('Experiment')

dot.edge('Hypothesis', 'Observation',
         label="<Hypothesis<BR/>is <I>supported:</I><BR/>Refine it>",
         dir='back')
dot.edge('Hypothesis', 'Prediction')

dot.node('Problem Report', shape='none', fillcolor='white')
dot.edge('Problem Report', 'Hypothesis')

dot.node('Code', shape='none', fillcolor='white')
dot.edge('Code', 'Hypothesis')

dot.node('Runs', shape='none', fillcolor='white')
dot.edge('Runs', 'Hypothesis')

dot.node('More Runs', shape='none', fillcolor='white')
dot.edge('More Runs', 'Hypothesis')

dot.edge('Prediction', 'Experiment')
dot.edge('Experiment', 'Observation')
dot.edge('Observation', 'Hypothesis',
         label="<Hypothesis<BR/>is <I>rejected:</I><BR/>Seek alternative>")

In [38]:

# ignore
display(dot)

In debugging, we proceed the very same way – indeed, we are treating bugs as if they were natural phenomena. This analogy may sound far-fetched, as programs are anything but natural. Nature, by definition, is not under our control. But bugs are out of our control just as well. Hence, the analogy is not that far-fetched – and we can apply the same techniques for debugging.

Finding a Hypothesis¶

Let us apply the scientific method to our Python program which removes HTML tags. First of all, let us recall the problem – remove_html_markup() works for some inputs, but fails on others.

In [39]:

for i, html in enumerate(['<b>foo</b>',
                          '<b>"foo"</b>',
                          '"<b>foo</b>"',
                          '<b id="bar">foo</b>']):
    result = remove_html_markup(html)
    print("%-2d %-15s %s" % (i + 1, html, result))

1  <b>foo</b>      foo
2  <b>"foo"</b>    foo
3  "<b>foo</b>"    <b>foo</b>
4  <b id="bar">foo</b> foo

Input #1 and #4 work as expected, the others do not. We can write these down in a table, such that we can always look back at our previous results:

Input	Expectation	Output	Outcome
`<b>foo</b>`	`foo`	`foo`	✔
`<b>"foo"</b>`	`"foo"`	`foo`	✘
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`	✘
`<b id="bar">foo</b>`	`foo`	`foo`	✔

In [40]:

quiz("From the difference between success and failure,"
     " we can already devise some observations about "
     " what is wrong with the output."
     " Which of these can we turn into general hypotheses?",
    [
        "Double quotes (`\"`) are stripped from the tagged input.",
        "Tags in double quotes are not stripped.",
        "The tag `<>` is always stripped from the input.",
        "Four-letter words are stripped."
    ], '[298 % 33, 1234 % 616]')

Out[40]:

Quiz

From the difference between success and failure, we can already devise some observations about what is wrong with the output. Which of these can we turn into general hypotheses?

Double quotes (") are stripped from the tagged input.
Tags in double quotes are not stripped.
The tag <> is always stripped from the input.
Four-letter words are stripped.

Testing a Hypothesis¶

The hypotheses that remain are:

Double quotes are stripped from the tagged input.
Tags in double quotes are not stripped.

These may be two separate issues, but chances are they are tied to each other. Let's focus on 1., because it is simpler. Does it hold for all inputs, even untagged ones? Our hypothesis becomes

Double quotes are stripped from the ~~tagged~~ input.

Let's devise an experiment to validate this. If we feed the string

"foo"

(including the double quotes) into remove_html_markup(), we should obtain

"foo"

as result – that is, the output should be the unchanged input. However, if our hypothesis 1. is correct, we should obtain

foo

as result – that is, "Double quotes are stripped from the input" as predicted by the hypothesis.

We can very easily test this hypothesis:

In [41]:

remove_html_markup('"foo"')

Out[41]:

'foo'

Our hypothesis is confirmed! We can add this to our list of observations.

Input	Expectation	Output	Outcome
`<b>foo</b>`	`foo`	`foo`	✔
`<b>"foo"</b>`	`"foo"`	`foo`	✘
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`	✘
`<b id="bar">foo</b>`	`foo`	`foo`	✔
`"foo"`	`"foo"`	`foo`	✘

You can try out the hypothesis with more inputs – and it remains valid. Any non-markup input that contains double quotes will have these stripped.

Where does that quote-stripping come from? This is where we need to explore the cause-effect chain. The only place in remove_html_markup() where quotes are handled is this line:

elif c == '"' or c == "'" and tag:
    quote = not quote

So, quotes should be removed only if tag is set. However, tag can be set only if the input contains a markup tag, which is not the case for a simple input like "foo". Hence, what we observe is actually impossible. Yet, it happens.

Refining a Hypothesis¶

Debugging is a game of falsifying assumptions. You assume the code works – it doesn't. You assume the tag flag cannot be set – yet it may be. What do we do? Again, we create a hypothesis:

The error is due to tag being set.

How do we know whether tag is being set? Let me introduce one of the most powerful debugging tools ever invented, the assert statement. The statement

assert cond

evaluates the given condition cond and

if it holds: proceed as usual
if cond does not hold: throw an exception

An assert statement encodes our assumptions and as such, should never fail. If it does, well, then something is wrong. Furthermore, assert statements break the cause-effect chain into smaller pieces. The observable failure occurs earlier in the program execution and thus closer to the initial fault.

Using assert, we can check the value of tag all through the loop:

In [42]:

def remove_html_markup_with_tag_assert(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        assert not tag  # <=== Just added

        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

Our expectation is that this assertion would fail. So, do we actually get an exception? Try it out for yourself by uncommenting the following line:

In [43]:

# remove_html_markup_with_tag_assert('"foo"')

In [44]:

quiz("What happens after inserting the above assertion?",
    [
        "The program raises an exception. (i.e., `tag` is set)",
        "The output is as before, i.e., `foo` without quotes."
        " (which means that `tag` is not set)"
    ], 2)

Out[44]:

Quiz

What happens after inserting the above assertion?

The program raises an exception. (i.e., tag is set)
The output is as before, i.e., foo without quotes. (which means that tag is not set)

Here's the solution:

In [45]:

with ExpectError():
    result = remove_html_markup_with_tag_assert('"foo"')
result

Out[45]:

'foo'

Refuting a Hypothesis¶

We did not get an exception, hence we reject our hypothesis:

~~The error is due to tag being set.~~

Again, let's go back to the only place in our code where quotes are handled:

elif c == '"' or c == "'" and tag:
    quote = not quote

Because of the assertion, we already know that tag is always False. Hence, this condition should never hold either.

But maybe there's something wrong with the condition such that it holds? Here's our hypothesis:

The error is due to the quote condition evaluating to true

If the condition evaluates to true, then quote should be set. We could now go and assert that quote is false; but we only care about the condition. So we insert an assertion that assumes that setting the code setting the quote flag is never reached:

In [46]:

def remove_html_markup_with_quote_assert(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif c == '"' or c == "'" and tag:
            assert False  # <=== Just added
            quote = not quote
        elif not tag:
            out = out + c

    return out

Our expectation this time again is that the assertion fails. So, do we get an exception this time? Try it out for yourself by uncommenting the following line:

In [47]:

# remove_html_markup_with_quote_assert('"foo"')

In [48]:

quiz("What happens after inserting the 'assert' tag?",
    [
        "The program raises an exception (i.e., the quote condition holds)",
        "The output is still foo (i.e., the quote condition does not hold)"
    ], 29 % 7)

Out[48]:

Quiz

What happens after inserting the 'assert' tag?

The program raises an exception (i.e., the quote condition holds)
The output is still foo (i.e., the quote condition does not hold)

Here's what happens now that we have the assert tag:

In [49]:

with ExpectError():
    result = remove_html_markup_with_quote_assert('"foo"')

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/904218512.py", line 2, in <cell line: 1>
    result = remove_html_markup_with_quote_assert('"foo"')
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/2877654332.py", line 12, in remove_html_markup_with_quote_assert
    assert False  # <=== Just added
AssertionError (expected)

From this observation, we can deduce that our hypothesis is confirmed:

The error is due to the quote condition evaluating to true (CONFIRMED)

and the condition is actually faulty. It evaluates to True although tag is always False:

elif c == '"' or c == "'" and tag:
    quote = not quote

But this condition holds for single and double quotes. Is there a difference?

Let us see whether our observations generalize towards general quotes:

~~Double~~ quotes are stripped from the input.

We can verify these hypotheses with an additional experiment. We go back to our original implementation (without any asserts), and then check it:

In [50]:

remove_html_markup("'foo'")

Out[50]:

"'foo'"

Surprise: Our hypothesis is rejected and we can add another observation to our table:

Input	Expectation	Output	Outcome
`'foo'`	`'foo'`	`'foo'`	✔

So, the condition

becomes True when a double quote is seen
becomes False (as it should) with single quotes

At this point, you should have enough material to solve the problem. How do we have to fix the condition? Here are four alternatives:

c == "" or c == '' and tag        # Choice 1
c == '"' or c == "'" and not tag  # Choice 2
(c == '"' or c == "'") and tag    # Choice 3
...                               # Something else

In [51]:

quiz("How should the condition read?",
     [
         '''`c == "" or c == '' and tag` (Choice 1)''',
         '''`c == '"' or c == "'" and not tag` (Choice 2)''',
         '''`(c == '"' or c == "'") and tag` (Choice 3)''',
         "Something else"
     ],
     '399 % 4')

Out[51]:

Quiz

How should the condition read?

c == "" or c == '' and tag (Choice 1)
c == '"' or c == "'" and not tag (Choice 2)
(c == '"' or c == "'") and tag (Choice 3)
Something else

Fixing the Bug¶

So, you have spotted the defect: In Python (and most other languages), and takes precedence over or, which is why the condition is wrong. It should read:

(c == '"' or c == "'") and tag

(Actually, good programmers rarely depend on precedence; it is considered good style to use parentheses lavishly.)

So, our hypothesis now has become

The error is due to the quote condition evaluating to True

Is this our final hypothesis? We can check our earlier examples whether they should now work well:

Input	Expectation	Output	Outcome
`<b>foo</b>`	`foo`	`foo`	✔
`<b>"foo"</b>`	`"foo"`	`foo`	✘
`"<b>foo</b>"`	`"foo"`	`<b>foo</b>`	✘
`<b id="bar">foo</b>`	`foo`	`foo`	✔
`"foo"`	`'foo'`	`foo`	✘
`'foo'`	`'foo'`	`'foo'`	✔

In all of these examples, the quote flag should now be set outside of tags; hence, everything should work as expected.

In terms of the scientific process, we now have a theory – a hypothesis that

is consistent with all earlier observations
predicts future observations (in our case: correct behavior)

For debugging, our problems are usually too small for a big word like theory, so we use the word diagnosis instead. So is our diagnosis sufficient to fix the bug? Let us check.

Checking Diagnoses¶

In debugging, you should start to fix your code if and only if you have a diagnosis that shows two things:

Causality. Your diagnosis should explain why and how the failure came to be. Hence, it induces a fix that, when applied, should make the failure disappear.
Incorrectness. Your diagnosis should explain why and how the code is incorrect (which in turn suggests how to correct the code). Hence, the fix it induces not only applies to the given failure, but also to all related failures.

Showing both these aspects requirements – causality and incorrectness – are crucial for a debugging diagnosis:

If you find that you can change some location to make the failure go away, but are not sure why this location is wrong, then your "fix" may apply only to the symptom rather than the source. Your diagnosis explains causality, but not incorrectness.
If you find that there is a defect in some code location, but do not verify whether this defect is related to the failure in question, then your "fix" may not address the failure. Your diagnosis addresses incorrectness, but not causality.

When you do have a diagnosis that explains both causality (how the failure came to be), and incorrectness (how to correct the code accordingly), then (and only then!) is it time to actually fix the code accordingly. After applying the fix, the failure should be gone, and no other failure should occur. If the failure persists, this should come as a surprise. Obviously, there is some other aspect that you haven't considered yet, so you have to go back to the drawing board and add another failing test case to the set of observations.

Fixing the Code¶

All these things considered, let us go and fix remove_html_markup(). We know how the defect causes the failure (by erroneously setting quote outside of tags). We know that the line in question is incorrect (as single and double of quotes should be treated similarly). So, our diagnosis shows both causality and incorrectness, and we can go and fix the code accordingly:

In [52]:

def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif (c == '"' or c == "'") and tag:  # <-- FIX
            quote = not quote
        elif not tag:
            out = out + c

    return out

We verify that the fix was successful by running our earlier tests. Not only should the previously failing tests now pass, the previously passing tests also should not be affected. Fortunately, all tests now pass:

In [53]:

assert remove_html_markup("Here's some <strong>strong argument</strong>.") == \
    "Here's some strong argument."
assert remove_html_markup(
    '<input type="text" value="<your name>">') == ""
assert remove_html_markup('<b>foo</b>') == 'foo'
assert remove_html_markup('<b>"foo"</b>') == '"foo"'
assert remove_html_markup('"<b>foo</b>"') == '"foo"'
assert remove_html_markup('<b id="bar">foo</b>') == 'foo'

So, our hypothesis was a theory, and our diagnosis was correct. Success!

Alternate Paths¶

A defect may have more than one hypothesis, and each diagnosis can be obtained by many ways. We could also have started with our other hypothesis

Tags in double quotes are not stripped

and by reasoning and experiments, we would have reached the same conclusion that the condition is faulty:

To strip tags, the tag flag must be set (but it is not).
To set the tag flag, the quote variable must not be set (but it is).
The quote flag is set under the given condition (which thus must be faulty).

This gets us to the same diagnosis as above – and, of course, the same fix.

Homework after the Fix¶

After having successfully validated the fix, we still have some homework to make.

Check for further Defect Occurrences¶

First, we may want to check that the underlying mistake was not made elsewhere, too.

For an error as with remove_html_markup(), it may be wise to check other parts of the code (possibly written by the same programmer) whether Boolean formulas show proper precedence. Consider setting up a static program checker or style checker to catch similar mistakes. Usually if you code in an IDE specifically designed for language you are coding in, you will get hints on so called code smells, parts of your code that are typically erroneous because of bad style or other typical mistakes.

Check your Tests¶

If the defect was not found through testing, now is a good time to make sure it will be found the next time. If you use automated tests, add a test that catches the bug (as well as similar ones), such that you can prevent regressions.

Add Assertions¶

To be 100% sure, we could add an assertion to remove_html_markup() that checks the final result for correctness. Unfortunately, writing such an assertion is just as complex as writing the function itself.

There is one assertion, though, which could be placed in the loop body to catch this kind of errors, and which could remain in the code. Which is it?

In [54]:

quiz("Which assertion would have caught the problem?",
     [
        "`assert quote and not tag`",
        "`assert quote or not tag`",
        "`assert tag or not quote`",
        "`assert tag and not quote`"
     ], '3270 - 3267')

Out[54]:

Quiz

Which assertion would have caught the problem?

assert quote and not tag
assert quote or not tag
assert tag or not quote
assert tag and not quote

Indeed, the statement

assert tag or not quote

is correct. This excludes the situation of ¬tag ∧ quote – that is, the tag flag is not set, but the quote flag is. If you remember our state machine from above, this is actually a state that should never exist:

In [55]:

# ignore
display(state_machine)

Here's our function in its "final" state. As software goes, software is never final – and this may also hold for our function, as there is still room for improvement. For this chapter though, we leave it be.

In [56]:

def remove_html_markup(s):  # type: ignore
    tag = False
    quote = False
    out = ""

    for c in s:
        assert tag or not quote

        if c == '<' and not quote:
            tag = True
        elif c == '>' and not quote:
            tag = False
        elif (c == '"' or c == "'") and tag:
            quote = not quote
        elif not tag:
            out = out + c

    return out

Commit the Fix¶

It may sound obvious, but your fix is worth nothing if it doesn't go into production. Be sure to commit your change to the code repository, together with your diagnosis. If your fix has to be approved by a third party, a good diagnosis on why and what happened is immensely helpful.

Close the Bug Report¶

If you systematically track bugs, and your bug is properly tracked, now is the time to mark the issue as "resolved". Check for duplicates of the issue and check whether they are resolved, too. And now, you are finally done:

Time to relax – and look for the next bug!

Become a Better Debugger¶

We have now systematically fixed a bug. In this book, we will explore a number of techniques to make debugging easier – coming up with automated diagnoses, explanations, even automatic repairs, including for our example above. But there are also a number of things you can do to become a better debugger.

Follow the Process¶

If you're an experienced programmer, you may have spotted the problem in remove_html_markup() immediately, and start fixing the code right away. But this is dangerous and risky.

Why is this so? Well, because you should first

try to understand the problem, and
have a full diagnosis before starting to fix away.

You can skip these steps, and jump right to your interactive debugger the very moment you see a failure, happily stepping through their program. This may even work well for simple problems, including this one. The risk, however, is that this narrows your view to just this one execution, which limits your ability to understand all the circumstances of the problem. Even worse: If you start "fixing" the bug without exactly understanding the problem, you may end up with an incomplete solution – as illustrated in "The Devil's Guide to Debugging", above.

Keep a Log¶

A second risk of starting debugging too soon is that it lets you easily deviate from a systematic process. Remember how we wrote down every experiment in a table? How we numbered every hypothesis? This is not just for teaching. Writing these things down explicitly allow you to keep track of all your observations and hypotheses over time.

Input	Expectation	Output	Outcome
`<b>foo</b>`	`foo`	`foo`	✔

Every time you come up with a new hypothesis, you can immediately check it against your earlier observations, which will help you to eliminate unlikely ones from the start. This is a bit like in the classic "Mastermind" board game, in which you have to guess some secret combination of pins, and in which your opponent gives you hints on whether and how your guesses are correct. At any time, you can see your previous guesses (experiments) and the results (observations) you got; any new guess (hypothesis) has to be consistent with the previous observations and experiments.

Master Mind Board Grame

Keeping such a log also allows you to interrupt your debugging session at any time. You can be home in time, sleep over the problem, and resume the next morning with a refreshed mind. You can even hand over the log to someone else, stating your findings so far.

The alternative to having a log is to keep all in memory. This only works for short amounts of time, as it puts a higher and higher cognitive load on your memory as you debug along. After some time, you will forget earlier observations, which leads to mistakes. Worst of all, any interruption will break your concentration and make you forget things, so you can't stop debugging until you're done.

Sure, if you are a real master, you can stay glued to the screen all night. But I'd rather be home in time, thank you.

Rubberducking¶

A great technique to revisit your observations and to come up with new hypotheses is to explain the problem to someone else. In this process, the "someone else" is important, but even more important is that you are explaining the problem to yourself! As Kernighan and Pike \cite{Kernighan1999} put it:

Sometimes it takes no more than a few sentences, followed by an embarrassed "Never mind. I see what's wrong. Sorry to bother you."

The reason why this works is that teaching someone else forces you to take different perspectives, and these help you to resolve the inconsistency between what you assume and what you actually observe.

Since that "someone else" can be totally passive, you can even replace her with an inanimate object to talk to – even a rubber duck. This technique is called rubber duck debugging or rubberducking – the idea is that you explain your problem to a rubber duck first before interrupting one of your co-workers with the problem. Some programmers, when asked for advice, explicitly request that you "explain your problem to the duck first", knowing that this resolves a good fraction of problems.

Rubber duck debugging

The Cost of Debugging¶

And it's not only that debugging takes time – the worst thing is that it is a search process, which can take anything between a few minutes and several hours, sometimes even days and weeks. But even if you never know how much time a bug will take, it's a bit of blessing to use a process which gradually gets you towards its cause.

Debugging Aftermath¶

The fix of a bug may lead to a code construct, a more complex condition or another solution which may look strange on first sight. Often the initial code was easier to understand but did not catch all cases. After fixing a bug it may be beneficial to write down why the code now looks like this and what caused the more complex construct. A comment right at the fixed location will help future programmers to not be confused or even re-introduce the bug while refactoring for better readability. And of course, helpful comments at complex code locations should even exist before something went wrong with the code -- but thats a chapter in another book.

History of Debugging¶

Engineers and programmers have long used the term "bug" for faults in their systems – as if it were something that crept into an otherwise flawless program to cause the effects that none could explain. And from a psychological standpoint, it is far easier to blame some "bug" rather than taking responsibility ourselves. In the end, though, we have to face the fact: We made the bugs, and they are ours to fix.

Having said that, there has been one recorded instance where a real bug has crept into a system. That was on September 9, 1947, when a moth got stuck in the relay of a Harvard Mark II machine. This event was logged, and the log book is now on display at the Smithsonian Natural Museum of American History, as "First actual case of bug being found."

First actual case of bug being found

The actual term "bug", however, is much older. What do you think is its origin?

In [57]:

# ignore
import hashlib

In [58]:

# ignore
bughash = hashlib.md5(b"debug").hexdigest()

In [59]:

quiz('Where has the name "bug" been used to denote disruptive events?',
     [
        'In the early days of Morse telegraphy, referring to a special key '
          'that would send a string of dots',
        'Among radio technicians to describe a device that '
          'converts electromagnetic field variations into acoustic signals',
        "In Shakespeare's " '"Henry VI", referring to a walking spectre',
        'In Middle English, where the word "bugge" is the basis for terms '
          'like "bugbear" and "bugaboo"'
     ], [bughash.index(i) for i in "d42f"])

Out[59]:

Quiz

Where has the name "bug" been used to denote disruptive events?

In the early days of Morse telegraphy, referring to a special key that would send a string of dots
Among radio technicians to describe a device that converts electromagnetic field variations into acoustic signals
In Shakespeare's "Henry VI", referring to a walking spectre
In Middle English, where the word "bugge" is the basis for terms like "bugbear" and "bugaboo"

Need help on this quiz? To learn more about the term "bug" and its origin, have a look at the etymology of the word "bug" in The Jargon File. Also check out the Wikipedia entry on debugging!

Synopsis¶

In this chapter, we introduce some basics of how failures come to be as well as a general process for debugging.

Lessons Learned¶

An error is a deviation from what is correct, right, or true. Specifically,
- A mistake is a human act or decision resulting in an error.
- A defect is an error in the program code. Also called bug.
- A fault is an error in the program state. Also called infection.
- A failure is an externally visible error in the program behavior. Also called malfunction.
In a failing program execution, a mistake by the programmer results in a defect in the code, which creates a fault in the state, which propagates until it results in a failure. Tracing back fault propagation allows identifying the defect that causes the failure.
In debugging, the scientific method allows systematically identifying failure causes by gradually refining and refuting hypotheses based on experiments and observations.
Before fixing the defect, have a complete diagnosis that
- shows causality (how the defect causes the failure)
- shows incorrectness (how the defect is wrong)
You can become a better debugger by
- Following a systematic process like the scientific method
- Keeping a log of your observations and hypotheses
- Making your observations and conclusions explicit by telling them somebody (or something).

Next Steps¶

In the next chapters, we will learn how to

Enjoy!

Background¶

There are several good books on debugging, but these three are especially recommended:

Debugging by Agans \cite{agans2006-debugging} takes a pragmatic approach to debugging, highlighting systematic approaches that help for all kinds of application-specific problems;
Why Programs Fail by Zeller \cite{zeller2009-why-programs-fail} takes a more academic approach, creating theories of how failures come to be and systematic debugging processes;
Effective Debugging by Spinellis \cite{spinellis2016-effective-debugging} aims for a middle ground between the two, creating general recipes and recommendations that easily instantiate towards specific problems.

All these books focus on manual debugging and the debugging process, just like this chapter; for automated debugging, simply read on :-)

Exercises¶

Exercise 1: Get Acquainted with Notebooks and Python¶

Your first exercise in this book is to get acquainted with notebooks and Python, such that you can run the code examples in the book – and try out your own. Here are a few tasks to get you started.

Beginner Level: Run Notebooks in Your Browser¶

The easiest way to get access to the code is to run them in your browser.

From the Web Page, check out the menu at the top. Select Resources $\rightarrow$ Edit as Notebook.
After a short waiting time, this will open a Jupyter Notebook right within your browser, containing the current chapter as a notebook.
You can again scroll through the material, but you click on any code example to edit and run its code (by entering Shift + Return). You can edit the examples as you please.
Note that code examples typically depend on earlier code, so be sure to run the preceding code first.
Any changes you make will not be saved (unless you save your notebook to disk).

For help on Jupyter Notebooks, from the Web Page, check out the Help menu.

Advanced Level: Run Python Code on Your Machine¶

This is useful if you want to make greater changes, but do not want to work with Jupyter.

From the Web Page, check out the menu at the top. Select Resources $\rightarrow$ Download Code.
This will download the Python code of the chapter as a single Python .py file, which you can save to your computer.
You can then open the file, edit it, and run it in your favorite Python environment to re-run the examples.
Most importantly, you can import it into your own code and reuse functions, classes, and other resources.

For help on Python, from the Web Page, check out the Help menu.

Pro Level: Run Notebooks on Your Machine¶

This is useful if you want to work with Jupyter on your machine. This will allow you to also run more complex examples, such as those with graphical output.

From the Web Page, check out the menu at the top. Select Resources $\rightarrow$ All Notebooks.
This will download all Jupyter Notebooks as a collection of .ipynb files, which you can save to your computer.
You can then open the notebooks in Jupyter Notebook or Jupyter Lab, edit them, and run them. To navigate across notebooks, open the notebook 00_Table_of_Contents.ipynb.
You can also download individual notebooks using Select Resources $\rightarrow$ Download Notebook. Running these, however, will require that you have the other notebooks downloaded already.

For help on Jupyter Notebooks, from the Web Page, check out the Help menu.

Boss Level: Contribute!¶

This is useful if you want to contribute to the book with patches or other material. It also gives you access to the very latest version of the book.

From the Web Page, check out the menu at the top. Select Resources $\rightarrow$ GitHub Repo.
This will get you to the GitHub repository which contains all sources of the book, including the latest notebooks.
You can then clone this repository to your disk, such that you get the latest and greatest.
You can report issues and suggest pull requests on the GitHub page.
Updating the repository with git pull will get you updated.

If you want to contribute code or text, check out the Guide for Authors.

Exercise 2: More Bugs!¶

You may have noticed that our remove_html_markup() function is still not working perfectly under all circumstances. The error has something to do with different quotes occurring in the input.

Part 1: Find the Problem¶

What does the problem look like? Set up a test case that demonstrates the problem.

In [60]:

assert(...)

Set up additional test cases as useful.

Solution. The remaining problem stems from the fact that in remove_html_markup(), we do not differentiate between single and double quotes. Hence, if we have a quote within a quoted text, the function may get confused. Notably, a string that begins with a double quote may be interpreted as ending when a single quote is seen, and vice versa. Here's an example of such a string:

<b title="<Shakespeare's play>">foo</b>

When we remove the HTML markup, the > in the string is interpreted as unquoted. Hence, it is interpreted as ending the tag, such that the rest of the tag is not removed.

In [61]:

s = '<b title="<Shakespeare' + "'s play>" + '">foo</b>'
s

Out[61]:

'<b title="<Shakespeare\'s play>">foo</b>'

In [62]:

remove_html_markup(s)

Out[62]:

'"foo'

In [63]:

with ExpectError():
    assert(remove_html_markup(s) == "foo")

Traceback (most recent call last):
  File "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_20417/2099129847.py", line 2, in <cell line: 1>
    assert(remove_html_markup(s) == "foo")
AssertionError (expected)

Part 2: Identify Extent and Cause¶

Using the scientific method, identify the extent and cause of the problem. Write down your hypotheses and log your observations, as in

Input	Expectation	Output	Outcome
(input)	(expectation)	(output)	(outcome)

Solution. The first step is obviously

Input	Expectation	Output	Outcome
foo	foo	"foo	✘

Part 3: Fix the Problem¶

Design a fix for the problem. Show that it satisfies the earlier tests and does not violate any existing test.

Solution. Here's an improved implementation that actually tracks the opening and closing quote by storing the quoting character in the quote variable. (If quote is '', we are not in a string.)

In [64]:

def remove_html_markup_with_proper_quotes(s):  # type: ignore
    tag = False
    quote = ''
    out = ""

    for c in s:
        assert tag or quote == ''

        if c == '<' and quote == '':
            tag = True
        elif c == '>' and quote == '':
            tag = False
        elif (c == '"' or c == "'") and tag and quote == '':
            # beginning of string
            quote = c
        elif c == quote:
            # end of string
            quote = ''
        elif not tag:
            out = out + c

    return out

Python enthusiasts may note that we could also write not quote instead of quote == '', leaving most of the original code untouched. We stick to classic Boolean comparisons here.

The function now satisfies the earlier failing test:

In [65]:

assert(remove_html_markup_with_proper_quotes(s) == "foo")

as well as all our earlier tests:

In [66]:

assert remove_html_markup_with_proper_quotes(
    "Here's some <strong>strong argument</strong>.") == \
    "Here's some strong argument."
assert remove_html_markup_with_proper_quotes(
    '<input type="text" value="<your name>">') == ""
assert remove_html_markup_with_proper_quotes('<b>foo</b>') == 'foo'
assert remove_html_markup_with_proper_quotes('<b>"foo"</b>') == '"foo"'
assert remove_html_markup_with_proper_quotes('"<b>foo</b>"') == '"foo"'
assert remove_html_markup_with_proper_quotes('<b id="bar">foo</b>') == 'foo'