#!/usr/bin/env python # coding: utf-8 # # Isolating Failure-Inducing Changes # # "Yesterday, my program worked. Today, it does not. Why?" In debugging, as elsewhere in software development, code keeps on changing. Thus, it can happen that a piece of code that yesterday was working perfectly, today no longer runs – because we (or others) have made some changes to it that cause it to fail. The good news is that for debugging, we can actually _exploit_ this version history to narrow down _the changes that caused the failure_ – be it by us or by others. # In[ ]: from bookutils import YouTubeVideo YouTubeVideo("hX9ViNEXGL8") # **Prerequisites** # # * You should have read the [Chapter on Delta Debugging](DeltaDebugger.ipynb). # * Knowledge on version control systems (notably git) will be useful. # In[ ]: import bookutils # In[ ]: from bookutils import quiz, print_file, print_content # In[ ]: # ignore from typing import Dict, Callable, TextIO, List, Tuple, Set, Any, Type # ## Synopsis # # # To [use the code provided in this chapter](Importing.ipynb), write # # ```python # >>> from debuggingbook.ChangeDebugger import # ``` # # and then make use of the following features. # # # This chapter introduces a class `ChangeDebugger` that automatically determines failure-inducing code changes. # # ### High-Level Interface # # You are given two Python source codes `source_pass` and `source_fail`, and a function `test()` that works using the definitions in `source_pass`, but raises an exception using the definitions in `source_fail`. Then, you can use `ChangeDebugger` as follows: # # ```python # with ChangeDebugger(source_pass, source_fail) as cd: # test() # cd # ``` # # This will produce the failure-inducing change between `source_pass` and `source_fail`, using [Delta Debugging](DeltaDebugger.ipynb) to determine minimal differences in patches applied. # # Here is an example. The function `test()` passes (raises no exception) if `remove_html_markup()` is defined as follows: # # ```python # >>> print_content(source_pass, '.py') # >>> def test() -> None: # >>> assert remove_html_markup('"foo"') == '"foo"' # >>> exec(source_pass) # >>> test() # ``` # If `remove_html_markup()` is changed as follows, though, then # `test()` raises an exception and fails: # # ```python # >>> print_content(source_fail, '.py') # >>> exec(source_fail) # >>> with ExpectError(AssertionError): # >>> test() # ``` # We can use `ChangeDebugger` to automatically identify the failure-inducing difference: # # ```python # >>> with ChangeDebugger(source_pass, source_fail) as cd: # >>> test() # >>> cd # ``` # The lines prefixed with `+` from are the ones in `source_fail` that cause the failure when added. (They also are the ones that should be fixed.) # # ### Programmatic Interface # # For more details or more automation, use the programmatic interface. The method `min_patches()` returns a triple (`pass_patches`, `fail_patches`, `diffs`) where # # * applying `pass_patches` still make the call pass # * applying `fail_patches` causes the call to fail # * `diffs` is the (minimal) difference between the two. # # The patches come as list of `patch_obj` objects, as defined by Google's [diff-match-patch library](https://github.com/google/diff-match-patch). # # ```python # >>> pass_patches, fail_patches, diffs = cd.min_patches() # ``` # One can apply all patches in `pass_patches` and still not cause the test to fail: # # ```python # >>> for p in pass_patches: # >>> print_patch(p) # ``` # However, as soon as we also apply the patches in `diffs`, we get the failure. (This is also what is shown when we output a `ChangeDebugger`.) # # ```python # >>> for p in diffs: # >>> print_patch(p) # ``` # The full set of methods in `ChangeDebugger` is shown below. # # ### Supporting Functions # # `ChangeDebugger` relies on lower level `patch()` and `diff()` functions. # # To apply patch objects on source code, use the `patch()` function. It takes a source code and a list of patches to be applied. # # ```python # >>> print_content(patch(source_pass, diffs), '.py') # ``` # Conversely, the `diff()` function computes patches between two texts. It returns a list of patch objects that can be applied on text. # # ```python # >>> for p in diff(source_pass, source_fail): # >>> print_patch(p) # ``` # # ## Changes and Bugs # # When you develop software, it is unlikely that you will be able to produce a fully working piece of software right from the beginning – hence the need for debugging. It is just as unlikely, though, that your software will stay _unchanged_ forever. Evolving requirements, the introduction of new technology, changes in the environment all induce software changes – and every such change brings the risk of introducing new bugs. # # To detect such bugs introduced by changes, systematic (and best automatic) _testing_, notably _regression testing_, can be a big help; in fact, the more comprehensive and the more automated your testing strategy is, the more it actually _enables_ evolving your software, because every new test lowers the risk of a change introducing a bug. Extensive testing is what enables agile development – and we're happy to point to our [sibling book on test generation](https://www.fuzzingbook.org) to give you some inspiration on how to do this. # # However, a test can only _detect_ failures, not _fix_ them. The more you change, the more you may need to fix, too. The good news is that there are a number of debugging techniques, manual and automated, that can actually _exploit_ the presence of older, working software versions to effectively narrow down the causes of failure in a new, failing software. # ## Leveraging Version Histories # # The fundamental prerequisite for exploiting older, working software versions is to have such older, working versions in the first place. If your software _always_ failed, then you will have to resort to conventional debugging. But if there is an older _working_ version, you can make use of it. # # We assume that you do have a _version repository_ such as git or SVN, which you use to organize software development and keep older software versions. (If you do _not_ use version control for your project, you are in [debugging hell](Intro_Debugging.ipynb). Go and set it up now, and come back once you're done.) # # If you have a version history, an older working version, and a new failing version, your situation is roughly as depicted in this diagram: # In[ ]: from graphviz import Digraph, nohtml # In[ ]: from IPython.display import display # In[ ]: # ignore PASS = "✔" FAIL = "✘" UNRESOLVED = "?" PASS_COLOR = 'darkgreen' # '#006400' # darkgreen FAIL_COLOR = 'red4' # '#8B0000' # darkred STEP_COLOR = 'peachpuff' FONT_NAME = 'Raleway' # In[ ]: # ignore def graph(comment: str = "default") -> Digraph: return Digraph(name='', comment=comment, graph_attr={ 'rankdir': 'LR', }, node_attr={ 'style': 'filled', 'shape': 'box', 'fillcolor': STEP_COLOR, 'fontname': FONT_NAME, }, edge_attr={ 'fontname': FONT_NAME, }) # In[ ]: # ignore VERSIONS = 8 def display_versions(outcomes: Dict[int, str]) -> Digraph: state_machine = graph() for version_number in range(1, VERSIONS + 1): id = f'v{version_number}' label = f' {outcomes [version_number]}' \ if version_number in outcomes else '' state_machine.node(id, label=f'{id}{label}') if version_number > 1: last_id = f'v{version_number - 1}' state_machine.edge(last_id, id) display(state_machine) # In[ ]: # ignore display_versions({1: PASS, 8: FAIL}) # Somewhere between the old version "v1" ("yesterday") and the current version "v8" ("today"), the software stopped working. But when exactly? And which change was it that caused the failure? # # You may think that this is an easy task: We simply manually test one version after another, thus determining the exact version that first failed. However, # # * this can take a long time (notably in the presence of dozens or even hundreds of versions); # * this may still leave you with _dozens of changes_ applied from one version to another; and # * this is actually a task that can be _fully automated_. # # And these "automated" debugging techniques are what we explore in this chapter. # ## An Example Version History # # As our ongoing example, we start with creating a little version history, using the git version management system. We follow the evolution of the `remove_html_markup()` versions from [the introduction to debugging](Intro_Debugging.ipynb) and [the chapter on assertions](Assertions.ipynb). # ### Create a Working Directory # # We start with creating a working folder (aptly named `my_project`) in which we will do our work. (Note: should you have a folder of that name, it will be deleted and re-initialized). # In[ ]: PROJECT = 'my_project' # In[ ]: import os import shutil # In[ ]: try: shutil.rmtree(PROJECT) except FileNotFoundError: pass os.mkdir(PROJECT) # We choose the project folder as our working directory. Any file we create will be created in that folder. # In[ ]: import sys # In[ ]: sys.path.append(os.getcwd()) os.chdir(PROJECT) # ### Initialize Git # # We set up a local Git repository in our local project folder. # In[ ]: get_ipython().system('git init') # In[ ]: get_ipython().system('git config user.name "Demo User"') # In[ ]: get_ipython().system('git config user.email "demo-user@example.com"') # In[ ]: get_ipython().system('git config advice.detachedHead False') # We are now ready to commit our first version. Here's the initial definition of `remove_html_markup()` from [the introduction to debugging](Intro_Debugging.ipynb). # In[ ]: def remove_html_markup(s): # type: ignore tag = False out = "" for c in s: if c == '<': # start of markup tag = True elif c == '>': # end of markup tag = False elif not tag: out = out + c return out # The function `write_source()` takes a function `fun` and writes its source code into a file of the same name – in our case, `remove_html_markup.py`: # In[ ]: import inspect # In[ ]: def write_source(fun: Callable, filename: str = None) -> None: if filename is None: filename = fun.__name__ + '.py' with open(filename, 'w') as fh: fh.write(inspect.getsource(fun)) # Here is `write_source()` in action: # In[ ]: write_source(remove_html_markup) # In[ ]: print_file('remove_html_markup.py') # With `git add` and `git commit`, we add the file to our version repository. The `-m` option defines a _message_ for the commit; this is how we (and potential co-workers) can later retrieve information on what has changed, and why. (The messages we use here are deliberately kept short.) # In[ ]: get_ipython().system('git add remove_html_markup.py') # In[ ]: get_ipython().system('git commit -m "First version"') # Let us now take the second (buggy) version of `remove_html_markup()` and again write this into our file, thus simulating changing the source code from the first version to the new version: # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif c == '"' or c == "'" and tag: quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # We can inspect the differences between the previously committed version and the current one. Lines prefixed with `+` are added; lines prefixed with `-` are deleted. # In[ ]: get_ipython().system('git diff remove_html_markup.py') # We can now commit this second version, adding it to our repository. # In[ ]: get_ipython().system('git commit -m "Second version" remove_html_markup.py') # We create a few more revisions. # ### Excursion: Adding More Revisions # We use the additional definitions for `remove_html_markup()` from [the introduction to debugging](Intro_Debugging.ipynb) as additional versions. # # These also include "debugging" versions with enabled logging statements, as well as "tentative" versions that may or may not fix the discussed issues. In a real version history, such transient versions would typically not show up – or at least not be made available to co-workers. # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: print("c =", repr(c), "tag =", tag, "quote =", quote) if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif c == '"' or c == "'" and tag: quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Third version (with debugging output)" remove_html_markup.py') # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: if c == '<': # and not quote: tag = True elif c == '>': # and not quote: tag = False elif c == '"' or c == "'" and tag: quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Fourth version (clueless)" remove_html_markup.py') # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: assert not tag # <=== Just added if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif c == '"' or c == "'" and tag: quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Fifth version (with assert)" remove_html_markup.py') # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif c == '"' or c == "'" and tag: assert False # <=== Just added quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Sixth version (with another assert)" remove_html_markup.py') # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif (c == '"' or c == "'") and tag: # <-- FIX quote = not quote elif not tag: out = out + c return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Seventh version (fixed)" remove_html_markup.py') # ### End of Excursion # Here comes the last version of `remove_html_markup()`, this one from [the chapter on assertions](Assertions.ipynb). # In[ ]: def remove_html_markup(s): # type: ignore tag = False quote = False out = "" for c in s: if c == '<' and not quote: tag = True elif c == '>' and not quote: tag = False elif c == '"' or c == "'" and tag: quote = not quote elif not tag: out = out + c # postcondition assert '<' not in out and '>' not in out return out # In[ ]: write_source(remove_html_markup) # In[ ]: get_ipython().system('git commit -m "Eighth version (with proper assertion)" remove_html_markup.py') # We finally reached the "today" state with the latest version – and find that the latest version has an error. This should come to you as no surprise if you have read the earlier chapters. But if you haven't, you will find that when the argument `s` contains double quotes, these are stripped from the output: # In[ ]: remove_html_markup('"foo"') # Consequently, this test assertion fails: # In[ ]: from ExpectError import ExpectError # In[ ]: with ExpectError(): assert remove_html_markup('"foo"') == '"foo"' # Note that the failure does _not_ occur in the very first version, as introduced above. So the simple question is: # # * What is the change that caused the failure? # ## Accessing Versions # # To find out the failure-inducing change, we first need to be able to access older versions. The command # # ```sh # git log # ``` # # gives us a listing of all commits: # In[ ]: get_ipython().system('git log') # Using the `subprocess` module, we can run `git log` and access its output. # In[ ]: import subprocess # In[ ]: def get_output(command: List[str]) -> str: result = subprocess.run(command, stdout=subprocess.PIPE, universal_newlines=True) return result.stdout # The output of `git log` contains the ID of the version (the so-called *commit hash*) as well as the message provided during the commit. # In[ ]: log = get_output(['git', 'log', '--pretty=oneline']) print(log) # Each hash uniquely identifies a version, and is required to access it. Let us create a list `versions`, where `versions[0]` contains the hash (the id) of the first version, `versions[1]` the second version, and so on. # In[ ]: versions = [line.split()[0] for line in log.split('\n') if line] versions.reverse() versions[0] # We can now check out the first version: # In[ ]: get_ipython().system('git checkout {versions[0]}') # In[ ]: print_file('remove_html_markup.py') # If we read in this definition of `remove_html_markup()` from the first version, we will find that the failure was not yet present: # In[ ]: exec(open('remove_html_markup.py').read()) # In[ ]: remove_html_markup('"foo"') # However, if we check out the _last_ version of that file ... # In[ ]: get_ipython().system('git checkout {versions[7]}') # In[ ]: print_file('remove_html_markup.py') # ... we find that this is the version that no longer works. # In[ ]: exec(open('remove_html_markup.py').read()) # In[ ]: remove_html_markup('"foo"') # ## Manual Bisecting # # As stated above, we could now go and try out one version after another to see with which version the bug was introduced. But again, proceeding in such a _linear_ fashion would be very inefficient. It is much better to proceed by _binary search_: If you know that version $v_n$ passed, and version $v_m$ failed (with $m >n$), then test a version $v' = v_{n + (m - n)/2}$ that is right in the _middle_ between the two. # # * If $v'$ passes, then repeat the process with $v'$ and $v_m$. # * If $v'$ fails, then repeat the process with $v_n$ and $v'$. # # Such *bisecting* quickly progresses towards the failure-inducing version, as it requires you to take only a logarithmic number of tests. In contrast, progressing linearly through the version history requires a test for each version, which is far more effort. # If you use the git version control system, such bisecting is actually a _built-in feature_, coming to you through the `git bisect` command. Let us illustrate how `git bisect` quickly identifies the version that introduced the error. # A bisecting section with `git` starts with the command `bisect start`. # In[ ]: get_ipython().system('git bisect start') # Then, you use `git bisect good` to identify the version that worked, and `git bisect bad` to identify the version that was bad – in our case, the hashes of the first and last version. # In[ ]: get_ipython().system('git bisect good {versions[0]}') # In[ ]: get_ipython().system('git bisect bad {versions[7]}') # We find that `git bisect` automatically has checked out the _middle_ version between the passing and the failing one – in our case, version 4 – and now asks us to assess this version. # In[ ]: # ignore display_versions({1: PASS, 4: UNRESOLVED, 8: FAIL}) # The version is already in our working folder: # In[ ]: print_file('remove_html_markup.py') # We now need to test this version, and let `git bisect` know the outcome – with # # ```sh # git bisect good # ``` # # if the test passes, and with # # ```sh # git bisect bad # ``` # # if the test fails. # It turns out that this version fails: # In[ ]: exec(open('remove_html_markup.py').read()) # In[ ]: remove_html_markup('"foo"') # So we enter `git bisect bad`: # In[ ]: get_ipython().system('git bisect bad') # `git bisect` has chosen version 3 to assess – again in the middle between a passing and a failing version: # In[ ]: # ignore display_versions({1: PASS, 3: UNRESOLVED, 4: FAIL, 8: FAIL}) # So let us test this version and find that it fails, too: # In[ ]: print_file('remove_html_markup.py') # In[ ]: exec(open('remove_html_markup.py').read()) # In[ ]: remove_html_markup('"foo"') # We mark the version as `bad`. `git bisect` then checks out version 2 as the last version to assess. # In[ ]: get_ipython().system('git bisect bad') # In[ ]: # ignore display_versions({1: PASS, 2: UNRESOLVED, 3: FAIL, 4: FAIL, 8: FAIL}) # When we test version 2, we find that it fails as well: # In[ ]: print_file('remove_html_markup.py') # In[ ]: exec(open('remove_html_markup.py').read()) # In[ ]: remove_html_markup('"foo"') # Hence, version 2 is _the version that introduced the error_. # In[ ]: # ignore display_versions({1: PASS, 2: FAIL, 3: FAIL, 4: FAIL, 8: FAIL}) # When we let `git bisect` know that this version fails, it tells us that this version is indeed the "first bad commit": # In[ ]: get_ipython().system('git bisect bad') # By comparing this version against the older one, we can see the lines it introduced – namely (buggy) handling of double quotes: # In[ ]: get_ipython().system('git diff HEAD^') # Now that we have identified the failure-inducing change ("something is wrong in `remove_html_markup()`, and it has to do with quote handling"), we can end our `git bisect` session. `git bisect reset` gets us back to the start, such that we can fix the most recent version. # In[ ]: get_ipython().system('git bisect reset') # ## Automatic Bisecting # Even though manual bisecting can be quick, we can speed things up by writing a _script_ that does the testing for us. With such a script, we can have `git bisect` run fully automatically. # A test script to automate bisecting does the following: # # * It (re)builds the program under test for the given version # * It tests whether the failure is present. # # Its exit code indicates the test outcome: # # * 0 means "good" (the failure did not occur) # * 1 means "bad" (the failure did occur) # * 125 means "undetermined" (we cannot decide if the failure is present or not) # # The latter ("undetermined") case may occur if the program fails to build, or shows some other behavior. # We use a Python script `test.py` that reads in `remove_html_markup.py` and then tests for the presence or absence of the failure. (Since this is Python, we don't have to rebuild things.) # In[ ]: # ignore open('test.py', 'w').write(''' #!/usr/bin/env python from remove_html_markup import remove_html_markup import sys result = remove_html_markup('"foo"') if result == '"foo"': sys.exit(0) # good/pass elif result == 'foo': sys.exit(1) # bad/fail else: sys.exit(125) # unresolved '''); # In[ ]: print_file('test.py') # Right now, we are with version 8 and thus in the "failing" state – our script exits with a code of 1: # In[ ]: get_ipython().system('python ./test.py; echo $?') # Let us use our test script to bisect automatically. As with manual bisecting, we first have to tell `git bisect` which the good (passing) and bad (failing) versions are: # In[ ]: get_ipython().system('git bisect start') # In[ ]: get_ipython().system('git bisect good {versions[0]}') # In[ ]: get_ipython().system('git bisect bad {versions[7]}') # Now, we can make use of our script. `git bisect run