Mining Function Specifications¶

When testing a program, one not only needs to cover its several behaviors; one also needs to check whether the result is as expected. In this chapter, we introduce a technique that allows us to mine function specifications from a set of given executions, resulting in abstract and formal descriptions of what the function expects and what it delivers.

These so-called dynamic invariants produce pre- and post-conditions over function arguments and variables from a set of executions. They are useful in a variety of contexts:

Dynamic invariants provide important information for symbolic fuzzing, such as types and ranges of function arguments.
Dynamic invariants provide pre- and postconditions for formal program proofs and verification.
Dynamic invariants provide a large number of assertions that can check whether function behavior has changed.

Traditionally, dynamic invariants are dependent on the executions they are derived from. However, when paired with comprehensive test generators, they quickly become very precise, as we show in this chapter.

Prerequisites

You should be familiar with tracing program executions, as in the chapter on coverage.
Later in this section, we make use of Python program transformations; some knowledge on the Python AST functions is helpful.
The interplay with symbolic testing builds on, well, symbolic testing,

In [ ]:

import fuzzingbook_utils

In [ ]:

import Intro_Testing

Specifications and Assertions¶

When implementing a function or program, one usually works against a specification – a set of documented requirements to be satisfied by the code. Such specifications can come in natural language. A formal specification, however, allows the computer to check whether the specification is satisfied.

In the introduction to testing, we have seen how preconditions and postconditions can describe what a function does. Consider the following (simple) square root function:

In [ ]:

def my_sqrt(x):
    assert x >= 0  # Precondition
    
    ...
    
    assert result * result == x  # Postcondition
    return result

The assertion assert p checks the condition p; if it does not hold, execution is aborted. Here, the actual body is not yet written; we use the assertions as a specification of what my_sqrt() expects, and what it delivers.

The topmost assertion is the precondition, stating the requirements on the function arguments. The assertion at the end is the postcondition, stating the properties of the function result (including its relationship with the original arguments). Using these pre- and postconditions as a specification, we can now go and implement a square root function that satisfies them. Once implemented, we can have the assertions check at runtime whether my_sqrt() works as expected; a symbolic or concolic test generator will even specifically try to find inputs where the assertions do not hold. (An assertion can be seen as a conditional branch towards aborting the execution, and any technique that tries to cover all code branches will also try to invalidate as many assertions as possible.)

However, not every piece of code is developed with explicit specifications in the first place; let alone does most code comes with formal pre- and post-conditions. (Just take a look at the chapters in this book.) This is a pity: As Ken Thompson famously said, "Without specifications, there are no bugs – only surprises". It is also a problem for testing, since, of course, testing needs some specification to test against. This raises the interesting question: Can we somehow retrofit existing code with "specifications" that properly describe their behavior, allowing developers to simply check them rather than having to write them from scratch? This is what we do in this chapter.}

Mining Type Specifications¶

For our Python code, one of the most important "specifications" we need is types. Python being a "dynamically" typed language means that all data types are determined at run time; the code itself does not explicitly state whether a variable is an integer, a string, an array, a dictionary – or whatever. As writer of Python code, omitting explicit type declarations may save time (and allows for some fun hacks). It is not sure whether a lack of types helps in reading and understanding code for humans. For a computer trying to analyze code, the lack of explicit types is detrimental. If, say, a constraint solver, sees if x: and cannot know whether x is supposed to be a number or a string, this introduces an ambiguity which multiplies over the entire analysis in a combinatorial explosion. Our first task thus will be to mine static types (as part of the code) from values we observe at run time.

How can we mine types from executions? The answer is simple:

We observe a function during execution
We track the types of its arguments
We include these types as annotations or assertions into the codee.

To do so, we can make use of Python's tracing facility we already observed in the chapter on coverage. With every call to a function, we retrieve the arguments, their values, and their types.

As an example, consider the full implementation of my_sqrt() from the introduction to testing:

In [ ]:

import fuzzingbook_utils

In [ ]:

def my_sqrt(x):
    """Computes the square root of x, using the Newton-Raphson method"""
    approx = None
    guess = x / 2
    while approx != guess:
        approx = guess
        guess = (approx + x / approx) / 2
    return approx

my_sqrt() does not come with any assertions that would check types or values:

In [ ]:

from ExpectError import ExpectError, ExpectTimeout

In [ ]:

with ExpectError():
    my_sqrt("foo")

In [ ]:

with ExpectTimeout(1):
    x = my_sqrt(-1)

Our goal is to determine types from normal invocations such as this one:

In [ ]:

my_sqrt(25)

Tracking Calls¶

We can define a tracer function that tracks the execution of my_sqrt(), listing its arguments and return values:

In [ ]:

import sys

In [ ]:

class Tracker(object):
    def __init__(self, log=False):
        self._log = log
        self.reset()

    def reset(self):
        self._calls = {}
        self._stack = []

    # Start of `with` block
    def __enter__(self):
        self.original_trace_function = sys.gettrace()
        sys.settrace(self.traceit)
        return self

    # End of `with` block
    def __exit__(self, exc_type, exc_value, tb):
        sys.settrace(self.original_trace_function)

In [ ]:

def get_arguments(frame):
    """Return call arguments in the given frame"""
    # When called, all arguments are local variables
    arguments = [(var, frame.f_locals[var]) for var in frame.f_locals]
    arguments.reverse()  # Want same order as call
    return arguments

In [ ]:

def simple_call_string(function_name, argument_list):
    """Return function_name(arg[0], arg[1], ...) as a string"""
    return function_name + "(" + \
        ", ".join([var + "=" + repr(value)
                   for (var, value) in argument_list]) + ")"

In [ ]:

class CallTracker(Tracker):
    def add_call(self, function_name, arguments, return_value=None):
        """Add given call to list of calls"""
        if function_name not in self._calls:
            self._calls[function_name] = []
        self._calls[function_name].append((arguments, return_value))

    # Tracking function: Record all calls and all args
    def traceit(self, frame, event, arg):
        if event == "call":
            self.trace_call(frame, event, arg)
        elif event == "return":
            self.trace_return(frame, event, arg)
            
        return self.traceit
        
    def trace_call(self, frame, event, arg):
        code = frame.f_code
        function_name = code.co_name
        arguments = get_arguments(frame)
        self._stack.append((function_name, arguments))

        if self._log:
            print(simple_call_string(function_name, arguments))

    def trace_return(self, frame, event, arg):
        code = frame.f_code
        function_name = code.co_name
        return_value = arg
        
        called_function_name, called_arguments = self._stack.pop()
        assert function_name == called_function_name
        
        if self._log:
            print(simple_call_string(function_name, called_arguments), "returns", return_value)
            
        self.add_call(function_name, called_arguments, return_value)

In [ ]:

class CallTracker(CallTracker):
    def calls(self, function_name=None):
        if function_name is None:
            return self._calls

        return self._calls[function_name]

In [ ]:

with CallTracker(log=True) as tracker:
    y = my_sqrt(25)
    y = my_sqrt(2.0)

In [ ]:

calls = tracker.calls('my_sqrt')
calls

In [ ]:

def simple_call_string(function_name, argument_list, return_value = None):
    """Return function_name(arg[0], arg[1], ...) as a string"""
    call = function_name + "(" + \
        ", ".join([var + "=" + repr(value)
                   for (var, value) in argument_list]) + ")"

    if return_value is not None:
        call += " = " + repr(return_value)
        
    return call

\todo{Import these into Carver}

In [ ]:

argument_list, return_value = calls[0]
simple_call_string('my_sqrt', argument_list, return_value)

Getting Types¶

Python has an elaborate type system:

In [ ]:

type(4)

In [ ]:

type(2.0)

In [ ]:

type([4])

We can retrieve the type of the first argument to my_sqrt():

In [ ]:

parameter, value = argument_list[0]
parameter, type(value)

as well as the type of the return value:

In [ ]:

type(return_value)

Hence, we see that (so far), my_sqrt() is a function taking (among others) integers and returning floats. We could declare my_sqrt() as:

In [ ]:

def my_sqrt_annotated(x: int) -> float:
    return my_sqrt(x)

This is a representation we could place in a static type checker, allowing to check whether calls to my_sqrt() actually pass a number. A dynamic type checker could run such checks at runtime. And of course, any symbolic interpretation will greatly profit from the additional annotations.

By default, Python does not do anything with such annotations. However, tools can access annotations from functions and other objects:

In [ ]:

my_sqrt_annotated.__annotations__

Annotating Functions¶

Let us go and annotate functions automatically, based on the types we have seen.

In [ ]:

import ast
import inspect
import astunparse

We can get the source of a function (if it is in the same file)

In [ ]:

from fuzzingbook_utils import print_content

In [ ]:

from pygments.lexers import PythonLexer

In [ ]:

my_sqrt_source = inspect.getsource(my_sqrt)
print_content(my_sqrt_source, '.py')

Parsing this gives us an abstract syntax tree (AST):

In [ ]:

my_sqrt_ast = ast.parse(my_sqrt_source)

In [ ]:

ast.dump(my_sqrt_ast)

Printing this out as Python code is a bit more readable:

In [ ]:

print(astunparse.unparse(my_sqrt_ast))

We want to transform this adding annotations.

First, a helper function to parse type names:

In [ ]:

def parse_type(name):
    class ValueVisitor(ast.NodeVisitor):
        def visit_Expr(self, node):
            self.value_node = node.value
        
    tree = ast.parse(name)
    name_visitor = ValueVisitor()
    name_visitor.visit(tree)
    return name_visitor.value_node

In [ ]:

ast.dump(parse_type('int'))

In [ ]:

ast.dump(parse_type('[Any]'))

Now, a helper to actually add type annotations to a function AST. This would be called as

TypeTransformer({'x': 'int'}, 'float').visit(ast)

to annotate the arguments of my_sqrt(): x with int, and the return type with float. The returned AST can then be unparsed, compiled or analyzed.

In [ ]:

class TypeTransformer(ast.NodeTransformer):
    def __init__(self, argument_types, return_type=None):
        self.argument_types = argument_types
        self.return_type = return_type
        super().__init__()

In [ ]:

class TypeTransformer(TypeTransformer):
    def annotate_arg(self, arg):
        """Add annotation to single function argument"""
        arg_name = arg.arg
        if arg_name in self.argument_types:
            arg.annotation = parse_type(self.argument_types[arg_name])
        return arg

In [ ]:

class TypeTransformer(TypeTransformer):
    def visit_FunctionDef(self, node):
        """Add annotation to function"""
        # Set argument types
        new_args = []
        for arg in node.args.args:
            new_args.append(self.annotate_arg(arg))

        new_arguments = ast.arguments(
            new_args,
            node.args.vararg,
            node.args.kwonlyargs,
            node.args.kw_defaults,
            node.args.kwarg,
            node.args.defaults
        )

        # Set return type
        node.returns = parse_type(self.return_type)
        
        return ast.copy_location(ast.FunctionDef(node.name, new_arguments, 
                                                 node.body, node.decorator_list,
                                                 node.returns), node)

Does this work? Yes!

In [ ]:

new_ast = TypeTransformer({'x': 'int'}, 'float').visit(my_sqrt_ast)

In [ ]:

print_content(astunparse.unparse(new_ast), '.py')

All Together¶

In [ ]:

with CallTracker() as tracker:
    y = my_sqrt(25)
    y = my_sqrt(2.0)

In [ ]:

tracker.calls()

In [ ]:

def type_string(value):
    return type(value).__name__

In [ ]:

type_string(4)

In [ ]:

type_string([])

In [ ]:

def annotate_types(calls):
    annotated_functions = {}
    
    for function_name in calls:
        annotated_functions[function_name] = annotate_types_function(function_name, calls[function_name])

    return annotated_functions
        
def annotate_types_function(function_name, function_calls):
    function = globals()[function_name]
    function_code = inspect.getsource(function)
    function_ast = ast.parse(function_code)
    return annotate_types_function_ast(function_ast, function_calls)

def annotate_types_function_ast(function_ast, function_calls):
    parameter_types = {}
    return_type = None
        
    for calls_seen in function_calls:
        args, return_value = calls_seen
        if return_value is not None:
            return_type = type_string(return_value)
        for parameter, value in args:
            parameter_types[parameter] = type_string(value)
        
    annotated_function_ast = TypeTransformer(parameter_types, return_type).visit(function_ast)
    return annotated_function_ast

In [ ]:

print_content(astunparse.unparse(annotate_types(tracker.calls())['my_sqrt']), '.py')

In [ ]:

class TypeAnnotator(CallTracker):
    def typed_functions_ast(self, function_name=None):
        if function_name is None:
            return annotate_types(self.calls())
        
        return annotate_types_function(function_name, self.calls(function_name))
    
    def typed_functions(self):
        return ''.join([astunparse.unparse(self.typed_functions_ast(function_name))         
                            for function_name in self.calls()])

In [ ]:

with TypeAnnotator() as annotator:
    y = my_sqrt(25)
    y = my_sqrt(2.0)

In [ ]:

print_content(annotator.typed_functions(), '.py')

In [ ]:

def sum3(a, b, c):
    return a + b + c

In [ ]:

with TypeAnnotator() as annotator:
    y = my_sqrt(1.0)
    y = sum3(1.0, 2.0, 3.0)

In [ ]:

print_content(annotator.typed_functions(), '.py')

In [ ]:

with TypeAnnotator() as annotator:
    y = sum3("one", "two", "three")

In [ ]:

print_content(annotator.typed_functions(), '.py')

Use in Symbolic Testing¶

\todo{add}

In [ ]:

import SymbolicFuzzer  # minor dependency

Mining Specifications from Generated Tests¶

In [ ]:

import GrammarFuzzer  # minor dependency

Mining Complex Invariants¶

\todo{add}

Mining Ranges¶

\todo{add}

Patterns for Pre- and Postconditions¶

\todo{add}

Lessons Learned¶

Lesson one
Lesson two
Lesson three

Next Steps¶

Link to subsequent chapters (notebooks) here, as in:

Background¶

Cite relevant works in the literature and put them into context, as in:

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

Exercises¶

Close the chapter with a few exercises such that people have things to do. To make the solutions hidden (to be revealed by the user), have them start with

**Solution.**

Your solution can then extend up to the next title (i.e., any markdown cell starting with #).

Running make metadata will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user. The button will be introduced above the solution.

Exercise 1: Title¶

Text of the exercise

In [ ]:

# Some code that is part of the exercise
pass

Some more text for the exercise

Solution. Some text for the solution

In [ ]:

# Some code for the solution
2 + 2

Some more text for the solution

Exercise 2: Title¶

Text of the exercise

Solution. Solution for the exercise