#!/usr/bin/env python
# coding: utf-8
# # Testing Web Applications
#
# In this chapter, we explore how to generate tests for Graphical User Interfaces (GUIs), notably on Web interfaces. We set up a (vulnerable) Web server and demonstrate how to systematically explore its behavior – first with handwritten grammars, then with grammars automatically inferred from the user interface. We also show how to conduct systematic attacks on these servers, notably with code and SQL injection.
# In[1]:
from bookutils import YouTubeVideo
YouTubeVideo('cgtpQ2KLZC8')
**Prerequisites**
* The techniques in this chapter make use of [grammars for fuzzing](Grammars.ipynb).
* Basic knowledge of HTML and HTTP is required.
* Knowledge of SQL databases is helpful.
# ## Synopsis
#
#
# To [use the code provided in this chapter](Importing.ipynb), write
#
# ```python
# >>> from fuzzingbook.WebFuzzer import
# ```
#
# and then make use of the following features.
#
#
# This chapter provides a simple (and vulnerable) Web server and two experimental fuzzers that are applied to it.
#
# ### Fuzzing Web Forms
#
# `WebFormFuzzer` demonstrates how to interact with a Web form. Given a URL with a Web form, it automatically extracts a grammar that produces a URL; this URL contains values for all form elements. Support is limited to GET forms and a subset of HTML form elements.
#
# Here's the grammar extracted for our vulnerable Web server:
#
# ```python
# >>> web_form_fuzzer = WebFormFuzzer(httpd_url)
# >>> web_form_fuzzer.grammar['']
# ['?']
# >>> web_form_fuzzer.grammar['']
# ['/order']
# >>> web_form_fuzzer.grammar['']
# ['&&&&&&']
# ```
# Using it for fuzzing yields a path with all form values filled; accessing this path acts like filling out and submitting the form.
#
# ```python
# >>> web_form_fuzzer.fuzz()
# '/order?item=lockset&name=%43+&email=+c%40_+c&city=%37b_4&zip=5&terms=on&submit='
# ```
# Repeated calls to `WebFormFuzzer.fuzz()` invoke the form again and again, each time with different (fuzzed) values.
#
# Internally, `WebFormFuzzer` builds on a helper class named `HTMLGrammarMiner`; you can extend its functionality to include more features.
#
# ### SQL Injection Attacks
#
# `SQLInjectionFuzzer` is an experimental extension of `WebFormFuzzer` whose constructor takes an additional _payload_ – an SQL command to be injected and executed on the server. Otherwise, it is used like `WebFormFuzzer`:
#
# ```python
# >>> sql_fuzzer = SQLInjectionFuzzer(httpd_url, "DELETE FROM orders")
# >>> sql_fuzzer.fuzz()
# "/order?item=lockset&name=+&email=0%404&city=+'+)%3b+DELETE+FROM+orders%3b+--&zip='+OR+1%3d1--'&terms=on&submit="
# ```
# As you can see, the path to be retrieved contains the payload encoded into one of the form field values.
#
# Internally, `SQLInjectionFuzzer` builds on a helper class named `SQLInjectionGrammarMiner`; you can extend its functionality to include more features.
#
# `SQLInjectionFuzzer` is a proof-of-concept on how to build a malicious fuzzer; you should study and extend its code to make actual use of it.
#
# ![](PICS/WebFuzzer-synopsis-1.svg)
#
#
# ## A Web User Interface
#
# Let us start with a simple example. We want to set up a _Web server_ that allows readers of this book to buy fuzzingbook-branded fan articles ("swag"). In reality, we would make use of an existing Web shop (or an appropriate framework) for this purpose. For the purpose of this book, we _write our own Web server_, building on the HTTP server facilities provided by the Python library.
# ### Excursion: Implementing a Web Server
# All of our Web server is defined in a `HTTPRequestHandler`, which, as the name suggests, handles arbitrary Web page requests.
# In[2]:
from http.server import HTTPServer, BaseHTTPRequestHandler
from http.server import HTTPStatus # type: ignore
# In[3]:
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
"""A simple HTTP server"""
pass
# #### Taking Orders
#
# For our Web server, we need a number of Web pages:
# * We want one page where customers can place an order.
# * We want one page where they see their order confirmed.
# * Additionally, we need pages display error messages such as "Page Not Found".
# We start with the order form. The dictionary `FUZZINGBOOK_SWAG` holds the items that customers can order, together with long descriptions:
# In[4]:
import bookutils.setup
# In[5]:
from typing import NoReturn, Tuple, Dict, List, Optional, Union
# In[6]:
FUZZINGBOOK_SWAG = {
"tshirt": "One FuzzingBook T-Shirt",
"drill": "One FuzzingBook Rotary Hammer",
"lockset": "One FuzzingBook Lock Set"
}
# This is the HTML code for the order form. The menu for selecting the swag to be ordered is created dynamically from `FUZZINGBOOK_SWAG`. We omit plenty of details such as precise shipping address, payment, shopping cart, and more.
# In[7]:
HTML_ORDER_FORM = """
"""
# This is what the order form looks like:
# In[8]:
from IPython.display import display
# In[9]:
from bookutils import HTML
# In[10]:
HTML(HTML_ORDER_FORM)
# This form is not yet functional, as there is no server behind it; pressing "place order" will lead you to a nonexistent page.
# #### Order Confirmation
#
# Once we have gotten an order, we show a confirmation page, which is instantiated with the customer information submitted before. Here is the HTML and the rendering:
# In[11]:
HTML_ORDER_RECEIVED = """
Thank you for your Fuzzingbook Order!
We will send {item_name} to {name} in {city}, {zip}
A confirmation mail will be sent to {email}.
"""
# In[12]:
HTML(HTML_ORDER_RECEIVED.format(item_name="One FuzzingBook Rotary Hammer",
name="Jane Doe",
email="doe@example.com",
city="Seattle",
zip="98104"))
# #### Terms and Conditions
#
# A website can only be complete if it has the necessary legalese. This page shows some terms and conditions.
# In[13]:
HTML_TERMS_AND_CONDITIONS = """
"""
# In[14]:
HTML(HTML_TERMS_AND_CONDITIONS)
# #### Storing Orders
# To store orders, we make use of a *database*, stored in the file `orders.db`.
# In[15]:
import sqlite3
import os
# In[16]:
ORDERS_DB = "orders.db"
# To interact with the database, we use *SQL commands*. The following commands create a table with five text columns for item, name, email, city, and zip – the exact same fields we also use in our HTML form.
# In[17]:
def init_db():
if os.path.exists(ORDERS_DB):
os.remove(ORDERS_DB)
db_connection = sqlite3.connect(ORDERS_DB)
db_connection.execute("DROP TABLE IF EXISTS orders")
db_connection.execute("CREATE TABLE orders "
"(item text, name text, email text, "
"city text, zip text)")
db_connection.commit()
return db_connection
# In[18]:
db = init_db()
# At this point, the database is still empty:
# In[19]:
print(db.execute("SELECT * FROM orders").fetchall())
# We can add entries using the SQL `INSERT` command:
# In[20]:
db.execute("INSERT INTO orders " +
"VALUES ('lockset', 'Walter White', "
"'white@jpwynne.edu', 'Albuquerque', '87101')")
db.commit()
# These values are now in the database:
# In[21]:
print(db.execute("SELECT * FROM orders").fetchall())
# We can also delete entries from the table again (say, after completion of the order):
# In[22]:
db.execute("DELETE FROM orders WHERE name = 'Walter White'")
db.commit()
# In[23]:
print(db.execute("SELECT * FROM orders").fetchall())
# #### Handling HTTP Requests
#
# We have an order form and a database; now we need a Web server which brings it all together. The Python `http.server` module provides everything we need to build a simple HTTP server. A `HTTPRequestHandler` is an object that takes and processes HTTP requests – in particular, `GET` requests for retrieving Web pages.
# We implement the `do_GET()` method that, based on the given path, branches off to serve the requested Web pages. Requesting the path `/` produces the order form; a path beginning with `/order` sends an order to be processed. All other requests end in a `Page Not Found` message.
# In[24]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def do_GET(self):
try:
# print("GET " + self.path)
if self.path == "/":
self.send_order_form()
elif self.path.startswith("/order"):
self.handle_order()
elif self.path.startswith("/terms"):
self.send_terms_and_conditions()
else:
self.not_found()
except Exception:
self.internal_server_error()
# ##### Order Form
#
# Accessing the home page (i.e. getting the page at `/`) is simple: We go and serve the `html_order_form` as defined above.
# In[25]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def send_order_form(self):
self.send_response(HTTPStatus.OK, "Place your order")
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(HTML_ORDER_FORM.encode("utf8"))
# Likewise, we can send out the terms and conditions:
# In[26]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def send_terms_and_conditions(self):
self.send_response(HTTPStatus.OK, "Terms and Conditions")
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(HTML_TERMS_AND_CONDITIONS.encode("utf8"))
# ##### Processing Orders
# When the user clicks `Submit` on the order form, the Web browser creates and retrieves a URL of the form
#
# ```
# /order?field_1=value_1&field_2=value_2&field_3=value_3
# ```
#
# where each `field_i` is the name of the field in the HTML form, and `value_i` is the value provided by the user. Values use the CGI encoding we have seen in the [chapter on coverage](Coverage.ipynb) – that is, spaces are converted into `+`, and characters that are not digits or letters are converted into `%nn`, where `nn` is the hexadecimal value of the character.
#
# If Jane Doe `` from Seattle orders a T-Shirt, this is the URL the browser creates:
#
# ```
# /order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104
# ```
# When processing a query, the attribute `self.path` of the HTTP request handler holds the path accessed – i.e., everything after ``. The helper method `get_field_values()` takes `self.path` and returns a dictionary of values.
# In[27]:
import urllib.parse
# In[28]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def get_field_values(self):
# Note: this fails to decode non-ASCII characters properly
query_string = urllib.parse.urlparse(self.path).query
# fields is { 'item': ['tshirt'], 'name': ['Jane Doe'], ...}
fields = urllib.parse.parse_qs(query_string, keep_blank_values=True)
values = {}
for key in fields:
values[key] = fields[key][0]
return values
# The method `handle_order()` takes these values from the URL, stores the order, and returns a page confirming the order. If anything goes wrong, it sends an internal server error.
# In[29]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def handle_order(self):
values = self.get_field_values()
self.store_order(values)
self.send_order_received(values)
# Storing the order makes use of the database connection defined above; we create an SQL command instantiated with the values as extracted from the URL.
# In[30]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def store_order(self, values):
db = sqlite3.connect(ORDERS_DB)
# The following should be one line
sql_command = "INSERT INTO orders VALUES ('{item}', '{name}', '{email}', '{city}', '{zip}')".format(**values)
self.log_message("%s", sql_command)
db.executescript(sql_command)
db.commit()
# After storing the order, we send the confirmation HTML page, which again is instantiated with the values from the URL.
# In[31]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def send_order_received(self, values):
# Should use html.escape()
values["item_name"] = FUZZINGBOOK_SWAG[values["item"]]
confirmation = HTML_ORDER_RECEIVED.format(**values).encode("utf8")
self.send_response(HTTPStatus.OK, "Order received")
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(confirmation)
# ##### Other HTTP commands
#
# Besides the `GET` command (which does all the heavy lifting), HTTP servers can also support other HTTP commands; we support the `HEAD` command, which returns the head information of a Web page. In our case, this is always empty.
# In[32]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def do_HEAD(self):
# print("HEAD " + self.path)
self.send_response(HTTPStatus.OK)
self.send_header("Content-type", "text/html")
self.end_headers()
# #### Error Handling
#
# We have defined pages for submitting and processing orders; now we also need a few pages for errors that might occur.
# ##### Page Not Found
#
# This page is displayed if a non-existing page (i.e. anything except `/` or `/order`) is requested.
# In[33]:
HTML_NOT_FOUND = """
Sorry.
This page does not exist. Try our order form instead.
"""
# In[34]:
HTML(HTML_NOT_FOUND)
# The method `not_found()` takes care of sending this out with the appropriate HTTP status code.
# In[35]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def not_found(self):
self.send_response(HTTPStatus.NOT_FOUND, "Not found")
self.send_header("Content-type", "text/html")
self.end_headers()
message = HTML_NOT_FOUND
self.wfile.write(message.encode("utf8"))
# ##### Internal Errors
#
# This page is shown for any internal errors that might occur. For diagnostic purposes, we have it include the traceback of the failing function.
# In[36]:
HTML_INTERNAL_SERVER_ERROR = """
Internal Server Error
The server has encountered an internal error. Go to our order form.
{error_message}
"""
# In[37]:
HTML(HTML_INTERNAL_SERVER_ERROR)
# In[38]:
import sys
import traceback
# In[39]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def internal_server_error(self):
self.send_response(HTTPStatus.INTERNAL_SERVER_ERROR, "Internal Error")
self.send_header("Content-type", "text/html")
self.end_headers()
exc = traceback.format_exc()
self.log_message("%s", exc.strip())
message = HTML_INTERNAL_SERVER_ERROR.format(error_message=exc)
self.wfile.write(message.encode("utf8"))
# #### Logging
#
# Our server runs as a separate process in the background, waiting to receive commands at all time. To see what it is doing, we implement a special logging mechanism. The `httpd_message_queue` establishes a queue into which one process (the server) can store Python objects, and in which another process (the notebook) can retrieve them. We use this to pass log messages from the server, which we can then display in the notebook.
# For multiprocessing, we use the `multiprocess` module - a variant of the standard Python `multiprocessing` module that also works in notebooks. If you are running this code outside a notebook, you can also use `multiprocessing` instead.
# In[40]:
from multiprocess import Queue # type: ignore
# In[41]:
HTTPD_MESSAGE_QUEUE = Queue()
# Let us place two messages in the queue:
# In[42]:
HTTPD_MESSAGE_QUEUE.put("I am another message")
# In[43]:
HTTPD_MESSAGE_QUEUE.put("I am one more message")
# To distinguish server messages from other parts of the notebook, we format them specially:
# In[44]:
from bookutils import rich_output, terminal_escape
# In[45]:
def display_httpd_message(message: str) -> None:
if rich_output():
display(
HTML(
'
' +
message +
"
"))
else:
print(terminal_escape(message))
# In[46]:
display_httpd_message("I am a httpd server message")
# The method `print_httpd_messages()` prints all messages accumulated in the queue so far:
# In[47]:
def print_httpd_messages():
while not HTTPD_MESSAGE_QUEUE.empty():
message = HTTPD_MESSAGE_QUEUE.get()
display_httpd_message(message)
# In[48]:
import time
# In[49]:
time.sleep(1)
print_httpd_messages()
# With `clear_httpd_messages()`, we can silently discard all pending messages:
# In[50]:
def clear_httpd_messages() -> None:
while not HTTPD_MESSAGE_QUEUE.empty():
HTTPD_MESSAGE_QUEUE.get()
# The method `log_message()` in the request handler makes use of the queue to store its messages:
# In[51]:
class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):
def log_message(self, format: str, *args) -> None:
message = ("%s - - [%s] %s\n" %
(self.address_string(),
self.log_date_time_string(),
format % args))
HTTPD_MESSAGE_QUEUE.put(message)
# In [the chapter on carving](Carver.ipynb), we had introduced a `webbrowser()` method which retrieves the contents of the given URL. We now extend it such that it also prints out any log messages produced by the server:
# In[52]:
import requests
# In[53]:
def webbrowser(url: str, mute: bool = False) -> str:
"""Download and return the http/https resource given by the URL"""
try:
r = requests.get(url)
contents = r.text
finally:
if not mute:
print_httpd_messages()
else:
clear_httpd_messages()
return contents
# With `webbrowser()`, we are now ready to get the Web server up and running.
# ### End of Excursion
# ### Running the Server
#
# We run the server on the *local host* – that is, the same machine which also runs this notebook. We check for an accessible port and put the resulting URL in the queue created earlier.
# In[54]:
def run_httpd_forever(handler_class: type) -> NoReturn: # type: ignore
host = "127.0.0.1" # localhost IP
for port in range(8800, 9000):
httpd_address = (host, port)
try:
httpd = HTTPServer(httpd_address, handler_class)
break
except OSError:
continue
httpd_url = "http://" + host + ":" + repr(port)
HTTPD_MESSAGE_QUEUE.put(httpd_url)
httpd.serve_forever()
# The function `start_httpd()` starts the server in a separate process, which we start using the `multiprocess` module. It retrieves its URL from the message queue and returns it, such that we can start talking to the server.
# In[55]:
from multiprocess import Process
# In[56]:
def start_httpd(handler_class: type = SimpleHTTPRequestHandler) \
-> Tuple[Process, str]:
clear_httpd_messages()
httpd_process = Process(target=run_httpd_forever, args=(handler_class,))
httpd_process.start()
httpd_url = HTTPD_MESSAGE_QUEUE.get()
return httpd_process, httpd_url
# Let us now start the server and save its URL:
# In[57]:
httpd_process, httpd_url = start_httpd()
httpd_url
# ### Interacting with the Server
#
# Let us now access the server just created.
# #### Direct Browser Access
#
# If you are running the Jupyter notebook server on the local host as well, you can now access the server directly at the given URL. Simply open the address in `httpd_url` by clicking on the link below.
#
# **Note**: This only works if you are running the Jupyter notebook server on the local host.
# In[58]:
def print_url(url: str) -> None:
if rich_output():
display(HTML('
' % (url, url)))
else:
print(terminal_escape(url))
# In[59]:
print_url(httpd_url)
# Even more convenient, you may be able to interact directly with the server using the window below.
#
# **Note**: This only works if you are running the Jupyter notebook server on the local host.
# In[60]:
from IPython.display import IFrame
# In[61]:
IFrame(httpd_url, '100%', 230)
# After interaction, you can retrieve the messages produced by the server:
# In[62]:
print_httpd_messages()
# We can also see any orders placed in the `orders` database (`db`):
# In[63]:
print(db.execute("SELECT * FROM orders").fetchall())
# And we can clear the order database:
# In[64]:
db.execute("DELETE FROM orders")
db.commit()
# #### Retrieving the Home Page
#
# Even if our browser cannot directly interact with the server, the _notebook_ can. We can, for instance, retrieve the contents of the home page and display them:
# In[65]:
contents = webbrowser(httpd_url)
# In[66]:
HTML(contents)
# #### Placing Orders
#
# To test this form, we can generate URLs with orders and have the server process them.
# The method `urljoin()` puts together a base URL (i.e., the URL of our server) and a path – say, the path towards our order.
# In[67]:
from urllib.parse import urljoin, urlsplit
# In[68]:
urljoin(httpd_url, "/order?foo=bar")
# With `urljoin()`, we can create a full URL that is the same as the one generated by the browser as we submit the order form. Sending this URL to the browser effectively places the order, as we can see in the server log produced:
# In[69]:
contents = webbrowser(urljoin(httpd_url,
"/order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104"))
# The web page returned confirms the order:
# In[70]:
HTML(contents)
# And the order is in the database, too:
# In[71]:
print(db.execute("SELECT * FROM orders").fetchall())
# #### Error Messages
#
# We can also test whether the server correctly responds to invalid requests. Nonexistent pages, for instance, are correctly handled:
# In[72]:
HTML(webbrowser(urljoin(httpd_url, "/some/other/path")))
# You may remember we also have a page for internal server errors. Can we get the server to produce this page? To find this out, we have to test the server thoroughly – which we do in the remainder of this chapter.
# ## Fuzzing Input Forms
#
# After setting up and starting the server, let us now go and systematically test it – first with expected, and then with less expected values.
# ### Fuzzing with Expected Values
#
# Since placing orders is all done by creating appropriate URLs, we define a [grammar](Grammars.ipynb) `ORDER_GRAMMAR` which encodes ordering URLs. It comes with a few sample values for names, email addresses, cities and (random) digits.
# #### Excursion: Implementing cgi_decode()
# To make it easier to define strings that become part of a URL, we define the function `cgi_encode()`, taking a string and automatically encoding it into CGI:
# In[73]:
import string
# In[74]:
def cgi_encode(s: str, do_not_encode: str = "") -> str:
ret = ""
for c in s:
if (c in string.ascii_letters or c in string.digits
or c in "$-_.+!*'()," or c in do_not_encode):
ret += c
elif c == ' ':
ret += '+'
else:
ret += "%%%02x" % ord(c)
return ret
# In[75]:
s = cgi_encode('Is "DOW30" down .24%?')
s
# The optional parameter `do_not_encode` allows us to skip certain characters from encoding. This is useful when encoding grammar rules:
# In[76]:
cgi_encode("@", "<>")
# `cgi_encode()` is the exact counterpart of the `cgi_decode()` function defined in the [chapter on coverage](Coverage.ipynb):
# In[77]:
from Coverage import cgi_decode # minor dependency
# In[78]:
cgi_decode(s)
# #### End of Excursion
# In the grammar, we make use of `cgi_encode()` to encode strings:
# In[79]:
from Grammars import crange, is_valid_grammar, syntax_diagram, Grammar
# In[80]:
ORDER_GRAMMAR: Grammar = {
"": [""],
"": ["/order?item=&name=&email=&city=&zip="],
"": ["tshirt", "drill", "lockset"],
"": [cgi_encode("Jane Doe"), cgi_encode("John Smith")],
"": [cgi_encode("j.doe@example.com"), cgi_encode("j_smith@example.com")],
"": ["Seattle", cgi_encode("New York")],
"": ["" * 5],
"": crange('0', '9')
}
# In[81]:
assert is_valid_grammar(ORDER_GRAMMAR)
# In[82]:
syntax_diagram(ORDER_GRAMMAR)
# Using [one of our grammar fuzzers](GrammarFuzzer.iynb), we can instantiate this grammar and generate URLs:
# In[83]:
from GrammarFuzzer import GrammarFuzzer
# In[84]:
order_fuzzer = GrammarFuzzer(ORDER_GRAMMAR)
[order_fuzzer.fuzz() for i in range(5)]
# Sending these URLs to the server will have them processed correctly:
# In[85]:
HTML(webbrowser(urljoin(httpd_url, order_fuzzer.fuzz())))
# In[86]:
print(db.execute("SELECT * FROM orders").fetchall())
# ### Fuzzing with Unexpected Values
# We can now see that the server does a good job when faced with "standard" values. But what happens if we feed it non-standard values? To this end, we make use of a [mutation fuzzer](MutationFuzzer.ipynb) which inserts random changes into the URL. Our seed (i.e. the value to be mutated) comes from the grammar fuzzer:
# In[87]:
seed = order_fuzzer.fuzz()
seed
# Mutating this string yields mutations not only in the field values, but also in field names as well as the URL structure.
# In[88]:
from MutationFuzzer import MutationFuzzer # minor deoendency
# In[89]:
mutate_order_fuzzer = MutationFuzzer([seed], min_mutations=1, max_mutations=1)
[mutate_order_fuzzer.fuzz() for i in range(5)]
# Let us fuzz a little until we get an internal server error. We use the Python `requests` module to interact with the Web server such that we can directly access the HTTP status code.
# In[90]:
while True:
path = mutate_order_fuzzer.fuzz()
url = urljoin(httpd_url, path)
r = requests.get(url)
if r.status_code == HTTPStatus.INTERNAL_SERVER_ERROR:
break
# That didn't take long. Here's the offending URL:
# In[91]:
url
# In[92]:
clear_httpd_messages()
HTML(webbrowser(url))
# How does the URL cause this internal error? We make use of [delta debugging](Reducer.ipynb) to minimize the failure-inducing path, setting up a `WebRunner` class to define the failure condition:
# In[93]:
failing_path = path
failing_path
# In[94]:
from Fuzzer import Runner
# In[95]:
class WebRunner(Runner):
"""Runner for a Web server"""
def __init__(self, base_url: Optional[str] = None):
self.base_url = base_url
def run(self, url: str) -> Tuple[str, str]:
if self.base_url is not None:
url = urljoin(self.base_url, url)
import requests # for imports
r = requests.get(url)
if r.status_code == HTTPStatus.OK:
return url, Runner.PASS
elif r.status_code == HTTPStatus.INTERNAL_SERVER_ERROR:
return url, Runner.FAIL
else:
return url, Runner.UNRESOLVED
# In[96]:
web_runner = WebRunner(httpd_url)
web_runner.run(failing_path)
# This is the minimized path:
# In[97]:
from Reducer import DeltaDebuggingReducer # minor
# In[98]:
minimized_path = DeltaDebuggingReducer(web_runner).reduce(failing_path)
minimized_path
# It turns out that our server encounters an internal error if we do not supply the requested fields:
# In[99]:
minimized_url = urljoin(httpd_url, minimized_path)
minimized_url
# In[100]:
clear_httpd_messages()
HTML(webbrowser(minimized_url))
# We see that we might have a lot to do to make our Web server more robust against unexpected inputs. The [exercises](#Exercises) give some instructions on what to do.
# ## Extracting Grammars for Input Forms
#
# In our previous examples, we have assumed that we have a grammar that produces valid (or less valid) order queries. However, such a grammar does not need to be specified manually; we can also _extract it automatically_ from a Web page at hand. This way, we can apply our test generators on arbitrary Web forms without a manual specification step.
# ### Searching HTML for Input Fields
#
# The key idea of our approach is to identify all input fields in a form. To this end, let us take a look at how the individual elements in our order form are encoded in HTML:
# In[101]:
html_text = webbrowser(httpd_url)
print(html_text[html_text.find("")])
# We see that there is a number of form elements that accept inputs, in particular ``, but also `