This tutorial shows some of the features of backtesting.py, yet another Python package for backtesting trading strategies.
Firstly, what backtesting.py is not: It is not a data source — you bring your own data. It does not support strategies that rely on multiple orders, hedging, position sizing, or multi-asset portfolio rebalancing. Instead, backtesting.py works with a single asset at a time, a single position at a time (long or short), and the position size is (as yet) non-adjustable, corresponding to 100% of available funds. Backtesting.py is not aware of order types and does not properly simulate, nor can be connected to, a broker.
As a trade-off, backtesting.py is a blazing fast, small and lightweight backtesting library that uses state-of-the-art Python data structures and procedures. The entire API easily fits into memory banks of a single human individual. It's best suited for optimizing position entrance and exit strategies, decisions upon values of technical indicators, and it's also a versatile interactive trading strategy visualization tool.
You bring your own data. Backtesting ingests data as a pandas.DataFrame with columns 'Open'
, 'High'
, 'Low'
, 'Close'
, and (optionally) 'Volume'
. Such data is easily obtainable (see e.g.
pandas-datareader,
Quandl,
findatapy, ...).
Your data frames can have other columns, but these are necessary.
DataFrame should ideally be indexed with a datetime index (convert it with pd.to_datetime()
), otherwise a simple range index will do.
# Example OHLC data for Google Inc.
from backtesting.test import GOOG
GOOG.tail()
/home/jk/Documents/projects/trading/python/notebooks/venv/lib/python3.5/site-packages/backtesting/backtesting.py:26: UserWarning: Using tqdm in Jupyter Notebook mode. Raise an issue if you experience problems. warnings.warn('Using tqdm in Jupyter Notebook mode. ' /home/jk/Documents/projects/trading/python/notebooks/venv/lib/python3.5/site-packages/backtesting/_plotting.py:34: UserWarning: Jupyter Notebook detected. Setting Bokeh output to notebook. This may not work in Jupyter clients without JavaScript support (e.g. PyCharm, Spyder IDE). Reset with `bokeh.io.reset_output()`. warnings.warn('Jupyter Notebook detected. '
Open | High | Low | Close | Volume | |
---|---|---|---|---|---|
2013-02-25 | 802.30 | 808.41 | 790.49 | 790.77 | 2303900 |
2013-02-26 | 795.00 | 795.95 | 784.40 | 790.13 | 2202500 |
2013-02-27 | 794.80 | 804.75 | 791.11 | 799.78 | 2026100 |
2013-02-28 | 801.10 | 806.99 | 801.03 | 801.20 | 2265800 |
2013-03-01 | 797.80 | 807.14 | 796.15 | 806.19 | 2175400 |
Let's create our first strategy to backtest on these Google data. Let it be a simple moving average (MA) cross-over strategy.
Backtesting.py doesn't contain its own set of technical indicators. In practice, the user should probably use functions from their favorite indicator library, such as TA-Lib, Tulipy, PyAlgoTrade, ... But for this example, we define a simple helper moving average function.
import pandas as pd
def SMA(values, n):
"""
Return simple moving average of `values`, at
each step taking into account `n` previous values.
"""
return pd.Series(values).rolling(n).mean()
Note, this is the exact same helper function as the one used in the project unit tests, so we could just import that instead.
from backtesting.test import SMA
A custom strategy needs to extend backtesting.Strategy
class and override its two methods:
init()
and
next()
.
Method init()
is invoked at the beginning, before the strategy is run. Within it, one ideally precomputes in efficient, vectorized fashion whatever indicators and signals the strategy depends on.
Method next()
is iteratively called by the backtest instance, once for each data point (data frame row), simulating the incremental availability of each new full candlestick bar. Note, backtesting.py cannot make decisions / trades within candlesticks — any trade is executed on the next candle's open (or the current candle's close, see
Backtest(trade_on_close)
.
If you need to trade within candlesticks (e.g. daytrading), instead begin with more fine-grained (e.g. hourly) data.
from backtesting import Strategy
from backtesting.lib import crossover
class SmaCross(Strategy):
# Define the two MA lags as *class variables*
# for later optimization
n1 = 10
n2 = 20
def init(self):
# Precompute two moving averages
self.sma1 = self.I(SMA, self.data.Close, self.n1)
self.sma2 = self.I(SMA, self.data.Close, self.n2)
def next(self):
# If sma1 crosses above sma2, buy the asset
if crossover(self.sma1, self.sma2):
self.buy()
# Else, if sma1 crosses below sma2, sell it
elif crossover(self.sma2, self.sma1):
self.sell()
In init()
as well as in next()
, the data the strategy is simulated on is available as an instance variable
self.data
.
In init()
, we compute indicators indirectly by wrapping them in
self.I()
.
The wrapper is passed a function (here, our SMA
function) along with any arguments to call it with (here, our close values and the MA lag). Indicators wrapped in this way will be plotted, and their names, intelligently inferred, will appear in the plot legend.
In next()
, we simply check if the faster moving average just crossed over the slower one. If it did and upwards, we go long; if it did and downwards, we close any open long position and go short. Note, there is no position size to adjust; Backtesting.py assumes maximal possible position. We use
backtesting.lib.crossover()
function instead of writing more obscure and confusing conditions, such as:
Ugh!
In init()
, the whole series of points was available, whereas in next()
, the length of self.data
and any declared indicator arrays is adjusted on each next()
call so that array[-1]
(e.g. self.data.Close[-1]
or self.sma1[-1]
) always contains the most recent value, array[-2]
the previous value, etc. (ordinary Python indexing of ascending-sorted 1D arrays).
Note: self.data
and any indicators wrapped with self.I
(e.g. self.sma1
) are NumPy arrays for performance reasons. If you need pandas.Series
, use .to_series()
method (e.g. self.data.Close.to_series()
) or construct the series manually (e.g. pd.Series(self.data.Close, index=self.data.index)
).
Let's see how our strategy performs on historical Google data. We begin with 10,000 units of cash and set broker's commission to realistic 0.2%.
from backtesting import Backtest
bt = Backtest(GOOG, SmaCross, cash=10000, commission=.002)
bt.run()
Start 2004-08-19 00:00:00 End 2013-03-01 00:00:00 Duration 3116 days 00:00:00 Exposure [%] 94.29 Equity Final [$] 69665.12 Equity Peak [$] 69722.15 Return [%] 596.65 Buy & Hold Return [%] 703.46 Max. Drawdown [%] -33.61 Avg. Drawdown [%] -5.68 Max. Drawdown Duration 689 days 00:00:00 Avg. Drawdown Duration 41 days 00:00:00 # Trades 93 Win Rate [%] 53.76 Best Trade [%] 56.98 Worst Trade [%] -17.03 Avg. Trade [%] 2.44 Max. Trade Duration 121 days 00:00:00 Avg. Trade Duration 32 days 00:00:00 Expectancy [%] 6.92 SQN 1.77 Sharpe Ratio 0.22 Sortino Ratio 0.54 Calmar Ratio 0.07 _strategy SmaCross dtype: object
The
Backtest
instance is initialized with data and a strategy class (see API reference for additional options).
When
Backtest.run()
method is called, it returns a pandas Series of simulation results and statistics associated with our strategy. We see that this simple strategy makes 600% return in the period of 9 years, with maximal drawdown 33%, and with longest drawdown period spanning almost two years ...
Backtest.plot()
method provides the same results in a more visual form.
bt.plot()
We hard-coded the two lag parameters (n1
and n2
) into our strategy above. However, the strategy may work better with 15–30 or some other cross-over. We define the parameters as optimizable by making them class variables.
We optimize the two parameters by calling
Backtest.optimize()
method with each parameter a keyword argument pointing to its pool of values to test. Parameter n1
is tested for values in range between 5 and 30 and parameter n2
for values between 10 and 70, respectively. Some combinations of values of the two parameters are invalid, i.e. n1
should not be larger than or equal to n2
. We limit admissible parameter combinations with an ad hoc constraint function, which returns True
(admissible) whenever n1
is less than n2
. Additionally, we search for such parameter combination that maximizes return over the observed period. We could instead choose to optimize any key from the returned stats
series.
%%time
stats = bt.optimize(n1=range(5, 30, 5),
n2=range(10, 70, 5),
maximize='Equity Final [$]',
constraint=lambda p: p.n1 < p.n2)
HBox(children=(IntProgress(value=0, max=9), HTML(value='')))
CPU times: user 175 ms, sys: 59.5 ms, total: 234 ms Wall time: 1.33 s
stats
Start 2004-08-19 00:00:00 End 2013-03-01 00:00:00 Duration 3116 days 00:00:00 Exposure [%] 98.14 Equity Final [$] 106429.70 Equity Peak [$] 109515.30 Return [%] 964.30 Buy & Hold Return [%] 703.46 Max. Drawdown [%] -43.98 Avg. Drawdown [%] -5.70 Max. Drawdown Duration 690 days 00:00:00 Avg. Drawdown Duration 36 days 00:00:00 # Trades 152 Win Rate [%] 51.32 Best Trade [%] 60.81 Worst Trade [%] -20.80 Avg. Trade [%] 1.90 Max. Trade Duration 83 days 00:00:00 Avg. Trade Duration 21 days 00:00:00 Expectancy [%] 5.97 SQN 1.51 Sharpe Ratio 0.19 Sortino Ratio 0.49 Calmar Ratio 0.04 _strategy SmaCross(n1=10,n2=15) dtype: object
We can look into stats._strategy
to access the Strategy instance and its optimal parameter values (10 and 15).
bt.plot()
Strategy optimization managed to up its initial performance on in-sample data by almost 70% and beat buy & hold. In real life, however, always take steps to avoid overfitting before putting real money at risk.
Learn more by exploring further examples or find more program options in the full API reference.