Backtesting Stock Trading Strategies In Python 🐍
tl;dr If you are not ready to commit endless hours in building an algo trading bot then you might as well just buy and hold 😐
Preface
We've all wanted to create that one trading bot that can consistently make us big money and beat the market. I believe it's possible but one would have to invest an insane amount of time learning, researching, and building something like this. Nonetheless, it's always fun to play around with money so here we go.
What We'll Be Covering
- Getting free historical stock data from Alphavantage
- Working with a backtesting framework in python, Backtesting.py
- Using technical indicators, pandas-ta
- Coding up a couple of Systematic Trading Strategies
Setting Up The env
You don't need to have this, but I used Conda to create a separate environment for this project.
You can go to the next section if you don't want to work with Conda.
Once you've installed it, you can create a new Conda environment. I'm naming the new environment algo
and using python 3.7:
conda create --name algo python=3.7
After the environment is setup we can go inside it like so:
conda activate algo
If you are using Vscode, you'll want to install the extension Python, as well as change your python environment to the new Conda env:
Getting Historical Stock Data
We'll be getting our historical stock data from Alphavantage. Their free tier allows you 5 API requests per minute and a max of 500 requests a day. Plenty for what we need.
First, let's install a couple of libraries that we'll be needing for this.
pip install alpha_vantage pandas python-dotenv
- alpha_vantage, a wrapper around the Alphavantage REST API
- pandas, a popular library use for messing around with data
- python-dotenv, for loading in our secret API keys
For this mini-project, I'll be working in a directory called algo
. Inside there we'll have two files .env
for our environment variables and algo.py
for our code. We'll also want a data/
folder, we'll be storing our historical stock data files here.
Once you've got your API key you can put it in your .env
file:
AV_API_KEY=REPLACE_THIS_WITH_YOUR_API_KEY
To get the stock data, all we really need to do is this:
from alpha_vantage.timeseries import TimeSeries
ts = TimeSeries(key=av_api_key, output_format='csv')
df, _ = ts.get_daily_adjusted(symbol, outputsize='full')
However... we want to be nice developers and not hit the Alphavantage server all the time. So let's save the data locally and we can retrieve it locally when needed.
from os import path, getenv
from dotenv import load_dotenv
import pandas as pd
from alpha_vantage.timeseries import TimeSeries
load_dotenv()
av_api_key = getenv('AV_API_KEY')
def download_daily_data(symbol: str, filepath: str):
ts = TimeSeries(key=av_api_key, output_format='pandas')
rawDf, _ = ts.get_daily_adjusted(symbol, outputsize='full') # pylint: disable=unbalanced-tuple-unpacking
# Doing this to let python know it's a df 🤷♂️
df = pd.DataFrame(rawDf)
# Rename the columns before we save
df.columns = ['open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'dividend_amount', 'split_coefficient']
df.to_csv(filepath)
def get_daily_data(symbol) -> pd.DataFrame:
data_file_path = f'./data/{symbol}.csv'
if not path.exists(data_file_path):
download_daily_data(data_file_path, symbol)
df = pd.read_csv(
data_file_path,
index_col='date',
usecols=['date', 'open', 'high', 'low', 'close', 'adjusted_close', 'volume'],
parse_dates=True,
)
# Rename them so Backtesting.py is happy
df.columns = ['Open', 'High', 'Low', 'Close', 'adjusted_close', 'Volume']
# We reverse the data frame so it goes from oldest to newest
return df[::-1]
if __name__ == '__main__':
symbol = 'spy'
get_daily_data(symbol)
Inside the function get_daily_data
, I've started preparing the data in the way Backtesting.py likes. Backtesting.py expects the following columns: Open
, High
, Low
, Close
, and Volume
.
Note: We can also add additional columns we may want to reference in our strategies. Will come in handly in strategy 2.
Now we're all good right? That's what I thought at first, but there is more prep work we're going to need to do.
Formating Historical Data For Backtesting
You would be good if you are trading Forex or Crypto but in my examples, I'll be trading stocks.
In the stock trading world, there's a concept of stock splits. This framework however doesn't take that into account, so we'll need to adjust all the OHLC.
Lucky for us, Alphavantage provides us with an adjusted_close
value which we can use to scale the OHLC data.
Let's create the new helper function and update our main block to call that instead:
def get_daily_data_adjusted(symbol: str) -> pd.DataFrame:
df = get_daily_data(symbol)
adjusted_df = pd.DataFrame(index=df.index)
# Adjust the OHLC
df['ratio'] = df.adjusted_close / df.Close
adjusted_df['Open'] = df.Open * df.ratio
adjusted_df['High'] = df.High * df.ratio
adjusted_df['Low'] = df.Low * df.ratio
adjusted_df['Close'] = df.adjusted_close
adjusted_df['Volume'] = df.Volume
return adjusted_df
if __name__ == '__main__':
symbol = 'spy'
get_daily_data_adjusted(symbol)
Writing Our First Strategy
In the README.md
file, they have a Simple Moving Average Crossover strategy. We'll be coding up another one for you to reference that's equally as simple.
RSI Strategy
Our first strategy will be using the indicator, Relative Strength Index (RSI). This is a very common indicator. It's a momentum indicator that's used to measure if a particular asset is overbought or oversold. The link I've shared goes more in-depth on how to calculate it but we'll be using the pandas-ta library to help crunch these numbers.
The RSI is a value and people typically use below 30 as an indication that the asset is oversold and above 70 as overbought. So our strategy will simply trade based on this.
Implementing The RSI Strategy
First let's install the backtesting framework along with pandas_ta:
pip install backtesting pandas_ta
Next, import these libraries at the top of our file:
from backtesting import Backtest, Strategy
from pandas_ta import rsi
To create our strategy, we'll have our strategy inherit from Backtesting's Strategy class:
class BasicRsiStrategy(Strategy):
def init(self):
self.rsi = self.I(rsi, self.data.df.Close, length=14)
def next(self):
today = self.rsi[-1]
yesterday = self.rsi[-2]
# Crosses below 30 (oversold, time to buy)
if yesterday > 30 and today < 30 and not self.position.is_long:
self.buy()
# Crosses above 70 (overbought, time to sell)
elif yesterday < 70 and today > 70 and self.position.size > 0:
self.position.close()
The init
function is where we can precompute our technical indicator data. Inside there we've assigned self.rsi
with the RSI indicator. We need to precompile the data via the self.I
function. This function will help us magically pipe the data values as needed in the next
function.
The next
function is ran at each tick (OHLC bar). Since we'll be using daily data, it simulates one trading day at a time.
We also have additional checks, to make sure that we only have one long position (we are not shorting). And when it's time to sell the shares we've bought, we close our long position (selling all the shares we own).
Now let's call our strategy by updating our main block:
if __name__ == '__main__':
symbol = 'spy'
strategy = BasicRsiStrategy
data = get_daily_data_adjusted(symbol)
bt = Backtest(
data=data,
strategy=strategy,
cash=1000,
commission=.002,
exclusive_orders=False,
trade_on_close=False,
)
stats = bt.run()
print(stats)
# Creates a nice HTML file with graphs
bt.plot(
smooth_equity=True,
superimpose=True,
)
This framework uses a % of the trade as commission. This might work for you depending on your broker or the type of asset your trading. Unfortunately for me, my broker charges a flat rate of $10 per trade and this backtesting framework can't simulate that 🤷♂️
We set trade_on_close
to False
to have the backtest perform the trade on the next day's open. For some strategies, you might want to perform on the current day's close instead, in that case just change the value to True
.
RSI Strategy Results
Let's take a look at the stats that was printed:
Start 1999-11-01 00:00:00
End 2021-04-16 00:00:00
Duration 7837 days 00:00:00
Exposure Time [%] 37.618077
Equity Final [$] 1675.520564
Equity Peak [$] 1675.520564
Return [%] 67.552056 <-
Buy & Hold Return [%] 358.976317
Return (Ann.) [%] 2.438276
Volatility (Ann.) [%] 16.252559
Sharpe Ratio 0.150024
Sortino Ratio 0.220649
Calmar Ratio 0.044493
Max. Drawdown [%] -54.801191
Avg. Drawdown [%] -3.833282
Max. Drawdown Duration 2928 days 00:00:00
Avg. Drawdown Duration 129 days 00:00:00
# Trades 18
Win Rate [%] 83.333333 <-
Best Trade [%] 13.096123 <-
Worst Trade [%] -30.365652 <-
Avg. Trade [%] 3.295865
Max. Trade Duration 725 days 00:00:00
Avg. Trade Duration 163 days 00:00:00
Profit Factor 2.617049
Expectancy [%] 3.859875
SQN 1.480562
From what we can tell, RSI isn't a strategy we should use by itself. The returns over the course of 21 years is 67.55%. I mean, it's a good win rate (83%) but we have big losing trades and small winning trades. If we just held SPY we would have gotten a much better, 358%, return.
And the auto generated pretty graph:
The Second Strategy
MA Channel
I came across this strategy from watching David's video at Critical Trading on Youtube. He goes over the traditional Golden Cross and the one we are interested in, the Moving Average (MA) Channel.
His strategy is more complex, allowing for 10 positions, using a scanner for Annual Return of Capital (ROC) ranking and multiple market checks to make sure we are not trading when the market is in a heavy downtrend.
The backtesting framework we are using doesn't allow us to easily do all of this. We are going to simplify it with no scanner for Annual ROC. The framework also only allows for trading 1 asset so we'll only be holding 1 position at a time.
Implementing The Strategy
For our market downtrend indicator we're going to use the S&P 100 data. So we will only buy when the market is in an uptrend (sp100 > sma(sp100)).
We'll need to add the S&P 100's adjusted close into our data frame inorder to have it accessible in our strategy.
Let's create a new function to help us with that:
def add_symbol_adjusted_close(symbol: str, full_df: pd.DataFrame) -> pd.DataFrame:
new_df = get_daily_data(symbol)
close_df = pd.DataFrame(index=new_df.index)
close_df[symbol] = new_df.adjusted_close
return full_df.merge(close_df, left_index=True, right_index=True)
We can use this to pass in our existing data frame along with the new symbol we want to add as a column.
Let's update our main block:
if __name__ == '__main__':
symbol = 'spy'
strategy = MaHighLowChannel # Our new strategy
data = get_daily_data_adjusted(symbol)
data = add_symbol_adjusted_close('oef', data)
# You can add more symbols to the data frame as needed
# Example:
# add_symbol_adjusted_close('aapl', data)
# add_symbol_adjusted_close('tsla', data)
bt = Backtest(
data=data,
strategy=strategy,
cash=1000,
commission=.002,
exclusive_orders=False,
trade_on_close=False,
)
We are adding OEF because that is an asset that tracks the S&P 100 index. You can add additional stocks as your strategy calls for.
Now for the strategy:
class MaHighLowChannel(Strategy):
def init(self):
self.sp100Ma = self.I(sma, self.data.df.oef, length=200, plot=True)
# Moving Average High (mah) and Moving Average Low (mal)
self.mah = self.I(sma, self.data.df.High, length=10, plot=True)
self.mal = self.I(sma, self.data.df.Low, length=10, plot=True)
def should_buy(self):
# If the SP100 close is below the SMA, downtrend
if self.data.df.oef[-1] <= self.sp100Ma[-1]:
return False
for i in range(1, 6):
if self.data.Low[-i] <= self.mah[-i]:
return False
return True
def should_sell(self):
for i in range(1, 6):
if self.data.High[-i] >= self.mal[-i]:
return False
return True
def next(self):
if not self.position.is_long and self.should_buy():
self.buy()
elif self.position.is_long and self.should_sell():
self.position.close()
This strategy is also pretty straightforward. I just wanted to show you an example of how to add more data into your strategy via the data frame.
MA Channel Strategy Results
Start 2000-10-27 00:00:00
End 2021-04-16 00:00:00
Duration 7476 days 00:00:00
Exposure Time [%] 67.210567
Equity Final [$] 2995.257031
Equity Peak [$] 2995.257031
Return [%] 199.525703 <-
Buy & Hold Return [%] 342.323678
Return (Ann.) [%] 5.516882
Volatility (Ann.) [%] 10.779165
Sharpe Ratio 0.51181
Sortino Ratio 0.741254
Calmar Ratio 0.274589
Max. Drawdown [%] -20.09138
Avg. Drawdown [%] -1.580463
Max. Drawdown Duration 1377 days 00:00:00
Avg. Drawdown Duration 42 days 00:00:00
# Trades 24
Win Rate [%] 70.833333 <-
Best Trade [%] 32.135879 <-
Worst Trade [%] -10.8418 <-
Avg. Trade [%] 4.941294
Max. Trade Duration 690 days 00:00:00
Avg. Trade Duration 208 days 00:00:00
Profit Factor 3.513146
Expectancy [%] 5.4784
SQN 2.499001
The MA Channel is overall better than the RSI Strategy. The win rate is a bit lower but we hold onto our winning trades longer. But this still isn't better than buying and holding.
Next Steps
There are still a lot more things I want to try for Systematic Trading:
- Using the Annual ROC ranking David suggested
- Take multiple smaller positions
- Trade multiple assets
- Utilize different strategies depending on market conditions (ex: downtrend vs uptrend)
- Test out more backtesting frameworks
- Just coding up a 💩 ton more strategies
Conclusion
If what they say is true, stonks only go up, then it'll be quite tough to beat the buy and hold strategy. Assuming you pick the right stock though 🌚