Backtesting A Trading Strategy Part 2

How To Retrieve S&P Constituents Historical Data Using Python

Photo by the author

Introduction

Backtesting is a tool to measure the performance of a trading strategy using historical data. The backtesting process consists of three parts: 1. determining the universe of securities where we will invest in (e.g. equity or fixed income? US or emerging markets?); 2. gathering historical data for the universe of securities; and 3. implementing a trading strategy using the historical data collected.

In the previous article, I illustrated the first step in the backtesting process of determining the universe of stocks, namely the S&P 500, S&P MidCap 400 and S&P SmallCap 600 indices. In this article, I will discuss the second step of the backtesting process of collecting historical data for each constituent of the universe of stocks.

Retrieving S&P Constituents Historical Data

Step By Step

  1. Load the S&P tickers which were gathered from the previous article.
  2. Collect the S&P constituents’ 5-year historical data using Python package pandas-datareader from the Investors Exchange (IEX).

You can find the code below on https://github.com/DinodC/backtesting-trading-strategy.

Import packages

import pandas as pd
from pandas import Series, DataFrame
import pickle
import pandas_datareader.data as web

S&P Constituents Tickers

In this section, we load the lists pickled from the last article.

Set an id for each index

id = ['sp500', 'sp400', 'sp600']

Create a dictionary to map each id to a tickers file

input_file = {'sp500': 'sp500_barchart.pickle',
              'sp400': 'sp400_barchart.pickle', 
              'sp600': 'sp600_barchart.pickle'} 

Define a dictionary to map each id to a tickers list

sp500_tickers = []
sp400_tickers = []
sp600_tickers = []
sp_tickers = {'sp500': sp500_tickers,
              'sp400': sp400_tickers,
              'sp600': sp600_tickers}

Fill the tickers lists

for i in input_file:
    with open(input_file[i], 'rb') as f:

        # Update tickers list        
        sp_tickers[i] = pickle.load(f)

        # Sort tickers list
        sp_tickers[i].sort()

    f.close()

S&P Constituents Historical Data

Define dictionary of historical data

sp500_data = pd.DataFrame()
sp400_data = pd.DataFrame()
sp600_data = pd.DataFrame()
sp_data = {'sp500': sp500_data,
           'sp400': sp400_data,
           'sp600': sp600_data}

Set the start and date of the historical data

start_date = '2014-01-01'
end_date = '2020-01-01'

Set the source Investors Exchange(IEX) to be used

source = 'iex'

Create a dictionary to map each id to an output file

output_file = {'sp500': 'sp500_data.pickle',
               'sp400': 'sp400_data.pickle',
               'sp600': 'sp600_data.pickle'}

Retrieve historical data for each constituent of each S&P index

for i in output_file:

    # Retrieve historical data 
    # Note that we set number of tickers to < 100 because DataReader gives error when number of tickers > 100
    data1 = web.DataReader(sp_tickers[i][:98], source, start_date, end_date)
    data2 = web.DataReader(sp_tickers[i][98:198], source, start_date, end_date)
    data3 = web.DataReader(sp_tickers[i][198:298], source, start_date, end_date)
    data4 = web.DataReader(sp_tickers[i][298:398], source, start_date, end_date)
    data5 = web.DataReader(sp_tickers[i][398:498], source, start_date, end_date)
    if i == 'sp400':
        # Concatenate historical data
        sp_data[i] = pd.concat([data1, data2, data3, data4, data5], axis=1, sort=True)
    if i == 'sp500':
        data6 = web.DataReader(sp_tickers[i][498:], source, start_date, end_date)
        # Concatenate historical data        
        sp_data[i] = pd.concat([data1, data2, data3, data4, data5, data6], axis=1, sort=True)
    elif i == 'sp600':
        data6 = web.DataReader(sp_tickers[i][498:598], source, start_date, end_date)
        data7 = web.DataReader(sp_tickers[i][598:], source, start_date, end_date)
        # Concatenate historical data
        sp_data[i] = pd.concat([data1, data2, data3, data4, data5, data6, data7], axis=1, sort=True)        
    else:
        pass

    # Convert index to datetime
    sp_data[i].index = pd.to_datetime(sp_data[i].index)

    # Save historical data to file
    with open(output_file[i], 'wb') as f:
        pickle.dump(sp_data[i], f)
    f.close()

Constituents Close Prices

S&P 500 Index

Look at the dimensions of our DataFrame

sp_data['sp500'].close.shape
(1258, 505)

Check the first rows

sp_data['sp500'].close.head()
SymbolsAAALAAPAAPLABBV
date
2014-06-1140.119940.2868125.069486.024945.0769
2014-06-1239.772638.2958123.225284.586044.6031
2014-06-1339.840738.4672123.740783.660345.0187
2014-06-1639.718139.1150124.008684.503544.8857
2014-06-1740.092739.8867124.941184.393545.1351

5 rows × 505 columns

Check the end rows

sp_data['sp500'].close.tail()
SymbolsAAALAAPAAPLABBV
date
2019-06-0467.9529.12154.61179.6476.75
2019-06-0568.3530.36154.61182.5477.06
2019-06-0669.1630.38154.90185.2277.07
2019-06-0769.5230.92155.35190.1577.43
2019-06-1070.2930.76153.52192.5876.95

5 rows × 505 columns

Descriptive stats

sp_data['sp500'].close.describe()
SymbolsAAALAAPAAPLABBV
count1258.0000001258.0000001258.0000001258.0000001258.000000
mean52.01092441.207727145.418612135.09439966.653293
std13.8665776.36623624.12833938.12761617.721243
min32.25860024.53980079.16870082.74380042.066600
25%39.39350036.586125132.519725103.62265053.170800
50%46.09760040.843300150.457700119.47380058.470800
75%65.59185046.119900161.899875167.84120082.894300
max81.94000057.586600199.159900229.392000116.445400

8 rows × 505 columns

S&P MidCap 400 Index

Look at the dimensions of our DataFrame

sp_data['sp400'].close.shape
(1128, 400)

Check the first rows

sp_data['sp400'].close.head()
SymbolsAANACCACHCACIWACM
2014-05-07NaNNaNNaNNaNNaN
2014-05-08NaNNaNNaNNaNNaN
2014-05-09NaNNaNNaNNaNNaN
2014-05-12NaNNaNNaNNaNNaN
2014-05-13NaNNaNNaNNaNNaN

5 rows × 400 columns

Check the end rows

sp_data['sp400'].close.tail()
SymbolsAANACCACHCACIWACM
2019-06-0454.8846.1733.7232.1033.17
2019-06-0554.9746.9733.3032.2633.29
2019-06-0654.7247.0233.2032.4033.18
2019-06-0755.7247.1733.9132.3433.47
2019-06-1059.2346.8133.8132.7533.45

5 rows × 400 columns

Descriptive stats

sp_data['sp400'].close.describe()
SymbolsAANACCACHCACIWACM
count1258.0000001258.0000001258.0000001258.0000001258.000000
mean35.34469639.07400248.80678122.65606032.409610
std9.8250814.94645313.9031604.0657613.178291
min20.11860027.79390024.75000013.61750023.150000
25%25.95767534.71830038.35500019.47000030.370000
50%34.79230039.72790046.52500022.27500032.465000
75%43.25912543.42245059.71750024.24000034.537500
max59.23000048.35330082.97000035.52000040.130000

8 rows × 400 columns

S&P SmallCap 600 Index

Look at the dimensions of our DataFrame

sp_data['sp600'].close.shape
(1116, 601)

Check the first rows

sp_data['sp600'].close.head()
SymbolsAAOIAAONAATAAWWAAXN
2014-06-10NaNNaNNaNNaNNaN
2014-06-1122.3320.769530.107038.0713.91
2014-06-1222.3020.590429.956637.0514.03
2014-06-1321.8620.379229.965437.2913.87
2014-06-1622.7120.571230.027336.9513.78

5 rows × 601 columns

Check the end rows

sp_data['sp600'].close.tail()
SymbolsAAOIAAONAATAAWWAAXN
date
2019-06-049.1947.8045.1139.7067.63
2019-06-059.2348.0846.3938.6268.00
2019-06-069.3048.1846.3937.9967.97
2019-06-079.5048.8546.6139.8869.18
2019-06-109.6048.6446.5140.3071.91

5 rows × 601 columns

Descriptive stats

sp_data['sp600'].close.describe()
SymbolsAAOIAAONAATAAWWAAXN
count1258.0000001258.0000001258.0000001258.0000001258.000000
mean26.52318829.59527237.38307749.43138731.595862
std17.1169137.7630963.59545810.15675815.917380
min8.38000016.28740029.35230031.40000010.500000
25%14.81000022.30587535.38415040.97000022.285000
50%19.84500030.68460037.50855049.15000025.175000
75%33.92750035.47340039.25435056.54250038.340000
max99.61000051.63140046.77000074.00000074.890000

8 rows × 601 columns

Summary

In this article, we retrieved historical data for every constituent in our universe of stocks – the S&P 500, S&P MidCap 400 and S&P SmallCap 600 indices. The 5-year historical data is relatively straightforward to obtain, and is provided for free by the Investors Exchange. In the next article, we implement a simple trading strategy, and backtest it using the historical data collected.

Advertisement

2 thoughts on “Backtesting A Trading Strategy Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s