How To Retrieve S&P Constituents Historical Data Using Python

Introduction
Backtesting is a tool to measure the performance of a trading strategy using historical data. The backtesting process consists of three parts: 1. determining the universe of securities where we will invest in (e.g. equity or fixed income? US or emerging markets?); 2. gathering historical data for the universe of securities; and 3. implementing a trading strategy using the historical data collected.
In the previous article, I illustrated the first step in the backtesting process of determining the universe of stocks, namely the S&P 500, S&P MidCap 400 and S&P SmallCap 600 indices. In this article, I will discuss the second step of the backtesting process of collecting historical data for each constituent of the universe of stocks.
Retrieving S&P Constituents Historical Data
Step By Step
- Load the S&P tickers which were gathered from the previous article.
- Collect the S&P constituents’ 5-year historical data using Python package pandas-datareader from the Investors Exchange (IEX).
You can find the code below on https://github.com/DinodC/backtesting-trading-strategy.
Import packages
import pandas as pd
from pandas import Series, DataFrame
import pickle
import pandas_datareader.data as web
S&P Constituents Tickers
In this section, we load the lists pickled from the last article.
Set an id for each index
id = ['sp500', 'sp400', 'sp600']
Create a dictionary to map each id to a tickers file
input_file = {'sp500': 'sp500_barchart.pickle',
'sp400': 'sp400_barchart.pickle',
'sp600': 'sp600_barchart.pickle'}
Define a dictionary to map each id to a tickers list
sp500_tickers = []
sp400_tickers = []
sp600_tickers = []
sp_tickers = {'sp500': sp500_tickers,
'sp400': sp400_tickers,
'sp600': sp600_tickers}
Fill the tickers lists
for i in input_file:
with open(input_file[i], 'rb') as f:
# Update tickers list
sp_tickers[i] = pickle.load(f)
# Sort tickers list
sp_tickers[i].sort()
f.close()
S&P Constituents Historical Data
Define dictionary of historical data
sp500_data = pd.DataFrame()
sp400_data = pd.DataFrame()
sp600_data = pd.DataFrame()
sp_data = {'sp500': sp500_data,
'sp400': sp400_data,
'sp600': sp600_data}
Set the start and date of the historical data
start_date = '2014-01-01'
end_date = '2020-01-01'
Set the source Investors Exchange(IEX) to be used
source = 'iex'
Create a dictionary to map each id to an output file
output_file = {'sp500': 'sp500_data.pickle',
'sp400': 'sp400_data.pickle',
'sp600': 'sp600_data.pickle'}
Retrieve historical data for each constituent of each S&P index
for i in output_file:
# Retrieve historical data
# Note that we set number of tickers to < 100 because DataReader gives error when number of tickers > 100
data1 = web.DataReader(sp_tickers[i][:98], source, start_date, end_date)
data2 = web.DataReader(sp_tickers[i][98:198], source, start_date, end_date)
data3 = web.DataReader(sp_tickers[i][198:298], source, start_date, end_date)
data4 = web.DataReader(sp_tickers[i][298:398], source, start_date, end_date)
data5 = web.DataReader(sp_tickers[i][398:498], source, start_date, end_date)
if i == 'sp400':
# Concatenate historical data
sp_data[i] = pd.concat([data1, data2, data3, data4, data5], axis=1, sort=True)
if i == 'sp500':
data6 = web.DataReader(sp_tickers[i][498:], source, start_date, end_date)
# Concatenate historical data
sp_data[i] = pd.concat([data1, data2, data3, data4, data5, data6], axis=1, sort=True)
elif i == 'sp600':
data6 = web.DataReader(sp_tickers[i][498:598], source, start_date, end_date)
data7 = web.DataReader(sp_tickers[i][598:], source, start_date, end_date)
# Concatenate historical data
sp_data[i] = pd.concat([data1, data2, data3, data4, data5, data6, data7], axis=1, sort=True)
else:
pass
# Convert index to datetime
sp_data[i].index = pd.to_datetime(sp_data[i].index)
# Save historical data to file
with open(output_file[i], 'wb') as f:
pickle.dump(sp_data[i], f)
f.close()
Constituents Close Prices
S&P 500 Index
Look at the dimensions of our DataFrame
sp_data['sp500'].close.shape
(1258, 505)
Check the first rows
sp_data['sp500'].close.head()
Symbols | A | AAL | AAP | AAPL | ABBV |
---|---|---|---|---|---|
date | |||||
2014-06-11 | 40.1199 | 40.2868 | 125.0694 | 86.0249 | 45.0769 |
2014-06-12 | 39.7726 | 38.2958 | 123.2252 | 84.5860 | 44.6031 |
2014-06-13 | 39.8407 | 38.4672 | 123.7407 | 83.6603 | 45.0187 |
2014-06-16 | 39.7181 | 39.1150 | 124.0086 | 84.5035 | 44.8857 |
2014-06-17 | 40.0927 | 39.8867 | 124.9411 | 84.3935 | 45.1351 |
5 rows × 505 columns
Check the end rows
sp_data['sp500'].close.tail()
Symbols | A | AAL | AAP | AAPL | ABBV |
---|---|---|---|---|---|
date | |||||
2019-06-04 | 67.95 | 29.12 | 154.61 | 179.64 | 76.75 |
2019-06-05 | 68.35 | 30.36 | 154.61 | 182.54 | 77.06 |
2019-06-06 | 69.16 | 30.38 | 154.90 | 185.22 | 77.07 |
2019-06-07 | 69.52 | 30.92 | 155.35 | 190.15 | 77.43 |
2019-06-10 | 70.29 | 30.76 | 153.52 | 192.58 | 76.95 |
5 rows × 505 columns
Descriptive stats
sp_data['sp500'].close.describe()
Symbols | A | AAL | AAP | AAPL | ABBV |
---|---|---|---|---|---|
count | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 |
mean | 52.010924 | 41.207727 | 145.418612 | 135.094399 | 66.653293 |
std | 13.866577 | 6.366236 | 24.128339 | 38.127616 | 17.721243 |
min | 32.258600 | 24.539800 | 79.168700 | 82.743800 | 42.066600 |
25% | 39.393500 | 36.586125 | 132.519725 | 103.622650 | 53.170800 |
50% | 46.097600 | 40.843300 | 150.457700 | 119.473800 | 58.470800 |
75% | 65.591850 | 46.119900 | 161.899875 | 167.841200 | 82.894300 |
max | 81.940000 | 57.586600 | 199.159900 | 229.392000 | 116.445400 |
8 rows × 505 columns
S&P MidCap 400 Index
Look at the dimensions of our DataFrame
sp_data['sp400'].close.shape
(1128, 400)
Check the first rows
sp_data['sp400'].close.head()
Symbols | AAN | ACC | ACHC | ACIW | ACM |
---|---|---|---|---|---|
2014-05-07 | NaN | NaN | NaN | NaN | NaN |
2014-05-08 | NaN | NaN | NaN | NaN | NaN |
2014-05-09 | NaN | NaN | NaN | NaN | NaN |
2014-05-12 | NaN | NaN | NaN | NaN | NaN |
2014-05-13 | NaN | NaN | NaN | NaN | NaN |
5 rows × 400 columns
Check the end rows
sp_data['sp400'].close.tail()
Symbols | AAN | ACC | ACHC | ACIW | ACM |
---|---|---|---|---|---|
2019-06-04 | 54.88 | 46.17 | 33.72 | 32.10 | 33.17 |
2019-06-05 | 54.97 | 46.97 | 33.30 | 32.26 | 33.29 |
2019-06-06 | 54.72 | 47.02 | 33.20 | 32.40 | 33.18 |
2019-06-07 | 55.72 | 47.17 | 33.91 | 32.34 | 33.47 |
2019-06-10 | 59.23 | 46.81 | 33.81 | 32.75 | 33.45 |
5 rows × 400 columns
Descriptive stats
sp_data['sp400'].close.describe()
Symbols | AAN | ACC | ACHC | ACIW | ACM |
---|---|---|---|---|---|
count | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 |
mean | 35.344696 | 39.074002 | 48.806781 | 22.656060 | 32.409610 |
std | 9.825081 | 4.946453 | 13.903160 | 4.065761 | 3.178291 |
min | 20.118600 | 27.793900 | 24.750000 | 13.617500 | 23.150000 |
25% | 25.957675 | 34.718300 | 38.355000 | 19.470000 | 30.370000 |
50% | 34.792300 | 39.727900 | 46.525000 | 22.275000 | 32.465000 |
75% | 43.259125 | 43.422450 | 59.717500 | 24.240000 | 34.537500 |
max | 59.230000 | 48.353300 | 82.970000 | 35.520000 | 40.130000 |
8 rows × 400 columns
S&P SmallCap 600 Index
Look at the dimensions of our DataFrame
sp_data['sp600'].close.shape
(1116, 601)
Check the first rows
sp_data['sp600'].close.head()
Symbols | AAOI | AAON | AAT | AAWW | AAXN |
---|---|---|---|---|---|
2014-06-10 | NaN | NaN | NaN | NaN | NaN |
2014-06-11 | 22.33 | 20.7695 | 30.1070 | 38.07 | 13.91 |
2014-06-12 | 22.30 | 20.5904 | 29.9566 | 37.05 | 14.03 |
2014-06-13 | 21.86 | 20.3792 | 29.9654 | 37.29 | 13.87 |
2014-06-16 | 22.71 | 20.5712 | 30.0273 | 36.95 | 13.78 |
5 rows × 601 columns
Check the end rows
sp_data['sp600'].close.tail()
Symbols | AAOI | AAON | AAT | AAWW | AAXN |
---|---|---|---|---|---|
date | |||||
2019-06-04 | 9.19 | 47.80 | 45.11 | 39.70 | 67.63 |
2019-06-05 | 9.23 | 48.08 | 46.39 | 38.62 | 68.00 |
2019-06-06 | 9.30 | 48.18 | 46.39 | 37.99 | 67.97 |
2019-06-07 | 9.50 | 48.85 | 46.61 | 39.88 | 69.18 |
2019-06-10 | 9.60 | 48.64 | 46.51 | 40.30 | 71.91 |
5 rows × 601 columns
Descriptive stats
sp_data['sp600'].close.describe()
Symbols | AAOI | AAON | AAT | AAWW | AAXN |
---|---|---|---|---|---|
count | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 |
mean | 26.523188 | 29.595272 | 37.383077 | 49.431387 | 31.595862 |
std | 17.116913 | 7.763096 | 3.595458 | 10.156758 | 15.917380 |
min | 8.380000 | 16.287400 | 29.352300 | 31.400000 | 10.500000 |
25% | 14.810000 | 22.305875 | 35.384150 | 40.970000 | 22.285000 |
50% | 19.845000 | 30.684600 | 37.508550 | 49.150000 | 25.175000 |
75% | 33.927500 | 35.473400 | 39.254350 | 56.542500 | 38.340000 |
max | 99.610000 | 51.631400 | 46.770000 | 74.000000 | 74.890000 |
8 rows × 601 columns
Summary
In this article, we retrieved historical data for every constituent in our universe of stocks – the S&P 500, S&P MidCap 400 and S&P SmallCap 600 indices. The 5-year historical data is relatively straightforward to obtain, and is provided for free by the Investors Exchange. In the next article, we implement a simple trading strategy, and backtest it using the historical data collected.
2 thoughts on “Backtesting A Trading Strategy Part 2”