The S&P 500, Factor Investing, and Index Benchmarking
a brief practitioner’s look at equity indexing and the monkey-vs-active manager paradox
The S&P500 is the benchmark used to measure virtually every fund and every manager, and S&P500 returns occupy a universal measurement standard similar to the speed of light and the force of gravity. The most common refrain around a successful manager is “s/he has outperformed the index X out of the last Y years”. The S&P500 yardstick is liberally applied to investment companies, mutual funds and hedge funds even where the comparison doesn’t apply because of relative risk or strategic differences.
Within the ETF market and certain academic circles, market capitalization weighted indices such as the S&P500 are labelled old fashioned and in need of replacing with equal weight (i.e. each stock is held in equal proportion), value weight, growth weight or dividend weight indices. “New” weighting schemes vary from simple factor screening to pseudo-active strategies.
While switching from market-capitalization weighting to alternative factor weighting presents some interesting trades, the bigger questions are: (1) “should market capitalization weighting be the standard bearer for equity investments?”, and (2) “is it possible to make index style choices truly independent of stock & sector picking?”
Directly related to the developments around equity indexing is the long running monkey-vs-active manager paradox; academic studies claim that monkeys throwing darts can consistently outperform the S&P500, while similar studies report that the vast majority of (non-simian) active managers underperform the S&P500.[i] Burton Malkiel is widely credited with creating the monkey parable, but I believe that he uses it to underscore the (Fama) efficient markets aspects of his random walk model rather than as a promotional piece for CGAM - Curious George Asset Management LLC.
In the following brief post, I evaluate the factor weight debate, the monkey debate, and the S&P500’s position as the central equity benchmark.
Indexing - Simple Theory vs. Practice
Academic theory deals in expected returns while brokerage statements are regrettably limited to actual returns.
Claims of outright outperformance by one index over another are usually presented in the context of sterilized back-tested data and “expected returns”, when in reality, a carefully created new index presents a different but not necessarily better trade. For a number of years, “equal weight” has been positioned as superior to the S&P500; in practice equal weight is usually a bet on smaller capitalization stocks (more Russell / less DJIA). In practice smaller cap stocks have more return variability and expectations of higher returns – bigger expected gains, but more dispersion in outcomes. Choosing small caps over large caps is a positioning trade, and not necessarily a dominant strategy. Today, the large-cap/small-cap selection is more interesting than usual given the new administration’s moves on global trade, foreign exchange, tax policy, and labor – different stocks/different risks.
The following graphic presents a return comparison of “SPY” (the S&P500) with “EQAL” (a Russell 1000 equal weight ETF) – two tickers at the opposite ends of the weighting debate. The left graph is the classic risk/return relationship underpinning CAPM, APT, and basic portfolio selection – higher non-diversifiable risk is associated with higher expected returns. The right graph is a two-year 2015-2016 return comparison of SPY and EQAL: in the bull market of 2016, EQAL outperformed SPY by 2.5%, but in the flat market of 2015, SPY outperformed EQAL by more than 6% - empirically small caps disproportionately favor strong markets.
I replicated the classic monkey experiment, comparing the performance of randomly selected portfolios with the broader index. Rather than recreate the “small stocks always win” results of the typical study, I compared stocks drawn from the S&P100 (^OEX – the largest of the large) with the S&P500 – by avoiding small caps in the sample portfolios, we can more accurately assess the impact of market-cap weighting without the complications of large-cap small-cap basis. Using non-repeating random number sets, I drew 500 random equal-weighted portfolios of 10, 20, 30, 40, and 50 stocks for the last five years (500 simulations x 5 portfolio sizes x 5 years). In each case, the equally weighted portfolio was drawn from the S&P 100 at the beginning of each year, and held throughout the year. I looked at random draw counts ranging from 100 to 1000, and settled on 500 for ease of replication in a spreadsheet and because results tended to converge after 300 draws.
A hasty look at the results c/would be used to support the monkey myth – the average return of the randomly selected 10-stock portfolios (10 stocks selected from the OEX) outperformed the S&P500 in 4 out of 5 years. As further support, the average return of the 10-stock portfolios outperformed the S&P500 in more than 70% of the 20 quarterly periods. I would also argue that my methods were more robust than most monkey-tests: no small-cap bias, rigorously generated numbers, and all draws were performed without replacement – a challenging feature when creating large arrays of random numbers.
I would also argue that the hasty conclusion is wrong.
The following graphic looks at the distribution of returns generated from the simulations of the 10, 20, 30, and 40 stock portfolios. The two biggest takeaways from the graphic are: (i) the mean values of the different portfolios do not show material variation (deviations in most years was less than 30 basis points) , but (ii) the variance of returns changes dramatically as the number of stocks selected increases from 10 to 40 –the variance of sampled returns for the 30/40 share count portfolios was less than half that of the 10 stock portfolios; most of the diversification benefits are achieved at 30 stocks – Harry Markowitz (Portfolio Selection- 1952) was right.
Using results from the study, the average return of the 30/40/50 stock portfolios for 2016 was approximately 13% with a standard deviation of 2.5%: a 95% confidence lower return boundary is around 8%. By contrast the average return of the 10 stock portfolios for 2016 was 13.25% with a standard deviation of 5.3% - importantly, the 95% confidence lower return boundary is under 3%....5 points lower than the larger portfolios.
Is There a Market Capitalization Rebalancing Effect?
The S&P500 (and other market cap indices) select and weight their constituents with reference to relative market capitalization. Indices will own a proportionately larger amount of the largest equities, and as those large equities rise (and possibly get more expensive), the index will acquire more. Further, the index does not control for sector concentrations, valuations measures, or correlations.
The S&P500 gets criticized for potential over-concentration in specific stocks and sectors. More philosophical criticisms of the S&P500 (and other market capitalization indices) suggest that they exacerbate price and value moves relating to winners and losers.
The following table presents the annual return results of the 30 (randomly selected) stock portfolios and the S&P500 (^SPX) – returns include dividends. Surprisingly, the average return of the random portfolios outperformed the S&P500 in four of five years, and the average outperformance is 0.50% per annum.
Taking the analysis one step further, we compare the 30 randoms (the “winner” in the above comparison), the S&P100 (^OEX), and the S&P100 Equal Weight Index (^EWI). Consistent with a lot of colorful industry brochures, it looks like equal weight wins the day (see “EWI” returns below) – equal weight beats the S&P500, the S&P100, and even beats the monkeys.
However….And What All This Mean
Indices like the S&P500 are used by the majority of retail investors and advisors to avoid the chore and risk of picking individual stocks. Unfortunately though, it’s really impossible to favor one index style over another without a corresponding and maybe inadvertant choice (or change) in stocks and sectors – I consider an AAPL weighting change from 5.0% to 0.5% a stock selection.
Looking only at the limited S&P100 universe, the market capitalization weight S&P100 (^OEX) and the equal weight S&P100 (^EWI) have the same tickers, but very different weightings in those tickers - today. S&P’s website has the complete detail, and I recreate some of the holding differentials below.
Tech occupies a 24% weighting in OEX, but only a 14% weighting in EWI, and Industrials, Utilities and Consumer Discretionary are over-represented in EWI as compared to OEX by 4%, 3%, and 2% respectively. OEX is likely to outperform or underperform largely on the basis of moves in tech names (regardless of the relative merits of market-cap or equal weight), and when all the weighting differentials are accounted for, differences in returns can get complex.
Many indexing schemes will have merit, but it is critically important to appreciate that different index schemes will probably begin with different holdings (even where the “base index” is the same) and those holdings differences are likely to drift over time with some degree of path dependency....all require an unavoidable measure of stock and sector selection.
The fundamental nature of indexing and benchmarking should occupy a dominant position in the industry’s commercial energy and fund researcher's efforts. Anyone interested in the commercial and research aspects of this area should feel free to contact me.
1. When you chose an index (S&P500, QQQ) or an index style (Market Cap, Equal Wgt, Dividend Wgt) you are also choosing sectors and ultimately stocks – there are very few “all else held constant” investment choices in indices.
2. If you are using the S&P500 for any purpose, keep an eye on its composition and sector drift.
3. When ranking funds and managers, take care to identify fundamental differences in sector weights, stocks weights and changes in both.
4. Don’t lose sight of the distinction between “expected” returns and real returns
5. If you have monkeys throw darts, limit them to large caps and make sure that they throw at least 30 darts.
[i] See “An evaluation of alternative equity indices” by Clare, Motson, Thomas, Cass Business School, 2013