Forecasting ETFs with Machine Learning Algorithms

Forecasting ETFs with Machine Learning Algorithms


In this article, the authors apply cutting-edge machine learning algorithms to one of the oldest challenges in finance: predicting returns. For the sake of simplicity, they focus on predicting the direction (either up or down) of several liquid exchange-traded funds (ETFs) and do not attempt to predict the magnitude of price changes. The ETFs serve as asset class proxies. The authors employ approximately five years (from January 2011 to January 2016) of historical, daily data obtained through Yahoo Finance. Using supervised learning classification algorithms, readily available from Python’s Scikit-Learn, they employ three powerful techniques: (1) deep neural networks, (2) random forests, and (3) support vector machines (linear and radial basis function). They document the performance of the three algorithms across four information sets: past returns, past volume, dummies for days/months, and a combination of all three. They use a gain criterion to compare classifiers’ performance. First, they find that these algorithms work well over one- to three-month horizons. Short-horizon predictability, over days, is extremely difficult, and thus the results support the short-term random walk hypothesis. Second, they document the importance of cross-sectional and intertemporal volume as a powerful information set. Third, they show that many features are needed for predictability because each feature makes very small contributions toward predictability. The authors conclude that ETF returns can be predicted with machine learning algorithms, but practitioners should incorporate prior knowledge of markets and intuition on asset class behavior.


In this work, we examined the ability of three popular machine learning algorithms to predict ETF returns. Although we restricted our initial analysis to only the direction of the future price movements, we still procured valuable results. First, machine learning algorithms do a good job of predicting price changes at the 10- to 60-day horizon. Not surprisingly, these algorithms fail to predict returns on short-term horizons of five days or less. We introduce our gain measure to help assess efficacy across algorithms and horizons. We also segmented our input feature variables into different information sets so as to cast our research in the framework of the efficient markets hypothesis. We find that the volume information set (B) works extremely well across our three algorithms. Moreover, we find that the most important predictive features vary depending on the ETFs that are being predicted. Financial intuition helps us to understand the prediction variables with complex relationships embedded within the prediction of the S&P 500, as proxied by SPY, requiring a more diverse set of features compared to the complexity of the top feature set needed to explain GLD or OIH.

In practice, the information set could be vastly extended to include other important features, such as social media, along the lines of Liew and Budavari [2017], who identified the social media factor. Additionally, the forecasting time horizons could have been extended even further beyond one trading year or shortened to examine intraday performance. However, we leave this more ambitious research agenda to future work.

One interesting application is to use several different horizon models launched at staggered times within a day, thereby gaining slight diversification benefits for the resultant portfolio of strategies.

In sum, we hope that our application of machine learning algorithm motivates others to move this body of knowledge forward. These algorithms possess great potential in their applications to the many problems in finance.

Forecasting ETFs with Machine Learning Algorithm


Predicting ETFs

Forecasting ETFs

Machine Learning ETF

Artificial Intelligence ETF

Predictive Ability