High-frequency factor models and regressions

https://doi.org/10.1016/j.jeconom.2020.01.007Get rights and content

Abstract

We consider a nonparametric time series regression model. Our framework allows precise estimation of betas without the usual assumption of betas being piecewise constant. This property makes our framework particularly suitable to study individual stocks. We provide an inference framework for all components of the model, including idiosyncratic volatility and idiosyncratic jumps. Our empirical analysis investigates the largest dataset in the high-frequency literature. First, we use all traded stocks from NYSE, AMEX, and NASDAQ stock markets for 1996–2017 to construct the five Fama–French factors and the momentum factor at the 5-minute frequency. Second, we document the key empirical properties across all the stocks and the new factors, and apply the nonparametric time series regression model with the new high-frequency Fama–French factors. We find that this factor model is effective in explaining the systematic component of the risk of individual stocks. In addition, we provide evidence that idiosyncratic jumps are related to idiosyncratic events such as earnings disappointments.

Introduction

Time series regressions of asset returns on Fama–French factors are commonly used in financial economics. This regression model is often estimated using rolling windows of short intervals of time such as one month, due to concerns over the time variation in the factor betas, and other potential forms of nonstationarity of the model. However, when relatively low-frequency data such as daily data is used, the monthly estimates can be noisy. The problem of noisy estimates due to a small number of time series observations can be somewhat alleviated by some additional averaging, for instance by forming portfolios of stocks with similarly estimated factor betas.

An alternative is to use high-frequency data. Because the factor betas are effectively covariance-like quantities, they benefit from the additional data collected at high frequency. Using relatively short time windows, we can decompose the idiosyncratic risk, which is usually measured as the variation of the stock returns once the impact of the common factors has been removed, into its continuous and jump components. Our model can be thought of as a nonparametric continuous-time regression model. The framework we employ allows for general time variation in factor betas, and so is particularly suitable to study individual stocks. We provide the asymptotic theory for the estimators of time-varying betas and each of the idiosyncratic risk components.

Empirically, we perform a much larger scale analysis than what has been done so far in the high-frequency literature. We use all traded stocks from NYSE, AMEX, and NASDAQ stock markets during the period 1996–2017 to construct high frequency versions of the Fama–French factors (see Fama and French, 1992, Fama and French, 1993, Fama and French, 2015) and the momentum factor (see Carhart, 1997) at the 5-min frequency. We document the key empirical properties across all the stocks and the new factors. We then investigate the potential of the new high-frequency factors in explaining the individual stock returns. Finally, we study the behavior of the individual stock betas and the components of the idiosyncratic risk measures.

An important preliminary step in our empirical analysis is the study of the appropriate sampling frequency for individual stocks. Given the large number of stocks in the sample (5005 on average), the liquidity naturally varies widely across stocks; it also changes over time. For every stock and every month, we select a sampling frequency where the stock is liquid enough and its return does not have significant market microstructure effects. For this purpose, we use the Hausman test of Aït-Sahalia and Xiu (2019a) to check for the absence of statistically significant market microstructure noise at a given frequency, and also take into account the number of zero returns at any potential frequency. We choose between 5-min, 10-min, 30-min, and daily frequencies. We document the variation across stocks and over time of the selected frequencies. Our procedure results in a clear increase towards higher frequencies over time, with the steepest increases following the decimalization in 2001. In the last ten years, some intraday frequency is appropriate for the majority of stocks.

Our further empirical findings can be summarized as follows. The estimated high-frequency betas are similar to the standard betas calculated at daily frequency, but the former are more precisely estimated and are more stable across time, especially towards the end of the sample. The additional high-frequency factors are helpful in explaining additional time-series variation in stock returns compared to the high-frequency market factor alone. The idiosyncratic risk estimates at high-frequency and daily frequency are also comparable, though again the former seem to be more accurate towards the end of our sample. Finally, we decompose the idiosyncratic risk into an idiosyncratic volatility and idiosyncratic jumps contributions, and we find that earnings surprises increase idiosyncratic jumps. Moreover, earnings disappointments have a larger effect on the idiosyncratic jumps than do positive earnings surprises. From the perspective of estimating the continuous component of the model using discrete data, jumps can be thought of as outliers in the same spirit as Box and Tiao (1968) and Chang et al. (1988).

The literature on nonparametric regressions at high-frequency is closely related. A realized beta estimator, constructed as the ratio of realized covariance to realized variance, was proposed in Barndorff-Nielsen and Shephard (2004) and Andersen et al. (2005). These papers do not allow for jumps, and the implicit regression model has constant betas over the time interval considered, such as a week or a month. When estimating the model on moving windows, the assumption is effectively that of piecewise constant betas. When jumps are also allowed, realized beta still estimates a meaningful quantity, provided the continuous and jump components of factors are constrained to have the same effect on the stock return. Todorov and Bollerslev (2010) maintain the piecewise constant betas assumption, but allow the continuous and jump betas to differ. Bollerslev et al. (2016) consider a closely related model. A regression relationship between jumps alone is considered by Li et al. (2017) who also assume piecewise constant betas.

By contrast, we allow for general time variation in betas. Nonparametric time series regression models with time-varying betas have been previously considered by Mykland and Zhang (2006) and Reiß et al. (2015). These papers assume there are no jumps and the regressor is scalar. Jacod and Rosenbaum (2013) develop a general inference framework that is useful to study the continuous components of more general regressions with time-varying betas, see also Li et al. (2016), Kalnina and Tewou (2017) and Kalnina and Xiu (2017). Neither these nor other results in the literature allow to conduct inference on the idiosyncratic jump risk, which we develop.

Our framework is more general than nonparametric jump-diffusions (see, e.g., Aït-Sahalia et al., 2009, Ang and Kristensen, 2012, and Bandi and Phillips, 2003), since we work with Itô semimartingales. We make no restrictions on the continuous or jump leverage effects, we allow for jumps in levels and volatilities of our processes, and we allow for general time-variations in factor betas. As a result, our betas may well depend on firm characteristics or macroeconomic variables, a specification that is popular in empirical finance, except we do not need to know what those variables are, as long as they satisfy some weak regularity assumptions.

The paper is organized as follows. Section 2 presents the continuous-time regression model and lists our assumptions. Section 3 outlines our identification strategy and presents the estimators. Section 4 presents their asymptotic properties. Section 5 provides Monte Carlo simulation evidence. Section 6 gives the details on the construction of the high-frequency factors, and presents the empirical results. Section 7 concludes. The appendix contains the proofs.

Section snippets

The model

We consider the following nonparametric time series regression model, Yt=Y0+0tβsdXsc+0stβ̃sΔXs+Zt,where Y is the dependent process, X is a d-dimensional multivariate covariate process, and Z is the residual process. In the above, Xc denotes the continuous component of X, and ΔXs denotes its jump (if any) at time s. βt and β̃t are the factor loadings with respect to the continuous and the jump parts of X.

This model, cast in continuous time, is analogous by the discrete-time factor model

Econometric strategy

The strategy for estimation is conceptually simple. Similarly to the standard factor model with observable factors, we use OLS-type regressions for the continuous components of the factor model. However, due to the time variation in βs and γs, we have to run regressions using data from a moving window, and then aggregate the spot estimates to obtain estimates for Iβ and IdV. On the other hand, the estimation of the jump idiosyncratic risk IdJ requires estimation of various jump times and sizes,

The asymptotic distributions

This section provides the asymptotic distributions of our estimators, which are useful for constructing the confidence intervals.

Theorem 2

Suppose Assumption 1, Assumption 2, Assumption 3 hold. In addition,knΔnς andunΔnϖ for someς(r213,12) andϖ[1ς2r,12). AsΔn0, we have3 Δn12Iβ̂tIβtLsWtβandΔn12IdV̂tIdVtLsWtγ,where Wβ andWγ are processes defined on the extension of the original space (Ω,F,{Ft}t0,P), which, conditionally on F, are

Monte Carlo simulations

We now examine the finite sample performance of our estimators. For this purpose, we simulate many trajectories over one month for one stock with log-price Yt following a three-factor model, which is a special case of (1).

The factors Xt=X1t,X2t,X3t follow the dynamics dX1tdX2tdX3t=b1b2b3dt+σ1t000σ2t000σ3t1ρ12ρ13ρ121ρ23ρ13ρ231dW1tdW2tdW3t+J1tJ2tJ3tdNt.The factor volatilities are driven by Feller’s square root process (a.k.a the Cox–Ingersoll–Ross model) with the same process Nt, dσit2=κ˜iα˜iσit

Construction of intraday equity factors

We reconstruct the five Fama–French factors (see Fama and French, 1993 and Fama and French, 2015) and the momentum factor (see Carhart, 1997) at the 5-min frequency from January 1, 1996 to December 31, 2017. The construction takes a few steps and requires a combination of three databases. We describe the details below.

Since the daily portfolio constituents for the five Fama–French factors are not publicly available, we start by replicating these factors at the daily frequency to obtain the

Conclusion

This paper shows how to identify and estimate, using high-frequency data, a nonparametric Fama–French factor model under broad assumptions on the data-generating process. We allow for general time-variation in the factor beta processes, which makes our framework particularly suitable to application to individual stocks. The definitions of the estimators are straightforward, but important technical difficulties associated with the classification of jumps into systematic and idiosyncratic

References (27)

  • Aït-SahaliaY. et al.

    High Frequency Financial Econometrics

    (2014)
  • Aït-SahaliaY. et al.

    Principal component analysis of high frequency data

    J. Amer. Statist. Assoc.

    (2019)
  • AndersenT.G. et al.

    A framework for exploring the macroeconomic determinants of systematic risk

    Amer. Econ. Rev.

    (2005)
  • Cited by (0)

    The paper previously circulated under the title “The Idiosyncratic Volatility Puzzle: a Reassessment at High Frequency.” We are grateful to two referees and the Editor for very helpful comments. We thank seminar and conference participants at Duke University, the Stevanovich Center for Financial Mathematics at the University of Chicago, Measuring and Modeling Financial Risk with High Frequency Data in Florence, Econometric Study Group Conference in Bristol, Financial Statistics Conference in Chicago, Canadian Econometric Study Group Conference, Conference on High-Frequency Financial Data in Montreal, Time Series and Financial Econometrics Conference in Montreal, Princeton-QUT-SJTU-SMU econometrics conference, and Financial Econometrics Conference in Toulouse.

    View full text