Sunday, June 25, 2017

Instrumental Variables & the Frisch-Waugh-Lovell Theorem

The so-called Frisch-Waugh-Lovell (FWL) Theorem is a standard result that we meet in pretty much any introductory grad. course in econometrics.

The theorem is so-named because (i) in the very fist volume of Econometrica Frisch and Waugh (1933) established it in the particular context of "de-trending" time-series data; and (ii) Lovell (1963) demonstrated that the same result establishes the equivalence of "seasonally adjusting" time-series data (in a particular way), and including seasonal dummy variables in an OLS regression model. (Also, see Lovell, 2008.)

We'll take a look at the statement of the FWL Theorem in a moment. First, though, it's important to note that it's purely an algebraic/geometric result. Although it arises in the context of regression analysis, it has no statistical content, per se.

What's not generally recognized, however, is that the FWL Theorem doesn't rely on the geometry of OLS. In fact, it relies on the geometry of the Instrumental Variables (IV) estimator - of which OLS is a special case, of course. (OLS is just IV in the just-identified case, with the regressors being used as their own instruments.)

Implicitly, this was shown in an old paper of mine (Giles, 1984) where I extended Lovell's analysis to the context of IV estimation. However, in that paper I didn't spell out the generality of the FWL-IV result.

Let's take a look at all of this.
First, what does the FWL Theorem actually tell us?

Suppose that we have a linear multiple regression model of the following form:

               y = X1β + X2γ + ε     ,                                                            (1)

where there are k1 columns in the X1 regressor sub-matrix, and k2 columns in the X2 regressor sub-matrix. Any or all of these regressors may be random; and both X1 and X2 have full column rank. We don't need to assume anything about the error term, ε. (Remember - the FWL Theorem has nothing to do with the statistical characteristics of out model, or estimators.)

Now, consider two alternative ways of estimating β:

(i)   Just apply OLS to the full model, (1). This yields the following estimator of β -

       b = (X1'M2X1)-1X1'M2y,

where    M2 = I - X2(X2'X2)-1X2'.

(ii)  Proceed in two stages. First, regress y on X2 using OLS, and obtain the prediction vector, y*, residual vector, e* = (y - y*). Also, regress (each column of) X1 on X2 using OLS, and get the prediction matrix, X1*, and the corresponding matrix of "residuals", E1* = (X1 - X1*). Then, regress eon E1*, using OLS. This yields the following estimator of β -

      β* = (E1*'E1*)-1E1*'e*.

The FWL Theorem is simply the result that  β* = b.

(You can check this easily enough if this isn't a result that's familiar to you already.)

Applications of this result include the following:

(a)  If X2 is a matrix of seasonal dummy variables, then we see that using such regressors is equivalent to "seasonally adjusting" y and the columns of X1 by regressing these variables on the dummies. (One criticism of this is that this sort of seasonal adjustment is very crude, as it pays no attention to the trend and cyclical components of the data.)

(b)  If X2 has just one column, and that variable is a linear time-trend variable, then its inclusion is equivalent to "de-trending" all of the other regressors, and y in the same way.

(c)  If X2 has just one column, and that column is the intercept, then its inclusion is equivalent to expressing y and the columns of X1 in terms of deviations about their sample means, and then estimating the model without the intercept.

Actually, there are some other related results that often overlooked. For instance, we get the same estimator of β if we regress y itself on E1*. Have a play around with some of the various combinations that come to mind.

It should be noted that Fiebig and Bartels (1996) extended the FWL Theorem to the situation where we use GLS, rather than OLS. However, what we're concerned with is showing how the above results generalized quite simply from OLS estimation to IV estimation. This isn't something that you'll find in your typical textbook.

If any or all of the regressors in (1) are random, and potentially correlated with the error term, then OLS will be an inconsistent estimator, and instead we'd undoubtedly consider using an an IV estimator. Let Z1 and Z2 be matrices of instruments for the columns of X1 and X2. We'll limit our discussion to the just-identified case, where the number instruments and regressors are equal. In practice, we'd want the instruments to be "legitimate", and "non-weak", but neither of these properties are required for the following algebraic results to hold.

Once again, consider two possible estimators of β in (1).

(i)  Just apply IV estimation to the full model, (1). This yields the following estimator of β:

       bIV = (Z1'Q2X1)-1Z1'Q2y,

where    Q2 = I - X2(Z2'X2)-1Z2'.

(ii)  Proceed in two stages. First, regress y on X2 using Z2 as the instrument matrix, and obtain the prediction vector, yIV*, and the residual vector, eIV* = (y - yIV*) = Q2y. Also, regress (each column of) X1 on X2 again using IV with Z2 as the instrument matrix, and get the prediction matrix, X1IV*, and the "residual matrix", E1IV* = (X1 - X1IV*) = Q2X1. Then, regress eIV* on E1IV* using IV with Z1 as the instrument matrix. This yields the following estimator of β:

      β*IV = (Z1'E1IV*)-1Z1'eIV*.

The IV version of the FWL Theorem is simply the result that  β*IV = bIV.

You'll find a proof of this, for the particular scenario mentioned in (a) above, in Giles (1984). It's a simple matter to generalize that proof to the more general situation just discussed!

In fact, my earlier paper establishes the equivalence of thirteen variations of the IV estimator, and similar extensions hold for the more general modelling problem discussed here. For instance, check to see what happens if you regress y on  E1IV* and X2, using IV with Z1 and Zas the instrument matrices.

Extending all of this to the over-identified case is a bit more challenging, but similar results can be established.

The bottom line: The FWL Theorem is stated in terms of OLS estimation, but this is actually somewhat misleading because it holds in the context of IV estimation (of which OLS is just a special case).


Fiebig, D. G. and R. Bartels, 1996. The Frisch-Waugh theorem and generalized least squares. Econometric Reiews, 15, 431-443.
Frisch, R. and F. V. Waugh (1933). Partial time regression as compared with individual trends. Econometrica, 1, 387-401.
Giles, D. E. (1984). Instrumental variables regressions involving seasonal data. Economics Letters, 14, 339-343. (Free download.)
Lovell, M. C. (1963). Seasonal adjustment of economic time series. Journal of the American Statistical Association, 58, 993-1010.
Lovell, M. C. (2008). A simple proof of the FWL (Frisch, Waugh, Lovell) theorem. Journal of Economic Education, 39, 88-91.

© 2017, David E. Giles


  1. I could be wrong, but intuitively I think in the first (ii) above it should be regressing e* on E_1*, and therefore it should be e* in the equation for beta* (instead of y*). My intuition comes from the case where X_2 is uncorrelated with y but X_1 is correlated with y. In that case, y* is zero in expectation, but b is nonzero in expectation.

    1. Thanks Andy - 2 typos. Now fixed. DG