RATS 11.1
RATS 11.1

Collinearity (or multicollinearity—the two terms are synonyms) is a description of a situation where a set of variables (such as regressors) have an exact or almost exact linear relationship. If the columns of the \(\bf{X}\) matrix in a regression are exactly (or perfectly) collinear, then the \({\bf{X'X}}\) formed from them will not be invertible and if they're nearly collinear, it may be difficult to invert the matrix using standard methods on a standard computer because of precision issues. Perfect collinearity and near collinearity have very different sources and require very different approaches, so we will split the topic into two parts.

Perfect Collinearity

In most cases, there is a relatively simple workaround for perfect collinearity—when the inversion routine detects that there is collinearity among the first \(K\) variables, it zeros out row and column \(K\) and just continues on. In effect, this just removes the variable that (in the order they were included in the regression) completes the collinear set. For instance, if we fall into the dummy variable trap and include CONSTANT, and both MALE and FEMALE dummies in a regression (rather than either the two dummies or CONSTANT and one of the two dummies):
 

linreg wage

# constant male female

 

LINREG will go ahead with that and produce the following output:

 

Linear Regression - Estimation by Least Squares

Dependent Variable WAGE

Usable Observations                      3294

Degrees of Freedom                       3292

Centered R^2                        0.0317459

R-Bar^2                             0.0314517

Uncentered R^2                      0.7639932

Mean of Dependent Variable       5.7575850178

Std Error of Dependent Variable  3.2691857840

Standard Error of Estimate       3.2173642756

Sum of Squared Residuals         34076.917047

Regression F(1,3292)                 107.9338

Significance Level of F             0.0000000

Log Likelihood                     -8522.2280

Durbin-Watson Statistic                1.8662

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     5.1469238679 0.0812248211     63.36639  0.00000000

2.  MALE                         1.1660972915 0.1122421588     10.38912  0.00000000

3.  FEMALE                       0.0000000000 0.0000000000      0.00000  0.00000000

 

This is exactly the result you would get if you had (properly) left the FEMALE dummy out of the regression. If you ordered the regressors as CONSTANT FEMALE MALE, you would get a FEMALE coefficient with the MALE being zeroed out. (Note that the degrees of freedom is corrected to subtract only two regressors, not three).
 

The single most common source for perfect collinearity is a single regressor which is zero throughout the sample being used. While this can be due to using a subsample where a variable (usually a dummy) is zero, more frequently it occurs in non-linear estimation if the PARMSET includes a variable which doesn't appear in the function being optimized. Here the DELTA variable is mistakenly included on the NONLIN instruction when it doesn't appear in the NLCONST FRML.

 

nonlin beta0 delta beta1 beta2

frml nlconst cons = beta0+beta1*inc^beta2

*

linreg cons

# constant inc

compute beta0=%beta(1),beta1=%beta(2),beta2=1.0

nlls(frml=nlconst)

 

Again, the unnecessary variable shows in the output with a zero coefficient and zero standard error.

 

Nonlinear Least Squares - Estimation by Gauss-Newton

Convergence in    26 Iterations. Final criterion was  0.0000008 <=  0.0000100

 

Dependent Variable CONS

Quarterly Data From 1960:01 To 2009:04

Usable Observations                       200

Degrees of Freedom                        197

Centered R^2                        0.9987629

R-Bar^2                             0.9987503

Uncentered R^2                      0.9997774

Mean of Dependent Variable       4906.7400000

Std Error of Dependent Variable  2304.4552354

Standard Error of Estimate         81.4634087

Sum of Squared Residuals         1307348.5306

Regression F(2,197)                79523.7556

Significance Level of F             0.0000000

Log Likelihood                     -1162.3071

Durbin-Watson Statistic                0.4081

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  BETA0                        299.01962998  48.85249050      6.12087  0.00000000

2.  DELTA                          0.00000000   0.00000000      0.00000  0.00000000

3.  BETA1                          0.28931304   0.03433131      8.42709  0.00000000

4.  BETA2                          1.12408725   0.01243615     90.38867  0.00000000

Sometimes, the use of a regression with perfect collinearity is intentional to allow a common regressor list with two models which use slightly different free parameters. For instance, in this second case, you might have a related model where DELTA does appear.

 

Perfect collinearity isn't always easily ignored. In a multivariate regression (as done with SUR or NLSYSTEM), most estimation methods depend upon an estimate of the covariance matrix of the residuals. If the number of equations is larger than the number of usable time periods, an unrestricted estimate of the covariance matrix has to be singular, and there is no mechanical way to deal with that because the inverse of the matrix is needed to properly weight the information from the different equations. Even if the number of equations is slightly smaller than the number of usable time periods, the unrestricted covariance matrix can still be singular because the residuals have fewer degrees of freedom (that is, fewer pieces of independent information) than the original data. The only solutions to the singularity problem are:

1.Use fewer equations.

2.Use more data points. Note that the only observations that are used in one of these regressions are ones for which all the equations have data, so if you have some series with missing data, it's possible that dropping a few equations will also increase the usable sample size.

3.Use a separate estimate of the covariance matrix (input to the instruction using the CV option). For instance, shrinking the off-diagonal elements (multiplying by, for instance, .8 off the diagonal and 1 on the diagonal) will give you a non-singular matrix.


 

Near Collinearity


At one point, near collinearity was a major issue, particularly in time series analysis, and you can often determine when the early editions of a textbook were written based upon how much space is given to a discussion of it. Computer arithmetic at the old single precision (largely made obsolete by floating point processors that were standard by the early 1990's) could not cope with the high degree of correlation among lags of typical time series data for things like distributed lags and vector autoregressions. (The correlation between \(x_t)\) and \(x_{t-1}\) approaches 1 as the data set gets large if \(x\) is, for instance, a random walk). A famous test case for linear regressions was the Longley data set (from 1967) which would generally require special algorithms for the regression to be computed in single precision. With the almost universal change to double precision, these issues were largely eliminated.

 

While you can now safely run time series regressions with blocks of lags of persistent data among your regressors without having to worry about the calculations being wrong, it's important to understand that the use of highly correlated regressors does affect the interpretation of the output. The following is an example of an eight-lag autoregression for (log) U.S. M1:

 

linreg fm1

# constant fm1{1 to 8}

 

Linear Regression - Estimation by Least Squares

Dependent Variable FM1

Monthly Data From 1959:09 To 2006:04

Usable Observations                       560

Degrees of Freedom                        551

Centered R^2                        0.9999585

R-Bar^2                             0.9999579

Uncentered R^2                      0.9999993

Mean of Dependent Variable       6.1448743408

Std Error of Dependent Variable  0.7782903192

Standard Error of Estimate       0.0050493344

Sum of Squared Residuals         0.0140481739

Regression F(8,551)              1660040.8853

Significance Level of F             0.0000000

Log Likelihood                      2171.4903

Durbin-Watson Statistic                1.9983

 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                      0.002536927  0.001730916      1.46566  0.14331249

2.  FM1{1}                        1.176161443  0.042617338     27.59819  0.00000000

3.  FM1{2}                       -0.089009016  0.065642350     -1.35597  0.17566435

4.  FM1{3}                        0.057594948  0.065793757      0.87539  0.38174523

5.  FM1{4}                       -0.099986781  0.065901004     -1.51723  0.12978289

6.  FM1{5}                        0.020901289  0.065888273      0.31722  0.75119441

7.  FM1{6}                        0.086948615  0.065847199      1.32046  0.18722959

8.  FM1{7}                       -0.136515526  0.065843403     -2.07334  0.03860510

9.  FM1{8}                       -0.016303804  0.042740776     -0.38146  0.70301061


 

There are several things to note about the output. First, there is a tendency for the coefficients to switch signs from one lag to the next. Second, the standard errors will (almost always) be fairly flat in the middle lags (from lag 2 to lag 7) with considerably lower values for the end lags (1 and 8 in this case). Both of these come about because of high correlation between adjacent lags of a persistent data series (such as this). Almost any single lag can be removed from the regression with relatively little effect on the fit (only two of the eight are individually significant at the 5% level) because its neighboring lags can do a reasonable job of proxying for it. Because the middle lags have two neighbors, while the end lags have just one, the middle lags aren't as well-determined, hence the higher standard errors. The sign changes are due to the fact that if the regressors themselves are positively correlated, their coefficients are negatively correlated: move one up and the next one down by the same amount and almost nothing happens to the fit of the regression. The main takeaway is that the individual coefficients in a model like this aren't structural, that is, they do not have any meaning outside of their place in the overall model. Some combinations of the coefficients can be structural (for instance, the sum would form the basis for a unit root test), but not the single coefficients.

 

One other thing to note (along the same lines): the individual t-statistics don't tell you much about how blocks of lags behave. For instance, the t's for lags 5, 6, 7 and 8 have three which are quite insignificant and one which is marginally significant (at .05). However, if we test an exclusion of all four together:
 

exclude

# fm1{5 to 8}

 

Null Hypothesis : The Following Coefficients Are Zero

FM1              Lag(s) 5 to 8

F(4,551)=      5.77397 with Significance Level 0.00014765


The result is very highly significant, which would come as a surprise from just looking at the individual coefficient information.


Copyright © 2026 Thomas A. Doan