|
LTS Procedure Description |
Regression methods are evaluated by several well-known criteria, including unbiasedness, efficiency, and consistency. A relatively new criterion is a high "breakdown point". A high-breakdown regression estimator is robust; it can withstand a lot of contamination in the sample. In particular, high-breakdown estimators can handle outliers whether they occur in the dependent variable or in the independent variables. The latter "bad leverage points" often overwhelm methods like least-absolute-errors regression and the M-estimators, which deal effectively with outliers in the dependent variable.
The maximum breakdown point for a linear regression method is fifty percent; loosely speaking, this means that the regression method can cope with samples in which as many as half the observations are contaminated. If more than half the data are outliers, a linear regression method cannot tell the good observations from the bad ones. Ordinary least squares (OLS) has a very low breakdown point (asymptotically zero!) since just one outlier, if it is bad enough, can throw the OLS line indefinitely far off target. This comment also applies to the OLS residuals and to the "hat" matrix X(X’X) -1X, two sets of statistics which are often proposed as diagnostics to identify anomalous data. Because they are based on the OLS regression, these diagnostics have low breakdown points and can be quite misleading in the presence of outliers.
In their monograph Robust Regression and Outlier Detection (Wiley, 1987), P. J. Rousseeuw and A. M. Leroy introduce Least Trimmed Squares (LTS), the high-breakdown linear regression method implemented in the LTS.SRC procedure. Using a resampling algorithm, LTS locates observations in the uncontaminated half of the sample and uses these good data to pinpoint the outliers.
Observations whose LTS standardized residuals exceed 2.5 in magnitude are dropped from the sample, and OLS is run on the remaining data to produce final regression estimates. The resampling approach is required because the LTS criterion function is not at all smooth; it typically contains many local minima and therefore cannot be minimized by conventional methods.
Rousseeuw and Leroy propose drawing a large number of subsamples, each of size K (the number of regression coefficients, including the constant term). They show that, if the number of subsamples is large, at least one of them is virtually certain to be uncontaminated by outliers. The LTS regression is based on these "clean" subsamples.
In the RATS procedure LTS.SRC, the number of subsamples to be drawn is an option set by the user. Three thousand (3000) subsamples should be ample for most purposes, but users are referred to the monograph by Rousseeuw and Leroy for a detailed treatment of this issue.
Detection of outliers is especially important in time-series models, including autoregressions, where the use of a lagged dependent variable guarantees that an outlier in the dependent variable will also show up as a bad leverage point in the independent variables. The more lags there are, the more the bad observation is propagated through the sample. There is now a substantial literature on high-breakdown regression, including many articles in the Journal of the American Statistical Association during the last ten years. These papers include numerous data sets, both real and hypothetical, which can be used to explore the robustness of OLS, LTS and other regression methods.
The syntax for the procedure is as follows:
@LTS(option) depvar start end #(list of explanatory variables--do not include CONSTANT)
As noted above, a CONSTANT is automatically included in the full-sample regression, so do NOT include it on the supplementary card
There is one option:
ITERATIONS=number of subsamples to draw [3000]
This file was last modified on