|
Examples / CHOWTEST.RPF |
The CHOWTEST.RPF example demonstrates a number of forms of "Chow" tests. The underlying regression is of per capita expenditure (PCEXP) on per capita aid (PCAID) and per capita income (PCINC) for the 50 U.S. states. It tests for a difference in the regression for the large vs the small states. Large states are those with a population (POP) above 5,000 (in thousands). The PCEXP series needs to be computed from the raw expenditure value (EXPEND) divided by population:
open data states.wks
data(org=col,format=wks) 1 50 expend pcaid pop pcinc
set pcexp = expend/pop
With this data set (and most cross section data sets), use the option SMPL to do the subsample regressions. With time series data sets, you can handle most sample splits with different start and end parameters on LINREG.
Method One: F Test by Separate Regressions
Doing an F test by separate regressions requires running three regressions (each subsample plus the full sample) and constructing an F statistic from the summary statistics from those. The values of the variables %RSS and %NDF are the ones we need. RSSLARGE, RSSSMALL and RSSPOOL are the residual sums of squares of the large state, small state and pooled regressions. Similar variables are defined for the degrees of freedom. The value for the unrestricted regression is the sum for the split samples.
linreg(smpl=pop<5000) pcexp
# constant pcaid pcinc
compute rsssmall=%rss , ndfsmall=%ndf
*
linreg(smpl=pop>=5000) pcexp
# constant pcaid pcinc
compute rsslarge=%rss , ndflarge=%ndf
*
linreg pcexp
# constant pcaid pcinc
compute rsspool=%rss
The test statistic is then computed and displayed with
compute rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
Method Two: F Test by SWEEP
While it might be overkill in this case, the SWEEP instruction is very useful when you need a regression which is split into more than two categories. To use it, you must create a series with different values for each category in the sample split:
set sizes = %if(pop<5000,1,2)
Then use SWEEP with the option GROUP=category series. The first supplementary card has the dependent variable (there could be more than one in a systems estimation) and the second has the explanatory variables. This will do a separate regression on each category. Save the covariance matrix of the residuals (in this case a \(1 \times 1\) matrix, since there is only the one “target” variable):
sweep(group=sizes,cvout=cv)
# pcexp
# constant pcaid pcinc
The sum of squared residuals will be %NOBS times the 1,1 element of the covariance matrix. The number of explanatory variables is in %NREG, while the total number of regressors across categories is in %NREGSYSTEM so the unrestricted degrees of freedom is the number of observations less that. Note that this relies upon the RSSPOOL value from above (for the full sample regression). (Output. Note that SWEEP produces no printed output).
compute rssunr=cv(1,1)*%nobs
compute ndfunr=%nobs-%nregsystem
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
Method 3: Chow Test with Dummy Variables
The Chow test with dummy variables is more generally applicable than the subsample regression method above, but the cost is a more complex procedure, especially if the number of regressors or number of subsamples is large. This is because a separate dummy must be constructed for each regressor in each subsample beyond the first. Thus, there are \(\left( {n - 1} \right) \times k\) dummies. This is not much of a problem here, since n=2 and k=3, but had we split the sample four ways, we would need nine SETs. For the more complicated cases, you would probably want to create a VECTOR or RECTANGULAR of SERIES to handle the various interaction terms. ONEBREAK.RPF provides an example of that.
Once the dummies (or more accurately, subsample dummies times regressors) are set up, the procedure is straightforward: estimate the model over the whole sample, including regressors and dummies, and test the joint significance of the dummies.
Because this example uses the ROBUSTERRORS option to correct the covariance matrix for heteroscedasticity, the test statistic will be reported as a \(\chi ^2\) with three degrees of freedom. Without that option, it will give identical results to the calculations above.
We create LARGE as a dummy which is one for the large states. We then set up dummies for PCAID and PCINC:
set large = pop>=5000
*
set dpcaid = pcaid*large
set dpcinc = pcinc*large
This computes the regression with the original explanatory variables and the dummy and dummied-out regressors and tests those added variables (Output for regression and test):
linreg(robusterrors) pcexp
# constant pcaid pcinc large dpcaid dpcinc
exclude(title="Sample Split Test-Robust Standard Errors")
# large dpcaid dpcinc
Full Program
open data states.wks
data(org=col,format=wks) 1 50 expend pcaid pop pcinc
set pcexp = expend/pop
*
* The test is for a split at between low (<5000) and high (>=5000)
* population states.
*
* Test by subsample regression
*
linreg(smpl=pop<5000) pcexp
# constant pcaid pcinc
compute rsssmall=%rss , ndfsmall=%ndf
*
linreg(smpl=pop>=5000) pcexp
# constant pcaid pcinc
compute rsslarge=%rss , ndflarge=%ndf
*
* Full sample regression
*
linreg pcexp
# constant pcaid pcinc
compute rsspool=%rss
*
compute rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
*
* Subsample regressions using SWEEP
* Create a series with different values for each category in the sample
* split.
*
set sizes = %if(pop<5000,1,2)
*
* Use SWEEP with the option GROUP=category series. This will do a
* separate regression on each category. Save the covariance matrix of
* the residuals (in this case a 1x1 matrix, since there is only the one
* "target" variable).
*
sweep(group=sizes,cvout=cv)
# pcexp
# constant pcaid pcinc
*
* The sum of squared residuals will be %nobs * the 1,1 element of the
* covariance matrix. The total number of regressors is in %NREGSYSTEM.
*
compute rssunr=cv(1,1)*%nobs
compute ndfunr=%nobs-%nregsystem
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
*
* Test by dummied regressors
*
set large = pop>=5000 ;* Will be one for large states
*
* Set up dummies for PCAID and PCINC
*
set dpcaid = pcaid*large
set dpcinc = pcinc*large
*
* Compute regression with dummies
*
linreg(robusterrors) pcexp
# constant pcaid pcinc large dpcaid dpcinc
*
* Test dummies
*
exclude(title="Sample Split Test-Robust Standard Errors")
# large dpcaid dpcinc
Output
Linear Regression - Estimation by Least Squares
Dependent Variable PCEXP
Usable Observations 38
Degrees of Freedom 35
Skipped/Missing (from 50) 12
Centered R^2 0.8926884
R-Bar^2 0.8865563
Uncentered R^2 0.9895551
Mean of Dependent Variable 0.7962896001
Std Error of Dependent Variable 0.2649887533
Standard Error of Estimate 0.0892519228
Sum of Squared Residuals 0.2788067004
Regression F(2,35) 145.5765
Significance Level of F 0.0000000
Log Likelihood 39.4620
Durbin-Watson Statistic 1.7713
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -0.759407260 0.110553429 -6.86914 0.00000006
2. PCAID 0.002460820 0.000184289 13.35306 0.00000000
3. PCINC 0.000257004 0.000024969 10.29314 0.00000000
Linear Regression - Estimation by Least Squares
Dependent Variable PCEXP
Usable Observations 12
Degrees of Freedom 9
Skipped/Missing (from 50) 38
Centered R^2 0.9211790
R-Bar^2 0.9036632
Uncentered R^2 0.9960146
Mean of Dependent Variable 0.7875144010
Std Error of Dependent Variable 0.1898159049
Standard Error of Estimate 0.0589153831
Sum of Squared Residuals 0.0312392013
Regression F(2,9) 52.5914
Significance Level of F 0.0000108
Log Likelihood 18.6787
Durbin-Watson Statistic 3.4133
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -0.449805662 0.178785772 -2.51589 0.03299028
2. PCAID 0.003217835 0.000532407 6.04394 0.00019190
3. PCINC 0.000158818 0.000044373 3.57913 0.00593886
Linear Regression - Estimation by Least Squares
Dependent Variable PCEXP
Usable Observations 50
Degrees of Freedom 47
Centered R^2 0.8888016
R-Bar^2 0.8840698
Uncentered R^2 0.9903551
Mean of Dependent Variable 0.7941835523
Std Error of Dependent Variable 0.2472352259
Standard Error of Estimate 0.0841799552
Sum of Squared Residuals 0.3330544481
Regression F(2,47) 187.8340
Significance Level of F 0.0000000
Log Likelihood 54.3399
Durbin-Watson Statistic 1.6067
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -0.696742106 0.092271306 -7.55102 0.00000000
2. PCAID 0.002521007 0.000162125 15.54974 0.00000000
3. PCINC 0.000237730 0.000019886 11.95454 0.00000000
Chow test for difference in large vs small
F(3,44)= 1.08842 with Significance Level 0.36397014
Chow test for difference in large vs small
F(3,44)= 1.08842 with Significance Level 0.36397014
Linear Regression - Estimation by Least Squares
With Heteroscedasticity-Consistent (Eicker-White) Standard Errors
Dependent Variable PCEXP
Usable Observations 50
Degrees of Freedom 44
Centered R^2 0.8964836
R-Bar^2 0.8847204
Uncentered R^2 0.9910214
Mean of Dependent Variable 0.7941835523
Std Error of Dependent Variable 0.2472352259
Standard Error of Estimate 0.0839434200
Sum of Squared Residuals 0.3100459017
Log Likelihood 56.1295
Durbin-Watson Statistic 1.5093
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. Constant -0.759407260 0.129018352 -5.88604 0.00000000
2. PCAID 0.002460820 0.000308788 7.96929 0.00000000
3. PCINC 0.000257004 0.000024865 10.33612 0.00000000
4. LARGE 0.309601599 0.207916442 1.48907 0.13646962
5. DPCAID 0.000757015 0.000618384 1.22418 0.22088334
6. DPCINC -0.000098186 0.000047154 -2.08225 0.03731943
Sample Split Test-Robust Standard Errors
Null Hypothesis : The Following Coefficients Are Zero
LARGE
DPCAID
DPCINC
Chi-Squared(3)= 5.365171 or F(3,*)= 1.78839 with Significance Level 0.14692899
Copyright © 2026 Thomas A. Doan