CHOWTEST.RPF

The CHOWTEST.RPF example demonstrates a number of forms of "Chow" tests. The underlying regression is of per capita expenditure (PCEXP) on per capita aid (PCAID) and per capita income (PCINC) for the 50 U.S. states. It tests for a difference in the regression for the large vs the small states. Large states are those with a population (POP) above 5,000 (in thousands). The PCEXP series needs to be computed from the raw expenditure value (EXPEND) divided by population:

open data states.wks

data(org=col,format=wks) 1 50 expend pcaid pop pcinc

set pcexp = expend/pop

With this data set (and most cross section data sets), use the option SMPL to do the subsample regressions. With time series data sets, you can handle most sample splits with different start and end parameters on LINREG.

Method One: F Test by Separate Regressions

Doing an F test by separate regressions requires running three regressions (each subsample plus the full sample) and constructing an F statistic from the summary statistics from those. The values of the variables %RSS and %NDF are the ones we need. RSSLARGE, RSSSMALL and RSSPOOL are the residual sums of squares of the large state, small state and pooled regressions. Similar variables are defined for the degrees of freedom. The value for the unrestricted regression is the sum for the split samples.

linreg(smpl=pop<5000) pcexp

# constant pcaid pcinc

compute rsssmall=%rss , ndfsmall=%ndf

linreg(smpl=pop>=5000) pcexp

# constant pcaid pcinc

compute rsslarge=%rss , ndflarge=%ndf

linreg pcexp

# constant pcaid pcinc

compute rsspool=%rss

The test statistic is then computed and displayed with

compute rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge

compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)

cdf(title="Chow test for difference in large vs small") $

ftest fstat %nreg ndfunr

Method Two: F Test by SWEEP

While it might be overkill in this case, the SWEEP instruction is very useful when you need a regression which is split into more than two categories. To use it, you must create a series with different values for each category in the sample split:

set sizes = %if(pop<5000,1,2)

Then use SWEEP with the option GROUP=category series. The first supplementary card has the dependent variable (there could be more than one in a systems estimation) and the second has the explanatory variables. This will do a separate regression on each category. Save the covariance matrix of the residuals (in this case a $1 \times 1$ matrix, since there is only the one “target” variable):

sweep(group=sizes,cvout=cv)

# pcexp

# constant pcaid pcinc

The sum of squared residuals will be %NOBS times the 1,1 element of the covariance matrix. The number of explanatory variables is in %NREG, while the total number of regressors across categories is in %NREGSYSTEM so the unrestricted degrees of freedom is the number of observations less that. Note that this relies upon the RSSPOOL value from above (for the full sample regression). (Output. Note that SWEEP produces no printed output).

compute rssunr=cv(1,1)*%nobs

compute ndfunr=%nobs-%nregsystem

compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)

cdf(title="Chow test for difference in large vs small") $

ftest fstat %nreg ndfunr

Method 3: Chow Test with Dummy Variables

The Chow test with dummy variables is more generally applicable than the subsample regression method above, but the cost is a more complex procedure, especially if the number of regressors or number of subsamples is large. This is because a separate dummy must be constructed for each regressor in each subsample beyond the first. Thus, there are $\left( {n - 1} \right) \times k$ dummies. This is not much of a problem here, since n=2 and k=3, but had we split the sample four ways, we would need nine SETs. For the more complicated cases, you would probably want to create a VECTOR or RECTANGULAR of SERIES to handle the various interaction terms. ONEBREAK.RPF provides an example of that.

Once the dummies (or more accurately, subsample dummies times regressors) are set up, the procedure is straightforward: estimate the model over the whole sample, including regressors and dummies, and test the joint significance of the dummies.

Because this example uses the ROBUSTERRORS option to correct the covariance matrix for heteroscedasticity, the test statistic will be reported as a $\chi ^2$ with three degrees of freedom. Without that option, it will give identical results to the calculations above.

We create LARGE as a dummy which is one for the large states. We then set up dummies for PCAID and PCINC:

set large = pop>=5000

set dpcaid = pcaid*large

set dpcinc = pcinc*large

This computes the regression with the original explanatory variables and the dummy and dummied-out regressors and tests those added variables (Output for regression and test):

linreg(robusterrors) pcexp

# constant pcaid pcinc large dpcaid dpcinc

exclude(title="Sample Split Test-Robust Standard Errors")

# large dpcaid dpcinc

Full Program

open data states.wks
data(org=col,format=wks) 1 50 expend pcaid pop pcinc
set pcexp = expend/pop
*
* The test is for a split at between low (<5000) and high (>=5000)
* population states.
*
* Test by subsample regression
*
linreg(smpl=pop<5000) pcexp
# constant pcaid pcinc
compute rsssmall=%rss , ndfsmall=%ndf
*
linreg(smpl=pop>=5000) pcexp
# constant pcaid pcinc
compute rsslarge=%rss , ndflarge=%ndf
*
* Full sample regression
*
linreg pcexp
# constant pcaid pcinc
compute rsspool=%rss
*
compute rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
*
* Subsample regressions using SWEEP
* Create a series with different values for each category in the sample
* split.
*
set sizes = %if(pop<5000,1,2)
*
* Use SWEEP with the option GROUP=category series. This will do a
* separate regression on each category. Save the covariance matrix of
* the residuals (in this case a 1x1 matrix, since there is only the one
* "target" variable).
*
sweep(group=sizes,cvout=cv)
# pcexp
# constant pcaid pcinc
*
* The sum of squared residuals will be %nobs * the 1,1 element of the
* covariance matrix. The total number of regressors is in %NREGSYSTEM.
*
compute rssunr=cv(1,1)*%nobs
compute ndfunr=%nobs-%nregsystem
compute fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small") $
ftest fstat %nreg ndfunr
*
* Test by dummied regressors
*
set large = pop>=5000 ;* Will be one for large states
*
* Set up dummies for PCAID and PCINC
*
set dpcaid = pcaid*large
set dpcinc = pcinc*large
*
* Compute regression with dummies
*
linreg(robusterrors) pcexp
# constant pcaid pcinc large dpcaid dpcinc
*
* Test dummies
*
exclude(title="Sample Split Test-Robust Standard Errors")
# large dpcaid dpcinc

Output

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations 38

Degrees of Freedom 35

Skipped/Missing (from 50) 12

Centered R^2 0.8926884

R-Bar^2 0.8865563

Uncentered R^2 0.9895551

Mean of Dependent Variable 0.7962896001

Std Error of Dependent Variable 0.2649887533

Standard Error of Estimate 0.0892519228

Sum of Squared Residuals 0.2788067004

Regression F(2,35) 145.5765

Significance Level of F 0.0000000

Log Likelihood 39.4620

Durbin-Watson Statistic 1.7713

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -0.759407260 0.110553429 -6.86914 0.00000006

2. PCAID 0.002460820 0.000184289 13.35306 0.00000000

3. PCINC 0.000257004 0.000024969 10.29314 0.00000000

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations 12

Degrees of Freedom 9

Skipped/Missing (from 50) 38

Centered R^2 0.9211790

R-Bar^2 0.9036632

Uncentered R^2 0.9960146

Mean of Dependent Variable 0.7875144010

Std Error of Dependent Variable 0.1898159049

Standard Error of Estimate 0.0589153831

Sum of Squared Residuals 0.0312392013

Regression F(2,9) 52.5914

Significance Level of F 0.0000108

Log Likelihood 18.6787

Durbin-Watson Statistic 3.4133

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -0.449805662 0.178785772 -2.51589 0.03299028

2. PCAID 0.003217835 0.000532407 6.04394 0.00019190

3. PCINC 0.000158818 0.000044373 3.57913 0.00593886

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations 50

Degrees of Freedom 47

Centered R^2 0.8888016

R-Bar^2 0.8840698

Uncentered R^2 0.9903551

Mean of Dependent Variable 0.7941835523

Std Error of Dependent Variable 0.2472352259

Standard Error of Estimate 0.0841799552

Sum of Squared Residuals 0.3330544481

Regression F(2,47) 187.8340

Significance Level of F 0.0000000

Log Likelihood 54.3399

Durbin-Watson Statistic 1.6067

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -0.696742106 0.092271306 -7.55102 0.00000000

2. PCAID 0.002521007 0.000162125 15.54974 0.00000000

3. PCINC 0.000237730 0.000019886 11.95454 0.00000000

Chow test for difference in large vs small

F(3,44)= 1.08842 with Significance Level 0.36397014

Chow test for difference in large vs small

F(3,44)= 1.08842 with Significance Level 0.36397014

Linear Regression - Estimation by Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable PCEXP

Usable Observations 50

Degrees of Freedom 44

Centered R^2 0.8964836

R-Bar^2 0.8847204

Uncentered R^2 0.9910214

Mean of Dependent Variable 0.7941835523

Std Error of Dependent Variable 0.2472352259

Standard Error of Estimate 0.0839434200

Sum of Squared Residuals 0.3100459017

Log Likelihood 56.1295

Durbin-Watson Statistic 1.5093

Variable Coeff Std Error T-Stat Signif

************************************************************************************

1. Constant -0.759407260 0.129018352 -5.88604 0.00000000

2. PCAID 0.002460820 0.000308788 7.96929 0.00000000

3. PCINC 0.000257004 0.000024865 10.33612 0.00000000

4. LARGE 0.309601599 0.207916442 1.48907 0.13646962

5. DPCAID 0.000757015 0.000618384 1.22418 0.22088334

6. DPCINC -0.000098186 0.000047154 -2.08225 0.03731943

Sample Split Test-Robust Standard Errors

Null Hypothesis : The Following Coefficients Are Zero

LARGE

DPCAID

DPCINC

Chi-Squared(3)= 5.365171 or F(3,*)= 1.78839 with Significance Level 0.14692899