RATS 11.1
RATS 11.1

The CHOWTEST.RPF example demonstrates a number of forms of "Chow" tests. The underlying regression is of per capita expenditure (PCEXP) on per capita aid (PCAID) and per capita income (PCINC) for the 50 U.S. states. It tests for a difference in the regression for the large vs the small states. Large states are those with a population (POP) above 5,000 (in thousands). The PCEXP series needs to be computed from the raw expenditure value (EXPEND) divided by population:

 

open data states.wks

data(org=col,format=wks) 1 50 expend pcaid pop pcinc

set pcexp = expend/pop

With this data set (and most cross section data sets), use the option SMPL to do the subsample regressions. With time series data sets, you can handle most sample splits with different start and end parameters on LINREG.

Method One: F Test by Separate Regressions

Doing an F test by separate regressions requires running three regressions (each subsample plus the full sample) and constructing an F statistic from the summary statistics from those. The values of the variables %RSS and %NDF are the ones we need. RSSLARGE, RSSSMALL and RSSPOOL are the residual sums of squares of the large state, small state and pooled regressions. Similar variables are defined for the degrees of freedom. The value for the unrestricted regression is the sum for the split samples.

 

linreg(smpl=pop<5000) pcexp

# constant pcaid pcinc

compute  rsssmall=%rss , ndfsmall=%ndf

*

linreg(smpl=pop>=5000) pcexp

# constant pcaid pcinc

compute  rsslarge=%rss , ndflarge=%ndf

*

linreg pcexp

# constant pcaid pcinc

compute   rsspool=%rss

 

The test statistic is then computed and displayed with
 

compute  rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge

compute  fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)

cdf(title="Chow test for difference in large vs small")  $

  ftest fstat %nreg ndfunr

 

Method Two: F Test by SWEEP

 

While it might be overkill in this case, the SWEEP instruction is very useful when you need a regression which is split into more than two categories. To use it, you must create a series with different values for each category in the  sample split:

 

set sizes = %if(pop<5000,1,2)

 

Then use SWEEP with the option GROUP=category series. The first supplementary card has the dependent variable (there could be more than one in a systems estimation) and the second has the explanatory variables. This will do a separate regression on each category. Save the covariance matrix of the residuals (in this case a \(1 \times 1\) matrix, since there is only the one “target” variable):

 

sweep(group=sizes,cvout=cv)

# pcexp

# constant pcaid pcinc

The sum of squared residuals will be %NOBS times the 1,1 element of the covariance matrix. The number of explanatory variables is in %NREG, while the total number of regressors across categories is in %NREGSYSTEM so the unrestricted degrees of freedom is the number of observations less that. Note that this relies upon the RSSPOOL value from above (for the full sample regression). (Output. Note that SWEEP produces no printed output).


 

compute rssunr=cv(1,1)*%nobs

compute ndfunr=%nobs-%nregsystem

compute  fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)

cdf(title="Chow test for difference in large vs small")  $

  ftest fstat %nreg ndfunr

 

Method 3: Chow Test with Dummy Variables

 

The Chow test with dummy variables is more generally applicable than the subsample regression method above, but the cost is a more complex procedure, especially if the number of regressors or number of subsamples is large. This is because a separate dummy must be constructed for each regressor in each subsample beyond the first. Thus, there are \(\left( {n - 1} \right) \times k\) dummies. This is not much of a problem here, since n=2 and k=3, but had we split the sample four ways, we would need nine SETs. For the more complicated cases, you would probably want to create a VECTOR or RECTANGULAR of SERIES to handle the various interaction terms. ONEBREAK.RPF provides an example of that.

 

Once the dummies (or more accurately, subsample dummies times regressors) are set up, the procedure is straightforward: estimate the model over the whole sample, including regressors and dummies, and test the joint significance of the dummies.

 

Because this example uses the ROBUSTERRORS option to correct the covariance matrix for heteroscedasticity, the test statistic will be reported as a \(\chi ^2\) with three degrees of freedom. Without that option, it will give identical results to the calculations above.

 

We create LARGE as a dummy which is one for the large states. We then set up dummies for PCAID and PCINC:
 

set large = pop>=5000

*

set dpcaid = pcaid*large

set dpcinc = pcinc*large

 

This computes the regression with the original explanatory variables and the dummy and dummied-out regressors and tests those added variables (Output for regression and test):

 

linreg(robusterrors) pcexp

# constant pcaid pcinc large dpcaid dpcinc

exclude(title="Sample Split Test-Robust Standard Errors")

# large  dpcaid  dpcinc

Full Program


 

open data states.wks
data(org=col,format=wks) 1 50 expend pcaid pop pcinc
set pcexp = expend/pop
*
* The test is for a split at between low (<5000) and high (>=5000)
* population states.
*
* Test by subsample regression
*
linreg(smpl=pop<5000) pcexp
# constant pcaid pcinc
compute  rsssmall=%rss , ndfsmall=%ndf
*
linreg(smpl=pop>=5000) pcexp
# constant pcaid pcinc
compute  rsslarge=%rss , ndflarge=%ndf
*
* Full sample regression
*
linreg pcexp
# constant pcaid pcinc
compute   rsspool=%rss
*
compute  rssunr=rsssmall+rsslarge , ndfunr=ndfsmall+ndflarge
compute  fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small")  $
  ftest fstat %nreg ndfunr
*
* Subsample regressions using SWEEP
* Create a series with different values for each category in the  sample
* split.
*
set sizes = %if(pop<5000,1,2)
*
* Use SWEEP with the option GROUP=category series. This will do a
* separate regression on each category. Save the covariance matrix of
* the residuals (in this case a 1x1 matrix, since there is only the one
* "target" variable).
*
sweep(group=sizes,cvout=cv)
# pcexp
# constant pcaid pcinc
*
* The sum of squared residuals will be %nobs * the 1,1 element of the
* covariance matrix. The total number of regressors is in %NREGSYSTEM.
*
compute rssunr=cv(1,1)*%nobs
compute ndfunr=%nobs-%nregsystem
compute  fstat = ( (rsspool-rssunr)/%nreg ) / (rssunr/ndfunr)
cdf(title="Chow test for difference in large vs small")  $
  ftest fstat %nreg ndfunr
*
* Test by dummied regressors
*
set large = pop>=5000         ;* Will be one for large states
*
* Set up dummies for PCAID and PCINC
*
set dpcaid = pcaid*large
set dpcinc = pcinc*large
*
* Compute regression with dummies
*
linreg(robusterrors) pcexp
# constant pcaid pcinc large dpcaid dpcinc
*
* Test dummies
*
exclude(title="Sample Split Test-Robust Standard Errors")
# large  dpcaid  dpcinc
 

Output


 

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations                        38

Degrees of Freedom                         35

Skipped/Missing (from 50)                  12

Centered R^2                        0.8926884

R-Bar^2                             0.8865563

Uncentered R^2                      0.9895551

Mean of Dependent Variable       0.7962896001

Std Error of Dependent Variable  0.2649887533

Standard Error of Estimate       0.0892519228

Sum of Squared Residuals         0.2788067004

Regression F(2,35)                   145.5765

Significance Level of F             0.0000000

Log Likelihood                        39.4620

Durbin-Watson Statistic                1.7713


 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     -0.759407260  0.110553429     -6.86914  0.00000006

2.  PCAID                         0.002460820  0.000184289     13.35306  0.00000000

3.  PCINC                         0.000257004  0.000024969     10.29314  0.00000000


 


 

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations                        12

Degrees of Freedom                          9

Skipped/Missing (from 50)                  38

Centered R^2                        0.9211790

R-Bar^2                             0.9036632

Uncentered R^2                      0.9960146

Mean of Dependent Variable       0.7875144010

Std Error of Dependent Variable  0.1898159049

Standard Error of Estimate       0.0589153831

Sum of Squared Residuals         0.0312392013

Regression F(2,9)                     52.5914

Significance Level of F             0.0000108

Log Likelihood                        18.6787

Durbin-Watson Statistic                3.4133


 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     -0.449805662  0.178785772     -2.51589  0.03299028

2.  PCAID                         0.003217835  0.000532407      6.04394  0.00019190

3.  PCINC                         0.000158818  0.000044373      3.57913  0.00593886


 


 

Linear Regression - Estimation by Least Squares

Dependent Variable PCEXP

Usable Observations                        50

Degrees of Freedom                         47

Centered R^2                        0.8888016

R-Bar^2                             0.8840698

Uncentered R^2                      0.9903551

Mean of Dependent Variable       0.7941835523

Std Error of Dependent Variable  0.2472352259

Standard Error of Estimate       0.0841799552

Sum of Squared Residuals         0.3330544481

Regression F(2,47)                   187.8340

Significance Level of F             0.0000000

Log Likelihood                        54.3399

Durbin-Watson Statistic                1.6067


 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     -0.696742106  0.092271306     -7.55102  0.00000000

2.  PCAID                         0.002521007  0.000162125     15.54974  0.00000000

3.  PCINC                         0.000237730  0.000019886     11.95454  0.00000000


 


 

Chow test for difference in large vs small

F(3,44)=      1.08842 with Significance Level 0.36397014


 

Chow test for difference in large vs small

F(3,44)=      1.08842 with Significance Level 0.36397014


 

Linear Regression - Estimation by Least Squares

With Heteroscedasticity-Consistent (Eicker-White) Standard Errors

Dependent Variable PCEXP

Usable Observations                        50

Degrees of Freedom                         44

Centered R^2                        0.8964836

R-Bar^2                             0.8847204

Uncentered R^2                      0.9910214

Mean of Dependent Variable       0.7941835523

Std Error of Dependent Variable  0.2472352259

Standard Error of Estimate       0.0839434200

Sum of Squared Residuals         0.3100459017

Log Likelihood                        56.1295

Durbin-Watson Statistic                1.5093


 

    Variable                        Coeff      Std Error      T-Stat      Signif

************************************************************************************

1.  Constant                     -0.759407260  0.129018352     -5.88604  0.00000000

2.  PCAID                         0.002460820  0.000308788      7.96929  0.00000000

3.  PCINC                         0.000257004  0.000024865     10.33612  0.00000000

4.  LARGE                         0.309601599  0.207916442      1.48907  0.13646962

5.  DPCAID                        0.000757015  0.000618384      1.22418  0.22088334

6.  DPCINC                       -0.000098186  0.000047154     -2.08225  0.03731943


 


 

Sample Split Test-Robust Standard Errors


 

Null Hypothesis : The Following Coefficients Are Zero

LARGE

DPCAID

DPCINC

Chi-Squared(3)=      5.365171 or F(3,*)=      1.78839 with Significance Level 0.14692899


 


Copyright © 2026 Thomas A. Doan