PRJ Instruction

PRJ(options) series start end

PRJ (short for PRoJect) computes projected (fitted) values, static forecasts, and/or truncated Normal distributions using $\bf{X}\beta$ from the previous regression.

Wizard

You can use the Time Series>Single-Equation Forecasts Wizard to forecast univariate models, including static forecasts as produced by PRJ. (However, it will create a UFORECAST instruction, rather than a PRJ).

Parameters

series	Series for the fitted values. If you use the options described later under “Distribution Statistics”, these are the normalized fitted values ($z_i$ in the notation there).
start, end	Range of entries for which fitted values are to be computed. If you have not used the SMPL instruction to set a range, this defaults to the range of the most recent regression. Note: using the SMPL option on the preceding regression has no effect on the range set by PRJ.

Description

PRJ handles forecasts for only certain types of models and certain situations. Use UFORECAST, FORECAST, or STEPS if you need more flexibility.

You can use PRJ to get fitted values after a LINREG, STWISE, DDV, LDV, AR1, or BOXJENK, although some statistics cannot be computed for AR1 and BOXJENK.

PRJ also has several options for computing distribution statistics from the fitted values. These are important for programming truncated and censored regressions, and for diagnostic tests in probit and related models.

Fitted Values

PRJ takes the coefficients ($\beta$) and the regressors ($X)$ from the most recent regression and computes the fitted values (${X_t}\beta $) over entries start to end. For a logit or probit, this gives the index value for the case.

When you use PRJ outside the regression range, it computes a simple form of forecast called a static forecast: predicting the dependent variable given the values of all the regressors. This is useful only for models with no lagged dependent variables. Note that you must have data available for the right-hand-side variables in order to compute forecasts.

Options (fitted values)

Note that these cannot be computed for AR1 and BOXJENK because the covariance matrix for those is from a restricted non-linear model, so these formulas won’t apply.

STDERR=(output) series of standard errors of projection

This computes the series of standard errors of projection: $s\sqrt {1 + {x_t}{{\left( {{\bf{X'X}}} \right)}^{ - 1}}{x'_t}} $

XVX=(output) series of leverage statistics

This computes the series of leverage statistics for the in-sample fitted values. These are useful in various diagnostic tests. The formula is: ${x_t}{\left( {{\bf{X'X}}} \right)^{ - 1}}{x_t}^\prime $. This is bounded between 0 and 1, with values generally $O(1/T)$. Larger values are observations said to have "high leverage", in the sense that they have greater influence on the regression estimates.

RESIDS=(output) series for residuals

This computes and saves the residuals, as the actual dependent variable values minus the fitted values computed by PRJ.

COEFFS=VECTOR of coefficient values to use

You can use this option to compute the fitted values based on the supplied coefficients, rather than the coefficients from the original regression.

Examples (Fitted Values)

smpl 1923:1 1941:1

linreg foodprod

# constant avgprice

prj fitted

scatter(style=symbols) 2

# avgprice foodprod

# avgprice fitted

data(unit=input) 1942:1 1945:1 avgprice

112.3 112.8 113.9 119.3

prj forecast 1942:1 1945:1

The first PRJ computes fitted values over 1923:1 to 1941:1. The SCATTER instruction does an actual vs. fitted plot. The second PRJ forecasts FOODPROD over the period 1942:1 to 1945:1 using the four input values for AVGPRICE.

linreg employ 1947:1 1961:1 resids

# constant year price gnp armed

prj(xvx=px)

set stdresids = resids/sqrt(%seesq*(1-px))

graph(style=symbols,vlabel="Standardized Residual")

# stdresids

This uses PRJ with the XVX option to produce residuals standardized by their individual standard errors—more precisely these are the internally studentized residuals. This decreases the standard errors for high leverage observations, since those would be expected to have lower residuals because of their greater than typical influence on the regression coefficients.

Distribution Statistics

You can use PRJ with the set of options described below to obtain one or more of the following statistics from a series ${z_i}$ of (standardized) deviates:

Density:$\phi \left( {{z_i}} \right)$

Distribution:$\Phi \left( {{z_i}} \right)$

Inverse Mills’ ratio: $\phi {\kern 1pt} \left( {{z_i}} \right)/\Phi {\kern 1pt} \left( {{z_i}} \right)$

Derivative of the Inverse Mills’ ratio, evaluated at ${z_i}$.

If observation i is truncated at the value ${T_i}$, ${z_i}$ takes the following values:

${z_i} = \left\{ {\begin{array}{*{20}{c}}{{\rm{Bottom truncation}}} \hfill & {\left( {{X_i}\hat \beta - {T_i}} \right)/\sigma } \hfill \\ {{\rm{Top truncation}}} \hfill & {\left( {{T_{\rm{i}}} - {X_i}\hat \beta } \right)/\sigma } \hfill \\ \end{array}} \right.$

Options (Distribution Statistics)

DISTRIBUTION=[PROBIT]/LOGIT/EXTREME

This selects the distribution to be used

DENSITY=series for densities [unused]

CDF=series for distributions [unused]

MILLS=series for (inverse) Mills’ ratios [unused]

DMILLS=series for derivatives of MILLS [unused]

After a (binary choice) DDV estimation, the CDF option will generate the series of predicted probabilities of the “1” choice. Use the option DISTRIB=LOGIT if you want these to be calculated for the logit, as the default is to compute these for the normal (regardless of your choice on the DDV).

Other Options

SMPL=standard SMPL option [unused].

You can supply a series or a formula that can be evaluated across entry numbers. Entries for which the series or formula is zero or “false” will be skipped, while entries that are non-zero or “true” will be included in the operation.

If the output series already exists, observations of that series not included in the SMPL will be completely unaffected by the PRJ operation.

SCALE=standard deviation of Normal [1.0]

UPPER=SERIES or FRML of upper truncation points [unused]

LOWER=SERIES or FRML of lower truncation points [unused]

These describe the normalization procedure. The truncation points can differ among observations: for instance, cutoffs may depend on some demographic characteristics. However, you can only do either “top” truncation or “bottom” truncation in a given PRJ command—you cannot do both simultaneously. Thus UPPER and LOWER are mutually exclusive. Note that UPPER and LOWER replace the older TRUNCATE and TOP/[NOTOP] options which provided the same functionality. Use missing value codes for any entries that are to be treated as unlimited.

ATMEAN/[NOATMEAN]

XVECTOR=the value of $x_i$ [unused]

The ATMEAN and XVECTOR options allow you to compute the index, density, standard error, and predicted probability for a single input set of X’s. The values are returned as the variables %PRJFIT, %PRJDENSITY, %PRJSTDERR and %PRJCDF.

The ATMEAN option does the calculation at the mean of the regressors over the estimation range. With XVECTOR, you provide a vector at which you want the values calculated.

Variables Defined

%MEANV	VECTOR of means of the explanatory variables
%PRJCDF	predicted probability produced by the XVECTOR or ATMEAN options (REAL)
%PRJDENSITY	density produced by the XVECTOR or ATMEAN options (REAL)
%PRJFIT	fitted value produced by the XVECTOR or ATMEAN options (REAL)
%PRJSTDERR	standard error of the fitted value produced by the XVECTOR or ATMEAN options (REAL)

Example

This estimates a "Tobit II" model, using the Mills ratio from a preliminary probit model as a regressor in a secondary linear regression.

ddv(noprint) choice1

# constant age nadults nkids nkids2 lnx $

agelnx nadlnx bluecol whitecol

prj(mills=lambda)

linreg(smpl=share1>0,title="Tobit II") share1

# constant age nadults nkids nkids2 lnx agelnx nadlnx lambda