Evaluating density forecasts

ac_1 · Unread post by **ac_1** » Wed Nov 10, 2010 5:59 am

If I generate a sequence of rolling one-step ahead point forecasts it is straightforward to evaluate these via:

(a) Graphical Analysis: Theil’s Prediction Realization Diagram.
(b) Standard Loss Measures: there are plenty.
(c) Sign test.

But how do I evaluate the point forecasts from the view of density forecasts ?

Initial research into this suggests by the examining distributional and autocorrelation properties of probability integral transformations.

Can you provide an example on how to proceed for evaluation ?

Thanks!

tclark · Unread post by **tclark** » Wed Nov 10, 2010 10:34 pm

I've pasted below a link to a starting point (by James Mitchell and Kenneth Wallis) on the literature on density evaluation. You are right that PITS are a key measure. But tests based on the normalized transforms can have better power (a result from a paper by Berkowitz). At this point, my sense is that most econometricians view the log predictive score as the best overall indicator of the calibration/accuracy of density forecasts.

http://www.niesr.ac.uk/pdf/DP320.pdf

ac_1 · Unread post by **ac_1** » Thu Nov 11, 2010 7:38 am

Thanks for the link to the paper Todd - I'll start to read - then get back for ideas on implementation.

I'm not certain if evaluating forecasts from this view is covered in the existing RATS reference manual/users guide

ac_1 · Unread post by **ac_1** » Tue Nov 16, 2010 11:32 am

Todd,

I have had a quick read of the Mitchell and Wallis (2010) paper and, without delving into details, an initial surmise/overview is (this is so others can see what is involved re Mitchell and Wallis (2010) ):

Idea
Generate p(t) and z(t) values, if p(t) and z(t) appear-like a random sample from U[0,1] or N[0,1] respectively, the forecasts are said to be ‘well-calibrated’.

Transform the outcomes into p(t) and z(t) as:

A.
Probability Integral Transformations (PITS): p(t) = F(t)(x(t)) ~ IID U[0,1]
where F(.) is the density forecast distribution, x(t) is the observed outcome.

p(t) is a cumulative density function corresponding to the density F(t)(x(t)) evaluated at x(t) i.e. the forecast probability of observing an outcome no greater than that actually realised.

In otherwords, given outcomes x(t), t=1,....,n, forecasts from a certain model can be evaluated by computing PITS w.r.t. forecast density F(t), as

p(t) = integral from (–infinity) to x(t) of F(t)(u) du and testing whether p(t) ~ IID U[0,1].

Diebold, Gunther and Tay (1998) provide a rationale and proof of the IID U[0,1] result.

B.
z(t)=invnormal(p(t)) ~ IID N[0,1]

z(t) values can be viewed as the standardized value of the outcome x(t).

Also, z(t) = (point forecast error / forecast standard deviation), given that the mean of the density forecast is taken as the point forecast.

z(t) has certain advantages over p(t) in the sense that:
• there are more tests available for normality.
• it is easier to test for independence/autocorrelation under normality than uniformity.
• the normal likelihood can be used to construct LR tests.

Given the shortcomings of Gneiting, Balabdaouri and Raftery (2007), Mitchell and Wallis (2010) provide a more realistic example (including a time dimension in their simulated data) in which several competing forecasts distinguish the ‘ideal’ forecast from its competitors, whilst emphasising augmenting the present ‘complete calibration’ evaluation techniques by an assessment of sharpness. Their statistical framework is akin to the recipe:

So a recipe for density forecast evaluation:

(a) Graphical:
• Visual inspection of p(t) & z(t) histograms, with comparison to U[0,1] or N[0,1], respectively.
• [ I have also seen plots of p(t) against a 45 degree line – the theoretical uniform distribution, with 95% confidence intervals from the critical values of KS statistics.
• Indeed in a recent paper of yours you have p(t) histograms shown as decile counts of the PITs transforms – how was this done? ]

(b) Formal Goodness-of-Fits tests:
• Pearson chi-squared on p(t) values.
• Kolmorgorov-Smirnov (KS) - for equality of distributions (around their means).
• Anderson-Darling - modification of KS test (but around the tails of distributions).
• Doornik & Hansen (2008) - for normality, on z(t) values only.

(c) Independence/autocorrelation tests (to test for departure from the IID hypothesis):
• For p(t) – LB test.
• For z(t) - parametric test of Berkowitz (2001).

(d) Scoring rules, Distance measures and Sharpness:
• -logScore.
• KLIC (Kullback-Leibler information criterion) or distance measure, similar to mean error/bias in point forecast evaluation.

Please correct any of the above overview. I have not yet absorbed all details, especially for the latter part of (c) and all of (d).

So, for a sequence of one-step ahead forecasts, how do I generate the p(t) and z(t) values in RATS? That’ll get me started!

In the recipe above I am able to achieve: (a) up to the LB test in (c).

Thereafter, I need your help with the remaining tests: the independence test for z(t), the scoring rule, and KLIC.

tclark · Unread post by **tclark** » Tue Nov 16, 2010 10:14 pm

Using a sample of N draws of forecasts (obtained by Bayesian Markov Chain Monte Carlo simulation), I computed the PIT as follows:

sstats(mean) 1 N (forecast{0}<=actual(t))>>pit

where *forecast* is a series from 1 to N with a projection for period t, actual(t) is the actual value of the variable in period t, and the result, the PIT, is stored in PIT. Once you have a time series of PITS, you can use DENSITY and SCATTER commands for a histogram. I used more of a brute force approach to getting the exact decile counts.

From the PIT time series, I computed the normalized error with

set statser stpt endpt = %invnormal(pit)

where pit is a time series.

ac_1 · Unread post by **ac_1** » Wed Nov 17, 2010 3:49 am

Thanks for your reply Todd.

Okay,

sstats(mean) 1 N (forecast{0}<=actual(t))>>pit

calculates a single PIT value (from a sample of N draws of the forecast & actual), as the mean of the number of times the forecast is less than or equal to the actual.

I need to generate an entire time series of PIT values. In otherwords, I need assistance with the Bayesian MCMC, or another (simpler) way to generate PIT values as a time series.

ac_1 · Unread post by **ac_1** » Mon Nov 22, 2010 9:20 am

So, given the actual and forecasts, I know that this is NOT correct, but something like the following will generate a sample of PITS (called gibbs).

Please assist, so that I am able to generate a viable series of PITS.

Code: Select all

* Bayesian MCMC
*
compute nburn =100
compute ndraws=1000
*
dec series gibbs
set gibbs 1 ndraws = 0.0
*
do draw=-nburn,ndraws
*
   sstats(mean) 1 draw (forecast{0}<=actual(t))>>pit
*
   if draw<=0
      next
*
   compute gibbs(draw)=pit
*
end do draw
*
prin / gibbs
*

tclark · Unread post by **tclark** » Mon Nov 22, 2010 12:18 pm

What you need to do is loop over time to generate posterior distributions of forecasts, as opposed to looping over draws. For example, at period 2000:Q1, generate 10k draws of a forecast for period 2000:Q2. Then apply sstats (to a series of the 10k draws) as I suggested to get the PIT value for the forecast in period 2000:Q2. Then move to period 2000:Q2, generate 10k draws of a forecast for period 2000:Q3, and apply sstats (to a series of the 10k draws) as I suggested to get the PIT value for the forecast in period 2000:Q3. Etc.

ac_1 · Unread post by **ac_1** » Mon Nov 22, 2010 3:27 pm

Still not getting values for PITS!

An example - this is a VAR from the Canmodel.prg example from RATS.

I am generating one-step ahead forecasts (that's okay) but having issues with the 10K draws for the PITS - I get "NA's" when I display the PITS.

Please correct.

Code: Select all

*
open data oecdsample.rat
calendar(q) 1981
data(format=rats) 1981:1 2006:4 can3mthpcp canexpgdpchs canexpgdpds canm1s canusxsr usaexpgdpch
*
set logcangdp  = log(canexpgdpchs)
set logcandefl = log(canexpgdpds)
set logcanm1   = log(canm1s)
set logusagdp  = log(usaexpgdpch)
set logexrate  = log(canusxsr)
*
system(model=canmodel)
variables logcangdp logcandefl logcanm1 logexrate can3mthpcp logusagdp
lags 1 to 2
det constant
end(system)
*
compute width = 70
compute ibegin = 1999:12
compute iend = 2006:4
*
do iend = ibegin,iend
	estimate(noprint) iend-width+1 iend
		do draw = 1, 10000
			forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)
			sstats(mean) 1 draw (fcasts(1)<=logcangdp(t))>>pit
		end do draw
		disp pit
end do iend
*
prin / fcasts(1) fcasts(2) fcasts(3) fcasts(4) fcasts(5)

tclark · Unread post by **tclark** » Mon Nov 22, 2010 4:41 pm

The SSTATS should be **outside** the do draw loop, and you need to store each forecast draw in a series with observations 1 to 10,000, and apply SSTATS once to the entire set of 10,000 draws to get the PIT in each period t.

ac_1 · Unread post by **ac_1** » Tue Nov 23, 2010 4:54 am

Have made alterations – I still get “NA’s” when displaying PITS!

Code: Select all

*
declare vector[series] fdrawS(10000)
*
do iend = ibegin,iend
	estimate(noprint) iend-width+1 iend
		clear fdrawS(10000)
		do draw = 1,10000
			forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)
			set fdrawS(draw) = fcasts(1)
		end do draw
	sstats(mean) 1 10000 (fdrawS(draw)<=logcangdp(t))>>pit
	disp pit
end do iend

Unread post by **moderator** » Wed Nov 24, 2010 10:33 am

Just to be clear, Todd's not an employee of Estima, so it's not his responsibility to fix or correct anyone's code. He may offer further suggestions, but he's already gone well above and beyond the call of duty as a fellow RATS user on this one.

We'll take a look at the good and see if we can offer any suggestions.

Thanks,
Tom Maycock
Estima

Unread post by **moderator** » Wed Nov 24, 2010 11:06 am

Looks like you have several coding issues here, and one theoretical issue.

First, the coding issues:

1) do iend = ibegin,iend

You don't want the same variable as the loop index variable and to specify the end of the loop. Try something like:

compute last = 2006:4
do iend = ibegin,last

instead.

2) You only need 10,000 draws for each period. So just make fdrawS a series with 10,000 elements, not a vector of 10,000 series.

3) Inside the "draw" loop, store the current forecast value (entry "iend+1" of the FCASTS series) into entry "draws" of the fdraws series. Here's one way to do that:

set fdrawS draw draw = fcasts(1)(iend+1)

4) It doesn't make sense to refer to entry "draw" on the SSTATS outside of the DO loop (since "draw" will just be equal to the last value from the loop). Also, you need to refer to a specific entry of LOGCANGDP, not not entry "t" (which would run from 1 through 10000). So, with that in mind and with fdrawS being a single series not an array of series, instead of:

sstats(mean) 1 10000 (fdrawS(draw)<=logcangdp(t))>>pit

I think you want:

sstats(mean) 1 10000 (fdrawS<=logcangdp(iend+1))>>pit

With the addition of the use of a variable for the number of draws, these changes give you this:

Code: Select all

compute ndraws=10000
set fdrawS 1 ndraws = %na
compute last = ibegin+2

do iend = ibegin,last
   estimate(noprint) iend-width+1 iend
      clear fdrawS
      do draw = 1,ndraws
         forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)
         set fdrawS draw draw = fcasts(1)(iend+1)
      end do draw
   sstats(mean) 1 ndraws (fdrawS<=logcangdp(iend+1))>>pit
   disp pit
end do iend

Which brings us to the theoretical problem: As written, you're just computing actual forecasts, producing the exact same forecasts for each draw, rather than doing any sort of simulation (such as the MCMC simulation Todd mentioned). So, this doesn't really produce any result. You would need to incorporate some simulation method into this for it to actual do anything. Not having read the papers, I assume they spell out a recommended approach?

Regards,
Tom Maycock

ac_1 · Unread post by **ac_1** » Wed Nov 24, 2010 1:27 pm

Thanks for the reply Tom, I appreciate your response and pointing out the errors in the code.

What I initially wanted when starting this thread was a worked example within RATS of how to proceed for evaluation of density forecasts given a set of out-of-sample point forecasts, (I have seen this type of evaluation in various papers, but not in any standard textbook), hence the post by Todd of the Mitchell and Wallis (2010) paper.

Although it’s not a necessity, density forecast evaluation is another way to evaluate the forecasts from a model. Albeit, in my view, loss measures and sign/direction tests say more about the quality of a model than if it’s a “model that is well calibrated”.

I, as am sure others reading this thread are interested in density forecast evaluation within RATS.

As far as I can tell density forecast evaluation is not covered in the RATS reference manual/users guide. Hence, it would be of practical use and fair to have an example of this type of evaluation. Thanks.

The RATS Software Forum

Evaluating density forecasts

Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts

Re: Evaluating density forecasts