Evaluating density forecasts

Econometrics questions and discussions

Evaluating density forecasts

If I generate a sequence of rolling one-step ahead point forecasts it is straightforward to evaluate these via:

(a) Graphical Analysis: Theil’s Prediction Realization Diagram.
(b) Standard Loss Measures: there are plenty.
(c) Sign test.

But how do I evaluate the point forecasts from the view of density forecasts ?

Initial research into this suggests by the examining distributional and autocorrelation properties of probability integral transformations.

Can you provide an example on how to proceed for evaluation ?

Thanks!
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

I've pasted below a link to a starting point (by James Mitchell and Kenneth Wallis) on the literature on density evaluation. You are right that PITS are a key measure. But tests based on the normalized transforms can have better power (a result from a paper by Berkowitz). At this point, my sense is that most econometricians view the log predictive score as the best overall indicator of the calibration/accuracy of density forecasts.

http://www.niesr.ac.uk/pdf/DP320.pdf
Todd Clark
Economic Research Dept.
Federal Reserve Bank of Cleveland
tclark

Posts: 35
Joined: Wed Nov 08, 2006 4:20 pm

Re: Evaluating density forecasts

Thanks for the link to the paper Todd - I'll start to read - then get back for ideas on implementation.

I'm not certain if evaluating forecasts from this view is covered in the existing RATS reference manual/users guide
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

Todd,

I have had a quick read of the Mitchell and Wallis (2010) paper and, without delving into details, an initial surmise/overview is (this is so others can see what is involved re Mitchell and Wallis (2010) ):

Idea
Generate p(t) and z(t) values, if p(t) and z(t) appear-like a random sample from U[0,1] or N[0,1] respectively, the forecasts are said to be ‘well-calibrated’.

Transform the outcomes into p(t) and z(t) as:

A.
Probability Integral Transformations (PITS): p(t) = F(t)(x(t)) ~ IID U[0,1]
where F(.) is the density forecast distribution, x(t) is the observed outcome.

p(t) is a cumulative density function corresponding to the density F(t)(x(t)) evaluated at x(t) i.e. the forecast probability of observing an outcome no greater than that actually realised.

In otherwords, given outcomes x(t), t=1,....,n, forecasts from a certain model can be evaluated by computing PITS w.r.t. forecast density F(t), as

p(t) = integral from (–infinity) to x(t) of F(t)(u) du and testing whether p(t) ~ IID U[0,1].

Diebold, Gunther and Tay (1998) provide a rationale and proof of the IID U[0,1] result.

B.
z(t)=invnormal(p(t)) ~ IID N[0,1]

z(t) values can be viewed as the standardized value of the outcome x(t).

Also, z(t) = (point forecast error / forecast standard deviation), given that the mean of the density forecast is taken as the point forecast.

z(t) has certain advantages over p(t) in the sense that:
• there are more tests available for normality.
• it is easier to test for independence/autocorrelation under normality than uniformity.
• the normal likelihood can be used to construct LR tests.

Given the shortcomings of Gneiting, Balabdaouri and Raftery (2007), Mitchell and Wallis (2010) provide a more realistic example (including a time dimension in their simulated data) in which several competing forecasts distinguish the ‘ideal’ forecast from its competitors, whilst emphasising augmenting the present ‘complete calibration’ evaluation techniques by an assessment of sharpness. Their statistical framework is akin to the recipe:

So a recipe for density forecast evaluation:

(a) Graphical:
• Visual inspection of p(t) & z(t) histograms, with comparison to U[0,1] or N[0,1], respectively.
• [ I have also seen plots of p(t) against a 45 degree line – the theoretical uniform distribution, with 95% confidence intervals from the critical values of KS statistics.
• Indeed in a recent paper of yours you have p(t) histograms shown as decile counts of the PITs transforms – how was this done? ]

(b) Formal Goodness-of-Fits tests:
• Pearson chi-squared on p(t) values.
• Kolmorgorov-Smirnov (KS) - for equality of distributions (around their means).
• Anderson-Darling - modification of KS test (but around the tails of distributions).
• Doornik & Hansen (2008) - for normality, on z(t) values only.

(c) Independence/autocorrelation tests (to test for departure from the IID hypothesis):
• For p(t) – LB test.
• For z(t) - parametric test of Berkowitz (2001).

(d) Scoring rules, Distance measures and Sharpness:
• -logScore.
• KLIC (Kullback-Leibler information criterion) or distance measure, similar to mean error/bias in point forecast evaluation.

Please correct any of the above overview. I have not yet absorbed all details, especially for the latter part of (c) and all of (d).

So, for a sequence of one-step ahead forecasts, how do I generate the p(t) and z(t) values in RATS? That’ll get me started!

In the recipe above I am able to achieve: (a) up to the LB test in (c).

Thereafter, I need your help with the remaining tests: the independence test for z(t), the scoring rule, and KLIC.
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

Using a sample of N draws of forecasts (obtained by Bayesian Markov Chain Monte Carlo simulation), I computed the PIT as follows:

sstats(mean) 1 N (forecast{0}<=actual(t))>>pit

where *forecast* is a series from 1 to N with a projection for period t, actual(t) is the actual value of the variable in period t, and the result, the PIT, is stored in PIT. Once you have a time series of PITS, you can use DENSITY and SCATTER commands for a histogram. I used more of a brute force approach to getting the exact decile counts.

From the PIT time series, I computed the normalized error with

set statser stpt endpt = %invnormal(pit)

where pit is a time series.
Todd Clark
Economic Research Dept.
Federal Reserve Bank of Cleveland
tclark

Posts: 35
Joined: Wed Nov 08, 2006 4:20 pm

Re: Evaluating density forecasts

Okay,

sstats(mean) 1 N (forecast{0}<=actual(t))>>pit

calculates a single PIT value (from a sample of N draws of the forecast & actual), as the mean of the number of times the forecast is less than or equal to the actual.

I need to generate an entire time series of PIT values. In otherwords, I need assistance with the Bayesian MCMC, or another (simpler) way to generate PIT values as a time series.
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

So, given the actual and forecasts, I know that this is NOT correct, but something like the following will generate a sample of PITS (called gibbs).

Please assist, so that I am able to generate a viable series of PITS.

Code: Select all
`* Bayesian MCMC*compute nburn =100compute ndraws=1000*dec series gibbsset gibbs 1 ndraws = 0.0*do draw=-nburn,ndraws*   sstats(mean) 1 draw (forecast{0}<=actual(t))>>pit*   if draw<=0      next*   compute gibbs(draw)=pit*end do draw*prin / gibbs*`
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

What you need to do is loop over time to generate posterior distributions of forecasts, as opposed to looping over draws. For example, at period 2000:Q1, generate 10k draws of a forecast for period 2000:Q2. Then apply sstats (to a series of the 10k draws) as I suggested to get the PIT value for the forecast in period 2000:Q2. Then move to period 2000:Q2, generate 10k draws of a forecast for period 2000:Q3, and apply sstats (to a series of the 10k draws) as I suggested to get the PIT value for the forecast in period 2000:Q3. Etc.
Todd Clark
Economic Research Dept.
Federal Reserve Bank of Cleveland
tclark

Posts: 35
Joined: Wed Nov 08, 2006 4:20 pm

Re: Evaluating density forecasts

Still not getting values for PITS!

An example - this is a VAR from the Canmodel.prg example from RATS.

I am generating one-step ahead forecasts (that's okay) but having issues with the 10K draws for the PITS - I get "NA's" when I display the PITS.

Code: Select all
`*open data oecdsample.ratcalendar(q) 1981data(format=rats) 1981:1 2006:4 can3mthpcp canexpgdpchs canexpgdpds canm1s canusxsr usaexpgdpch*set logcangdp  = log(canexpgdpchs)set logcandefl = log(canexpgdpds)set logcanm1   = log(canm1s)set logusagdp  = log(usaexpgdpch)set logexrate  = log(canusxsr)*system(model=canmodel)variables logcangdp logcandefl logcanm1 logexrate can3mthpcp logusagdplags 1 to 2det constantend(system)*compute width = 70compute ibegin = 1999:12compute iend = 2006:4*do iend = ibegin,iend   estimate(noprint) iend-width+1 iend      do draw = 1, 10000         forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)         sstats(mean) 1 draw (fcasts(1)<=logcangdp(t))>>pit      end do draw      disp pitend do iend*prin / fcasts(1) fcasts(2) fcasts(3) fcasts(4) fcasts(5)`
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

The SSTATS should be **outside** the do draw loop, and you need to store each forecast draw in a series with observations 1 to 10,000, and apply SSTATS once to the entire set of 10,000 draws to get the PIT in each period t.
Todd Clark
Economic Research Dept.
Federal Reserve Bank of Cleveland
tclark

Posts: 35
Joined: Wed Nov 08, 2006 4:20 pm

Re: Evaluating density forecasts

Have made alterations – I still get “NA’s” when displaying PITS!

Code: Select all
`*declare vector[series] fdrawS(10000)*do iend = ibegin,iend   estimate(noprint) iend-width+1 iend      clear fdrawS(10000)      do draw = 1,10000         forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)         set fdrawS(draw) = fcasts(1)      end do draw   sstats(mean) 1 10000 (fdrawS(draw)<=logcangdp(t))>>pit   disp pitend do iend`
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK

Re: Evaluating density forecasts

Just to be clear, Todd's not an employee of Estima, so it's not his responsibility to fix or correct anyone's code. He may offer further suggestions, but he's already gone well above and beyond the call of duty as a fellow RATS user on this one.

We'll take a look at the good and see if we can offer any suggestions.

Thanks,
Tom Maycock
Estima
moderator

Posts: 306
Joined: Thu Oct 19, 2006 4:33 pm

Re: Evaluating density forecasts

Looks like you have several coding issues here, and one theoretical issue.

First, the coding issues:

1) do iend = ibegin,iend

You don't want the same variable as the loop index variable and to specify the end of the loop. Try something like:

compute last = 2006:4
do iend = ibegin,last

2) You only need 10,000 draws for each period. So just make fdrawS a series with 10,000 elements, not a vector of 10,000 series.

3) Inside the "draw" loop, store the current forecast value (entry "iend+1" of the FCASTS series) into entry "draws" of the fdraws series. Here's one way to do that:

set fdrawS draw draw = fcasts(1)(iend+1)

4) It doesn't make sense to refer to entry "draw" on the SSTATS outside of the DO loop (since "draw" will just be equal to the last value from the loop). Also, you need to refer to a specific entry of LOGCANGDP, not not entry "t" (which would run from 1 through 10000). So, with that in mind and with fdrawS being a single series not an array of series, instead of:

sstats(mean) 1 10000 (fdrawS(draw)<=logcangdp(t))>>pit

I think you want:

sstats(mean) 1 10000 (fdrawS<=logcangdp(iend+1))>>pit

With the addition of the use of a variable for the number of draws, these changes give you this:
Code: Select all
`compute ndraws=10000set fdrawS 1 ndraws = %nacompute last = ibegin+2do iend = ibegin,last   estimate(noprint) iend-width+1 iend      clear fdrawS      do draw = 1,ndraws         forecast(model=canmodel,results=fcasts,from=iend+1,to=iend+1,noprint)         set fdrawS draw draw = fcasts(1)(iend+1)      end do draw   sstats(mean) 1 ndraws (fdrawS<=logcangdp(iend+1))>>pit   disp pitend do iend`

Which brings us to the theoretical problem: As written, you're just computing actual forecasts, producing the exact same forecasts for each draw, rather than doing any sort of simulation (such as the MCMC simulation Todd mentioned). So, this doesn't really produce any result. You would need to incorporate some simulation method into this for it to actual do anything. Not having read the papers, I assume they spell out a recommended approach?

Regards,
Tom Maycock
moderator

Posts: 306
Joined: Thu Oct 19, 2006 4:33 pm

Re: Evaluating density forecasts

Thanks for the reply Tom, I appreciate your response and pointing out the errors in the code.

What I initially wanted when starting this thread was a worked example within RATS of how to proceed for evaluation of density forecasts given a set of out-of-sample point forecasts, (I have seen this type of evaluation in various papers, but not in any standard textbook), hence the post by Todd of the Mitchell and Wallis (2010) paper.

Although it’s not a necessity, density forecast evaluation is another way to evaluate the forecasts from a model. Albeit, in my view, loss measures and sign/direction tests say more about the quality of a model than if it’s a “model that is well calibrated”.

I, as am sure others reading this thread are interested in density forecast evaluation within RATS.

As far as I can tell density forecast evaluation is not covered in the RATS reference manual/users guide. Hence, it would be of practical use and fair to have an example of this type of evaluation. Thanks.
ac_1

Posts: 56
Joined: Thu Apr 15, 2010 6:30 am
Location: Surrey, England, UK