fitting Beta Distribution to data

Use this forum to post questions about syntax problems or general programming issues. Questions on implementing a particular aspect of econometrics should go in "Econometrics Issues" below.

fitting Beta Distribution to data

Postby Carlo » Wed Dec 14, 2011 11:58 am

Dear all,

I’m trying to fit a unimodal generalized Beta distribution to some data as in:

Engelberg J., Manski C.F. and Williams J. (2009) Comparing the Point Predictions and Subjective Probability Distributions of Professional Forecasters, Journal of Business & Economic Statistics, vol. 27, pages 30-41.

So, suppose you have 10 bins from an individual distribution representing the probability one attach to gdp as follows:

bins: 1 ; 2 ; 3; 4 ; 5 ; 6; 7; 8 ; 9 ; 10
gdp: 6+ ; (5to5.9) ; (4to4.9); (3to3.9) ; (2to2.9) ; (1to1.9) ; (0to0.9) ; (-1to-0.1) ; (-2to-1.1) ; <-2
Prob: 0 ; 0 ; 0 ; 0.02 ; 0.05 ; 0.33 ; 0.5 ; 0.1 ; 0 ; 0

The idea is to fit a beta distribution to the data and then retrieve some quantiles.
I was wondering how can I fix the support of the distribution (‘l’ and ‘r’ in the above paper at pag. 38) and then solve equation (1) and (2) of the paper.
(the function %logbetadensity cannot be used for this)
Thanks a lot,
carlo
Carlo
 
Posts: 4
Joined: Wed Dec 14, 2011 11:03 am

Re: fitting Beta Distribution to data

Postby TomDoan » Wed Dec 14, 2011 4:14 pm

The CDF for the beta is %betainc. That needs to be re-mapped into the proper interval. They're estimating a and b using a least squares fit for the empirical CDF vs the CDF of the generalized beta. The following is an example:

Code: Select all
compute nbin=10
dec vect lowbound(nbin) upbound(nbin)
input lowbound
   .  -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0
input upbound
 -2.0 -1.0  0.0 1.0 2.0 3.0 4.0 5.0 6.0  .
dec vect prob(nbin)
input prob
 .00 .01 .10 .50 .33 .05 .01 0 0 0
dec vect cprob(nbin)
compute total=0.0
do i=1,nbin
   compute total=total+prob(i)
   compute cprob(i)=total
end do i
*
sstats(min,smpl=prob(t)>0.0) 1 nbin lowbound(t)>>left
sstats(max,smpl=prob(t)>0.0) 1 nbin upbound(t)>>right
*
nonlin a b a>=1.0 b>=1.0
compute a=b=2.0
*
frml gap = remap=(upbound(t)-left)/(right-left),$
  %if(remap<0.0.or.remap>1.0,%na,cprob(t)-%betainc(remap,a,b))
nlls(frml=gap) *
TomDoan
 
Posts: 2720
Joined: Wed Nov 01, 2006 5:36 pm

Re: fitting Beta Distribution to data

Postby Carlo » Tue Jan 10, 2012 7:16 am

Thanks Tom,
It works!
But then, when I have to recover the median and the interquartile range of the estimated distribution, I have to loop over thousands of values like:

dec vec A1(1000000) B1(1000000) B22(1000000)
do i=1,1000000
com A1(i) = %betainc((i*0.00001-left)/(right-left),%beta(1),%beta(2))
com B1(i) =i*0.00001
if A1(i)>0.500.and.A1(i)<0.501 ; com P_50 = b1(i)
if A1(i)>0.250.and.A1(i)<0.2501 ; com P_25 = b1(i)
if A1(i)>0.750.and.A1(i)<0.7501 ; com P_75 = b1(i)
end do

Or there is a fastest and more efficient way to do it in Rats?
(Of course, this methods is also not working when there is a positive probability attached to negative values of lowbound(nbin) upbound(nbin) in your example.
Like:
input prob
0 ; 20 ; 50 ; 20 ; 10 ; 0 ; 0 ; 0 ; 0 ; 0 )
Thanks a lot!
Carlo
Carlo
 
Posts: 4
Joined: Wed Dec 14, 2011 11:03 am

Re: fitting Beta Distribution to data

Postby TomDoan » Tue Jan 10, 2012 12:55 pm

There are two ways to do that in this case. One, which works for general functions, is to use FIND ROOT. If you have the lower and upper bounds and the two shape parameters (you'll have to save the latter out of %beta since those will get overwritten), you can invert the remapped %betainc with:

Code: Select all
nonlin median
compute lower=-6.0,upper=5.0,a=2.0,b=3.0
find root %betainc((median-lower)/(upper-lower),a,b)-.5
end find


For the interquartile range, repeat this with .5 in the FIND replaced with .25 and .75.

In this case, you can also use the fact that the F distribution is itself a remapping of the incomplete beta, so you can use %INVFTEST to do the inversion.

Code: Select all
compute fmedian=%invftest(.5,2.0*b,2.0*a)
compute betamedian=a/(a+b*fmedian)
disp lower+betamedian*(upper-lower)


Again, replace the .5 in the %INVFTEST with .25 and .75 to get the other two quantiles.
TomDoan
 
Posts: 2720
Joined: Wed Nov 01, 2006 5:36 pm


Return to Help With Programming

Who is online

Users browsing this forum: No registered users and 1 guest