Statistics and Algorithms / Simulations and Bootstrapping /

Rejection Method

Sometimes, distributions can’t be derived as simple functions of the elementary distributions such as normal, gamma and beta. For instance, posterior densities can be the product of two distributions that can’t easily be combined. The Rejection Method is a tool which can often be used in univariate cases to generate draws. The Rejection method (or Acceptance–Rejection or Acceptance, all three terms are used) is used within RATS to generate draws from the normal and gamma distributions.

Suppose that we need draws from a \(N(\mu,\sigma^2)\) truncated to the interval \([a,b]\). (Note: this is for illustration—RATS has a %RANTRUNCATE function to do this in practice). An obvious way to do this is to draw \(N(\mu,\sigma^2)\) deviates, then reject any that fall outside of \([a,b]\). The accepted draws would have the desired distribution, but this process will be inefficient if the probability of the Normal draw falling in \([a,b]\) is fairly low.

An alternative is to draw a random number \(x\) from \(U(a,b)\). Compute the ratio \(\alpha\) between \(f_N \left( {x\,\left| {\,\mu ,\sigma ^2 } \right.} \right)\) and the maximum that \(f_N \left( { \bullet \,\left| {\,\mu ,\sigma ^2 } \right.} \right)\) achieves on \([a,b]\) and accept \(x\) with probability \(\alpha\), otherwise draw again. This is most easily done with the %RANFLIP(p) function, which returns 1 if a random \(U(0,1)\) is less than p and 0 otherwise. This process trims away the uniform draws to match the shape of the Normal. This will be more efficient if the density function is fairly constant on \([a,b]\) , so the comparison ratio is close to one and almost all draws are accepted.

A stylized procedure for doing the rejection method is

loop

compute x = draw from the proposal density

compute p = acceptance function

if %ranflip(p)

break

end loop

The value of x when the loop breaks is the draw. If you know that you have set this up correctly, and you know that the probability of accepting a draw is high, then you can adopt this code as is. However, this will loop until a draw is accepted, and, if you make a bad choice for the proposal density, you might end up with an acceptance function which produces values very close to zero, so this could loop effectively forever. A “fail-safed” alternative is

do try=1,maximum_tries

compute x = draw from the proposal density

compute p = acceptance function

if %ranflip(p)

break

end do try

This will quit the loop if it fails to accept a draw in MAXIMUM_TRIES. How that limit should be set will depend upon the situation. If you set it to 100 and find that you’re routinely hitting the limit, there is probably a better way to generate the draws.

To apply the rejection method to the density \(f\), you need to be able to write

\begin{equation} f(x) \propto \alpha (x)g(x) \end{equation}

where the proposal density \(g(x)\) is one from which we can conveniently draw and

\begin{equation} 0 \le \alpha \left( x \right) \le 1 \end{equation}

In our two examples for drawing from the truncated Normal, these are (in order):

\begin{equation} g = N\left( {\mu ,\sigma ^2 } \right),\alpha = I_{\left[ {a,b} \right]} \end{equation}

\begin{equation} g = U\left( {a,b} \right),\alpha = {{f_N \left( {x|\mu ,\sigma ^2 } \right)} \mathord{\left/ {\vphantom {{f_N \left( {x|\mu ,\sigma ^2 } \right)} {\mathop {\max }\limits_{[a,b]} }}} \right. } {\mathop {\max }\limits_{[a,b]} }}f_N \left( { \bullet |\mu ,\sigma ^2 } \right) \label{eq:boot_gtruncated} \end{equation}

Any density for which \({{f\left( x \right)} \mathord{\left/{\vphantom {{f\left( x \right)} {g\left( x \right)}}} \right.} {g\left( x \right)}}\) is bounded will (theoretically) work as a proposal. However, as a general rule, you want to choose a distribution which has tails as thick as (or thicker) than the target density. For instance, if you use a Cauchy for \(g\), you will get a small percentage of very extreme draws. If \(f\) is a thin-tailed distribution, \({f \mathord{\left/{\vphantom {f g}} \right.} g}\) will be very small out there, and the extreme draws will be rejected. In other words, the rejection procedure thins out the Cauchy tails by rejecting most tail draws. If, however, \({f \mathord{\left/{\vphantom {f g}} \right.} g}\) is quite large in the tails, the only way we can “thicken” the tails is by accepting the tail draws and rejecting most of the draws near the mode.

So long as \({{f\left( x \right)} \mathord{\left/{\vphantom {{f\left( x \right)} {g\left( x \right)}}} \right.} {g\left( x \right)}}\) is bounded, we can always choose

\begin{equation} \alpha \left( x \right) = {{\left( {{{f\left( x \right)} \mathord{\left/ {\vphantom {{f\left( x \right)} {g\left( x \right)}}} \right. } {g\left( x \right)}}} \right)} \mathord{\left/ {\vphantom {{\left( {{{f\left( x \right)} \mathord{\left/ {\vphantom {{f\left( x \right)} {g\left( x \right)}}} \right. } {g\left( x \right)}}} \right)} {\max }}} \right. } {\max }}\left( {{{f\left( \bullet \right)} \mathord{\left/ {\vphantom {{f\left( \bullet \right)} {g\left( \bullet \right)}}} \right. } {g\left( \bullet \right)}}} \right) \label{eq:boot_rejectiontest} \end{equation}

(where the max is taken over the support of \(f\)) and that will give the most efficient rejection system for that particular choice of \(g\). However, the rejection method is often applied in situations where the distribution changes slightly from one draw to the next. It’s quite possible that you would spend more time computing the maximum in order to achieve “efficient” draws than it would take to use a simpler but less efficient choice of \(\alpha(x)\). For instance, if we look at \eqref{eq:boot_gtruncated} above, we could also use

\begin{equation} g = U\left( {a,b} \right),\alpha = \exp \left( {{{ - (x - \mu )^2 } \mathord{\left/ {\vphantom {{ - (x - \mu )^2 } {2\sigma ^2 }}} \right. } {2\sigma ^2 }}} \right) \end{equation}

If \(\mu \in \left[ {a,b} \right]\), this will be exactly the same as \eqref{eq:boot_gtruncated}. If it isn’t, then this will reject more draws than \eqref{eq:boot_gtruncated}, since \(\alpha(x)\) is strictly less than one for all the \(x\)’s that will be generated by the proposal density. Whether the extra work of finding the normalizing constant in \eqref{eq:boot_rejectiontest} is worth it will depend upon the situation. Here, the added cost isn’t high because the maximum will be at one of \(\left\{ {\mu ,a,b} \right\}\). If, however, \(f\) and \(g\) are non-trivial density functions, the maximum might very well only be found by some iterative root-finding technique like Newton’s method. Since finding the normalizing constant needs to be done just once per set of draws, the extra work will generally pay off when you need many draws from the same distribution, but not if you need just one.

Neither of these rejection systems is actually used by %RANTRUNCATE. Because %RANTRUNCATE can handle intervals that are unbounded, the second method wouldn’t apply, and the first can be quite inefficient. Instead, it uses a different rejection system, using a truncated logistic as the proposal density.