Hastey

Metropolis Hastings algorithm implementation example to estimate relative frequencies of purines and pyrimidines

1. Use Preset 1 to assume that the prior p(Θ) is uniform between 0 and 1, exploration window δ = 0.1 and our observed data: 2 purines and 5 pyrimidines. Using a random initial value and a 10000 burn-in period "calms" down some of the noise. We will use the average of all theta values after burn in as an estimate of the posterior mean: μ = 1/N * Σ(Θ) . We will use 1/N * Σ(Θ-μ)^2 as an estimate of the posterior standard of deviation. Repeat the simulation to "reassure" yourself of a good approximation. (But really you can't know for sure without infinite time)
2. Use Preset 2 to repeat the sampling with more observed data. 20 purines and 50 pyrimidines. Notice how the posterior histogram has a much smaller variance; more observations have made us more confident in our prediction of Θ.
3. Use Preset 3 to repeat the sampling with a small set of observations, and a more "Jumpy" prior distribution, defined by: 10/3 - (100/9)Θ when 0 <= Θ <= 0.3 and 100 Θ /49 - 30/49 when 0.3 <= Θ <= 1 (This places a difficult minimum near the true proportion)
4. Use Preset 4 to repeat the sampling with a larger (70) set of observations and the "Jumpy" prior. Notice how some of the posterior is squished at 0.3, causing our estimated STD to become slightly more variable across simulations.
5. Comparing Preset 3 and 4, we find that both estimates suffer similarly from the "Jumpy" prior. The minimum of the prior being very close to the true proportion prevents either from reaching a good estimate in a reasonable time. However, Averaging over many simulations, we find that the smaller standard of deviation for Preset 4 (70 observations) keeps the bulk of the simulation nearer to the true proportion for longer. Therefore, Preset 4 will converge to a better estimate faster, while Preset 3 will take many more samples to reach the same quality estimate.
6. Of the first four presets, 2 and 4 are the most similar. The part of the distribution that is "squashed" due to the "Jumpy" prior has a smaller effect on the larger observation set than on the small set.
7. Use the Rejection Sample Current Prior button to see a histogram of accepted samples from the currently selected prior. To generate these samples, we propose a new Θ from a Uniform Distribution from 0 to 1, U(0,1). We calculate the ratio of the current Prior(Θ) to some constant C * a Uniform Distribution that "encompasses" the current Prior. We then randomly accept or reject the Θ based on this ratio. This continues until the desired number of samples have been accepted, at which point, we estimate the values in the Results tab using the equations mentioned in Part 1.
8. Use the Importance Sample Current Prior button to see a histogram of accepted samples from the currently selected prior. To generate these samples, rather than accepting a few proposed Θ , we accept all newly generated Θ, but we weight their contribution to our Expectation of the posterior statistics. Since we are sampling from a Uniform distribution, the weights are how "far" the currently selected prior is from the Uniform Distribution for each new Θ. (weight equal to 1 indicates that the current prior and the Uniform are identical in that region) We estimate the posterior mean in Results tab by averaging Θ*weight for each Θ. We estimate the posterior standard of deviation by averaging the squared deviations from the mean.

μ = 1/N * Σ(Θ)
STD = 1/N * Σ(Θ-μ)^2
By tinkering with the total Number of Samples across multiple simulations of Rejection and Importance sampling, we find that the posterior estimates settle down near only 100 samples, and are close to stationary around N = 100000. Unfortunately, our graph somewhat loses coherent meaning when graphing importance samples. (I don't understand the "cliff" visible at 0.24, so pull requests are welcome)

Number of samples:
Number of Observations:
True Proportion of Purines:
Window size δ:
Burn In: