Understanding Computational Bayesian Statistics

Preface. 1 Introduction to Bayesian Statistics. 1.1 The Frequentist Approach to Statistics. 1.2 The Bayesian Approach to Statistics. 1.3 Comparing Likelihood and Bayesian Approaches to Statistics. 1.4 Computational Bayesian Statistics. 1.5 Purpose and Organization of This Book. 2 Monte Carlo Sampling from the posterior. 2.1 Acceptance-Rejection-Sampling. 2.2 Sampling-Importance-Resampling. 2.3 Adaptive-Rejection-Sampling from a Log-Concave Distribution. 2.4 Why Direct Methods are Inefficient for High-Dimension Parameter Space. 3 Bayesian Inference. 3.1 Bayesian Inference from the Numerical Posterior. 3.2 Bayesian Inference from Posterior Random Sample. 4 Bayesian Statistics using Conjugate Priors. 4.1 One-Dimensional Exponential Family of Densities. 4.2 Distributions for Count Data. 4.3 Distributions for Waiting Times. 4.4 Normally Distributed Observations with Known Variance. 4.5 Normally Distributed Observations with Known Mean. 4.6 Normally Distributed Observations with Unknown Mean and Variance. 4.7 Multivariate Normal Observations with Known Covariance Matrix. 4.8 Observations from Normal Linear Regression Model. Appendix: Proof of Poisson Process Theorem. 5 Markov Chains. 5.1 Stochastic Processes. 5.2 Markov Chains. 5.3 Time-Invariant Markov Chains with Finite State Space. 5.4 Classification of States of a Markov Chain. 5.5 Sampling from a Markov Chain. 5.6 Time-Reversible Markov Chains and Detailed Balance. 5.7 Markov Chains with Continuous State Space. 6 Markov Chain Monte Carlo Sampling from Posterior. 6.1 Metropolis-Hastings Algorithm for a Single Parameter. 6.2 Metropolis-Hastings Algorithm for Multiple Parameters. 6.3 Blockwise Metropolis-Hastings Algorithm. 6.4 Gibbs Sampling . 6.5 Summary. 7 Statistical Inference from a Markov Chain Monte Carlo Sample. 7.1 Mixing Properties of the Chain. 7.2 Finding a Heavy-Tailed Matched Curvature Candidate Density. 7.3 Obtaining An Approximate Random Sample For Inference. Appendix: Procedure for Finding the Matched Curvature Candidate Density for a Multivariate Parameter. 8 Logistic Regression. 8.1 Logistic Regression Model. 8.2 Computational Bayesian Approach to the Logistic Regression Model. 8.3 Modelling with the Multiple Logistic Regression Model. 9 Poisson Regression and Proportional Hazards Model. 9.1 Poisson Regression Model. 9.2 Computational Approach to Poisson Regression Model. 9.3 The Proportional Hazards Model. 9.4 Computational Bayesian Approach to Proportional Hazards Model. 10 Gibbs Sampling and Hierarchical Models. 10.1 Gibbs Sampling Procedure. 10.2 The Gibbs Sampler for the Normal Distribution. 10.3 Hierarchical Models and Gibbs Sampling. 10.4 Modelling Related Populations with Hierarchical Models. Appendix: Proof that Improper Jeffrey's Prior Distribution for the Hypervariance Can Lead to an Improper Posterior. 11 Going Forward with Markov Chain Monte Carlo. A Using the Included Minitab Macros. B Using the Included R Functions. References. Topic Index.

Introduction to Bayesian Statistics. Overall, the level of the book is such that it should be accessible to undergraduate students-MCMC methods being reduced to Gibbs, random walk, and independent Metropolis-Hastings algorithms and convergence assessments being done via autocorrelation graphs, the Gelman and Rubin (1992) intra-/inter-variance criterion, and a forward coupling device. The illustrative chapters cover logistic regression (Chapter 8), Poisson regression (Chapter 9), and normal hierarchical models (Chapter 10). Again, the overall feeling is that the book should be understood by undergraduate students, even though it may make MCMC seem easier than it is by sticking to fairly regular models. In a sense, it is more a book of the (roaring MCMC) 90s in that it does not incorporate advances from 2000 onward (as seen from the reference list) such as adaptive MCMC and the resurgence of importance sampling via particle systems and sequential Monte Carlo.
"Since we are uncertain about the true values of the parameters, in Bayesian statistics we will consider them to be random variables. This contrasts with the frequentist idea that the parameters are fixed but unknown constants." (Page 3) I find the book's introduction to Bayesian statistics (Chapter 1) somehow unbalanced with statements such as the above and "Statisticians have long known that the Bayesian approach offered clear cut advantages over the frequentist approach," (Page 1) which makes one wonder why there is any frequentist left, or "Clearly, the Bayesian approach is more straightforward [than the frequentist p-value]," (Page 53) because antagonistic presentations are likely to be lost to the neophyte. (I also disagree with the declaration that, for a Bayesian, there is no fixed value for the parameter.) The statement that the MAP estimator is associated with the 0-1 loss function (footnote 4, Page 10) is alas found in many books and papers, thus cannot truly be blamed on the author. That ancillary statistics "only work in exponential families" (footnote 5, Page 13) is either unclear or wrong. The discussion about Bayesian inference in the presence of nuisance parameters (pp. 15-16) is also confusing: "The Bayesian posterior density of u 1 found by marginalizing u 2 out of the joint posterior density, and the profile likelihood function of u 1 turn out to have the same shape" (Page 15) [under a flat prior] sounds wrong to me.

"It is not possible to do any inference about the parameter u from the unscaled posterior." (Page 25)
The chapter about simulation methods (Chapter 2) contains a mistake that one might deem of little importance. However, I do not and here it is: Sampling-important-resampling is presented as an exact simulation method (Page 34), omitting the bias due to normalizing the importance weights.
The chapter on conjugate priors (Chapter 4), although fine, feels as if it does not belong to this book, but should rather be in Bolstad's Introduction to Bayesian Statistics, especially as it is on the long side. Chapter 5 gives an introduction to Markov chain theory in the finite state case, with a nice illustration on the differences in convergence time through two 5 3 5 matrices.
(But why do we need six decimals?!) "MCMC methods are more efficient than the direct [simulation] procedures for drawing samples from the posterior when we have a large number of parameters." (Page 127) MCMC methods are presented through two chapters, the second titled "Statistical Inference from a Markov Chain Monte Carlo Sample" (Chapter 7), which is a neat idea to cover the analysis of an MCMC output. The presentation is mainly one-dimensional, which makes the recommendation to use independent Metropolis-Hastings algorithms found throughout the book [using a t proposal based on curvature at the mode] more understandable, if misguided. The presentation of the blockwise Metropolis-Hastings algorithm of Hastings through the formula (Page 145) is a bit confusing, as the update of the conditioners in the conditional kernels is not indicated. (The following algorithm is correct, though.) I also disliked the notion that "the sequence of draws from the chain (…) is not a random sample" (Page 161) because of the correlation: The draws are random, if not independent. This relates to the recommendation of using heavy thin-in with a gap that "should be the same as the burn-in time" (Page 169), which sounds like a waste of simulation power, as burn-in and thin-in of a Markov chain are different features. The author disagrees with the [my] viewpoint that keeping all the draws in the estimates improves on the precision: "One school considers that you should use all draws (…) However, it is not clear how good this estimate would be" (Page 168) and "values that were thinned out wouldn't be adding very much to the precision" (Page 169). I did not see any mention made of effective sample size, and the burn-in size is graphically determined via autocorrelation graphs, Gelman-Rubin statistics, and a rather fantasist use of coupling from the past (pp. 172-174). (In fact, the criterion is a forward coupling device that only works for independent chains. See Møller and Waagepetersen (2003) and Robert and Casella (2004)  The final chapters apply MCMC methods to logistic (Chapter 8) and Poisson regressions (Chapter 9), again using an independent proposal in the Metropolis-Hastings algorithm. (Actually, we also used a proposal based on the MLE solutions for the logistic regression in Introducing Monte Carlo Methods with R; however, it was in an importance sampling illustration for Chapter 4.) It is a nice introduction to handling generalized linear models with MCMC. The processing of the selection of variables (pp. 195-198 and pp. 224-226) could have been done in a more rigorous manner had Bayes factors been introduced. It is also a nice idea to conclude with Gibbs sampling applied to hierarchical models (Chapter 10), a feature missing in the first edition of Bayesian Core; however, the chapter crucially misses an advanced example, like mixed linear models. This chapter covers the possible misbehavior of posteriors associated with improper priors, with a bit too strong of a warning (see above), and it unnecessarily (in my opinion) goes into a short description of the empirical Bayes approach (pp. 245-247).

The style of Understanding Computational Bayesian
Statistics is repetitive at times, sentences from early paragraphs of a chapter being reproduced verbatim a few pages later. While the idea of opposing likelihoodbased inference to Bayesian inference by an illustration through a dozen graphs (Chapter 1) is praiseworthy, I fear the impact is weakened by the poor 3-D readability of the graphs. Another praiseworthy idea is the inclusion of a "main points" section at the end of each chapter; however, they should have been more focused in my opinion. Maybe the itemized presentation did not help.
Inevitably (trust me!), there are typing mistakes in the book and they will most likely be corrected in a future printing/edition. I am, however, puzzled by the high number of "the the" or the misspelling (Page 261) of Jeffreys' prior into Jeffrey's prior (maybe a mistake from the copy editor