Research Interests
- Bayesian modelling in partially identified models I am interested in issues of partial identification and how they can be dealt with within a Bayesian model. This problem arises very frequently these days, as vast amounts of data are constantly collected, but may not necessarily be uniformly random. A very simple example is internet polling: it allows us to obtain a sample of voting preferences, but certain types of people are more likely to vote online than others. This leaves us with a potentially huge dataset, but whose observations aren't quite what we want. We use flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data is available - inferring the prevalence of accounting misconduct among publicly traded U.S. businesses. This is joint work with Richard Hahn and Jared Murray , you can find more information in our paper.
- Bayesian modelling in health economics I have recently been interested in applications of statistics in health economics. Over recent years Value of Information analysis has become more widespread in health-economic evaluations, specifically as a tool to perform Probabilistic Sensitivity Analysis. This is largely due to methodological advancements allowing for the fast computation of a typical summary known as the Expected Value of Partial Perfect Information (EVPPI). A recent review discussed some estimations method for calculating the EVPPI but as the research has been active over the intervening years this review does not discuss some key estimation methods. Therefore, this paper presents a comprehensive review of these new methods. We begin by providing the technical details of these computation methods. We then present a case study in order to compare the estimation performance of these new methods. We conclude that the most recent development based on non-parametric regression offers the best method for calculating the EVPPI efficiently. This means that the EVPPI can now be used practically in health economic evaluations, especially as all the methods are developed in parallel with R. This is joint work with Anna Heath and Gianluca Baio. You can find more information in our review paper and a paper describing an efficient method of calculating the EVPPI using INLA (Intergrated Nested Laplace Approximation.
- Diffusion modelling in cell and animal tracking I have used diffusion models to describe trajectories of various organisms. This work was motivated by immune cell data collected by 2-photon microscopy from inside the lymphnode of a live mouse. The cells were tracked in 3 dimensions over time and the scientific interest lay in understanding the structure of the motion and whether it is completely random. This was joint work with Mike West, Melanie Matheu and Mike Cahalan. You can find more information in our paper. I have since also used similar methods to describe the motion of dictyostelium cells.
- Nonparametric Modelling of Spatial Fields and Images Quantitative Bayesian analysis of immunofluorescence histological imaging using Gaussian mixture modelling. This project follows a recent paper by Ji et. al., where the images are treated as a Poisson point process, observed indirectly due to the presence of noise. The intensity function of the point process is then represented by a mixture model with Dirichlet mixing weights, providing a flexible basis for drawing estimates of a variety of properties pertaining to the images, as well as confidence regions for those estimates. You can find more details about the imaging project here.
- Computational Methods in Big Data Problems Drawing inferences about low probability subpopulations in big datasets, and specifically in mixture models. The number of observations often deems a full analysis computationally prohibitive; this becomes especially important when observations of interest are very rare. In collaboration with Dr Cliburn Chan and Professor Mike West, we implemented Sequential Monte Carlo methods to draw targeted subsamples of the data in order to extract as much of the information about the low probability regions as possible. More information about the big data methods can be found here.
- A Bayesian Approach to Phylogeographic Clustering Discuss BPEC and 3 papers Drawing inferences about population geographical structure using Markov chain Monte Carlo methods. When given a section of the DNA sequences of some individuals (of the same species) along with their geographical location, one of the questions asked by biologists is how to split our data into clusters in terms of their geographical distribution, so that the results are consistent with the genetic history. A fully Bayesian aprroach to Phylogeographic Clustering is proposed, using auxiliary variable methods to overcome intractable likelihood issues. You can find more information about the BPEC methods here.
More recently I have been working on modelling animal movement using diffusion processes. I have been working with Yvo Pokern and Tjun-Yee Hoh in developing Bayesian non-parametric methods to draw inferences about 2-dimensional diffusion processes, applied to monkeys and baboons.