2 Theory
2.1 LGCP
A Poisson point process is a stochastic set of points indexed in a bounded n-dimensional space, where the number of points in the set is a Poisson distributed random variable. The LGCP extends the process by explicitly incorporating spatial or temporal correlation into the definition of the intensity of the process. Diggle et al. (2013) is a good reference for these type of models. I’ll give some details below, and then discuss the INLA approach for actually estimating the model.
The Poisson Point Process is characterized by an intensity that can be defined as the expected number of observations per unit area. For a given region \(A\), the number of individuals occurring within the region will be Poisson distributed. The mean of the distribution can be defined by the integral of the intensity over the region: \[ \Delta(A) = \int_A \lambda(s)ds \] Sadly, the intensity is an unobservable and unknown process, and therefore must be modelled as a function of covariates. The intensity takes the general parameterization as \[ \lambda(s) = \exp[\alpha + \beta'x(s)] \] In the previous blog I discussed incorporating sampling bias, but I will leave that out now. Some version of this will come back when discussing preferential sampling.
Now, in the LGCP we incorporate a Gaussian Random Field (also described as Gaussian spatial field, spatial random effect, or latent spatial field. I’m not sure exactly when the difference in description leads to a distinctly different spatial model) to incorporate spatial correlation in the model of intensity. \[ \lambda(s) = \exp[\alpha + \beta'x(s) + f_s(s)] \] The spatial effect is denoted as \(f_s(s)\) in order to distinguish from temporal contributions. \(f_s(s)\) can be thought of as a continuous spatial effect that is evaluated at certain locations.
The main thing is that, once we incorporate \(f_s(s)\), we are forced to use a covariance structure that depends on the resolution of the grid or mesh that will be used to approximate the continuous spatial effect. Furthermore, we are still stuck with the problem of an intractable integral in the likelihood of the LGCP (as in the case of the Poisson process). Thus, the computational complexity makes this approach prohibitively costly when using MCMC approaches to estimate the model in any region of practical interest. Hence, other approximations to the posterior of the parameters in the LGCP is necessary.
2.2 INLA
The Integrated Nested Laplace Approximation has been proposed by Rue, Martino, and Chopin (2009) as a fast way to estimate models with latent Gaussian structure. The LGCP is certainly an instance of models of this class. In the following description of the model, I present it nearly identically to the original paper, and/or from other very well-done sources, including:
There is nothing new under the sun here. I’m just condensing some resources for my own reference. So, taking notation directly from the Rue paper, INLA is concerned with models with an additive predictor that have the general form \[ \eta_i = \alpha + \sum^{n_f}_{j=1}f^{(j)}(u_{ij}) + \sum^{n_{\beta}}_{k=1} \beta_kz_{ki} + \epsilon_i \] where \(\alpha\) is an intercept, \(f\) are unknown functions of \(\mathbf{u}\), \(\beta\) are the typical fixed effects for covariates \(\mathbf{z}\), and \(\epsilon_i\) is the additional noise in the process. Additionally, \(\mathbf{x}\) will refer to the vector of latent Gaussian variables (Gaussian by definition of the prior), and \(\mathbf{\theta}\) the vector of hyperparameters that we usually deal with in a Bayesian model. These need not be inherently Gaussian. In the book of Blangiardo and Cameletti (2015), hyperparameters and latent field are denoted by \(\psi\) and \(\theta\) respectively, which is clearer than using \(\mathbf{x}\) to describe the latent field, but here I’m sticking to the original notation.
Still taking from the notation of the Rue paper, the joint posterior of hyperparameters and latent variables is written as \[ \begin{aligned} \pi(\mathbf{x},\mathbf{\theta}|\mathbf{y})\propto\pi(\mathbf{\theta})\pi(\mathbf{x}|\mathbf{\theta})\prod_{i\in I}\pi(y_i|x_i,\mathbf{\theta})\\ \propto\pi(\mathbf{\theta})|Q(\mathbf{\theta})|^{1/2}\exp\bigg[-\frac{1}{2}\mathbf{x}^TQ(\mathbf{\theta})\mathbf{x}+\sum_{i\in I}\log\{\pi(y_i|x_i,\mathbf{\theta})\}\bigg], \end{aligned} \] where \(\pi(y_i|x_i,\mathbf{\theta})\) is the distribution of the response variable and \(|Q(\mathbf{\theta})|^{1/2}\exp\bigg[-\frac{1}{2}\mathbf{x}^TQ(\mathbf{\theta})\mathbf{x}\bigg]\) is the Gaussian prior with a mean of 0 precision matrix \(Q\) on the GRMF conditional on \(\mathbf{\theta}\).
The main point of INLA is to make the nested approximations of the posterior marginals of the hyperparameters and (even more importantly) the latent field. The posterior marginal of one component of the hyperparameters is \[ \pi(\theta_j|\mathbf{y}) = \int \pi(\mathbf{\theta}|\mathbf{y})d\mathbf{\theta}_{-j}. \] Notice the \(-j\) components are integrated out. The posterior marginal of a component of the latent field is \[ \pi(x_i|\mathbf{y}) = \int \pi(x_i|\mathbf{\theta},\mathbf{y})\pi(\mathbf{\theta}|\mathbf{y})d\mathbf{\theta}, \] again, sticking to the \(\mathbf{x}\) notation.
Following the procedure of INLA, first \(\pi(\theta|y)\) will need to be approximated, and then \(\pi(x_i|\theta,y)\). The integration to approximate \(\pi(x_i|y)\) will be done numerically. The approximations are done with the Laplace Approximation, or a Gaussian-type approximation. To approximate the first quantity \(\pi(\theta|y)\), a Laplace approximation is used from Tierney and Kadane, 1986: \[ \tilde{\pi}(\theta|y) \propto \frac{\pi(x,\theta,y)}{\tilde{\pi}_G(x|\theta,y)}\bigg|_{x=x^*(\theta)} = \frac{\pi(y|x,\theta)\pi(x|\theta)\pi(\theta)}{\tilde{\pi}_G(x|\theta,y)}\bigg|_{x=x^*(\theta)}. \] Interestingly, this approximation uses a NESTED approximation, \(\tilde{\pi}_G(x|\theta,y)\). \(\tilde{\pi}_G\) is the Gaussian approximation to the latent field, which works because we already know the latent field to be prior-ly distributed as Gaussian.
The contribution of this paper is to then approximate \(\pi(x_i|, \theta, y)\) with a Laplace Approximation (LA), beyond just a simple Gaussian. The paper discusses 3 approximations to \(\pi(x_i|\theta, y)\): the Gaussian, LA, and a simplified LA.
The LA approximation is written as \[ \tilde{\pi}_{LA}(x_i|\theta,y) \propto \frac{\pi(x,\theta,y)}{\tilde{\pi}_{GG}(x_{-i}|x_i,\theta,y)}\bigg|_{x_{-i}=x^*_{-i}(x_i,\theta)}. \] This can be computationally intensive since it is necessary to recompute \(\tilde{\pi}_{GG}(x_{-i}|x_i,\theta,y)\) for each element of \(x\) and \(\theta\). Why?
Once the approximations of \(\tilde{\pi}(\theta|y)\) of \(\tilde{\pi}(x_i|\theta,y)\) are found, the marginals for each element \(x_i\) can be approximated:
\[ \tilde{\pi}(x_i|y) \approx\sum\tilde{\pi}(x_i|\theta^{(i)},y)\tilde{\pi}(\theta^{(j)}|y)\Delta_j \] for integration points \(\theta^{(i)}\) which are found by the exploration of the posterior for \(\theta\). This is done in a few different ways, for example by creating a standardized \(Z\) variable around the mode and inverse hessian of \(\log \tilde\pi}(\theta|y)\). There are quite a few more details to this part that I won’t discuss here, but may come up when up using the INLA software itself (choosing the integration scheme, for instance).
2.3 SPDE connection
The Stochastic Partial Differential Equation (SPDE) approach is an innovation from (lindgren-2011?) on the INLA method. The approach allows the representation of a continuous Gaussian Field with a Gaussian Markov Random Field. The connection to the (rue-2009?) paper is that (lindgren-2011?) connected the GRMF models of INLA to specific (continuous) GFs. As spatial statisticians, we care about the continuous case more than approximations in discrete space, and SPDE connected the two.
To be slightly more specific, it turns out a GF with Matern covariance is a solution to a specific SPDE. The solution to the SPDE is found via a set of basis functions defined on a triangulation of the spatial region. The resulting GRMF is an approximation to the SPDE. So this approach allows us to approximate the solution of the SPDE with a GRMF that corresponds to a specific GF with matern covariance.
2.4 Mesh
The mesh is is used to project the discrete GRMF to the continuous GF using some spatial weights defined at the vertices of the mesh (did I say that right?). From the Moraga tutorial above, she says the continuous process is estimated using a weighted average of the process at the vertices of the discrete triangulation mesh.
The projection matrix maps the GMRF from the observations to the triangulation nodes. The observation will be the weighted average using the weights and values from the triangulation and projection matrix. \[ S(x_i) \approx \sum^G_{g=1}A_{ig}S_g \]
2.5 Off the grid
(simpson-2016?) developed an approach for approximating the likelihood of the LGCP within the INLA framework, using a random field \[ Z(s) = \sum^n_{i=1}z_i\phi_i(s) \] TODO: 1-10: Think about what simpson did. ## Concluding the section
That was a bit of a longer-than-intended dive into INLA. But INLA is definitely a more mathematically complicated method than other approximations, and I wanted to solidify the details for myself. It’s not always that way, sometimes it’s nice just to jump straight into the application, but with INLA it seemed prudent to pick over the details. Now, onto using the software with some of my own data and use-cases.
Anyway, having covered, the necessary bits of INLA and LGCPs, I’ll now proceed to discuss the modelling procedure using INLA
and inlabru
. I’ll compare the software, first for a continuous measurement and then for point data using an LGCP!