3  Poisson process theory

Published

May 24, 2024

Remember the PO data. Now let’s think about how to model it. The points you collected, they actually can be described with a very specific mathematical structure, called a Nonhomogeneous Poisson Process (NHPP). I’ll describe the mathematical details of it now.

The NHPP describes a set of points \(S\) where \(\{s_i\} \subseteq D\) with intensity \(\lambda\). Given an area or quadrat called subregion \(A\) of \(D\) of size \(|A|\) centered around site \(s_i\), we can model the number of individuals, \(N_{s}(A)\) occurring at that site with a Poisson distribution. This distribution will have mean \[ \Lambda(A) = \int_A \lambda(s)ds. \]

Typically, we model the intensity with a log-linear function, for example with parameters \(\alpha\) and \(\beta\), \[ \lambda(s) = \exp[\alpha + \beta'x(s)], \] so that the expected number of observations in region A is now \[ \Lambda(A) = \int_A\exp[\alpha + \beta'x(s)]ds. \] We quickly notice that this integral is intractable and we will need a method of numerical integration. Indeed, this will be discussed and explored here.

We now define the likelihood for the NHPP (Banerjee, 2014, p214): \[ f(s_1, s_2...s_n|N(D)=n) = \prod_i\frac{\lambda(s_i)}{(\lambda(D))^n}, \] and the joint density will be \[ f(s_1, s_2...s_n,N(D)=n) = \prod_i\frac{\lambda(s_i)}{(\lambda(D))^n}\left[(\lambda(D))^n\frac{\exp(-\lambda(D))}{n!}\right], \] where we can see the second term on the right corresponds to the Poisson likelihood for the number of total observations in space \(D\). The likelihood will then be \[ L(\lambda(s)|s_1, s_2,...,s_n) = \prod_i\lambda(s_i)\frac{\exp(-\lambda(D))}{n!}. \]

Notice that we are still working with the points \(s_i\). Consider, however, that we are also going to be dealing with environmental data with environmental covariates (remember the notes you took about the temperature??), which in many cases are going to be discretized at some resolution inherently. For example, satellite data only occurs at the resolution of the raster image. Therefore, we can think of the NHPP as a continuous limit of a discretized set of conditionally independent Poisson random variables, where the NHPP emerges as the discretization grows smaller, i.e., \(\lambda(ds), ds\to 0\). In fact, using this continuous limit is how the likelihood for the NHPP can be derived (Banerjee, 2014. p203-4). Considering that we will be working with covariates in a discretized space, the continuous framework ends up being less practically useful in a sense. However, there are various frameworks for thinking about this type of problem. I have described it quite simply so I can move on. The point being, I will now show how to move from the continuous setting to the discrete.

To move between the continuous and discrete versions of this likelihood, partition \(D\) into a grid of \(c\) cells and take the Poisson likelihood: \[\begin{equation} \label{eq:pp-l} L(\lambda) = \prod_c(\lambda(A_i))^{N(A_i)}\exp(-\lambda(A_i)). \end{equation}\] where \(N(A_i)\) is the number of points occuring in cell \(A_i\). Noticing that we can sum all the exponents, we get \[ L(\lambda) = \prod_i(\lambda(A_i))^{N(A_i)}\exp(-[\lambda(A_1) + \lambda(A_2) + \cdots+\lambda(A_c)) = \prod_i(\lambda(A_i))^{N(A_i)}\exp(-\lambda(D)], \] where \(N(A_i) \in \{0,1\}\) as \(|A_c|\to 0\). The term on the right indeed reduces to \(\eqref{eq:pp-l}\). We can then safely work with the Poisson likelihood given our grid is fine enough.

We still have, however, one issue. We originally defined the mean of the of region \(A\) according to the properties of the NHPP, \[ \Lambda(A) = \int_A\lambda(s)ds= \int_A\exp[\alpha + \beta'x(s)]ds. \] Suddenly we are saying that we can just work with the intensity for region \(A\) without clarifying what we should do about the integral over the functional for intensity. Unfortunately, while we may want the integral over \(s\), we don’t have information at the resolution of \(ds\). All we have are covariates \(\beta'X\). That is, \[ \Lambda(A) = \int_A\exp[\alpha + \beta'x(s)]ds \approx |A|\exp[\alpha + \beta'x]. \] Notice that we no longer have \(x(s)\). We assume the covariate information is relative constant over region \(A\). If this doesn’t hold, our approximation will not be correct. But, given that it is all the information we have, the assumption is built into any modelling procedure and is less an issue with the Poisson process and more just an issue of covariate resolution, which is a common modelling issue in spatial-temporal statistics. As is stated in Banerjee, 2014 (p216) “In the absence of finer covariate resolution, we cannot do better with regard to the ecological fallacy.”

For the demonstration here, we assume \(|A|=1\). However, it is important to include the area to scale the parameters appropriately, as the parameters themselves are invariant to the scale of the data. In a GLM, we call this an offset.

Given this approximation and scaling assumption, we now have \[ \Lambda(A) = \int_A\exp[\alpha + \beta'x(s)]ds \approx |A|\exp[\alpha + \beta'x] = \frac{|D|}{c}\exp[\alpha + \beta'x] \] where \(c\) is the number of cells and \(|D|\) is our total area. Then the counts in subregion \(A_i\) will be distributed as \[ N(A_i) \sim Poisson\left(\frac{|D|}{c}\exp[\alpha + \beta'x_i]\right) = Poisson\left(\exp[\alpha + \beta'x_i]\right) \] if the number of cells is equal to the area \((|A| = 1)\). Based on this, we end up with the log-likelihood for our entire presence-only dataset: \[ l(\alpha, \beta) = \sum_{i\in B}N(A_i)(\alpha + \beta'x_i) - \frac{|D|}{c}\sum_{i\in B}\exp[\alpha + \beta'x_i] - \sum_{i\in B}\log(N(A_i))! \] where \(i\in B\) refers to the set of background points \(B={1,2,...,c}\). It is necessary to use the entire set of background points in order to appropriately estimate the integral (D) = _D(s)ds from the original NHPP likelihood.

With all of that, we have defined our model and the likelihood that we can use for PO data, and we have roughly established the discretized Poisson approximation to the NHPP.