Basic Spatial Statistics Definitions

At the end of the last section, we defined a point pattern as a collection of point locations in space. In this section, we will expand upon some definitions and processes that are important in the study of point patterns with statistics.

To get ready for all the fun coding we are going to do, open up R and make sure that spatstat is loaded by running

library(spatstat)

Intensity

One of the most important properties of a point pattern is its intensity. Intensity is defined as the number of points per unit area in a point pattern, and is represented by the Greek letter \(\lambda\). This concept will continue to pop up in our analysis, and is the basis for many spatial statistics analysis methods.

Point Processes

In order to study point patterns formally, we need to define what is called a point process. This is a random process or procedure that generates point patterns. Because point processes are random, 100 point patterns generated from the exact same point process could (and likely will) be completely different. This demonstrates why we are interested in point processes; when analyzing a point pattern, we are likely only looking at one random realization of the point process that was used to generate it. If we can figure something out about the point process, then we can make conclusions about the relationships between points that will always be there, even in another point pattern generated from the same process. It is important to note, however, that point processes are not always an appropriate way to study point patterns (although for our research, it usually is). For more details about situations when it is and isn’t appropriate to use point processes, see Spatial Point Patterns: Methods and Applications with R (SPP:MAR) pg. 142.

One of the simplest and most important examples of a point process is the Poisson process. In this process, \(n\) points (denoted \(x_1, x_2, \ldots, x_n\)) are randomly spatially distributed in a region \(W\). The number of points, \(n\), is randomly generated from a Poisson distribution with intensity \(\lambda\) (if unfamiliar with the Poisson distribution, check out the Wikipedia page). What exactly does “randomly spatially distributed” mean? It means that the coordinates of each point \(x_i\) are selected from a random uniform distributed over \(W\), and that each point location is independent of each other point location (independent just means that the placement of each point is completely unaffected by the location of the other points). The expected number of points to fall in a window with area \(|W|\) is \(\lambda |W|\). To generate a point pattern realization of the Poisson process with \(\lambda = 100\) points/unit area in a 2D window with domain \(x,y \in [0,1]\), we can run

pois.pp <- rpoispp(lambda = 100, win = owin(c(0,1), c(0,1)))

we can then look at our point pattern:

plot(pois.pp)

Bam! That’s a nice looking point pattern. Lets check out how many points are actually in there:

npoints(pois.pp)

## [1] 94

So, we see that even though we specified \(\lambda = 100\), we only have 94 points, due to the randomness of \(n\) introduced by selecting from the Poisson distribution. If you’re curious, try generating a few more realizations to see that each point pattern has different point locations and a different value of \(n\). The analogous function to rpoispp() for 3D point patterns is rpoispp3(). This function is used very similarly to rpoispp; check out the help documentation by running `?rpoispp3’ for more information.

There are many other kind of point processes; pretty much any procedure you can come up with to generate a point pattern randomly. We will talk about a few different important ones (besides the Poisson) later in the tutorials.

Complete Spatial Randomness (CSR)

Complete spatial randomness (CSR) is a term used to describe a point pattern that is “completely random” (what a surprise there). As it turns out, the definition of CSR is actually synonymous with the Poisson process. Formally, a point pattern has CSR if it is homogeneous (described in the next section) and if its point locations are independent. Because CSR and the Poisson process are one in the same, it is fair to say that if a point pattern has CSR i.f.f. it is a realization of the Poisson process.

CSR point patterns have a few nice properties:

Thinning Property: If we start with a CSR point pattern with intensity \(\lambda\) and apply random “thinning” (removing points), where each point has probability \(0 < p < 1\) of remaining in the pattern, then the point pattern after thinning will have CSR with intensity \(p \lambda\).
Superposition Property: If we start with two CSR point patterns with intensities \(\lambda_A\) and \(\lambda_B\), respectively, then their superposition will be a CSR point pattern with intensity \(\lambda = \lambda_A + \lambda_B\).

CSR is a common null hypothesis for point pattern hypothesis testing (which we will get to later in the tutorials), and has many other uses in point pattern spatial statistics. Get familiar with it!

Homogeneity

The definition of CSR utilizes the term homogeneous. A point pattern is homogeneous if the points are equally likely to be located at any spatial location in the window. This means that the intensity \(\lambda\) is constant over the entire window \(W\).

If a point pattern isn’t homogeneous, then it is inhomogeneous. The introduction of inhomogeneity allows for the definition of another important point process: the inhomogeneous Poisson process, which is a slight adjustment to the Poisson process discussed earlier. In this modification, the intensity of points at location \(u \in W\) is defined by some intensity function \(\lambda(u)\). This means that the expected number of points falling in a pixel of size \(\Delta u\) at location \(u\) is \(\lambda(u) \Delta u\), and the expected total number of points in window \(W\) is \(\int_W \lambda(u) du\).

Point patterns generated from the inhomogeneous Poisson process still have the property of independence; each point’s location is independent of every other point’s location. These point patterns also still satisfy the thinning property and the superposition property, but the new intensity functions will be \(p \lambda(u)\) and \(\lambda_A(u) + \lambda_B(u)\), respectively.

To simulate a realization of the inhomogeneous Poisson process, we first define an intensity function, and then use rpoispp:

lambda.u <- function(x,y){1000 * x^2 * y^2 + 100}
pois.inh.pp <- rpoispp(lambda = lambda.u, win = owin(c(0,1), c(0,1)))
plot(pois.inh.pp)

It is quite obvious that this point pattern’s intensity varies as a function of position in \(W\).

The inhomogeneous Poisson process is a decent point process model for many real-world point patterns.

Stationarity & Isotropy

There are a few important assumptions that are frequently made about point process models in order to perform spatial statistics:

First is stationarity, which is invariance of a point process under translation. There is a helpful description of stationarity in SPP:MAR: “Imagine a sheet of cardboard with a hole in it. When we shift the cardboard around (without changing its directional orientation), and view the point process through the hole, the statistical properties of the observable point process are the same in each position”.

Next is isotropy, which is invariance of a point process under rotation around some intrinsic center. That is, if we rotate a point process around its center, we will not be able to tell the difference between the rotated and original process.

A point process can have one or both of these properties, but assuming both of them is common in many fields of spatial stats (including those outside the study of point processes and point patterns).

Marks

Occasionally, points in a point pattern will need to be “marked” with extra information. This mark can be categorical or continuous in nature. For example, if you had a point pattern describing the locations of trees in a forest, you could mark each point with the species of tree it is (categorical) or the diameter of the tree (continuous). For APT data, marks are usually the mass/charge ratio (continuous) or chemical identity after ranging (categorical).

To add marks to a point pattern in spatstat, you use the mark() function. For example, let’s first create a point pattern using the Poisson process:

pp.marked <- rpoispp(lambda = 100, win = owin(c(0,1),c(0,1)))

Now, lets generate a categorical mark of either \(A\), \(B\), or \(C\) for each point in the pattern. We do this using the sample() function, which allows us to randomly sample as many times as we want from a vector of values.

mks <- sample(x = c('A','B','C'), size = npoints(pp.marked), replace = TRUE)

We can then add these marks to our point pattern:

marks(pp.marked) <- mks

Now, when we plot our point pattern, we can see where the different marks are located:

plot(pp.marked)

If we ever want to check out the marks in a point pattern (if there are any), we can run:

pp.marked$marks

##  [1] "B" "B" "A" "B" "A" "C" "C" "A" "A" "C" "B" "B" "C" "C" "C" "A" "B"
## [18] "C" "A" "C" "A" "C" "B" "B" "C" "A" "C" "C" "C" "A" "A" "A" "B" "B"
## [35] "C" "C" "A" "B" "A" "C" "A" "A" "B" "C" "B" "A" "B" "B" "A" "C" "C"
## [52] "B" "B" "A" "B" "C" "B" "B" "C" "A" "C" "A" "B" "A" "A" "B" "C" "B"
## [69] "B" "B" "B" "A" "A" "B" "A" "A" "C" "B" "B" "C" "A" "C" "A" "C" "A"
## [86] "C" "B"

or alternatively:

marks(pp.marked)

##  [1] "B" "B" "A" "B" "A" "C" "C" "A" "A" "C" "B" "B" "C" "C" "C" "A" "B"
## [18] "C" "A" "C" "A" "C" "B" "B" "C" "A" "C" "C" "C" "A" "A" "A" "B" "B"
## [35] "C" "C" "A" "B" "A" "C" "A" "A" "B" "C" "B" "A" "B" "B" "A" "C" "C"
## [52] "B" "B" "A" "B" "C" "B" "B" "C" "A" "C" "A" "B" "A" "A" "B" "C" "B"
## [69] "B" "B" "B" "A" "A" "B" "A" "A" "C" "B" "B" "C" "A" "C" "A" "C" "A"
## [86] "C" "B"

We will get into statistics that you can perform with marks later in the tutorials, but if you are interested, you can also check out SPP:MAR pg. 638.