Skip to contents

For most species distribution modelling approaches it is assumed that occurrence records are unbiased, which is rarely the case. While model-based control can alleviate some of the effects of sampling bias, it can often be desirable to account for some sampling biases through spatial thinning (Aiello‐Lammens et al. 2015). This is an approach based on the assumption that oversampled grid cells contribute little more than bias, rather than strengthing any environmental responses. This function provides some methods to apply spatial thinning approaches. Note that this effectively removes data prior to any estimation and its use should be considered with care (see also Steen et al. 2021).

Usage

thin_observations(
  df,
  background,
  env = NULL,
  method = "random",
  minpoints = 10,
  mindistance = NULL,
  zones = NULL,
  verbose = TRUE
)

Arguments

df

A sf or data.frame object with observed occurrence points. All methods threat presence-only and presence-absence occurrence points equally.

background

A SpatRaster object with the background of the study region. Use for assessing point density.

env

A SpatRaster object with environmental covariates. Needed when method is set to "environmental" or "bias" (Default: NULL).

method

A character of the method to be applied (Default: "random").

minpoints

A numeric giving the number of data points at minimum to take (Default: 10).

mindistance

A numeric for the minimum distance of neighbouring observations (Default: NULL).

zones

A SpatRaster to be supplied when option "method" is chosen (Default: NULL).

verbose

logical of whether to print some statistics about the thinning outcome (Default: TRUE).

Details

Currently implemented thinning methods:

  • "random": Samples at random up to number of "minpoints" across all occupied grid cells. Does not account for any spatial or environmental distance between observations.

  • "bias": This option removed explicitly points that are considered biased (parameter "env") only. Points are preferentially thinned from grid cells which are in the 25% most biased (larger values assumed greater bias) and have high point density. Thins the observations up to "minpoints".

  • "zones": Assesses for each observation that it falls with a maximum of "minpoints" into each occupied zone. Careful: If the zones are relatively wide this can remove quite a few observations.

  • "environmental": This approach creates an observation-wide clustering (k-means) under the assumption that the full environmental niche has been comprehensively sampled and is covered by the provided covariates env. We then obtain an number equal to ("minpoints") of observations for each cluster.

  • "spatial": Calculates the spatial distance between all observations. Then points are removed iteratively until the minimum distance between points is crossed. The "mindistance" parameter has to be set for this function to work.

References

  • Aiello‐Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545.

  • Steen, V. A., Tingley, M. W., Paton, P. W., & Elphick, C. S. (2021). Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data. Methods in Ecology and Evolution, 12(2), 216-226.

Examples

if (FALSE) {
 # Thin a certain number of observations
 # At random
 thin_points <- thin_observations(points, background, method = "random")
 # using a bias layer
 thin_points <- thin_observations(points, background, method = "bias", env = bias)
}