
Functionality for geographic and environmental thinning
Source:R/utils-spatial.R
thin_observations.RdFor most species distribution modelling approaches it is assumed that occurrence records are unbiased, which is rarely the case. While model-based control can alleviate some of the effects of sampling bias, it can often be desirable to account for some sampling biases through spatial thinning (Aiello‐Lammens et al. 2015). This is an approach based on the assumption that over-sampled grid cells contribute little more than bias, rather than strengthening any environmental responses. This function provides some methods to apply spatial thinning approaches. Note that this effectively removes data prior to any estimation and its use should be considered with care (see also Steen et al. 2021).
Usage
thin_observations(
data,
background,
env = NULL,
method = "random",
remainpoints = 10,
mindistance = NULL,
zones = NULL,
probs = 0.75,
global = TRUE,
centers = NULL,
verbose = TRUE
)Arguments
- data
A
sfobject with observed occurrence points. All methods threat presence-only and presence-absence occurrence points equally.- background
A
SpatRasterobject with the background of the study region. Use for assessing point density.- env
A
SpatRasterobject with environmental covariates. Needed when method is set to"environmental"or"bias"(Default:NULL).- method
A
characterof the method to be applied (Default:"random").- remainpoints
A
numericgiving the number of data points at minimum to remain (Default:10).- mindistance
A
numericfor the minimum distance of neighbouring observations (Default:NULL).- zones
A
SpatRasterto be supplied when option"zones"is chosen (Default:NULL).- probs
A
numericused as quantile threshold in"bias"method. (Default:0.75).- global
A
logicalif during"bias"method global (entireenvraster) or local (extracted at point locations) bias values are used as for quantile threshold. (Default:TRUE).- centers
A
numericused as number of centers for"environmental"method. (Default:NULL). If not set, automatically set to three or nlayers - 1 (whatever is bigger).- verbose
logicalof whether to print some statistics about the thinning outcome (Default:TRUE).
Details
All methods only remove points from "over-sampled" grid cells/areas. These are
defined as all cells/areas which either have more points than remainpoints or
more points than the global minimum point count per cell/area (whichever is larger).
Currently implemented thinning methods:
"random": Samples at random across all over-sampled grid cells returning only"remainpoints"from over-sampled cells. Does not account for any spatial or environmental distance between observations."bias": This option removes explicitly points that are considered biased only (based on"env"). Points are only thinned from grid cells which are above the bias quantile (larger values equals greater bias). Thins the observations returning"remainpoints"from each over-sampled and biased cell."zones": Thins observations from each zone that is above the over-sampled threshold and returns"remainpoints"for each zone. Careful: If the zones are relatively wide this can remove quite a few observations."environmental": This approach creates an observation-wide clustering (k-means) under the assumption that the full environmental niche has been comprehensively sampled and is covered by the provided covariatesenv. For each over-sampled cluster, we then obtain ("remainpoints") by thinning points."spatial": Calculates the spatial distance between all observations. Then points are removed iteratively until the minimum distance between points is crossed. The"mindistance"parameter has to be set for this function to work.
References
Aiello‐Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545.
Steen, V. A., Tingley, M. W., Paton, P. W., & Elphick, C. S. (2021). Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data. Methods in Ecology and Evolution, 12(2), 216-226.