Functionality for geographic and environmental thinning
Source:R/utils-spatial.R
thin_observations.Rd
For most species distribution modelling approaches it is assumed that occurrence records are unbiased, which is rarely the case. While model-based control can alleviate some of the effects of sampling bias, it can often be desirable to account for some sampling biases through spatial thinning (Aiello‐Lammens et al. 2015). This is an approach based on the assumption that oversampled grid cells contribute little more than bias, rather than strengthing any environmental responses. This function provides some methods to apply spatial thinning approaches. Note that this effectively removes data prior to any estimation and its use should be considered with care (see also Steen et al. 2021).
Usage
thin_observations(
df,
background,
env = NULL,
method = "random",
minpoints = 10,
mindistance = NULL,
zones = NULL,
verbose = TRUE
)
Arguments
- df
A
sf
ordata.frame
object with observed occurrence points. All methods threat presence-only and presence-absence occurrence points equally.- background
A
SpatRaster
object with the background of the study region. Use for assessing point density.- env
A
SpatRaster
object with environmental covariates. Needed when method is set to"environmental"
or"bias"
(Default:NULL
).- method
A
character
of the method to be applied (Default:"random"
).- minpoints
A
numeric
giving the number of data points at minimum to take (Default:10
).- mindistance
A
numeric
for the minimum distance of neighbouring observations (Default:NULL
).- zones
A
SpatRaster
to be supplied when option"method"
is chosen (Default:NULL
).- verbose
logical
of whether to print some statistics about the thinning outcome (Default:TRUE
).
Details
Currently implemented thinning methods:
"random"
: Samples at random up to number of"minpoints"
across all occupied grid cells. Does not account for any spatial or environmental distance between observations."bias"
: This option removed explicitly points that are considered biased (parameter"env"
) only. Points are preferentially thinned from grid cells which are in the 25% most biased (larger values assumed greater bias) and have high point density. Thins the observations up to"minpoints"
."zones"
: Assesses for each observation that it falls with a maximum of"minpoints"
into each occupied zone. Careful: If the zones are relatively wide this can remove quite a few observations."environmental"
: This approach creates an observation-wide clustering (k-means) under the assumption that the full environmental niche has been comprehensively sampled and is covered by the provided covariatesenv
. We then obtain an number equal to ("minpoints"
) of observations for each cluster."spatial"
: Calculates the spatial distance between all observations. Then points are removed iteratively until the minimum distance between points is crossed. The"mindistance"
parameter has to be set for this function to work.
References
Aiello‐Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545.
Steen, V. A., Tingley, M. W., Paton, P. W., & Elphick, C. S. (2021). Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data. Methods in Ecology and Evolution, 12(2), 216-226.