
Add a control to a BiodiversityModel object to train ensembles of small models
Source:R/add_control_esm.R
add_control_esm.Rd
Estimating species distributions with few occurrences can be challenging. This is
however often the case for many rare species, where seldom more than 5-10 occurrences
are observed, or species that are estimated over a relatively coarse grid (e.g.,
aggregating observations over a coarse grid). In these cases, it is often that
more complex algorithms tend to overpredict/overfit any given distributions (Breiner et al. 2015,
2018).
A solution to this is to use ensembles of small models (ESM) that are essentially
multiple separate models fitted on a subset of the covariates supplied. After inference,
each model is then combined via ensemble()
to produce a final distribution and coefficients.
In a range of studies it has been shown that this is an effective way to model species with few occurrences, and that it can outperform more complex models (Breiner et al. 2018, Erickson and Smith 2023).
See Details for the implementation in this package.
Usage
add_control_esm(x, n_covs = 2)
# S4 method for class 'BiodiversityDistribution'
add_control_esm(x, n_covs = 2)
Arguments
- x
distribution()
(i.e.BiodiversityDistribution
) object.- n_covs
A
numeric
on the number of covariates in each small model (Default:2
).
Value
Adds a control option to a distribution
object that introduces
the modelling of ensembles of small models in train()
.
Details
To make ensembles of small models (ESM) work, the distribution
object gets
assigned a flag that indicates that the model should be trained as an ensemble of small models
within train()
. The number of variables in each small model can be controlled
via the n_covs
parameter.
After inference and prediction, the coefficients from all models are extracted
and also combined in an ensemble. A dummy engine_glm()
model is then trained
with the exact supplied coefficients per covariate. Note that this necessarily
implies that the coefficients are not directly comparable to those of a single model.
Further only linear modelling is supported to (re)capture the coefficients.
Note
Apart from ensembles of small models, consider also filtering the predictors
prior to supplying them a distribution
object. This can be done using via
the function predictor_filter()
.
References
Breiner, F. T., Guisan, A., Bergamini, A., & Nobis, M. P. (2015). Overcoming limitations of modelling rare species by using ensembles of small models. Methods in Ecology and Evolution, 6(10), 1210-1218.
Breiner, F. T., Nobis, M. P., Bergamini, A., & Guisan, A. (2018). Optimizing ensembles of small models for predicting the distribution of species with few occurrences. Methods in Ecology and Evolution, 9(4), 802-808.
Erickson, K. D., & Smith, A. B. (2023). Modeling the rarest of the rare: a comparison between multi‐species distribution models, ensembles of small models, and single‐species models at extremely low sample sizes. Ecography, 2023(6), e06500.
Examples
if (FALSE) { # \dontrun{
x <- distribution(background) |>
add_biodiversity_poipa(species) |>
add_predictors(covariates) |>
add_control_esm(n_covs = 2)
} # }