This function conducts a model evaluation based on either on the fitted point data or any supplied independent. Currently only supporting point datasets. For validation of integrated models more work is needed.
Usage
# S4 method for ANY,character,sf,character,character
validate(mod,method,point,layer,point_column,...)
# S4 method for SpatRaster,character,sf,character
validate(mod,method,point,point_column,...)
Arguments
- mod
A fitted
BiodiversityDistribution
object with set predictors. Alternatively one can also provide directly aSpatRaster
, however in this case thepoint
layer also needs to be provided.- method
Should the validation be conducted on the continious prediction or a (previously calculated) thresholded layer in binary format? Note that depending on the method different metrics can be computed. See Details.
- layer
In case multiple layers exist, which one to use? (Default:
'mean'
).- point
A
sf
object with typePOINT
orMULTIPOINT
.- point_column
A
character
vector with the name of the column containing the independent observations. (Default:'observed'
).- ...
Other parameters that are passed on. Currently unused.
Details
The 'validate'
function calculates different validation
metrics depending on the output type.
The output metrics for each type are defined as follows: Continuous:
'n'
= Number of observations.'rmse'
= Root Mean Square Error, $$ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} $$'mae'
= Mean Absolute Error, $$ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} $$'logloss'
= Log loss, TBD'normgini'
= Normalized Gini index, TBD'cont.boyce'
= Continuous Boyce index, TBD
Discrete:
'n'
= Number of observations.'auc'
= Area under the curve, TBD'overall.accuracy'
= Overall Accuracy, TBD'true.presence.ratio'
= True presence ratio or Jaccard index, TBD'precision'
= Precision, TBD'sensitivity'
= Sensitivity, TBD'specificity'
= Specifivity, TBD'tss'
= True Skill Statistics, TBD'f1'
= F1 Score or Positive predictive value, $$ \frac{2TP}{2TP + FP + FN} $$'logloss'
= Log loss, TBD'expected.accuracy'
= Expected Accuracy, $$ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} $$'kappa'
= Kappa value, $$ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } $$,'brier.score'
= Brier score, $$ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} $$, where $y_i$ is predicted presence or absence and $x_i$ an observed. where TP is true positive, TN a true negative, FP the false positive and FN the false negative.
References
Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058
Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152.
Examples
if (FALSE) {
# Assuming that mod is a distribution object and has a thresholded layer
mod <- threshold(mod, method = "TSS")
validate(mod, method = "discrete")
}