This function conducts a model evaluation based on either on the fitted point data or any supplied independent. Currently only supporting point datasets. For validation of integrated models more work is needed.
Usage
validate(
mod,
method = "continuous",
layer = "mean",
point = NULL,
point_column = "observed",
field_occurrence = NULL,
...
)
# S4 method for class 'ANY'
validate(
mod,
method = "continuous",
layer = "mean",
point = NULL,
point_column = "observed",
field_occurrence = NULL,
...
)
# S4 method for class 'SpatRaster'
validate(
mod,
method = "continuous",
layer = NULL,
point = NULL,
point_column = "observed",
field_occurrence = NULL,
...
)
Arguments
- mod
A fitted
BiodiversityDistribution
object with set predictors. Alternatively one can also provide directly aSpatRaster
, however in this case thepoint
layer also needs to be provided.- method
Should the validation be conducted on the continious prediction or a (previously calculated) thresholded layer in binary format? Note that depending on the method different metrics can be computed. See Details.
- layer
In case multiple layers exist, which one to use? (Default:
'mean'
).- point
A
sf
object with typePOINT
orMULTIPOINT
.- point_column
A
character
vector with the name of the column containing the independent observations. (Default:'observed'
).- field_occurrence
(Deprectated) A
character
field pointing to the name of the independent observations. Identical to"point_column"
- ...
Other parameters that are passed on. Currently unused.
Details
The 'validate'
function calculates different validation
metrics depending on the output type.
The output metrics for each type are defined as follows: (where TP stands for true positive, TN for true negative, FP the false positive and FN the false negative) Continuous:
'n'
= Number of observations.'rmse'
= Root Mean Square Error, $$ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} $$'mae'
= Mean Absolute Error, $$ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} $$'logloss'
= Log loss, TBD'normgini'
= Normalized Gini index, TBD'cont.boyce'
= Continuous Boyce index, Ratio of predicted against expected frequency calculated over a moving window: $$\frac{P_{i}}{E_{i}}$$, where $$ P_{i} = \frac{p_{i}}{\sum{j=1}^{b} p_{j}} $$ and $$ E_{i} = \frac{a_{i}}{\sum{j=1}^{b} a_{j}} $$
Discrete:
'n'
= Number of observations.'auc'
= Area under the curve, e.g. the integral of a function relating the True positive rate against the false positive rate.'overall.accuracy'
= Overall Accuracy, Average of all positives,$$ \frac{TP + TN}{n} $$'true.presence.ratio'
= True presence ratio or Jaccard index, $$ \frac{TP}{TP+TN+FP+FN} $$'precision'
= Precision, positive detection rate $$ \frac{TP}{TP+FP} $$'sensitivity'
= Sensitivity, Ratio of True positives against all positives, $$ \frac{TP}{TP+FP} $$'specificity'
= Specifivity, Ratio of True negatives against all negatives, $$ \frac{TN}{TN+FN} $$'tss'
= True Skill Statistics,sensitivity + specificity – 1
*'f1'
= F1 Score or Positive predictive value, $$ \frac{2TP}{2TP + FP + FN} $$'logloss'
= Log loss, TBD'expected.accuracy'
= Expected Accuracy, $$ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} $$'kappa'
= Kappa value, $$ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } $$,'brier.score'
= Brier score, $$ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} $$, where $$y_{i}$$ is predicted presence or absence and $$x_{i}$$ an observed.
References
Allouche O., Tsoar A., Kadmon R., (2006). Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43(6), 1223–1232.
Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058
Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152.
Examples
if (FALSE) { # \dontrun{
# Assuming that mod is a distribution object and has a thresholded layer
mod <- threshold(mod, method = "TSS")
validate(mod, method = "discrete")
} # }