Skip to contents

This function conducts a model evaluation based on either on the fitted point data or any supplied independent. Currently only supporting point datasets. For validation of integrated models more work is needed.

Usage

validate(
  mod,
  method = "continuous",
  layer = "mean",
  point = NULL,
  point_column = "observed",
  ...
)

# S4 method for ANY
validate(
  mod,
  method = "continuous",
  layer = "mean",
  point = NULL,
  point_column = "observed",
  ...
)

# S4 method for SpatRaster
validate(
  mod,
  method = "continuous",
  layer = NULL,
  point = NULL,
  point_column = "observed",
  ...
)

Arguments

mod

A fitted BiodiversityDistribution object with set predictors. Alternatively one can also provide directly a SpatRaster, however in this case the point layer also needs to be provided.

method

Should the validation be conducted on the continious prediction or a (previously calculated) thresholded layer in binary format? Note that depending on the method different metrics can be computed. See Details.

layer

In case multiple layers exist, which one to use? (Default: 'mean').

point

A sf object with type POINT or MULTIPOINT.

point_column

A character vector with the name of the column containing the independent observations. (Default: 'observed').

...

Other parameters that are passed on. Currently unused.

Value

Return a tidy tibble with validation results.

Details

The 'validate' function calculates different validation metrics depending on the output type.

The output metrics for each type are defined as follows: Continuous:

  • 'n' = Number of observations.

  • 'rmse' = Root Mean Square Error, $$ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} $$

  • 'mae' = Mean Absolute Error, $$ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} $$

  • 'logloss' = Log loss, TBD

  • 'normgini' = Normalized Gini index, TBD

  • 'cont.boyce' = Continuous Boyce index, TBD

Discrete:

  • 'n' = Number of observations.

  • 'auc' = Area under the curve, TBD

  • 'overall.accuracy' = Overall Accuracy, TBD

  • 'true.presence.ratio' = True presence ratio or Jaccard index, TBD

  • 'precision' = Precision, TBD

  • 'sensitivity' = Sensitivity, TBD

  • 'specificity' = Specifivity, TBD

  • 'tss' = True Skill Statistics, TBD

  • 'f1' = F1 Score or Positive predictive value, $$ \frac{2TP}{2TP + FP + FN} $$

  • 'logloss' = Log loss, TBD

  • 'expected.accuracy' = Expected Accuracy, $$ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} $$

  • 'kappa' = Kappa value, $$ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } $$,

  • 'brier.score' = Brier score, $$ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} $$, where $y_i$ is predicted presence or absence and $x_i$ an observed. where TP is true positive, TN a true negative, FP the false positive and FN the false negative.

Note

If you use the Boyce Index, please cite the original Hirzel et al. (2006) paper.

References

  • Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058

  • Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152.

Examples

if (FALSE) {
 # Assuming that mod is a distribution object and has a thresholded layer
 mod <- threshold(mod, method = "TSS")
 validate(mod, method = "discrete")
 }