Skip to contents

This function conducts a model evaluation based on either on the fitted point data or any supplied independent. Currently only supporting point datasets. For validation of integrated models more work is needed.

Usage

validate(
  mod,
  method = "continuous",
  layer = "mean",
  point = NULL,
  point_column = "observed",
  field_occurrence = NULL,
  ...
)

# S4 method for class 'ANY'
validate(
  mod,
  method = "continuous",
  layer = "mean",
  point = NULL,
  point_column = "observed",
  field_occurrence = NULL,
  ...
)

# S4 method for class 'SpatRaster'
validate(
  mod,
  method = "continuous",
  layer = NULL,
  point = NULL,
  point_column = "observed",
  field_occurrence = NULL,
  ...
)

Arguments

mod

A fitted BiodiversityDistribution object with set predictors. Alternatively one can also provide directly a SpatRaster, however in this case the point layer also needs to be provided.

method

Should the validation be conducted on the continious prediction or a (previously calculated) thresholded layer in binary format? Note that depending on the method different metrics can be computed. See Details.

layer

In case multiple layers exist, which one to use? (Default: 'mean').

point

A sf object with type POINT or MULTIPOINT.

point_column

A character vector with the name of the column containing the independent observations. (Default: 'observed').

field_occurrence

(Deprectated) A character field pointing to the name of the independent observations. Identical to "point_column"

...

Other parameters that are passed on. Currently unused.

Value

Return a tidy tibble with validation results.

Details

The 'validate' function calculates different validation metrics depending on the output type.

The output metrics for each type are defined as follows: (where TP stands for true positive, TN for true negative, FP the false positive and FN the false negative) Continuous:

  • 'n' = Number of observations.

  • 'rmse' = Root Mean Square Error, $$ \sqrt {\frac{1}{N} \sum_{i=1}^{N} (\hat{y_{i}} - y_{i})^2} $$

  • 'mae' = Mean Absolute Error, $$ \frac{ \sum_{i=1}^{N} y_{i} - x_{i} }{n} $$

  • 'logloss' = Log loss, TBD

  • 'normgini' = Normalized Gini index, TBD

  • 'cont.boyce' = Continuous Boyce index, Ratio of predicted against expected frequency calculated over a moving window: $$\frac{P_{i}{E_{i}} }$$, where $$ P_{i} = \frac{p_{i}}{\sum{j=1}^{b} p_{j} }$$ and $$ E_{i} = \frac{a_{i}}{\sum{j=1}^{b} a_{j} }$$

Discrete:

  • 'n' = Number of observations.

  • 'auc' = Area under the curve, e.g. the integral of a function relating the True positive rate against the false positive rate.

  • 'overall.accuracy' = Overall Accuracy, Average of all positives,$$ \frac{TP + TN}{n} $$

  • 'true.presence.ratio' = True presence ratio or Jaccard index, $$ \frac{TP}{TP+TN+FP+FN} $$

  • 'precision' = Precision, positive detection rate $$ \frac{TP}{TP+FP} $$

  • 'sensitivity' = Sensitivity, Ratio of True positives against all positives, $$ \frac{TP}{TP+FP} $$

  • 'specificity' = Specifivity, Ratio of True negatives against all negatives, $$ \frac{TN}{TN+FN} $$

  • 'tss' = True Skill Statistics, sensitivity + specificity – 1 * 'f1' = F1 Score or Positive predictive value, $$ \frac{2TP}{2TP + FP + FN} $$

  • 'logloss' = Log loss, TBD

  • 'expected.accuracy' = Expected Accuracy, $$ \frac{TP + FP}{N} x \frac{TP + FN}{N} + \frac{TN + FN}{N} x \frac{TN + FP}{N} $$

  • 'kappa' = Kappa value, $$ \frac{2 (TP x TN - FN x FP)}{(TP + FP) x (FP + TN) + (TP + FN) x (FN + TN) } $$,

  • 'brier.score' = Brier score, $$ \frac{ \sum_{i=1}^{N} (y_{i} - x_{i})^{2} }{n} $$, where $$y_{i}$$ is predicted presence or absence and $$x_{i}$$ an observed.

Note

If you use the Boyce Index, please cite the original Hirzel et al. (2006) paper.

References

  • Allouche O., Tsoar A., Kadmon R., (2006). Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43(6), 1223–1232.

  • Liu, C., White, M., Newell, G., 2013. Selecting thresholds for the prediction of species occurrence with presence-only data. J. Biogeogr. 40, 778–789. https://doi.org/10.1111/jbi.12058

  • Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., & Guisan, A. (2006). Evaluating the ability of habitat suitability models to predict species presences. Ecological modelling, 199(2), 142-152.

Examples

if (FALSE) { # \dontrun{
 # Assuming that mod is a distribution object and has a thresholded layer
 mod <- threshold(mod, method = "TSS")
 validate(mod, method = "discrete")
 } # }