Engine for extreme gradient boosting (XGBoost)

Allows to estimate eXtreme gradient descent boosting for tree-based or linear boosting regressions. The XGBoost engine is a flexible, yet powerful engine with many customization options, supporting multiple options to perform single and multi-class regression and classification tasks. For a full list of options users are advised to have a look at the xgboost::xgb.train help file and https://xgboost.readthedocs.io.

Usage

engine_xgboost(
  x,
  booster = "gbtree",
  iter = 8000L,
  learning_rate = 0.001,
  gamma = 6,
  reg_lambda = 0,
  reg_alpha = 0,
  max_depth = 2,
  subsample = 0.75,
  colsample_bytree = 0.4,
  min_child_weight = 3,
  nthread = getOption("ibis.nthread"),
  ...
)

Arguments

x: distribution() (i.e. BiodiversityDistribution) object.
booster: A character of the booster to use. Either "gbtree" or "gblinear" (Default: gblinear)
iter: numeric value giving the the maximum number of boosting iterations for cross-validation (Default: 8e3L).
learning_rate: numeric value indicating the learning rate (eta). Lower values generally being better but also computationally more costly. (Default: 1e-3)
gamma: numeric A regularization parameter in the model. Lower values for better estimates (Default: 3). Also see "reg_lambda" parameter for the L2 regularization on the weights
reg_lambda: numeric L2 regularization term on weights (Default: 0).
reg_alpha: numeric L1 regularization term on weights (Default: 0).
max_depth: numeric The Maximum depth of a tree (Default: 3).
subsample: numeric The ratio used for subsampling to prevent overfitting. Also used for creating a random tresting dataset (Default: 0.75).
colsample_bytree: numeric Sub-sample ratio of columns when constructing each tree (Default: 0.4).
min_child_weight: numeric Broadly related to the number of instances necessary for each node (Default: 3).
nthread: numeric on the number of CPU-threads to use.
...: Other none specified parameters.

Value

An Engine.

Details

The default parameters have been set relatively conservative as to reduce overfitting.

XGBoost supports the specification of monotonic constraints on certain variables. Within ibis this is possible via XGBPrior. However constraints are available only for the "gbtree" baselearners.

Note

'Machine learning is statistics minus any checking of models and assumptions‘ ~ Brian D. Ripley, useR! 2004, Vienna

References

Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Examples

if (FALSE) { # \dontrun{
# Add xgboost as an engine
x <- distribution(background) |> engine_xgboost(iter = 4000)
} # }