Skip to contents

Allows to estimate eXtreme gradient descent boosting for tree-based or linear boosting regressions. The XGBoost engine is a flexible, yet powerful engine with many customization options, supporting multiple options to perform single and multi-class regression and classification tasks. For a full list of options users are advised to have a look at the xgboost::xgb.train help file and https://xgboost.readthedocs.io.

Usage

engine_xgboost(
  x,
  booster = "gbtree",
  iter = 8000L,
  learning_rate = 0.001,
  gamma = 6,
  reg_lambda = 0,
  reg_alpha = 0,
  max_depth = 2,
  subsample = 0.75,
  colsample_bytree = 0.4,
  min_child_weight = 3,
  nthread = getOption("ibis.nthread"),
  ...
)

Arguments

x

distribution() (i.e. BiodiversityDistribution) object.

booster

A character of the booster to use. Either "gbtree" or "gblinear" (Default: gblinear)

iter

numeric value giving the the maximum number of boosting iterations for cross-validation (Default: 8e3L).

learning_rate

numeric value indicating the learning rate (eta). Lower values generally being better but also computationally more costly. (Default: 1e-3)

gamma

numeric A regularization parameter in the model. Lower values for better estimates (Default: 3). Also see "reg_lambda" parameter for the L2 regularization on the weights

reg_lambda

numeric L2 regularization term on weights (Default: 0).

reg_alpha

numeric L1 regularization term on weights (Default: 0).

max_depth

numeric The Maximum depth of a tree (Default: 3).

subsample

numeric The ratio used for subsampling to prevent overfitting. Also used for creating a random tresting dataset (Default: 0.75).

colsample_bytree

numeric Sub-sample ratio of columns when constructing each tree (Default: 0.4).

min_child_weight

numeric Broadly related to the number of instances necessary for each node (Default: 3).

nthread

numeric on the number of CPU-threads to use.

...

Other none specified parameters.

Value

An Engine.

Details

The default parameters have been set relatively conservative as to reduce overfitting.

XGBoost supports the specification of monotonic constraints on certain variables. Within ibis this is possible via XGBPrior. However constraints are available only for the "gbtree" baselearners.

Note

'Machine learning is statistics minus any checking of models and assumptions‘ ~ Brian D. Ripley, useR! 2004, Vienna

References

  • Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Examples

if (FALSE) {
# Add xgboost as an engine
x <- distribution(background) |> engine_xgboost(iter = 4000)
}