зеркало из https://github.com/microsoft/LightGBM.git
148 строки
7.2 KiB
R
148 строки
7.2 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/lightgbm.R
|
|
\name{lightgbm}
|
|
\alias{lightgbm}
|
|
\title{Train a LightGBM model}
|
|
\usage{
|
|
lightgbm(
|
|
data,
|
|
label = NULL,
|
|
weights = NULL,
|
|
params = list(),
|
|
nrounds = 100L,
|
|
verbose = 1L,
|
|
eval_freq = 1L,
|
|
early_stopping_rounds = NULL,
|
|
init_model = NULL,
|
|
callbacks = list(),
|
|
serializable = TRUE,
|
|
objective = "auto",
|
|
init_score = NULL,
|
|
num_threads = NULL,
|
|
colnames = NULL,
|
|
categorical_feature = NULL,
|
|
...
|
|
)
|
|
}
|
|
\arguments{
|
|
\item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
|
|
may allow you to pass other types of data like \code{matrix} and then separately supply
|
|
\code{label} as a keyword argument.}
|
|
|
|
\item{label}{Vector of labels, used if \code{data} is not an \code{\link{lgb.Dataset}}}
|
|
|
|
\item{weights}{Sample / observation weights for rows in the input data. If \code{NULL}, will assume that all
|
|
observations / rows have the same importance / weight.
|
|
|
|
\emph{Changed from 'weight', in version 4.0.0}}
|
|
|
|
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
|
|
the "Parameters" section of the documentation} for a list of parameters and valid values.}
|
|
|
|
\item{nrounds}{number of training rounds}
|
|
|
|
\item{verbose}{verbosity for output, if <= 0 and \code{valids} has been provided, also will disable the
|
|
printing of evaluation during training}
|
|
|
|
\item{eval_freq}{evaluation output frequency, only effective when verbose > 0 and \code{valids} has been provided}
|
|
|
|
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
|
|
training will stop if the evaluation of any metric on any validation set
|
|
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
|
|
If training stops early, the returned model will have attribute \code{best_iter}
|
|
set to the iteration number of the best iteration.}
|
|
|
|
\item{init_model}{path of model file or \code{lgb.Booster} object, will continue training from this model}
|
|
|
|
\item{callbacks}{List of callback functions that are applied at each iteration.}
|
|
|
|
\item{serializable}{whether to make the resulting objects serializable through functions such as
|
|
\code{save} or \code{saveRDS} (see section "Model serialization").}
|
|
|
|
\item{objective}{Optimization objective (e.g. `"regression"`, `"binary"`, etc.).
|
|
For a list of accepted objectives, see
|
|
\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective}{
|
|
the "objective" item of the "Parameters" section of the documentation}.
|
|
|
|
If passing \code{"auto"} and \code{data} is not of type \code{lgb.Dataset}, the objective will
|
|
be determined according to what is passed for \code{label}:\itemize{
|
|
\item If passing a factor with two variables, will use objective \code{"binary"}.
|
|
\item If passing a factor with more than two variables, will use objective \code{"multiclass"}
|
|
(note that parameter \code{num_class} in this case will also be determined automatically from
|
|
\code{label}).
|
|
\item Otherwise (or if passing \code{lgb.Dataset} as input), will use objective \code{"regression"}.
|
|
}
|
|
|
|
\emph{New in version 4.0.0}}
|
|
|
|
\item{init_score}{initial score is the base prediction lightgbm will boost from
|
|
|
|
\emph{New in version 4.0.0}}
|
|
|
|
\item{num_threads}{Number of parallel threads to use. For best speed, this should be set to the number of
|
|
physical cores in the CPU - in a typical x86-64 machine, this corresponds to half the
|
|
number of maximum threads.
|
|
|
|
Be aware that using too many threads can result in speed degradation in smaller datasets
|
|
(see the parameters documentation for more details).
|
|
|
|
If passing zero, will use the default number of threads configured for OpenMP
|
|
(typically controlled through an environment variable \code{OMP_NUM_THREADS}).
|
|
|
|
If passing \code{NULL} (the default), will try to use the number of physical cores in the
|
|
system, but be aware that getting the number of cores detected correctly requires package
|
|
\code{RhpcBLASctl} to be installed.
|
|
|
|
This parameter gets overriden by \code{num_threads} and its aliases under \code{params}
|
|
if passed there.
|
|
|
|
\emph{New in version 4.0.0}}
|
|
|
|
\item{colnames}{Character vector of features. Only used if \code{data} is not an \code{\link{lgb.Dataset}}.}
|
|
|
|
\item{categorical_feature}{categorical features. This can either be a character vector of feature
|
|
names or an integer vector with the indices of the features (e.g.
|
|
\code{c(1L, 10L)} to say "the first and tenth columns").
|
|
Only used if \code{data} is not an \code{\link{lgb.Dataset}}.}
|
|
|
|
\item{...}{Additional arguments passed to \code{\link{lgb.train}}. For example
|
|
\itemize{
|
|
\item{\code{valids}: a list of \code{lgb.Dataset} objects, used for validation}
|
|
\item{\code{obj}: objective function, can be character or custom objective function. Examples include
|
|
\code{regression}, \code{regression_l1}, \code{huber},
|
|
\code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}}
|
|
\item{\code{eval}: evaluation function, can be (a list of) character or custom eval function}
|
|
\item{\code{record}: Boolean, TRUE will record iteration message to \code{booster$record_evals}}
|
|
\item{\code{reset_data}: Boolean, setting it to TRUE (not the default value) will transform the booster model
|
|
into a predictor model which frees up memory and the original datasets}
|
|
}}
|
|
}
|
|
\value{
|
|
a trained \code{lgb.Booster}
|
|
}
|
|
\description{
|
|
High-level R interface to train a LightGBM model. Unlike \code{\link{lgb.train}}, this function
|
|
is focused on compatibility with other statistics and machine learning interfaces in R.
|
|
This focus on compatibility means that this interface may experience more frequent breaking API changes
|
|
than \code{\link{lgb.train}}.
|
|
For efficiency-sensitive applications, or for applications where breaking API changes across releases
|
|
is very expensive, use \code{\link{lgb.train}}.
|
|
}
|
|
\section{Early Stopping}{
|
|
|
|
|
|
"early stopping" refers to stopping the training process if the model's performance on a given
|
|
validation set does not improve for several consecutive iterations.
|
|
|
|
If multiple arguments are given to \code{eval}, their order will be preserved. If you enable
|
|
early stopping by setting \code{early_stopping_rounds} in \code{params}, by default all
|
|
metrics will be considered for early stopping.
|
|
|
|
If you want to only consider the first metric for early stopping, pass
|
|
\code{first_metric_only = TRUE} in \code{params}. Note that if you also specify \code{metric}
|
|
in \code{params}, that metric will be considered the "first" one. If you omit \code{metric},
|
|
a default metric will be used based on your choice for the parameter \code{obj} (keyword argument)
|
|
or \code{objective} (passed into \code{params}).
|
|
}
|
|
|