[R-package] clarify parameter documentation (fixes #4193) (#4202)

* [R-package] clarify parameter documentation

* fixes to braces

* linting
This commit is contained in:
James Lamb 2021-04-19 15:41:27 -05:00 коммит произвёл GitHub
Родитель 8e126c80ba
Коммит 53602afa47
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
10 изменённых файлов: 79 добавлений и 45 удалений

Просмотреть файл

@ -730,11 +730,22 @@ Dataset <- R6::R6Class(
#' @description Construct \code{lgb.Dataset} object from dense matrix, sparse matrix
#' or local file (that was created previously by saving an \code{lgb.Dataset}).
#' @param data a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename
#' @param params a list of parameters
#' @param reference reference dataset
#' @param params a list of parameters. See
#' \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{
#' The "Dataset Parameters" section of the documentation} for a list of parameters
#' and valid values.
#' @param reference reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning
#' continuous features into histograms. If you want to apply the same bin boundaries from an existing
#' dataset to new \code{data}, pass that existing Dataset to this argument.
#' @param colnames names of columns
#' @param categorical_feature categorical features
#' @param free_raw_data TRUE for need to free raw data after construct
#' @param categorical_feature categorical features. This can either be a character vector of feature
#' names or an integer vector with the indices of the features (e.g.
#' \code{c(1L, 10L)} to say "the first and tenth columns").
#' @param free_raw_data LightGBM constructs its data format, called a "Dataset", from tabular data.
#' By default, that Dataset object on the R side does not keep a copy of the raw data.
#' This reduces LightGBM's memory consumption, but it means that the Dataset object
#' cannot be changed after it has been constructed. If you'd prefer to be able to
#' change the Dataset object after construction, set \code{free_raw_data = FALSE}.
#' @param info a list of information of the \code{lgb.Dataset} object
#' @param ... other information to pass to \code{info} or parameters pass to \code{params}
#'

Просмотреть файл

@ -5,19 +5,20 @@
#' @param valids a list of \code{lgb.Dataset} objects, used for validation
#' @param record Boolean, TRUE will record iteration message to \code{booster$record_evals}
#' @param colnames feature names, if not null, will use this to overwrite the names in dataset
#' @param categorical_feature list of str or int
#' type int represents index,
#' type str represents feature names
#' @param categorical_feature categorical features. This can either be a character vector of feature
#' names or an integer vector with the indices of the features (e.g.
#' \code{c(1L, 10L)} to say "the first and tenth columns").
#' @param callbacks List of callback functions that are applied at each iteration.
#' @param reset_data Boolean, setting it to TRUE (not the default value) will transform the
#' booster model into a predictor model which frees up memory and the
#' original datasets
#' @param ... other parameters, see Parameters.rst for more information. A few key parameters:
#' @param ... other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
#' the "Parameters" section of the documentation} for more information. A few key parameters:
#' \itemize{
#' \item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
#' \item{\code{num_leaves}: Maximum number of leaves in one tree.}
#' \item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
#' overfit when #data is small. Tree still grow by leaf-wise.}
#' overfitting. Tree still grow by leaf-wise.}
#' \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
#' the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}),
#' not the number of threads (most CPU using hyper-threading to generate 2 threads

Просмотреть файл

@ -5,10 +5,11 @@
#' @param data a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
#' may allow you to pass other types of data like \code{matrix} and then separately supply
#' \code{label} as a keyword argument.
#' @param early_stopping_rounds int. Activates early stopping. Requires at least one validation data
#' and one metric. If there's more than one, will check all of them
#' except the training data. Returns the model with (best_iter + early_stopping_rounds).
#' If early stopping occurs, the model will have 'best_iter' field.
#' @param early_stopping_rounds int. Activates early stopping. When this parameter is non-null,
#' training will stop if the evaluation of any metric on any validation set
#' fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
#' If training stops early, the returned model will have attribute \code{best_iter}
#' set to the iteration number of the best iteration.
#' @param eval evaluation function(s). This can be a character vector, function, or list with a mixture of
#' strings and functions.
#'
@ -48,7 +49,8 @@
#' @param obj objective function, can be character or custom objective function. Examples include
#' \code{regression}, \code{regression_l1}, \code{huber},
#' \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}
#' @param params List of parameters
#' @param params a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
#' the "Parameters" section of the documentation} for a list of parameters and valid values.
#' @param verbose verbosity for output, if <= 0, also will disable the print of evaluation during training
#' @section Early Stopping:
#'

Просмотреть файл

@ -2,7 +2,7 @@
#' @title saveRDS for \code{lgb.Booster} models
#' @description Attempts to save a model using RDS. Has an additional parameter (\code{raw})
#' which decides whether to save the raw model or not.
#' @param object R object to serialize.
#' @param object \code{lgb.Booster} object to serialize.
#' @param file a connection or the name of the file where the R object is saved to or read from.
#' @param ascii a logical. If TRUE or NA, an ASCII representation is written; otherwise (default),
#' a binary one is used. See the comments in the help for save.

Просмотреть файл

@ -18,15 +18,26 @@ lgb.Dataset(
\arguments{
\item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename}
\item{params}{a list of parameters}
\item{params}{a list of parameters. See
\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{
The "Dataset Parameters" section of the documentation} for a list of parameters
and valid values.}
\item{reference}{reference dataset}
\item{reference}{reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning
continuous features into histograms. If you want to apply the same bin boundaries from an existing
dataset to new \code{data}, pass that existing Dataset to this argument.}
\item{colnames}{names of columns}
\item{categorical_feature}{categorical features}
\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
\item{free_raw_data}{TRUE for need to free raw data after construct}
\item{free_raw_data}{LightGBM constructs its data format, called a "Dataset", from tabular data.
By default, that Dataset object on the R side does not keep a copy of the raw data.
This reduces LightGBM's memory consumption, but it means that the Dataset object
cannot be changed after it has been constructed. If you'd prefer to be able to
change the Dataset object after construction, set \code{free_raw_data = FALSE}.}
\item{info}{a list of information of the \code{lgb.Dataset} object}

Просмотреть файл

@ -29,7 +29,8 @@ lgb.cv(
)
}
\arguments{
\item{params}{List of parameters}
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for a list of parameters and valid values.}
\item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
may allow you to pass other types of data like \code{matrix} and then separately supply
@ -104,10 +105,11 @@ the \code{nfold} and \code{stratified} parameters are ignored.}
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
and one metric. If there's more than one, will check all of them
except the training data. Returns the model with (best_iter + early_stopping_rounds).
If early stopping occurs, the model will have 'best_iter' field.}
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
If training stops early, the returned model will have attribute \code{best_iter}
set to the iteration number of the best iteration.}
\item{callbacks}{List of callback functions that are applied at each iteration.}

Просмотреть файл

@ -24,7 +24,8 @@ lgb.train(
)
}
\arguments{
\item{params}{List of parameters}
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for a list of parameters and valid values.}
\item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
may allow you to pass other types of data like \code{matrix} and then separately supply
@ -82,14 +83,15 @@ may allow you to pass other types of data like \code{matrix} and then separately
\item{colnames}{feature names, if not null, will use this to overwrite the names in dataset}
\item{categorical_feature}{list of str or int
type int represents index,
type str represents feature names}
\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
and one metric. If there's more than one, will check all of them
except the training data. Returns the model with (best_iter + early_stopping_rounds).
If early stopping occurs, the model will have 'best_iter' field.}
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
If training stops early, the returned model will have attribute \code{best_iter}
set to the iteration number of the best iteration.}
\item{callbacks}{List of callback functions that are applied at each iteration.}
@ -97,12 +99,13 @@ If early stopping occurs, the model will have 'best_iter' field.}
booster model into a predictor model which frees up memory and the
original datasets}
\item{...}{other parameters, see Parameters.rst for more information. A few key parameters:
\item{...}{other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for more information. A few key parameters:
\itemize{
\item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
\item{\code{num_leaves}: Maximum number of leaves in one tree.}
\item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
overfit when #data is small. Tree still grow by leaf-wise.}
overfitting. Tree still grow by leaf-wise.}
\item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}),
not the number of threads (most CPU using hyper-threading to generate 2 threads

Просмотреть файл

@ -10,10 +10,11 @@
may allow you to pass other types of data like \code{matrix} and then separately supply
\code{label} as a keyword argument.}
\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
and one metric. If there's more than one, will check all of them
except the training data. Returns the model with (best_iter + early_stopping_rounds).
If early stopping occurs, the model will have 'best_iter' field.}
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
If training stops early, the returned model will have attribute \code{best_iter}
set to the iteration number of the best iteration.}
\item{eval}{evaluation function(s). This can be a character vector, function, or list with a mixture of
strings and functions.
@ -59,7 +60,8 @@ If early stopping occurs, the model will have 'best_iter' field.}
\code{regression}, \code{regression_l1}, \code{huber},
\code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}}
\item{params}{List of parameters}
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for a list of parameters and valid values.}
\item{verbose}{verbosity for output, if <= 0, also will disable the print of evaluation during training}
}

Просмотреть файл

@ -28,7 +28,8 @@ may allow you to pass other types of data like \code{matrix} and then separately
\item{weight}{vector of response values. If not NULL, will set to dataset}
\item{params}{List of parameters}
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for a list of parameters and valid values.}
\item{nrounds}{number of training rounds}
@ -36,10 +37,11 @@ may allow you to pass other types of data like \code{matrix} and then separately
\item{eval_freq}{evaluation output frequency, only effect when verbose > 0}
\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
and one metric. If there's more than one, will check all of them
except the training data. Returns the model with (best_iter + early_stopping_rounds).
If early stopping occurs, the model will have 'best_iter' field.}
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
If training stops early, the returned model will have attribute \code{best_iter}
set to the iteration number of the best iteration.}
\item{save_name}{File name to use when writing the trained model to disk. Should end in ".model".}

Просмотреть файл

@ -15,7 +15,7 @@ saveRDS.lgb.Booster(
)
}
\arguments{
\item{object}{R object to serialize.}
\item{object}{\code{lgb.Booster} object to serialize.}
\item{file}{a connection or the name of the file where the R object is saved to or read from.}