diff --git a/R-package/R/lgb.Dataset.R b/R-package/R/lgb.Dataset.R index 4a8e838f8..c8ffb8370 100644 --- a/R-package/R/lgb.Dataset.R +++ b/R-package/R/lgb.Dataset.R @@ -730,11 +730,22 @@ Dataset <- R6::R6Class( #' @description Construct \code{lgb.Dataset} object from dense matrix, sparse matrix #' or local file (that was created previously by saving an \code{lgb.Dataset}). #' @param data a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename -#' @param params a list of parameters -#' @param reference reference dataset +#' @param params a list of parameters. See +#' \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{ +#' The "Dataset Parameters" section of the documentation} for a list of parameters +#' and valid values. +#' @param reference reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning +#' continuous features into histograms. If you want to apply the same bin boundaries from an existing +#' dataset to new \code{data}, pass that existing Dataset to this argument. #' @param colnames names of columns -#' @param categorical_feature categorical features -#' @param free_raw_data TRUE for need to free raw data after construct +#' @param categorical_feature categorical features. This can either be a character vector of feature +#' names or an integer vector with the indices of the features (e.g. +#' \code{c(1L, 10L)} to say "the first and tenth columns"). +#' @param free_raw_data LightGBM constructs its data format, called a "Dataset", from tabular data. +#' By default, that Dataset object on the R side does not keep a copy of the raw data. +#' This reduces LightGBM's memory consumption, but it means that the Dataset object +#' cannot be changed after it has been constructed. If you'd prefer to be able to +#' change the Dataset object after construction, set \code{free_raw_data = FALSE}. #' @param info a list of information of the \code{lgb.Dataset} object #' @param ... other information to pass to \code{info} or parameters pass to \code{params} #' diff --git a/R-package/R/lgb.train.R b/R-package/R/lgb.train.R index 5e54a7b32..1f0216110 100644 --- a/R-package/R/lgb.train.R +++ b/R-package/R/lgb.train.R @@ -5,19 +5,20 @@ #' @param valids a list of \code{lgb.Dataset} objects, used for validation #' @param record Boolean, TRUE will record iteration message to \code{booster$record_evals} #' @param colnames feature names, if not null, will use this to overwrite the names in dataset -#' @param categorical_feature list of str or int -#' type int represents index, -#' type str represents feature names +#' @param categorical_feature categorical features. This can either be a character vector of feature +#' names or an integer vector with the indices of the features (e.g. +#' \code{c(1L, 10L)} to say "the first and tenth columns"). #' @param callbacks List of callback functions that are applied at each iteration. #' @param reset_data Boolean, setting it to TRUE (not the default value) will transform the #' booster model into a predictor model which frees up memory and the #' original datasets -#' @param ... other parameters, see Parameters.rst for more information. A few key parameters: +#' @param ... other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +#' the "Parameters" section of the documentation} for more information. A few key parameters: #' \itemize{ #' \item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.} #' \item{\code{num_leaves}: Maximum number of leaves in one tree.} #' \item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with -#' overfit when #data is small. Tree still grow by leaf-wise.} +#' overfitting. Tree still grow by leaf-wise.} #' \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to #' the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}), #' not the number of threads (most CPU using hyper-threading to generate 2 threads diff --git a/R-package/R/lightgbm.R b/R-package/R/lightgbm.R index 6af8da66a..e2df9063e 100644 --- a/R-package/R/lightgbm.R +++ b/R-package/R/lightgbm.R @@ -5,10 +5,11 @@ #' @param data a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}}, #' may allow you to pass other types of data like \code{matrix} and then separately supply #' \code{label} as a keyword argument. -#' @param early_stopping_rounds int. Activates early stopping. Requires at least one validation data -#' and one metric. If there's more than one, will check all of them -#' except the training data. Returns the model with (best_iter + early_stopping_rounds). -#' If early stopping occurs, the model will have 'best_iter' field. +#' @param early_stopping_rounds int. Activates early stopping. When this parameter is non-null, +#' training will stop if the evaluation of any metric on any validation set +#' fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. +#' If training stops early, the returned model will have attribute \code{best_iter} +#' set to the iteration number of the best iteration. #' @param eval evaluation function(s). This can be a character vector, function, or list with a mixture of #' strings and functions. #' @@ -48,7 +49,8 @@ #' @param obj objective function, can be character or custom objective function. Examples include #' \code{regression}, \code{regression_l1}, \code{huber}, #' \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass} -#' @param params List of parameters +#' @param params a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +#' the "Parameters" section of the documentation} for a list of parameters and valid values. #' @param verbose verbosity for output, if <= 0, also will disable the print of evaluation during training #' @section Early Stopping: #' diff --git a/R-package/R/saveRDS.lgb.Booster.R b/R-package/R/saveRDS.lgb.Booster.R index 1a0a657fa..64b1f78ec 100644 --- a/R-package/R/saveRDS.lgb.Booster.R +++ b/R-package/R/saveRDS.lgb.Booster.R @@ -2,7 +2,7 @@ #' @title saveRDS for \code{lgb.Booster} models #' @description Attempts to save a model using RDS. Has an additional parameter (\code{raw}) #' which decides whether to save the raw model or not. -#' @param object R object to serialize. +#' @param object \code{lgb.Booster} object to serialize. #' @param file a connection or the name of the file where the R object is saved to or read from. #' @param ascii a logical. If TRUE or NA, an ASCII representation is written; otherwise (default), #' a binary one is used. See the comments in the help for save. diff --git a/R-package/man/lgb.Dataset.Rd b/R-package/man/lgb.Dataset.Rd index 67f129bdc..4a5abcf78 100644 --- a/R-package/man/lgb.Dataset.Rd +++ b/R-package/man/lgb.Dataset.Rd @@ -18,15 +18,26 @@ lgb.Dataset( \arguments{ \item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename} -\item{params}{a list of parameters} +\item{params}{a list of parameters. See +\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{ +The "Dataset Parameters" section of the documentation} for a list of parameters +and valid values.} -\item{reference}{reference dataset} +\item{reference}{reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning +continuous features into histograms. If you want to apply the same bin boundaries from an existing +dataset to new \code{data}, pass that existing Dataset to this argument.} \item{colnames}{names of columns} -\item{categorical_feature}{categorical features} +\item{categorical_feature}{categorical features. This can either be a character vector of feature +names or an integer vector with the indices of the features (e.g. +\code{c(1L, 10L)} to say "the first and tenth columns").} -\item{free_raw_data}{TRUE for need to free raw data after construct} +\item{free_raw_data}{LightGBM constructs its data format, called a "Dataset", from tabular data. +By default, that Dataset object on the R side does not keep a copy of the raw data. +This reduces LightGBM's memory consumption, but it means that the Dataset object +cannot be changed after it has been constructed. If you'd prefer to be able to +change the Dataset object after construction, set \code{free_raw_data = FALSE}.} \item{info}{a list of information of the \code{lgb.Dataset} object} diff --git a/R-package/man/lgb.cv.Rd b/R-package/man/lgb.cv.Rd index 65dd46658..ec606d880 100644 --- a/R-package/man/lgb.cv.Rd +++ b/R-package/man/lgb.cv.Rd @@ -29,7 +29,8 @@ lgb.cv( ) } \arguments{ -\item{params}{List of parameters} +\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}}, may allow you to pass other types of data like \code{matrix} and then separately supply @@ -104,10 +105,11 @@ the \code{nfold} and \code{stratified} parameters are ignored.} names or an integer vector with the indices of the features (e.g. \code{c(1L, 10L)} to say "the first and tenth columns").} -\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data -and one metric. If there's more than one, will check all of them -except the training data. Returns the model with (best_iter + early_stopping_rounds). -If early stopping occurs, the model will have 'best_iter' field.} +\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, +training will stop if the evaluation of any metric on any validation set +fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. +If training stops early, the returned model will have attribute \code{best_iter} +set to the iteration number of the best iteration.} \item{callbacks}{List of callback functions that are applied at each iteration.} diff --git a/R-package/man/lgb.train.Rd b/R-package/man/lgb.train.Rd index 2bfde1135..40c7135d3 100644 --- a/R-package/man/lgb.train.Rd +++ b/R-package/man/lgb.train.Rd @@ -24,7 +24,8 @@ lgb.train( ) } \arguments{ -\item{params}{List of parameters} +\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}}, may allow you to pass other types of data like \code{matrix} and then separately supply @@ -82,14 +83,15 @@ may allow you to pass other types of data like \code{matrix} and then separately \item{colnames}{feature names, if not null, will use this to overwrite the names in dataset} -\item{categorical_feature}{list of str or int -type int represents index, -type str represents feature names} +\item{categorical_feature}{categorical features. This can either be a character vector of feature +names or an integer vector with the indices of the features (e.g. +\code{c(1L, 10L)} to say "the first and tenth columns").} -\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data -and one metric. If there's more than one, will check all of them -except the training data. Returns the model with (best_iter + early_stopping_rounds). -If early stopping occurs, the model will have 'best_iter' field.} +\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, +training will stop if the evaluation of any metric on any validation set +fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. +If training stops early, the returned model will have attribute \code{best_iter} +set to the iteration number of the best iteration.} \item{callbacks}{List of callback functions that are applied at each iteration.} @@ -97,12 +99,13 @@ If early stopping occurs, the model will have 'best_iter' field.} booster model into a predictor model which frees up memory and the original datasets} -\item{...}{other parameters, see Parameters.rst for more information. A few key parameters: +\item{...}{other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +the "Parameters" section of the documentation} for more information. A few key parameters: \itemize{ \item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.} \item{\code{num_leaves}: Maximum number of leaves in one tree.} \item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with - overfit when #data is small. Tree still grow by leaf-wise.} + overfitting. Tree still grow by leaf-wise.} \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}), not the number of threads (most CPU using hyper-threading to generate 2 threads diff --git a/R-package/man/lgb_shared_params.Rd b/R-package/man/lgb_shared_params.Rd index 89786e44f..e5288e045 100644 --- a/R-package/man/lgb_shared_params.Rd +++ b/R-package/man/lgb_shared_params.Rd @@ -10,10 +10,11 @@ may allow you to pass other types of data like \code{matrix} and then separately supply \code{label} as a keyword argument.} -\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data -and one metric. If there's more than one, will check all of them -except the training data. Returns the model with (best_iter + early_stopping_rounds). -If early stopping occurs, the model will have 'best_iter' field.} +\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, +training will stop if the evaluation of any metric on any validation set +fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. +If training stops early, the returned model will have attribute \code{best_iter} +set to the iteration number of the best iteration.} \item{eval}{evaluation function(s). This can be a character vector, function, or list with a mixture of strings and functions. @@ -59,7 +60,8 @@ If early stopping occurs, the model will have 'best_iter' field.} \code{regression}, \code{regression_l1}, \code{huber}, \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}} -\item{params}{List of parameters} +\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{verbose}{verbosity for output, if <= 0, also will disable the print of evaluation during training} } diff --git a/R-package/man/lightgbm.Rd b/R-package/man/lightgbm.Rd index dfd9b2aa8..6512dbc6b 100644 --- a/R-package/man/lightgbm.Rd +++ b/R-package/man/lightgbm.Rd @@ -28,7 +28,8 @@ may allow you to pass other types of data like \code{matrix} and then separately \item{weight}{vector of response values. If not NULL, will set to dataset} -\item{params}{List of parameters} +\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ +the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{nrounds}{number of training rounds} @@ -36,10 +37,11 @@ may allow you to pass other types of data like \code{matrix} and then separately \item{eval_freq}{evaluation output frequency, only effect when verbose > 0} -\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data -and one metric. If there's more than one, will check all of them -except the training data. Returns the model with (best_iter + early_stopping_rounds). -If early stopping occurs, the model will have 'best_iter' field.} +\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, +training will stop if the evaluation of any metric on any validation set +fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. +If training stops early, the returned model will have attribute \code{best_iter} +set to the iteration number of the best iteration.} \item{save_name}{File name to use when writing the trained model to disk. Should end in ".model".} diff --git a/R-package/man/saveRDS.lgb.Booster.Rd b/R-package/man/saveRDS.lgb.Booster.Rd index ac9a0386c..76f1165f8 100644 --- a/R-package/man/saveRDS.lgb.Booster.Rd +++ b/R-package/man/saveRDS.lgb.Booster.Rd @@ -15,7 +15,7 @@ saveRDS.lgb.Booster( ) } \arguments{ -\item{object}{R object to serialize.} +\item{object}{\code{lgb.Booster} object to serialize.} \item{file}{a connection or the name of the file where the R object is saved to or read from.}