[R-package] clarify parameter documentation (fixes #4193) (#4202)

* [R-package] clarify parameter documentation * fixes to braces * linting
2021-04-19 15:41:27 -05:00 · 2021-04-19 15:41:27 -05:00 · 53602afa47
--- a/R-package/R/lgb.Dataset.R
+++ b/R-package/R/lgb.Dataset.R
@ -730,11 +730,22 @@ Dataset <- R6::R6Class(
 #' @description Construct \code{lgb.Dataset} object from dense matrix, sparse matrix
 #'              or local file (that was created previously by saving an \code{lgb.Dataset}).
 #' @param data a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename
-#' @param params a list of parameters
-#' @param reference reference dataset
+#' @param params a list of parameters. See
+#'               \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{
+#'               The "Dataset Parameters" section of the documentation} for a list of parameters
+#'               and valid values.
+#' @param reference reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning
+#'                  continuous features into histograms. If you want to apply the same bin boundaries from an existing
+#'                  dataset to new \code{data}, pass that existing Dataset to this argument.
 #' @param colnames names of columns
-#' @param categorical_feature categorical features
-#' @param free_raw_data TRUE for need to free raw data after construct
+#' @param categorical_feature categorical features. This can either be a character vector of feature
+#'                            names or an integer vector with the indices of the features (e.g.
+#'                            \code{c(1L, 10L)} to say "the first and tenth columns").
+#' @param free_raw_data LightGBM constructs its data format, called a "Dataset", from tabular data.
+#'                      By default, that Dataset object on the R side does not keep a copy of the raw data.
+#'                      This reduces LightGBM's memory consumption, but it means that the Dataset object
+#'                      cannot be changed after it has been constructed. If you'd prefer to be able to
+#'                      change the Dataset object after construction, set \code{free_raw_data = FALSE}.
 #' @param info a list of information of the \code{lgb.Dataset} object
 #' @param ... other information to pass to \code{info} or parameters pass to \code{params}
 #'
--- a/R-package/R/lgb.train.R
+++ b/R-package/R/lgb.train.R
@ -5,19 +5,20 @@
 #' @param valids a list of \code{lgb.Dataset} objects, used for validation
 #' @param record Boolean, TRUE will record iteration message to \code{booster$record_evals}
 #' @param colnames feature names, if not null, will use this to overwrite the names in dataset
-#' @param categorical_feature list of str or int
-#'                            type int represents index,
-#'                            type str represents feature names
+#' @param categorical_feature categorical features. This can either be a character vector of feature
+#'                            names or an integer vector with the indices of the features (e.g.
+#'                            \code{c(1L, 10L)} to say "the first and tenth columns").
 #' @param callbacks List of callback functions that are applied at each iteration.
 #' @param reset_data Boolean, setting it to TRUE (not the default value) will transform the
 #'                   booster model into a predictor model which frees up memory and the
 #'                   original datasets
-#' @param ... other parameters, see Parameters.rst for more information. A few key parameters:
+#' @param ... other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+#'            the "Parameters" section of the documentation} for more information. A few key parameters:
 #'            \itemize{
 #'                \item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
 #'                \item{\code{num_leaves}: Maximum number of leaves in one tree.}
 #'                \item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
-#'                                 overfit when #data is small. Tree still grow by leaf-wise.}
+#'                                 overfitting. Tree still grow by leaf-wise.}
 #'                \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
 #'                             the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}),
 #'                             not the number of threads (most CPU using hyper-threading to generate 2 threads
--- a/R-package/R/lightgbm.R
+++ b/R-package/R/lightgbm.R
@ -5,10 +5,11 @@
 #' @param data a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
 #'             may allow you to pass other types of data like \code{matrix} and then separately supply
 #'             \code{label} as a keyword argument.
-#' @param early_stopping_rounds int. Activates early stopping. Requires at least one validation data
-#'                              and one metric. If there's more than one, will check all of them
-#'                              except the training data. Returns the model with (best_iter + early_stopping_rounds).
-#'                              If early stopping occurs, the model will have 'best_iter' field.
+#' @param early_stopping_rounds int. Activates early stopping. When this parameter is non-null,
+#'                              training will stop if the evaluation of any metric on any validation set
+#'                              fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
+#'                              If training stops early, the returned model will have attribute \code{best_iter}
+#'                              set to the iteration number of the best iteration.
 #' @param eval evaluation function(s). This can be a character vector, function, or list with a mixture of
 #'             strings and functions.
 #'
@ -48,7 +49,8 @@
 #' @param obj objective function, can be character or custom objective function. Examples include
 #'            \code{regression}, \code{regression_l1}, \code{huber},
 #'            \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}
-#' @param params List of parameters
+#' @param params a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+#'               the "Parameters" section of the documentation} for a list of parameters and valid values.
 #' @param verbose verbosity for output, if <= 0, also will disable the print of evaluation during training
 #' @section Early Stopping:
 #'
--- a/R-package/R/saveRDS.lgb.Booster.R
+++ b/R-package/R/saveRDS.lgb.Booster.R
@ -2,7 +2,7 @@
 #' @title saveRDS for \code{lgb.Booster} models
 #' @description Attempts to save a model using RDS. Has an additional parameter (\code{raw})
 #'              which decides whether to save the raw model or not.
-#' @param object R object to serialize.
+#' @param object \code{lgb.Booster} object to serialize.
 #' @param file a connection or the name of the file where the R object is saved to or read from.
 #' @param ascii a logical. If TRUE or NA, an ASCII representation is written; otherwise (default),
 #'              a binary one is used. See the comments in the help for save.
--- a/R-package/man/lgb.Dataset.Rd
+++ b/R-package/man/lgb.Dataset.Rd
@ -18,15 +18,26 @@ lgb.Dataset(
 \arguments{
 \item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character representing a filename}

-\item{params}{a list of parameters}
+\item{params}{a list of parameters. See
+\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{
+The "Dataset Parameters" section of the documentation} for a list of parameters
+and valid values.}

-\item{reference}{reference dataset}
+\item{reference}{reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning
+continuous features into histograms. If you want to apply the same bin boundaries from an existing
+dataset to new \code{data}, pass that existing Dataset to this argument.}

 \item{colnames}{names of columns}

-\item{categorical_feature}{categorical features}
+\item{categorical_feature}{categorical features. This can either be a character vector of feature
+names or an integer vector with the indices of the features (e.g.
+\code{c(1L, 10L)} to say "the first and tenth columns").}

-\item{free_raw_data}{TRUE for need to free raw data after construct}
+\item{free_raw_data}{LightGBM constructs its data format, called a "Dataset", from tabular data.
+By default, that Dataset object on the R side does not keep a copy of the raw data.
+This reduces LightGBM's memory consumption, but it means that the Dataset object
+cannot be changed after it has been constructed. If you'd prefer to be able to
+change the Dataset object after construction, set \code{free_raw_data = FALSE}.}

 \item{info}{a list of information of the \code{lgb.Dataset} object}

--- a/R-package/man/lgb.cv.Rd
+++ b/R-package/man/lgb.cv.Rd
@ -29,7 +29,8 @@ lgb.cv(
 )
 }
 \arguments{
-\item{params}{List of parameters}
+\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+the "Parameters" section of the documentation} for a list of parameters and valid values.}

 \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
 may allow you to pass other types of data like \code{matrix} and then separately supply
@ -104,10 +105,11 @@ the \code{nfold} and \code{stratified} parameters are ignored.}
 names or an integer vector with the indices of the features (e.g.
 \code{c(1L, 10L)} to say "the first and tenth columns").}

-\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
-and one metric. If there's more than one, will check all of them
-except the training data. Returns the model with (best_iter + early_stopping_rounds).
-If early stopping occurs, the model will have 'best_iter' field.}
+\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
+training will stop if the evaluation of any metric on any validation set
+fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
+If training stops early, the returned model will have attribute \code{best_iter}
+set to the iteration number of the best iteration.}

 \item{callbacks}{List of callback functions that are applied at each iteration.}

--- a/R-package/man/lgb.train.Rd
+++ b/R-package/man/lgb.train.Rd
@ -24,7 +24,8 @@ lgb.train(
 )
 }
 \arguments{
-\item{params}{List of parameters}
+\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+the "Parameters" section of the documentation} for a list of parameters and valid values.}

 \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
 may allow you to pass other types of data like \code{matrix} and then separately supply
@ -82,14 +83,15 @@ may allow you to pass other types of data like \code{matrix} and then separately

 \item{colnames}{feature names, if not null, will use this to overwrite the names in dataset}

-\item{categorical_feature}{list of str or int
-type int represents index,
-type str represents feature names}
+\item{categorical_feature}{categorical features. This can either be a character vector of feature
+names or an integer vector with the indices of the features (e.g.
+\code{c(1L, 10L)} to say "the first and tenth columns").}

-\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
-and one metric. If there's more than one, will check all of them
-except the training data. Returns the model with (best_iter + early_stopping_rounds).
-If early stopping occurs, the model will have 'best_iter' field.}
+\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
+training will stop if the evaluation of any metric on any validation set
+fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
+If training stops early, the returned model will have attribute \code{best_iter}
+set to the iteration number of the best iteration.}

 \item{callbacks}{List of callback functions that are applied at each iteration.}

@ -97,12 +99,13 @@ If early stopping occurs, the model will have 'best_iter' field.}
 booster model into a predictor model which frees up memory and the
 original datasets}

-\item{...}{other parameters, see Parameters.rst for more information. A few key parameters:
+\item{...}{other parameters, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+the "Parameters" section of the documentation} for more information. A few key parameters:
 \itemize{
    \item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
    \item{\code{num_leaves}: Maximum number of leaves in one tree.}
    \item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
-                     overfit when #data is small. Tree still grow by leaf-wise.}
+                     overfitting. Tree still grow by leaf-wise.}
    \item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
                 the number of real CPU cores(\code{parallel::detectCores(logical = FALSE)}),
                 not the number of threads (most CPU using hyper-threading to generate 2 threads
--- a/R-package/man/lgb_shared_params.Rd
+++ b/R-package/man/lgb_shared_params.Rd
@ -10,10 +10,11 @@
 may allow you to pass other types of data like \code{matrix} and then separately supply
 \code{label} as a keyword argument.}

-\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
-and one metric. If there's more than one, will check all of them
-except the training data. Returns the model with (best_iter + early_stopping_rounds).
-If early stopping occurs, the model will have 'best_iter' field.}
+\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
+training will stop if the evaluation of any metric on any validation set
+fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
+If training stops early, the returned model will have attribute \code{best_iter}
+set to the iteration number of the best iteration.}

 \item{eval}{evaluation function(s). This can be a character vector, function, or list with a mixture of
            strings and functions.
@ -59,7 +60,8 @@ If early stopping occurs, the model will have 'best_iter' field.}
 \code{regression}, \code{regression_l1}, \code{huber},
 \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}}

-\item{params}{List of parameters}
+\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+the "Parameters" section of the documentation} for a list of parameters and valid values.}

 \item{verbose}{verbosity for output, if <= 0, also will disable the print of evaluation during training}
 }
--- a/R-package/man/lightgbm.Rd
+++ b/R-package/man/lightgbm.Rd
@ -28,7 +28,8 @@ may allow you to pass other types of data like \code{matrix} and then separately

 \item{weight}{vector of response values. If not NULL, will set to dataset}

-\item{params}{List of parameters}
+\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
+the "Parameters" section of the documentation} for a list of parameters and valid values.}

 \item{nrounds}{number of training rounds}

@ -36,10 +37,11 @@ may allow you to pass other types of data like \code{matrix} and then separately

 \item{eval_freq}{evaluation output frequency, only effect when verbose > 0}

-\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
-and one metric. If there's more than one, will check all of them
-except the training data. Returns the model with (best_iter + early_stopping_rounds).
-If early stopping occurs, the model will have 'best_iter' field.}
+\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
+training will stop if the evaluation of any metric on any validation set
+fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
+If training stops early, the returned model will have attribute \code{best_iter}
+set to the iteration number of the best iteration.}

 \item{save_name}{File name to use when writing the trained model to disk. Should end in ".model".}

--- a/R-package/man/saveRDS.lgb.Booster.Rd
+++ b/R-package/man/saveRDS.lgb.Booster.Rd
@ -15,7 +15,7 @@ saveRDS.lgb.Booster(
 )
 }
 \arguments{
-\item{object}{R object to serialize.}
+\item{object}{\code{lgb.Booster} object to serialize.}

 \item{file}{a connection or the name of the file where the R object is saved to or read from.}