From b4535b8d843c9f0881d9f30a0869def4a1aec83f Mon Sep 17 00:00:00 2001 From: Nikita Titov Date: Sat, 9 Jun 2018 05:04:04 +0300 Subject: [PATCH] [docs] refined parameters section in quick start guide doc (#1427) --- docs/Parameters.rst | 2 +- docs/Quick-Start.rst | 176 ++----------------------------------------- 2 files changed, 9 insertions(+), 169 deletions(-) diff --git a/docs/Parameters.rst b/docs/Parameters.rst index 1f26bbd9f..7091a351e 100644 --- a/docs/Parameters.rst +++ b/docs/Parameters.rst @@ -23,7 +23,7 @@ Parameters can be set both in config file and command line. By using command line, parameters should not have spaces before and after ``=``. By using config files, one line can only contain one parameter. You can use ``#`` to comment. -If one parameter appears in both command line and config file, LightGBM will use the parameter in command line. +If one parameter appears in both command line and config file, LightGBM will use the parameter from the command line. .. start params list diff --git a/docs/Quick-Start.rst b/docs/Quick-Start.rst index 21efb29b4..fe93a3b4c 100644 --- a/docs/Quick-Start.rst +++ b/docs/Quick-Start.rst @@ -42,151 +42,17 @@ And it needs an additional `query data <./Parameters.rst#io-parameters>`_ for ra Also, weight and query data could be specified as columns in training data in the same manner as label. -Parameter Quick Look --------------------- +Parameters Quick Look +--------------------- + +The parameters format is ``key1=value1 key2=value2 ...``. -The parameter format is ``key1=value1 key2=value2 ...``. Parameters can be set both in config file and command line. +If one parameter appears in both command line and config file, LightGBM will use the parameter from the command line. -Some important parameters: - -- ``config``, default=\ ``""``, type=string, alias=\ ``config_file`` - - - path to config file - -- ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``predict``, ``convert_model`` - - - ``train``, alias=\ ``training``, for training - - - ``predict``, alias=\ ``prediction``, ``test``, for prediction - - - ``convert_model``, for converting model file into if-else format, see more information in `IO Parameters <./Parameters.rst#io-parameters>`__ - -- ``application``, default=\ ``regression``, type=enum, - options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``mape``, ``gammma``, ``tweedie``, - ``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``, - alias=\ ``objective``, ``app`` - - - regression application - - - ``regression_l2``, L2 loss, alias=\ ``regression``, ``mean_squared_error``, ``mse``, ``l2_root``, ``root_mean_squared_error``, ``rmse`` - - - ``regression_l1``, L1 loss, alias=\ ``mean_absolute_error``, ``mae`` - - - ``huber``, `Huber loss`_ - - - ``fair``, `Fair loss`_ - - - ``poisson``, `Poisson regression`_ - - - ``quantile``, `Quantile regression`_ - - - ``mape``, `MAPE loss`_, alias=\ ``mean_absolute_percentage_error`` - - - ``gamma``, Gamma regression with log-link. It might be useful, e.g., for modeling insurance claims severity, or for any target that might be `gamma-distributed`_ - - - ``tweedie``, Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any target that might be `tweedie-distributed`_ - - - ``binary``, binary `log loss`_ classification application - - - multi-class classification application - - - ``multiclass``, `softmax`_ objective function, alias=\ ``softmax`` - - - ``multiclassova``, `One-vs-All`_ binary objective function, alias=\ ``multiclass_ova``, ``ova``, ``ovr`` - - - ``num_class`` should be set as well - - - cross-entropy application - - - ``xentropy``, objective function for cross-entropy (with optional linear weights), alias=\ ``cross_entropy`` - - - ``xentlambda``, alternative parameterization of cross-entropy, alias=\ ``cross_entropy_lambda`` - - - the label is anything in interval [0, 1] - - - ``lambdarank``, `lambdarank`_ application - - - the label should be ``int`` type in lambdarank tasks, and larger number represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect) - - - ``label_gain`` can be used to set the gain(weight) of ``int`` label - - - all values in ``label`` must be smaller than number of elements in ``label_gain`` - -- ``boosting``, default=\ ``gbdt``, type=enum, - options=\ ``gbdt``, ``rf``, ``dart``, ``goss``, - alias=\ ``boost``, ``boosting_type`` - - - ``gbdt``, traditional Gradient Boosting Decision Tree - - - ``rf``, Random Forest - - - ``dart``, `Dropouts meet Multiple Additive Regression Trees`_ - - - ``goss``, Gradient-based One-Side Sampling - -- ``data``, default=\ ``""``, type=string, alias=\ ``train``, ``train_data`` - - - training data, LightGBM will train from this data - -- ``valid``, default=\ ``""``, type=multi-string, alias=\ ``test``, ``valid_data``, ``test_data`` - - - validation/test data, LightGBM will output metrics for these data - - - support multi validation data, separate by ``,`` - -- ``num_iterations``, default=\ ``100``, type=int, - alias=\ ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds``, ``num_boost_round``, ``n_estimators`` - - - number of boosting iterations - -- ``learning_rate``, default=\ ``0.1``, type=double, alias=\ ``shrinkage_rate`` - - - shrinkage rate - -- ``num_leaves``, default=\ ``31``, type=int, alias=\ ``num_leaf`` - - - number of leaves in one tree - -- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, ``voting``, alias=\ ``tree`` - - - ``serial``, single machine tree learner - - - ``feature``, alias=\ ``feature_parallel``, feature parallel tree learner - - - ``data``, alias=\ ``data_parallel``, data parallel tree learner - - - ``voting``, alias=\ ``voting_parallel``, voting parallel tree learner - - - refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details - -- ``num_threads``, default=\ ``OpenMP_default``, type=int, alias=\ ``num_thread``, ``nthread`` - - - number of threads for LightGBM - - - for the best speed, set this to the number of **real CPU cores**, - not the number of threads (most CPU using `hyper-threading`_ to generate 2 threads per CPU core) - - - for parallel learning, should not use full CPU cores since this will cause poor performance for the network - -- ``max_depth``, default=\ ``-1``, type=int - - - limit the max depth for tree model. - This is used to deal with over-fitting when ``#data`` is small. - Tree still grows by leaf-wise - - - ``< 0`` means no limit - -- ``min_data_in_leaf``, default=\ ``20``, type=int, alias=\ ``min_data_per_leaf`` , ``min_data``, ``min_child_samples`` - - - minimal number of data in one leaf. Can be used this to deal with over-fitting - -- ``min_sum_hessian_in_leaf``, default=\ ``1e-3``, type=double, - alias=\ ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian``, ``min_child_weight`` - - - minimal sum hessian in one leaf. Like ``min_data_in_leaf``, it can be used to deal with over-fitting - -For all parameters, please refer to `Parameters <./Parameters.rst>`__. +The most important parameters which new users should take a look to are located into `Core Parameters <./Parameters.rst#core-parameters>`__ +and the top of `Learning Control Parameters <./Parameters.rst#learning-control-parameters>`__ +sections of the full detailed list of `LightGBM's parameters <./Parameters.rst>`__. Run LightGBM ------------ @@ -228,29 +94,3 @@ Examples .. _LibSVM: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ .. _Expo data: http://stat-computing.org/dataexpo/2009/ - -.. _Huber loss: https://en.wikipedia.org/wiki/Huber_loss - -.. _Fair loss: https://www.kaggle.com/c/allstate-claims-severity/discussion/24520 - -.. _Poisson regression: https://en.wikipedia.org/wiki/Poisson_regression - -.. _Quantile regression: https://en.wikipedia.org/wiki/Quantile_regression - -.. _MAPE loss: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error - -.. _log loss: https://en.wikipedia.org/wiki/Cross_entropy - -.. _softmax: https://en.wikipedia.org/wiki/Softmax_function - -.. _One-vs-All: https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest - -.. _lambdarank: https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf - -.. _Dropouts meet Multiple Additive Regression Trees: https://arxiv.org/abs/1505.01866 - -.. _hyper-threading: https://en.wikipedia.org/wiki/Hyper-threading - -.. _gamma-distributed: https://en.wikipedia.org/wiki/Gamma_distribution#Applications - -.. _tweedie-distributed: https://en.wikipedia.org/wiki/Tweedie_distribution#Applications