Граф коммитов

509 Коммитов

Автор SHA1 Сообщение Дата
david-cortes f3ea1ad724
[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores (#5105)
* use joblib formula for negative n_jobs

* correction for n_jobs calculation

* use more robust cpu_count from joblib

* change default n_jobs to number of cores

* fix detection of num_threads under parameters

* better handling of n_jobs at prediction time

* fix incorrect usage of list.pop

* correct pop/remove yet again

* Update python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add comments clarifying negative n_jobs

* fix CI (code taken from PR comment)

* change default to n_jobs=None in dask interface

* corrections for handling of n_jobs

* linter

* corrections for predict-time n_jobs

* linter

* add more comments about n_jobs values

* linter

* more corrections

* linter

* linter

* linter

* Update python-package/lightgbm/compat.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* workaround for passing test about outputs with multiple threads

* Update tests/python_package_test/test_sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_sklearn.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-06-19 03:30:15 +03:00
James Lamb 11110c540e
[python-package] remove `Booster.set_attr()` and `Booster.attr()` (#5272) 2022-06-12 20:18:26 +03:00
shiyu1994 f1328d5c5f
Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679) (#5164)
* clear split info buffer in cegb_ before every iteration

* check nullable of cegb_ in serial_tree_learner.cpp

* add a test case for checking the split buffer in CEGB

* swith to Threading::For instead of raw OpenMP

* apply review suggestions

* apply review comments

* remove device cpu
2022-06-07 22:03:10 -05:00
Nikita Titov 27d9ad2e8e
[tests][python] Make test that checks original pandas data isn't modified more strict (#5267)
* Update test_basic.py

* Address review comment
2022-06-05 17:34:10 -05:00
José Morales 65b3db1c9f
[python-package] make a shallow copy on dataframe rename (fixes #4596) (#5254)
* dont copy dataframe on rename

* test with feature_name and 'auto'
2022-06-05 04:38:57 +03:00
Nikita Titov a4478f7e35
[python] Fix training on subset constructed without params (#5213)
* Update basic.py

* Update test_engine.py

* Add return type annotation
2022-05-24 03:45:42 +03:00
José Morales c000b8cc68
[python-package] make a shallow copy when replacing categorical features with codes (fixes #4596) (#5225) 2022-05-22 09:28:28 +08:00
José Morales 5b664b67c4
[python-package][R-package] allow using feature names when retrieving number of bins (#5116)
* allow using feature names when retrieving number of bins

* unname vector

* use default feature names when not defined

* lint

* apply suggestions

* remove extra comma

* add test with categorical feature

* make feature names sync more transparent
2022-05-16 21:45:13 -05:00
Nikita Titov 6de9bafaeb
Fix potential overflow "Multiplication result converted to larger type" (#5189)
* Update dataset_loader.cpp

* Update gbdt.h

* Update regression_objective.hpp

* Update linker_topo.cpp

* Update xentropy_objective.hpp

* Update regression_objective.hpp

* investigate inf test failure

* avoid overflow in regression objective

* remove `test_inf_handle` test

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2022-05-10 10:56:24 +08:00
José Morales f53fa6912e
[c-api] check number of features when retrieving number of bins (#5183)
* check number of features when retrieving number of bins

* check for negative values

* lint
2022-04-30 18:44:40 +03:00
Nikita Titov 56ccea4243
[tests] replace `fobj` with `custom objective` in test comments and make tests stricter (#5173) 2022-04-24 19:20:30 +03:00
Miguel Trejo Marrufo 416ecd5a8d
[python-package] remove 'fobj' in favor of passing custom objective function in params (fixes #3244) (#5052)
* feat: support custom metrics in params

* feat: support objective in params

* test: custom objective and metric

* fix: imports are incorrectly sorted

* feat: convert eval metrics str and set to list

* feat: convert single callable eval_metric to list

* test: single callable objective in params

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* feat: callable fobj in basic cv function

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv support objective callable

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: assert in cv_res

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* docs: objective callable in params

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* recover test_boost_from_average_with_single_leaf_trees

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linters fail

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* remove metrics helper functions

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* feat: choose objective through _choose_param_values

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: test objective through _choose_param_values

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: test objective is callabe in train

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: parametrize choose_param_value with objective aliases

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv booster metric is none

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: if string and callable choose callable

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test train uses custom objective metrics

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: cv uses custom objective metrics

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: remove fobj parameter in train and cv

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: objective through params in sklearn API

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* custom objective function in advanced_example

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix whitespackes lint

* objective is none not a particular case for predict method

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* replace scipy.expit with custom implementation

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* test: set num_boost_round value to 20

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* fix: custom objective default_value is none

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: remove self._fobj

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* custom_objective default value is None

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* refactor: variables name reference dummy_obj

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linter errors

* fix: process objective parameter when calling predict

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>

* linter errors

* fix: objective is None during predict call

Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
2022-04-21 22:41:49 -05:00
Nikita Titov 4ae3d1387d
[python] make `reset_parameter` callback pickleable (#5109) 2022-03-31 22:33:55 +03:00
Nikita Titov 60244e4a41
[python] make `record_evaluation` callback pickleable (#5107)
* make `log_evaluation` callback pickleable

* make callback tests stricter

* make `record_evaluation` callback picklable
2022-03-31 01:09:38 +03:00
Nikita Titov 8b33e776cc
[python] make `log_evaluation` callback pickleable (#5101)
* make `log_evaluation` callback pickleable

* make callback tests stricter
2022-03-30 21:52:46 +03:00
RustingSword 60e72d5f4e
[python] allow to register any custom logger (fixes #4783) (#4880)
* [python] allow to register any custom logger

* allow customizable logging method name; add unit test

* [python] allow to register any custom logger

* allow customizable logging method name; add unit test

* update tests

* fix lint error

* remove unused method

* fix docstring style

Co-authored-by: gongxudong <gongxudong@kuaishou.com>
2022-03-29 02:01:43 +03:00
shiyu1994 6b56a90cd1
[CUDA] New CUDA version Part 1 (#4630)
* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers

Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-03-23 10:39:23 +08:00
James Lamb b857ee10cc
clarify no-meaningful-features warning in Dataset construction (fixes #5081) (#5083)
* clarify no-meaningful-features warning in Dataset construction (fixes #5081)

* update tests
2022-03-22 13:10:38 +08:00
Antoni Baum f77e0adf59
[python] make `early_stopping` callback pickleable (#5012)
* Turn `early_stopping` into a Callable class

* Fix

* Lint

* Remove print

* Fix order

* Revert "Lint"

This reverts commit 7ca8b55757.

* Apply suggestion from code review

* Nit

* Lint

* Move callable class outside the func for pickling

* Move _pickle and _unpickle to tests utils

* Add early stopping callback picklability test

* Nit

* Fix

* Lint

* Improve type hint

* Lint

* Lint

* Add cloudpickle to test_windows

* Update tests/python_package_test/test_engine.py

* Fix

* Apply suggestions from code review
2022-03-16 23:03:53 -05:00
José Morales d10372e2e0
[c-api][python-package][R-package] expose feature num bin (#5048)
* expose FeatureNumBin in C api

* parametrize min_data_in_bin and add test with max_bin_by_feature

* include feature_num_bin in R package

* add suggestion from review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update error message and lint

* lint

* add call method

* minor improvements in tests

* add suggestions from review

* lint

* rename argument to feature in python and r packages

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-03-15 06:39:40 +03:00
José Morales 9a4e70687d
[python-package] [R-package] propagate the best iteration of cvbooster into the individual boosters (#5066) 2022-03-12 22:14:43 +03:00
shiyu1994 f6d654b737
[fix] fix duplicate added initial scores for single-leaf trees (#fixes #4708)
* fix duplicate added initial scores for single-leaf trees

* add test case

* Fix import in Python test

* commit python suggestions

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-03-08 21:34:17 -06:00
Nikita Titov 01568cf59a
[tests][python] move tests that use `train()` function defined in `engine.py` from `test_basic.py` to `test_engine.py` (#5034)
* Update test_basic.py

* Update test_engine.py

* Update test_engine.py
2022-03-01 22:19:39 +08:00
José Morales f185695617
[python-package] add support for pandas nullable types (fixes #4173) (#4927)
* map nullable dtypes to regular float dtypes

* cast x3 to float after introducing missing values

* add test for regular dtypes

* use .astype and then values. update nullable_dtypes test and include test for regular numpy dtypes

* more specific allowed dtypes. test no copy when single float dtype df

* use np.find_common_type. set np.float128 to None when it isn't supported

* set default as type(None)

* move tests that use lgb.train to test_engine

* include np.float32 when finding common dtype

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add linebreak

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-02-23 22:27:03 -06:00
José Morales d670a4d655
[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective (#4925)
* reshape predictions, grad and hess in multiclass custom objective

* add sklearn test. move custom obj to utils. docs for numpy

* use num_model_per_iteration to get num_classes

* update docs and dask multiclass custom objective test

* move reshaping to __inner_predict. add test for feval

* add missing note. remove extra line
2022-02-23 11:54:04 +08:00
José Morales 9fc348af6f
[python-package] make record_evaluation compatible with cv (fixes #4943) (#4947)
* make record_evaluation compatible with cv

* test multiple metrics in cv

* lint

* fix cv with train metric. save stdv as well

* always add dataset prefix to cv_agg

* remove unused function
2022-02-16 02:23:04 +03:00
Nikita Titov a3e073ad3a
[tests][python] remove compatibility code for old versions in tests (#4978)
* Update test_dask.py

* Update test_engine.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_engine.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py
2022-02-13 00:47:28 +03:00
Miguel Trejo Marrufo e6a2f7162c
[python-package] support customizing Dataset creation in Booster.refit() (fixes #3038) (#4894)
* feat: refit additional kwargs for dataset and predict

* test: kwargs for refit method

* fix: __init__ got multiple values for argument

* fix: pycodestyle E302 error

* refactor: dataset_params to avoid breaking change

* refactor: expose all Dataset params in refit

* feat: dataset_params updates new_params

* fix: remove unnecessary params to test

* test: parameters input are the same

* docs: address StrikeRUS changes

* test: refit test changes in train dataset

* test: set init_score and decay_rate to zero
2022-01-22 23:17:16 +03:00
James Lamb a06fadfb7a
[dask] add support for custom objective functions (fixes #3934) (#4920)
* add test for custom objective with regressor

* add test for custom binary classification objective with classifier

* isort

* got tests working for multiclass

* update docs

* train deeper model for classifier

* Apply suggestions from code review

Co-authored-by: José Morales <jmoralz92@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update multiclass tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix multiclass probabilities

* linting

Co-authored-by: José Morales <jmoralz92@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-01-17 23:30:26 +03:00
Yaqub Alwan af5b40e1f6
[python] raise an informative error instead of segfaulting when custom objective produces incorrect output (#4815)
* fix for bad grads causing segfault

* adjust checking criteria to properly reflect reality of multi-class classifiers

* fix styling

* Line break before operator

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add a note to the C-API docs

* rearrange text s;ightly

* add some tests to python package

* Update include/LightGBM/c_api.h

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* PR comments

* match argument is a regex and our expression has brackets ..

* rework tests

* isorting imports

* updating test to relfect that the python APi does not take pres/labels as a fobj function

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-12-30 13:12:00 +08:00
Nikita Titov ce486e5b45
[python] remove `early_stopping_rounds` argument of `train()` and `cv()` functions (#4908) 2021-12-26 17:20:49 +03:00
Nikita Titov e4c0ca5f5d
[python] remove `evals_result` argument of `train()` function (#4882) 2021-12-23 04:57:09 +03:00
José Morales 8a34b1af2d
[tests][python-package] change boston dataset to synthetic dataset in tests that don't check score (#4895)
* change boston dataset to synthetic dataset in tests that don't evaluate score

* format imports
2021-12-21 02:41:39 +03:00
Nikita Titov 8e729af38d
[python] reset storage in record evaluation callback each time before starting training (#4885)
* Update test_sklearn.py

* Update python_package.yml

* Update python_package.yml

* Update callback.py

* Update callback.py
2021-12-18 17:30:35 +03:00
Nikita Titov 729ac43c25
[python][sklearn] do not replace empty dict with `None` for `evals_result_` (#4884)
* Update sklearn.py

* Update sklearn.py

* Update test_sklearn.py
2021-12-18 17:28:55 +03:00
Nikita Titov 9f13a9c897
[python] remove `verbose_eval` argument of `train()` and `cv()` functions (#4878)
* remove `verbose_eval` argument

* update example Notebook
2021-12-12 21:02:15 +03:00
Nikita Titov 8066261899
[python] remove `verbose` argument of `model_from_string()` method of Booster class (#4877) 2021-12-10 20:26:35 -06:00
Nikita Titov f71328d410
[python][sklearn] Remove `early_stopping_rounds` argument of `fit()` method (#4846) 2021-12-11 01:21:19 +03:00
Nikita Titov d82743465c
[python] reset storages in early stopping callback after finishing training (#4868) 2021-12-10 03:02:07 +03:00
James Lamb 630f2e78af
[python-package][dask] handle failures parsing worker host names (#4852)
* [python-package][dask] handle failures parsing work host names

* add tests

* revert local testing changes
2021-12-06 12:56:59 -06:00
Nikita Titov 12915d5813
[python][sklearn] unify values of `best_iteration` for sklearn and standard APIs (#4845)
* unify values of `best_iteration` for sklearn and standard APIs

* update Dask test
2021-12-04 23:10:28 -06:00
Nikita Titov cf38071b6a
Add C API function that returns all parameter names with their aliases (#4829)
* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-12-02 21:23:46 -06:00
Nikita Titov f57ef6f479
[python][sklearn] respect parameters for predictions in `init()` and `set_params()` methods (#4822)
* in predict(), respect params set via `set_params()` after fit()

* continue

* add test

* fix return name

* hotfix

* simplify
2021-12-02 04:58:26 +03:00
Nikita Titov b31d5a4392
[tests][dask] fix argument names in custom eval function in Dask test (#4833)
* fix argument types in custom eval function for Dask estimators

* revert changes to docstrings

* fix argument names in Dask test
2021-12-02 04:56:58 +03:00
Nikita Titov 4072e9f793
[python][sklearn] remove `verbose` argument from `fit()` method (#4832) 2021-12-01 02:32:41 +03:00
Nikita Titov 2caf945f9d
[python] Remove `silent` argument (#4800)
* Update test_plotting.py

* Update dask.py

* Update sklearn.py

* Update test_sklearn.py

* Update basic.py

* Update engine.py

* Update test_engine.py

* Update basic.py

* Update basic.py

* Update engine.py
2021-11-21 01:09:38 +03:00
chjinche b0137debe6
Add customized parser support (#4782)
* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines
2021-11-16 14:27:23 +08:00
José Morales 99e0a4bd7b
[python-package] early stopping min_delta (fixes #2526) (#4580)
* initial changes

* initial version

* better handling of cases

* warn only with positive threshold

* remove early_stopping_threshold from high-level functions

* remove remaining early_stopping_threshold

* update test to use callback

* better handling of cases

* rename threshold to min_delta

enhance parameter description

update tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* reduce num_boost_round in tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* trigger ci

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
2021-11-10 16:17:06 +03:00
Nikita Titov 0a4d190828
[python][sklearn] respect objective aliases (#4758)
* respect objective aliases

* Update test_sklearn.py

* revert removal of blank lines

* add argument name which is being overwritten in warning message
2021-11-10 16:15:39 +03:00
tongwu-msft 33a2f9ec05
Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) (#4725)
* issue fix #4601

* fix issue 4601 it2

* add tests for issue 4601

* fix warning

* fix warning

* add new line at end

* remove last line at end

* fix lint warning

* address comments

* address comments

* address comments

* fix address

* address comments

* revert seed

* fix recursive force split issue

* fix build error

* fix lint warning
2021-11-10 09:30:54 +08:00
Zhiyuan He b1facf5050
Suppress categorical warning (fixes #3379) 2021-11-08 10:06:50 +08:00
Nikita Titov cebdc2a8c4
[ci][tests][python] remove assertion for `filename` attribute that is no longer true with new version of graphviz (#4778) 2021-11-07 20:33:18 +03:00
Nikita Titov aab212a782
[python][sklearn] add `n_estimators_` and `n_iter_` post-fit attributes (#4753)
* add n_estimators_ and n_iter_ post-fit attributes

* address review comments
2021-11-05 20:29:49 +03:00
Nikita Titov 798dc1d419
[tests] [python] add test for non-serializable callback (#4741) 2021-10-28 23:25:22 -05:00
Nikita Titov d130bb198b
fix behavior for default objective and metric (#4660) 2021-10-13 11:44:22 +08:00
José Morales 29857c8adb
[tests][python-package] refactor list_to_1d_numpy test to run without pandas installed (#4639)
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
2021-10-07 19:30:20 +03:00
Nikita Titov b78175b746
[python] add placeholders to titles in plotting functions (#4614) 2021-09-23 18:50:20 +03:00
José Morales f1f5ba15c2
[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)
* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection

Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
2021-09-17 21:44:03 +03:00
Nikita Titov 54facc4d72
[python] rename `print_evaluation()` into `log_evaluation()` (#4604)
* Update __init__.py

* Update Python-API.rst

* Update engine.py

* Update test_utilities.py

* Update sklearn.py

* Update callback.py

* Update callback.py

* Update callback.py
2021-09-16 01:26:02 +03:00
Nikita Titov 86bda6f061
[RFC][python] deprecate advanced args of `train()` and `cv()` functions and sklearn wrapper (#4574)
* deprecate advanced args of `train()` and `cv()`

* update Dask test

* improve deducing

* address review comments
2021-09-12 22:19:03 +03:00
Nikita Titov 79463dfb11
[python] [sklearn] respect `eval_at` aliases in keyword arguments (#4599) 2021-09-09 22:33:39 -05:00
José Morales 5857ef5e38
[tests][dask] Use workers hostname in tests (fixes #4594) (#4595)
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
2021-09-09 19:09:14 +03:00
James Lamb 4bf9f95455
[ci] skip Dask tests on QEMU builds (#4600) 2021-09-09 14:45:50 +03:00
Nikita Titov 3942126592
add 'auto' value for `importance_type` param in plotting (#4570) 2021-08-31 20:24:15 -05:00
Xavier Dupré 11d7608f2d
[python] add parameter object_hook to method dump_model (#4533)
* add parameter object_hook to function dump_model (python API)

* eol

* fix syntax

* lint

* better documentation

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-08-24 01:48:16 +03:00
José Morales cfe8eb17c9
[tests][dask] reduce number of collisions tests (#4501)
* reduce number of collisions tests

* measure tests execution time

* measure tests execution time in bdist task

* remove durations in bdist task
2021-08-09 20:04:01 +03:00
José Morales 5fe27d5942
[dask] find all needed ports in each host at once (fixes #4458) (#4498)
* find all needed ports in each worker at once

* lint

* better naming

* use _HostWorkers in test
2021-08-03 17:24:10 -05:00
Nikita Titov 661bde103a
[python][tests] refactor tests with Sequence input (#4495) 2021-07-31 22:38:31 +03:00
Chen Yufei 1d21d1ad4c
[python] support Dataset.get_data for Sequence input. (#4472)
* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-07-30 23:49:13 +03:00
Nikita Titov 96583ab589
[python] migrate to pathlib in setup.py and use `absolute()` on paths first (#4444)
* use absolute() on paths first

* migrate to pathlib in setup.py
2021-07-10 16:18:50 +03:00
Nikita Titov d05f54701e
[tests][python] added tests for early stop in prediction in ranking task (#4457) 2021-07-09 23:07:36 -05:00
Nikita Titov 7f9959fe1c
[tests] clarify RuntimeError in distributed tests(#4452) 2021-07-07 08:29:53 -05:00
Nikita Titov 90342e929b
[python] allow to pass some params as pathlib.Path objects (#4440)
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
2021-07-07 14:31:06 +03:00
James Lamb b09da434f0
[dask] Make output of feature contribution predictions for sparse matrices match those from sklearn estimators (fixes #3881) (#4378)
* test_classifier working

* adding tests

* docs

* tests

* revert unnecessary changes in tests

* test output type

* linting

* linting

* use from_delayed() instead

* docstring pycodestyle is happy with

* isort

* put pytest skips back

* respect sparse return type

* fix doc

* remove unnecessary dask_array_concatenate()

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update predict_proba() docstring

* remove unnecessary np.array()

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix assertion

* fix test use of len()

* restore np.array() in tests

* use np.asarray() instead

* use toarray()

* remove empty functions in compat

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-07-07 14:27:06 +03:00
James Lamb e36cc9c171
[python-package] use toarray() instead of todense() in tests and examples (#4446) 2021-07-07 01:12:47 +03:00
Nikita Titov ec1debcee8
[python] migrate to pathlib in distributed tests (#4443) 2021-07-05 18:47:24 -05:00
Nikita Titov 7eac5a6381
[python] minor refactoring of Python code (#4442)
* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py
2021-07-04 22:58:41 -05:00
Nikita Titov 03469ae59b
[tests][python] refactor file loading routine in C API test (#4437)
* refactor file loading in C API test

* continue
2021-07-04 17:10:48 -05:00
Nikita Titov 29052c5dc6
[tests] fix deprecation numpy warning (#4439) 2021-07-04 17:00:25 -05:00
James Lamb 26cc160abc
[python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136) (#4436)
* [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136)

* Update tests/python_package_test/test_engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* revert get_workflow_status changes

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-07-04 15:10:32 -05:00
jmoralez b699fa68cb
[tests][cli] distributed training (#4254)
* include distributed tests

* remove github action file

* try CI

* build shared library and fix linting error

* ignore files created for testing. add type hints and check with mypy. include docstrings

* lint

* use pre_partition and write separate model files. remove mypy

* update docs

* remove ci. lower rtol. pass num_machines in config

* write predict.conf in the predict method. more robust port setup. use subprocess.run and check returncode

* add paths to tests and binary. remove lgb dependency. update .igtignore.

* lint

* allow to pass executable dir as argument to pytest

* pass execfile to pytest instead of execdir

* add suggestions

* use os.path and add type hint to predict_config

* Update tests/distributed/_test_distributed.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-07-04 00:10:09 -05:00
Nikita Titov cff80442e1
[python] migrate to pathlib in python tests (#4435) 2021-07-03 23:31:41 -05:00
Chen Yufei c359896e9b
[python-package] Create Dataset from multiple data files (#4089)
* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389

* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence

Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review

Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-07-02 15:17:17 +03:00
Nikita Titov 189a80181e
fix compiler warning about types conversion in cpp tests (#4418) 2021-06-29 14:46:25 +08:00
Frank Fineis b5502d19b2
[dask] add support for eval sets and custom eval functions (#4101)
* es WiP, need to add eval_sample_weight and eval_group

* add weight, group to dask es. WiP.

* dask es reorg

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

_train_part model.fit args to lines, pt2

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines pt3

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

dask_model.fit args to lines

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

use is instead of id()

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* applying changes to eval_set PR WiP

* dask support for eval_names, eval_metric, eval_stopping_rounds

* add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP

* fix lint errors in test_dask.py

* drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker

* add eval_at to test_dask eval_set ranker tests

* add back group_shape to lgbmmmodel docs, tighten tests

* drop random eval weights from early stopping, probably causing training to terminate too early

* add eval data templates to sklearn fit docs, add eval data docs to dask

* add n_features to _create_data, eval_set tests stop w/ desirable tree counts

* import alphabetically

* add back get_worker for eval_set error handling

* test_dask argmin typo

* push forgotten eval_names bugfix

* eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test

* change default eval_at to tuple 1-5

* re-drop get_worker

* drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit

* add eval_class_weight and eval_init_score to lightgbm/dask, WiP

* clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets

* remove redundant backslash

* lint fixes

* fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple

* use all data_outputs for test_eval_set tests

* undo newlines from first pr

* add custom_eval_metric test, correct issue with eval_at and metric names

* move _constant_metric outside of test

* dataset reference names instead of __strings__

* add padding to eval_set parts makes each part has same len(eval_set)

* eval set code clean up

* revert n_evals to be max len eval_set across all parts on worker

* pylint errors in _DatasetNames

* more pylint fixes

* pylinting...

* add by pytest.mark, mistakenly deleted during merge conflict resolution

* address code review comments

* add _pad_eval_names to handle nondeterministic evals_result_ valid set names

* change not evaluated evals_result_ test criteria

* address fit eval docs issues, switch _DatasetNames to Enum

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default

* address eval_set dict keys naming in docstr and training eval_set naming issue

* in test_dask check for obj-default metric names in eval_results, remove check for training key

* lint fixes for _pad_eval_names

* remove unnecessary breaklinen in _pad_eval_names docstr

* use Enum.member syntax not Enum.member.name

* remove str from supported eval_at types

* add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train

* remove "of shape = [n_samples]" from group_shape docs

* add eval_at base_doc in DaskLGBMRanker.fit

* remove excess paren from eval_names docs in _train

* make requested changes to test_dask.py

* remove Optional() wrapper on eval_at

* add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__

* fix ordering of .sklearn imports to attempt lint fix

* dask custom eval note to f-string pt1

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 2

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 3

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-06-27 22:30:07 -05:00
Nikita Titov 45ac271ba9
[python] replace numpy.zeros with numpy.empty for the speedup (#4410) 2021-06-27 15:58:25 +03:00
James Lamb db3915c25c
[tests][dask] add missing compute() in Dask test (#4412) 2021-06-27 15:54:14 +03:00
James Lamb 8116d880f7
[dask] pass additional predict() parameters through when input is a Dask Array (#4399)
* [dask] pass predict() kwargs through when input is a Dask Array

* add tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add prediction early stopping params

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-06-26 16:01:32 +03:00
Nikita Titov aab8fc18a2
fix param aliases (#4387) 2021-06-26 15:07:37 +03:00
kruda c7134fa7cc
Fixed issue https://github.com/microsoft/LightGBM/issues/4272 and added tests for partition (#4280) 2021-06-18 11:09:41 -05:00
Chen Yufei f126db6470
Log warning instead of fatal when parsing float get under/overflow (#4336)
* Log warning instead of fatal when parsing float get under/overflow.

For texts that resolve to infinity, under or overflow should be
accepted.

* Remove outdated unit test.

* empty commit to trigger ci
2021-06-18 09:03:39 -05:00
Nikita Titov c738c83bbd
[tests] replace pytest.parametrize (#4377)
* replace pytest.parametrize

* add informative message for assert
2021-06-15 18:51:41 +03:00
Nikita Titov c3b9363d02
[tests][python] fix f-string in test_dask.py (#4373) 2021-06-12 18:24:20 +03:00
sayantan sadhu d677d6c647
[python] improving the syntax of the fstring in the file : tests/python_package_test/test_dask.py (#4358)
* updated the old syntax with fstrings

* Updated the strings with + catenation to fstrings

* Updated the strings with + catenation to fstrings

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-06-09 11:58:18 -05:00
Weston King-Leatham 9143003df6
[python-package] change to f-strings in test_plotting.py (#4359) 2021-06-08 21:16:31 -05:00
sayantan sadhu bab58d0e90
[python-package] updated test_consistency.py to use f-strings (#4348) 2021-06-07 21:03:52 +03:00
Belinda Trotta 1b5bec0047
Add linear leaf models to json output (fixes #4186) (#4329)
* Add linear leaf models to json output

* Add closing bracket

* Move test into test_engine.py and add asserts

* Update tests/python_package_test/test_engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-06-03 21:32:08 +10:00
sayantan sadhu da3465cbf1
[python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py (#4312) 2021-05-21 10:19:40 -05:00
Nikita Titov a372ed5032
[dask] run Dask tests on aarch64 architecture (#3996)
* run Dask tests on aarch64 architecture

* make random Dask test to fail

* Revert "make random Dask test to fail"

This reverts commit c43c98507f.

* empty commit

* empty commit

* empty commit

* empty commit

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-05-21 10:18:53 -05:00
Nikita Titov 237ac299fc
[python] handle arbitrary length feature names in Python-package (#4293)
* handle arbitrary length feature names in Python-package

* added tests
2021-05-21 15:19:37 +03:00
Nikita Titov 272fedb95a
[tests][python] Handle data types more accurate in C API test (#4297) 2021-05-20 15:22:18 +03:00
sayantan sadhu b423cb47fe
Improved the syntax of the fstrings (#4294) 2021-05-16 18:13:16 -05:00
Chen Yufei f83180883a
Precise text file parsing (#4081)
* New build option: USE_PRECISE_TEXT_PARSER.

Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.

* Add benchmark for CSVParser with Atof and AtofPrecise.

* Fix lint complaint.

* Fix typo in open result error message.

* Revert "Fix lint complaint."

This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.

* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."

This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.

* Use AtofPrecise in Common::__StringToTHelper.

* [option] precise_float_parser: precise float number parsing for text input.

* Remove USE_PRECISE_TEXT_PARSER compile option.

* test: add test for Common::AtofPrecise.

* test: remove ChunkedArrayTest with 0 length.

This triggers Log::Fatal which aborts the test program.

* fix lint, add copyright.

* Revert "test: remove ChunkedArrayTest with 0 length."

This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.

* Use LightGBM::Common::Sign

* save precise_float_parser in model file.

* Fix error checking in AtofPrecise. Add more test cases.

* Remove test case that can't pass under macOS.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-05-07 11:00:48 +08:00
Andrew Ziem e79716e0b6
Correct spelling (#4250)
* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-05-04 10:10:55 -05:00
James Lamb 086f0785a1
[ci][python-package] remove unused import in tests (#4233) 2021-04-28 16:59:45 +03:00
Nikita Titov 211ef7878f
[ci] run cpp tests at CI (#4166)
* run cpp tests at CI

* Update docs/Installation-Guide.rst

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-04-16 16:22:46 +03:00
Christoph Aymanns 9e1d7fa1bb
enforce interaction constraints with monotone_constraints_method = intermediate/advanced (#4043)
* add test for interaction constraints and monotone constraints

* enforce interaction constraints in RecomputeBestSplitForLeaf

* code formatting

* code formatting

* move interaction constraint test to test_engine

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-04-11 16:44:15 +03:00
jmoralez 965b9fc97a
[tests][dask] replace client fixture with cluster fixture (#4159)
* replace client fixture with cluster fixture

* wait on persist before rebalance
2021-04-05 22:32:47 +03:00
jmoralez d517ba12f2
[tests][dask] Add voting_parallel algorithm in tests (fixes #3834) (#4088)
* include voting_parallel tree_learner in test_regressor, test_classifier and test_ranker

* remove test for warnings and test for error when using feature_parallel

* use real names for tree_learner intest and include test for aliases. use the error message in the test for error in feature parallel

* split all tests with rf in test_classifier

* remove task parametrization for tree_learner aliases test. smaller input data from feature_parallel error

* define task for tree_learner aliases
2021-04-01 08:51:24 -05:00
jmoralez 46a20ab0ed
use dy_true mean in denominator for r2_score (#4151) 2021-04-01 08:06:27 -05:00
James Lamb 1ce4b22b8c
[dask] make random port search more resilient to random collisions (fixes #4057) (#4133)
* [dask] make random port search more resilient to random collisions

* linting

* more reliable ports check

* address review comments

* add error message
2021-03-31 09:25:16 -05:00
jmoralez f879018b50
[tests][dask] test all boosting types (fixes #3896) (#4119)
* test all boosting types

* lint

* bring scores comparison back and set y as second argument in assert_eq
2021-03-30 15:21:31 +03:00
Nikita Titov 7bf81f8c6d
[ci] apply cpplint to cpp tests (#4092)
* Update chunked_array.hpp

* Update ChunkedArray_API_extensions.i

* Update StringArray.i

* apply cpplint to cpp tests

* Update test_chunked_array to please cpplint (#4121)

* Update test_chunked_array to please cpplint

* Simplify var name

* Add comment

Co-authored-by: Alberto Ferreira <AlbertoEAF@users.noreply.github.com>
2021-03-28 15:52:42 +03:00
Nikita Titov d32ee23a74
[ci] remove output parametrization from two Dask tests (#4123)
* Update test_dask.py

* Update test_dask.py
2021-03-28 00:10:23 +03:00
jmoralez fe1b80a5c1
[dask] Include support for raw_score in predict (fixes #3793) (#4024)
* include test for prediction with raw_score

* close client

* initial comments

* update data creation and include ranking task

* linting

* update _create_data

* compare unique raw_predictions with values in leaves_df
2021-03-27 18:20:41 +03:00
jmoralez 8cc6eefcef
[tests][dask] Create an informative categorical feature (#4113)
* make one categorical variable informative. increase n_samples. reduce n_features for regression

* adjust tolerances in checks
2021-03-26 14:40:31 -05:00
Alberto Ferreira 4ded1342ae
[SWIG] Add streaming data support + cpp tests (#3997)
* [feature] Add ChunkedArray to SWIG

* Add ChunkedArray
* Add ChunkedArray_API_extensions.i
* Add SWIG class wrappers

* Address some review comments

* Fix linting issues

* Move test to tests/test_ChunkedArray_manually.cpp

* Add test note

* Move ChunkedArray to include/LightGBM/utils/

* Declare more explicit types of ChunkedArray in the SWIG API.

* Port ChunkedArray tests to googletest

* Please C++ linter

* Address StrikerRUS' review comments

* Update SWIG doc & disable ChunkedArray<int64_t>

* Use CHECK_EQ instead of assert

* Change include order (linting)

* Rename ChunkedArray -> chunked_array files

* Change header guards

* Address last comments from StrikerRUS
2021-03-21 15:07:21 +03:00
Nikita Titov 1f4a084230
[tests][dask] simplify code in Dask tests (#4075)
* simplify Dask tests code

* enable CI

* disable CI
2021-03-15 20:02:55 -05:00
James Lamb 39c85dd97d
[dask] [ci] fix flaky network-setup test (#4071) 2021-03-15 15:52:32 -05:00
Philip Hyunsu Cho bcf443b568
Add CMake option to enable sanitizers and build gtest (#3555)
* Add CMake option to enable sanitizer

* Set up gtest

* Address reviewer's feedback

* Address reviewer's feedback

* Update CMakeLists.txt

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-13 00:53:08 +03:00
James Lamb 296397df7b
[dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)
* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf00f.
2021-03-10 12:02:27 -06:00
jmoralez 1d7b54d30f
[dask] include multiclass-classification task in tests (#4048)
* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error
2021-03-09 21:58:38 -06:00
jmoralez 37e987828d
[dask] Include support for init_score (#3950)
* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float
2021-03-04 11:50:08 -06:00
James Lamb 2a00b6ffbc
[dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031) (#4032)
* [dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031)

* Update tests/python_package_test/test_dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* try upgrading mixtexsetup

* they changed the executable name UGH

* more changes for executable name

* another path change

* changing package mirrors

* undo experiments

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-02 16:29:08 +03:00
Nikita Titov 3ab6bbf9f3
[tests][dask] simplify fit calls in Dask tests (#4018)
* simplify fit calls in Dask tests

* Update .vsts-ci.yml

* Update .vsts-ci.yml
2021-02-24 08:17:55 -06:00
jmoralez 5dacd603ba
[dask][python-package] include support for column array as label (#3943)
* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments
2021-02-24 14:47:49 +03:00
Nikita Titov 86a085f7ca
[tests][python] Add test for single leaf in linear tree (#4015)
* Update test_engine.py

* Update python_package.yml

* Update python_package.yml

* Update test_engine.py

* hotfix
2021-02-24 18:46:05 +11:00
jmoralez 0e57657585
[dask] use random ports in network setup (#3823)
* use socket.bind with port 0 and client.run to find random open ports

* include test for found ports

* find random open ports as default

* parametrize local_listen_port. type hint to _find_random_open_port. fid open ports only on workers with data.

* make indentation consistent and pass list of workers to client.run

* remove socket import

* change random port implementation

* fix test
2021-02-23 22:14:12 -06:00
James Lamb 1f73f55938
[dask] allow tight control over ports (#3994)
* [dask] allow tight control over ports

* getting there, getting there

* fix params maybe

* fixing params

* remove unnecessary stuff

* fix tests

* fixes

* some minor changes

* fix flaky test

* linting

* more linting

* clarify parameter description

* add warning

* revert docs change

* Update python-package/lightgbm/dask.py

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* trying to fix stuff

* this is working

* update tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* indent

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-23 23:48:53 +03:00
imjwang eb5f471bc1
[tests][dask] add scikit-learn compatibility tests (fixes #3894) (#3947)
* add test_dask.py

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* clients

* remove ports

* safe sklearn checks

* safe sklearn checks

* fix whitespace

* fix whitespace-try 2

* fix whitespace-try 3

* isort

* isort

* sklearn_checks_to_learn

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-02-18 05:28:39 +03:00
James Lamb a3f4831d75
[tests][dask] make find-open-port test more reliable (#3993)
* [dask] make find-open-port test more reliable

* use listen_port fixture

* Apply suggestions from code review
2021-02-18 03:59:35 +03:00
Nikita Titov 75b9b0d3c8
[ci][python] hotfix imports order (#3992) 2021-02-17 01:18:37 +03:00
Nikita Titov 1413c060b0
Run tests and build Python wheels for aarch64 architecture (#3948)
* Update setup.sh

* Update test.sh

* Update test_dask.py

* Update test_engine.py

* Update .vsts-ci.yml
2021-02-16 23:35:37 +03:00
Nikita Titov d6ebd063ff
[ci][python] run isort in CI linting job (#3990)
* run isort in CI linting job

* workaround conda compatibility issues
2021-02-16 20:09:13 +03:00
Zhuyi Xue 1248d55f0d
[ci][python] apply isort to tests/python_package_test/test_engine.py #3958 (#3981) 2021-02-16 15:02:36 +03:00
Zhuyi Xue 9445b2ca26
[ci][python] apply isort to tests/python_package_test/test_basic.py #3958 (#3977) 2021-02-16 03:06:09 +03:00
Zhuyi Xue d64fcbe080
[ci][python] apply isort to tests/python_package_test/test_consistency.py #3958 (#3978) 2021-02-16 03:04:03 +03:00
Zhuyi Xue cac97d0c51
[ci][python] apply isort to tests/python_package_test/test_plotting.py #3958 (#3982) 2021-02-16 01:10:58 +03:00
Zhuyi Xue 0cb94fa59a
[ci][python] apply isort to tests/python_package_test/test_utilities.py #3958 (#3984) 2021-02-16 01:08:48 +03:00
Zhuyi Xue cdfe97f5d7
[ci][python] apply isort to tests/cpp_test/test.py #3958 (#3976) 2021-02-15 20:57:33 +03:00
Zhuyi Xue 07d7b7972f
[ci][python] apply isort to tests/c_api_test/test_.py #3958 (#3975) 2021-02-15 20:56:46 +03:00
Zhuyi Xue 219f613a76
[ci][python] apply isort to tests/python_package_test/test_dual.py #3958 (#3980) 2021-02-15 20:56:00 +03:00
Zhuyi Xue 1a294c87ff
[ci][python] apply isort to tests/python_package_test/test_dask.py #3958 (#3979) 2021-02-15 17:50:19 +03:00
James Lamb 18d57934b0
[dask] test that Dask automatically treats 'category' columns as categorical features (#3932) 2021-02-10 01:03:33 +03:00
James Lamb 06ed4337e0
[dask] [docs] Fix inaccuracies in API docs for Dask module (fixes #3871) (#3930)
* got fit() working

* add predict()

* predict_proba()

* remove custom objective docs

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix capitalization

* Update tests/python_package_test/test_dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-09 15:28:10 -06:00
jmoralez 7b47ab8fad
[dask] test training when a worker has no data (#3897)
* include test for training when a worker has no data

* test single partition against local model for all tasks and outputs

* remove futures_of

* include james' comments

* remove product import
2021-02-08 20:48:37 -06:00
James Lamb 37485fff5d
[dask] Add support for 'pred_leaf' in Dask estimators (fixes #3792) (#3919)
* fix tests

* fix tests

* fix test comments

* simplify tests

* Apply suggestions from code review
2021-02-07 13:17:28 -06:00
GOusignu 6f127847dc
[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (#3911)
* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)

* [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)
2021-02-07 11:38:48 -06:00
James Lamb fc6b71e08e
[dask] Support Dask dataframes with 'category' columns (fixes #3861) (#3908)
* add support for pandas categorical columns

* remove commented code

* quotes

* syntax error

* fix shape for ranker test

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_dask.py

* trying

* fix tests

* remove unnecessary debugging stuff

* skip accuracy checks on categorical

* use category columns as categorical features

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-07 01:19:49 +03:00
Nikita Titov b1e000c045
[dask] remove unused private _client attribute (#3904)
* Update test_dask.py

* Update dask.py

* Update .vsts-ci.yml

* Revert "Update .vsts-ci.yml"

This reverts commit 98422be5b5.
2021-02-03 10:44:08 -06:00