LightGBM

Граф коммитов

Автор	SHA1	Сообщение	Дата
david-cortes	f3ea1ad724	[python-package] Use scikit-learn interpretation of negative `n_jobs` and change default to number of cores (#5105 ) * use joblib formula for negative n_jobs * correction for n_jobs calculation * use more robust cpu_count from joblib * change default n_jobs to number of cores * fix detection of num_threads under parameters * better handling of n_jobs at prediction time * fix incorrect usage of list.pop * correct pop/remove yet again * Update python-package/lightgbm/sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * add comments clarifying negative n_jobs * fix CI (code taken from PR comment) * change default to n_jobs=None in dask interface * corrections for handling of n_jobs * linter * corrections for predict-time n_jobs * linter * add more comments about n_jobs values * linter * more corrections * linter * linter * linter * Update python-package/lightgbm/compat.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * workaround for passing test about outputs with multiple threads * Update tests/python_package_test/test_sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_sklearn.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-06-19 03:30:15 +03:00
James Lamb	11110c540e	[python-package] remove `Booster.set_attr()` and `Booster.attr()` (#5272 )	2022-06-12 20:18:26 +03:00
shiyu1994	f1328d5c5f	Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679 ) (#5164 ) * clear split info buffer in cegb_ before every iteration * check nullable of cegb_ in serial_tree_learner.cpp * add a test case for checking the split buffer in CEGB * swith to Threading::For instead of raw OpenMP * apply review suggestions * apply review comments * remove device cpu	2022-06-07 22:03:10 -05:00
Nikita Titov	27d9ad2e8e	[tests][python] Make test that checks original pandas data isn't modified more strict (#5267 ) * Update test_basic.py * Address review comment	2022-06-05 17:34:10 -05:00
José Morales	65b3db1c9f	[python-package] make a shallow copy on dataframe rename (fixes #4596 ) (#5254 ) * dont copy dataframe on rename * test with feature_name and 'auto'	2022-06-05 04:38:57 +03:00
Nikita Titov	a4478f7e35	[python] Fix training on subset constructed without params (#5213 ) * Update basic.py * Update test_engine.py * Add return type annotation	2022-05-24 03:45:42 +03:00
José Morales	c000b8cc68	[python-package] make a shallow copy when replacing categorical features with codes (fixes #4596 ) (#5225 )	2022-05-22 09:28:28 +08:00
José Morales	5b664b67c4	[python-package][R-package] allow using feature names when retrieving number of bins (#5116 ) * allow using feature names when retrieving number of bins * unname vector * use default feature names when not defined * lint * apply suggestions * remove extra comma * add test with categorical feature * make feature names sync more transparent	2022-05-16 21:45:13 -05:00
Nikita Titov	6de9bafaeb	Fix potential overflow "Multiplication result converted to larger type" (#5189 ) * Update dataset_loader.cpp * Update gbdt.h * Update regression_objective.hpp * Update linker_topo.cpp * Update xentropy_objective.hpp * Update regression_objective.hpp * investigate inf test failure * avoid overflow in regression objective * remove `test_inf_handle` test Co-authored-by: Guolin Ke <guolin.ke@outlook.com>	2022-05-10 10:56:24 +08:00
José Morales	f53fa6912e	[c-api] check number of features when retrieving number of bins (#5183 ) * check number of features when retrieving number of bins * check for negative values * lint	2022-04-30 18:44:40 +03:00
Nikita Titov	56ccea4243	[tests] replace `fobj` with `custom objective` in test comments and make tests stricter (#5173 )	2022-04-24 19:20:30 +03:00
Miguel Trejo Marrufo	416ecd5a8d	[python-package] remove 'fobj' in favor of passing custom objective function in params (fixes #3244 ) (#5052 ) * feat: support custom metrics in params * feat: support objective in params * test: custom objective and metric * fix: imports are incorrectly sorted * feat: convert eval metrics str and set to list * feat: convert single callable eval_metric to list * test: single callable objective in params Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * feat: callable fobj in basic cv function Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: cv support objective callable Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * fix: assert in cv_res Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * docs: objective callable in params Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * recover test_boost_from_average_with_single_leaf_trees Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * linters fail Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * remove metrics helper functions Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * feat: choose objective through _choose_param_values Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: test objective through _choose_param_values Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: test objective is callabe in train Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: parametrize choose_param_value with objective aliases Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: cv booster metric is none Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * fix: if string and callable choose callable Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test train uses custom objective metrics Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: cv uses custom objective metrics Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * refactor: remove fobj parameter in train and cv Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * refactor: objective through params in sklearn API Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * custom objective function in advanced_example Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * fix whitespackes lint * objective is none not a particular case for predict method Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * replace scipy.expit with custom implementation Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * test: set num_boost_round value to 20 Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * fix: custom objective default_value is none Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * refactor: remove self._fobj Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * custom_objective default value is None Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * refactor: variables name reference dummy_obj Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * linter errors * fix: process objective parameter when calling predict Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com> * linter errors * fix: objective is None during predict call Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>	2022-04-21 22:41:49 -05:00
Nikita Titov	4ae3d1387d	[python] make `reset_parameter` callback pickleable (#5109 )	2022-03-31 22:33:55 +03:00
Nikita Titov	60244e4a41	[python] make `record_evaluation` callback pickleable (#5107 ) * make `log_evaluation` callback pickleable * make callback tests stricter * make `record_evaluation` callback picklable	2022-03-31 01:09:38 +03:00
Nikita Titov	8b33e776cc	[python] make `log_evaluation` callback pickleable (#5101 ) * make `log_evaluation` callback pickleable * make callback tests stricter	2022-03-30 21:52:46 +03:00
RustingSword	60e72d5f4e	[python] allow to register any custom logger (fixes #4783 ) (#4880 ) * [python] allow to register any custom logger * allow customizable logging method name; add unit test * [python] allow to register any custom logger * allow customizable logging method name; add unit test * update tests * fix lint error * remove unused method * fix docstring style Co-authored-by: gongxudong <gongxudong@kuaishou.com>	2022-03-29 02:01:43 +03:00
shiyu1994	6b56a90cd1	[CUDA] New CUDA version Part 1 (#4630 ) * new cuda framework * add histogram construction kernel * before removing multi-gpu * new cuda framework * tree learner cuda kernels * single tree framework ready * single tree training framework * remove comments * boosting with cuda * optimize for best split find * data split * move boosting into cuda * parallel synchronize best split point * merge split data kernels * before code refactor * use tasks instead of features as units for split finding * refactor cuda best split finder * fix configuration error with small leaves in data split * skip histogram construction of too small leaf * skip split finding of invalid leaves stop when no leaf to split * support row wise with CUDA * copy data for split by column * copy data from host to CPU by column for data partition * add synchronize best splits for one leaf from multiple blocks * partition dense row data * fix sync best split from task blocks * add support for sparse row wise for CUDA * remove useless code * add l2 regression objective * sparse multi value bin enabled for CUDA * fix cuda ranking objective * support for number of items <= 2048 per query * speedup histogram construction by interleaving global memory access * split optimization * add cuda tree predictor * remove comma * refactor objective and score updater * before use struct * use structure for split information * use structure for leaf splits * return CUDASplitInfo directly after finding best split * split with CUDATree directly * use cuda row data in cuda histogram constructor * clean src/treelearner/cuda * gather shared cuda device functions * put shared CUDA functions into header file * change smaller leaf from <= back to < for consistent result with CPU * add tree predictor * remove useless cuda_tree_predictor * predict on CUDA with pipeline * add global sort algorithms * add global argsort for queries with many items in ranking tasks * remove limitation of maximum number of items per query in ranking * add cuda metrics * fix CUDA AUC * remove debug code * add regression metrics * remove useless file * don't use mask in shuffle reduce * add more regression objectives * fix cuda mape loss add cuda xentropy loss * use template for different versions of BitonicArgSortDevice * add multiclass metrics * add ndcg metric * fix cross entropy objectives and metrics * fix cross entropy and ndcg metrics * add support for customized objective in CUDA * complete multiclass ova for CUDA * separate cuda tree learner * use shuffle based prefix sum * clean up cuda_algorithms.hpp * add copy subset on CUDA * add bagging for CUDA * clean up code * copy gradients from host to device * support bagging without using subset * add support of bagging with subset for CUDAColumnData * add support of bagging with subset for dense CUDARowData * refactor copy sparse subrow * use copy subset for column subset * add reset train data and reset config for CUDA tree learner add deconstructors for cuda tree learner * add USE_CUDA ifdef to cuda tree learner files * check that dataset doesn't contain CUDA tree learner * remove printf debug information * use full new cuda tree learner only when using single GPU * disable all CUDA code when using CPU version * recover main.cpp * add cpp files for multi value bins * update LightGBM.vcxproj * update LightGBM.vcxproj fix lint errors * fix lint errors * fix lint errors * update Makevars fix lint errors * fix the case with 0 feature and 0 bin fix split finding for invalid leaves create cuda column data when loaded from bin file * fix lint errors hide GetRowWiseData when cuda is not used * recover default device type to cpu * fix na_as_missing case fix cuda feature meta information * fix UpdateDataIndexToLeafIndexKernel * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore * add refit by tree for cuda tree learner * fix test_refit in test_engine.py * create set of large bin partitions in CUDARowData * add histogram construction for columns with a large number of bins * add find best split for categorical features on CUDA * add bitvectors for categorical split * cuda data partition split for categorical features * fix split tree with categorical feature * fix categorical feature splits * refactor cuda_data_partition.cu with multi-level templates * refactor CUDABestSplitFinder by grouping task information into struct * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder * fix misuse of reference * remove useless changes * add support for path smoothing * virtual destructor for LightGBM::Tree * fix overlapped cat threshold in best split infos * reset histogram pointers in data partition and spllit finder in ResetConfig * comment useless parameter * fix reverse case when na is missing and default bin is zero * fix mfb_is_na and mfb_is_zero and is_single_feature_column * remove debug log * fix cat_l2 when one-hot fix gradient copy when data subset is used * switch shared histogram size according to CUDA version * gpu_use_dp=true when cuda test * revert modification in config.h * fix setting of gpu_use_dp=true in .ci/test.sh * fix linter errors * fix linter error remove useless change * recover main.cpp * separate cuda_exp and cuda * fix ci bash scripts add description for cuda_exp * add USE_CUDA_EXP flag * switch off USE_CUDA_EXP * revert changes in python-packages * more careful separation for USE_CUDA_EXP * fix CUDARowData::DivideCUDAFeatureGroups fix set fields for cuda metadata * revert config.h * fix test settings for cuda experimental version * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version * fix lint issue by adding a blank line * fix lint errors by resorting imports * fix lint errors by resorting imports * fix lint errors by resorting imports * merge cuda.yml and cuda_exp.yml * update python version in cuda.yml * remove cuda_exp.yml * remove unrelated changes * fix compilation warnings fix cuda exp ci task name * recover task * use multi-level template in histogram construction check split only in debug mode * ignore NVCC related lines in parameter_generator.py * update job name for CUDA tests * apply review suggestions * Update .github/workflows/cuda.yml Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update .github/workflows/cuda.yml Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update header * remove useless TODOs * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062 * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only * fix include order * fix include order * remove extra space * address review comments * add warning when cuda_exp is used together with deterministic * add comment about gpu_use_dp in .ci/test.sh * revert changing order of included headers Co-authored-by: Yu Shi <shiyu1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-03-23 10:39:23 +08:00
James Lamb	b857ee10cc	clarify no-meaningful-features warning in Dataset construction (fixes #5081 ) (#5083 ) * clarify no-meaningful-features warning in Dataset construction (fixes #5081) * update tests	2022-03-22 13:10:38 +08:00
Antoni Baum	f77e0adf59	[python] make `early_stopping` callback pickleable (#5012 ) * Turn `early_stopping` into a Callable class * Fix * Lint * Remove print * Fix order * Revert "Lint" This reverts commit `7ca8b55757`. * Apply suggestion from code review * Nit * Lint * Move callable class outside the func for pickling * Move _pickle and _unpickle to tests utils * Add early stopping callback picklability test * Nit * Fix * Lint * Improve type hint * Lint * Lint * Add cloudpickle to test_windows * Update tests/python_package_test/test_engine.py * Fix * Apply suggestions from code review	2022-03-16 23:03:53 -05:00
José Morales	d10372e2e0	[c-api][python-package][R-package] expose feature num bin (#5048 ) * expose FeatureNumBin in C api * parametrize min_data_in_bin and add test with max_bin_by_feature * include feature_num_bin in R package * add suggestion from review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update error message and lint * lint * add call method * minor improvements in tests * add suggestions from review * lint * rename argument to feature in python and r packages Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-03-15 06:39:40 +03:00
José Morales	9a4e70687d	[python-package] [R-package] propagate the best iteration of cvbooster into the individual boosters (#5066 )	2022-03-12 22:14:43 +03:00
shiyu1994	f6d654b737	[fix] fix duplicate added initial scores for single-leaf trees (#fixes #4708 ) * fix duplicate added initial scores for single-leaf trees * add test case * Fix import in Python test * commit python suggestions Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-03-08 21:34:17 -06:00
Nikita Titov	01568cf59a	[tests][python] move tests that use `train()` function defined in `engine.py` from `test_basic.py` to `test_engine.py` (#5034 ) * Update test_basic.py * Update test_engine.py * Update test_engine.py	2022-03-01 22:19:39 +08:00
José Morales	f185695617	[python-package] add support for pandas nullable types (fixes #4173 ) (#4927 ) * map nullable dtypes to regular float dtypes * cast x3 to float after introducing missing values * add test for regular dtypes * use .astype and then values. update nullable_dtypes test and include test for regular numpy dtypes * more specific allowed dtypes. test no copy when single float dtype df * use np.find_common_type. set np.float128 to None when it isn't supported * set default as type(None) * move tests that use lgb.train to test_engine * include np.float32 when finding common dtype * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * add linebreak Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-02-23 22:27:03 -06:00
José Morales	d670a4d655	[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective (#4925 ) * reshape predictions, grad and hess in multiclass custom objective * add sklearn test. move custom obj to utils. docs for numpy * use num_model_per_iteration to get num_classes * update docs and dask multiclass custom objective test * move reshaping to __inner_predict. add test for feval * add missing note. remove extra line	2022-02-23 11:54:04 +08:00
José Morales	9fc348af6f	[python-package] make record_evaluation compatible with cv (fixes #4943 ) (#4947 ) * make record_evaluation compatible with cv * test multiple metrics in cv * lint * fix cv with train metric. save stdv as well * always add dataset prefix to cv_agg * remove unused function	2022-02-16 02:23:04 +03:00
Nikita Titov	a3e073ad3a	[tests][python] remove compatibility code for old versions in tests (#4978 ) * Update test_dask.py * Update test_engine.py * Update test_sklearn.py * Update test_sklearn.py * Update test_sklearn.py * Update test_sklearn.py * Update test_sklearn.py * Update test_sklearn.py * Update test_engine.py * Update test_sklearn.py * Update test_sklearn.py * Update test_sklearn.py	2022-02-13 00:47:28 +03:00
Miguel Trejo Marrufo	e6a2f7162c	[python-package] support customizing Dataset creation in Booster.refit() (fixes #3038 ) (#4894 ) * feat: refit additional kwargs for dataset and predict * test: kwargs for refit method * fix: __init__ got multiple values for argument * fix: pycodestyle E302 error * refactor: dataset_params to avoid breaking change * refactor: expose all Dataset params in refit * feat: dataset_params updates new_params * fix: remove unnecessary params to test * test: parameters input are the same * docs: address StrikeRUS changes * test: refit test changes in train dataset * test: set init_score and decay_rate to zero	2022-01-22 23:17:16 +03:00
James Lamb	a06fadfb7a	[dask] add support for custom objective functions (fixes #3934 ) (#4920 ) * add test for custom objective with regressor * add test for custom binary classification objective with classifier * isort * got tests working for multiclass * update docs * train deeper model for classifier * Apply suggestions from code review Co-authored-by: José Morales <jmoralz92@gmail.com> * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update multiclass tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * fix multiclass probabilities * linting Co-authored-by: José Morales <jmoralz92@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2022-01-17 23:30:26 +03:00
Yaqub Alwan	af5b40e1f6	[python] raise an informative error instead of segfaulting when custom objective produces incorrect output (#4815 ) * fix for bad grads causing segfault * adjust checking criteria to properly reflect reality of multi-class classifiers * fix styling * Line break before operator * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * add a note to the C-API docs * rearrange text s;ightly * add some tests to python package * Update include/LightGBM/c_api.h Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * PR comments * match argument is a regex and our expression has brackets .. * rework tests * isorting imports * updating test to relfect that the python APi does not take pres/labels as a fobj function Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-12-30 13:12:00 +08:00
Nikita Titov	ce486e5b45	[python] remove `early_stopping_rounds` argument of `train()` and `cv()` functions (#4908 )	2021-12-26 17:20:49 +03:00
Nikita Titov	e4c0ca5f5d	[python] remove `evals_result` argument of `train()` function (#4882 )	2021-12-23 04:57:09 +03:00
José Morales	8a34b1af2d	[tests][python-package] change boston dataset to synthetic dataset in tests that don't check score (#4895 ) * change boston dataset to synthetic dataset in tests that don't evaluate score * format imports	2021-12-21 02:41:39 +03:00
Nikita Titov	8e729af38d	[python] reset storage in record evaluation callback each time before starting training (#4885 ) * Update test_sklearn.py * Update python_package.yml * Update python_package.yml * Update callback.py * Update callback.py	2021-12-18 17:30:35 +03:00
Nikita Titov	729ac43c25	[python][sklearn] do not replace empty dict with `None` for `evals_result_` (#4884 ) * Update sklearn.py * Update sklearn.py * Update test_sklearn.py	2021-12-18 17:28:55 +03:00
Nikita Titov	9f13a9c897	[python] remove `verbose_eval` argument of `train()` and `cv()` functions (#4878 ) * remove `verbose_eval` argument * update example Notebook	2021-12-12 21:02:15 +03:00
Nikita Titov	8066261899	[python] remove `verbose` argument of `model_from_string()` method of Booster class (#4877 )	2021-12-10 20:26:35 -06:00
Nikita Titov	f71328d410	[python][sklearn] Remove `early_stopping_rounds` argument of `fit()` method (#4846 )	2021-12-11 01:21:19 +03:00
Nikita Titov	d82743465c	[python] reset storages in early stopping callback after finishing training (#4868 )	2021-12-10 03:02:07 +03:00
James Lamb	630f2e78af	[python-package][dask] handle failures parsing worker host names (#4852 ) * [python-package][dask] handle failures parsing work host names * add tests * revert local testing changes	2021-12-06 12:56:59 -06:00
Nikita Titov	12915d5813	[python][sklearn] unify values of `best_iteration` for sklearn and standard APIs (#4845 ) * unify values of `best_iteration` for sklearn and standard APIs * update Dask test	2021-12-04 23:10:28 -06:00
Nikita Titov	cf38071b6a	Add C API function that returns all parameter names with their aliases (#4829 ) * add C API function that returns all param names with aliases * add C API function that returns all param names with aliases * add R code * test R code * remove debug CI * fix R lint * refactor * run CI * fix R * fix * revert CI checks * revert changes in docs * Try to make function `const` Co-authored-by: James Lamb <jaylamb20@gmail.com> * add `const` in cpp file * address review comments and sync with `master` Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-12-02 21:23:46 -06:00
Nikita Titov	f57ef6f479	[python][sklearn] respect parameters for predictions in `init()` and `set_params()` methods (#4822 ) * in predict(), respect params set via `set_params()` after fit() * continue * add test * fix return name * hotfix * simplify	2021-12-02 04:58:26 +03:00
Nikita Titov	b31d5a4392	[tests][dask] fix argument names in custom eval function in Dask test (#4833 ) * fix argument types in custom eval function for Dask estimators * revert changes to docstrings * fix argument names in Dask test	2021-12-02 04:56:58 +03:00
Nikita Titov	4072e9f793	[python][sklearn] remove `verbose` argument from `fit()` method (#4832 )	2021-12-01 02:32:41 +03:00
Nikita Titov	2caf945f9d	[python] Remove `silent` argument (#4800 ) * Update test_plotting.py * Update dask.py * Update sklearn.py * Update test_sklearn.py * Update basic.py * Update engine.py * Update test_engine.py * Update basic.py * Update basic.py * Update engine.py	2021-11-21 01:09:38 +03:00
chjinche	b0137debe6	Add customized parser support (#4782 ) * add customized parser support * fix typo of parser_config_file description * make delimiter as parameter of JoinedLines	2021-11-16 14:27:23 +08:00
José Morales	99e0a4bd7b	[python-package] early stopping min_delta (fixes #2526 ) (#4580 ) * initial changes * initial version * better handling of cases * warn only with positive threshold * remove early_stopping_threshold from high-level functions * remove remaining early_stopping_threshold * update test to use callback * better handling of cases * rename threshold to min_delta enhance parameter description update tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * reduce num_boost_round in tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * trigger ci Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>	2021-11-10 16:17:06 +03:00
Nikita Titov	0a4d190828	[python][sklearn] respect objective aliases (#4758 ) * respect objective aliases * Update test_sklearn.py * revert removal of blank lines * add argument name which is being overwritten in warning message	2021-11-10 16:15:39 +03:00
tongwu-msft	33a2f9ec05	Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601 ) (#4725 ) * issue fix #4601 * fix issue 4601 it2 * add tests for issue 4601 * fix warning * fix warning * add new line at end * remove last line at end * fix lint warning * address comments * address comments * address comments * fix address * address comments * revert seed * fix recursive force split issue * fix build error * fix lint warning	2021-11-10 09:30:54 +08:00
Zhiyuan He	b1facf5050	Suppress categorical warning (fixes #3379 )	2021-11-08 10:06:50 +08:00
Nikita Titov	cebdc2a8c4	[ci][tests][python] remove assertion for `filename` attribute that is no longer true with new version of graphviz (#4778 )	2021-11-07 20:33:18 +03:00
Nikita Titov	aab212a782	[python][sklearn] add `n_estimators_` and `n_iter_` post-fit attributes (#4753 ) * add n_estimators_ and n_iter_ post-fit attributes * address review comments	2021-11-05 20:29:49 +03:00
Nikita Titov	798dc1d419	[tests] [python] add test for non-serializable callback (#4741 )	2021-10-28 23:25:22 -05:00
Nikita Titov	d130bb198b	fix behavior for default objective and metric (#4660 )	2021-10-13 11:44:22 +08:00
José Morales	29857c8adb	[tests][python-package] refactor list_to_1d_numpy test to run without pandas installed (#4639 ) Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>	2021-10-07 19:30:20 +03:00
Nikita Titov	b78175b746	[python] add placeholders to titles in plotting functions (#4614 )	2021-09-23 18:50:20 +03:00
José Morales	f1f5ba15c2	[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150 ) * initial implementation of init_score for multiclass classification * check for 1d or 2d collection in init_score * remove dataset import * initial comments * update dask test and docstrings * update docstrings * move logic to set_field. reshape back on get_field * add type hints and update docstrings for dask. fix Dataset.set_field * revert wrong docstrings and type hints * add extra comma for consistency * prefix private functions with underscore add type hints to new functions make commas consistent in dask and basic * add missing spaces after type hint * remove shape condition for dataframe in is_2d_collection Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>	2021-09-17 21:44:03 +03:00
Nikita Titov	54facc4d72	[python] rename `print_evaluation()` into `log_evaluation()` (#4604 ) * Update __init__.py * Update Python-API.rst * Update engine.py * Update test_utilities.py * Update sklearn.py * Update callback.py * Update callback.py * Update callback.py	2021-09-16 01:26:02 +03:00
Nikita Titov	86bda6f061	[RFC][python] deprecate advanced args of `train()` and `cv()` functions and sklearn wrapper (#4574 ) * deprecate advanced args of `train()` and `cv()` * update Dask test * improve deducing * address review comments	2021-09-12 22:19:03 +03:00
Nikita Titov	79463dfb11	[python] [sklearn] respect `eval_at` aliases in keyword arguments (#4599 )	2021-09-09 22:33:39 -05:00
José Morales	5857ef5e38	[tests][dask] Use workers hostname in tests (fixes #4594 ) (#4595 ) Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>	2021-09-09 19:09:14 +03:00
James Lamb	4bf9f95455	[ci] skip Dask tests on QEMU builds (#4600 )	2021-09-09 14:45:50 +03:00
Nikita Titov	3942126592	add 'auto' value for `importance_type` param in plotting (#4570 )	2021-08-31 20:24:15 -05:00
Xavier Dupré	11d7608f2d	[python] add parameter object_hook to method dump_model (#4533 ) * add parameter object_hook to function dump_model (python API) * eol * fix syntax * lint * better documentation * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: xavier dupré <xavier.dupre@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-08-24 01:48:16 +03:00
José Morales	cfe8eb17c9	[tests][dask] reduce number of collisions tests (#4501 ) * reduce number of collisions tests * measure tests execution time * measure tests execution time in bdist task * remove durations in bdist task	2021-08-09 20:04:01 +03:00
José Morales	5fe27d5942	[dask] find all needed ports in each host at once (fixes #4458 ) (#4498 ) * find all needed ports in each worker at once * lint * better naming * use _HostWorkers in test	2021-08-03 17:24:10 -05:00
Nikita Titov	661bde103a	[python][tests] refactor tests with Sequence input (#4495 )	2021-07-31 22:38:31 +03:00
Chen Yufei	1d21d1ad4c	[python] support Dataset.get_data for Sequence input. (#4472 ) * [python] support Dataset.get_data for Sequence input. * Tweaks according to review comments. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Add test cases. * fix import order in test_basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-07-30 23:49:13 +03:00
Nikita Titov	96583ab589	[python] migrate to pathlib in setup.py and use `absolute()` on paths first (#4444 ) * use absolute() on paths first * migrate to pathlib in setup.py	2021-07-10 16:18:50 +03:00
Nikita Titov	d05f54701e	[tests][python] added tests for early stop in prediction in ranking task (#4457 )	2021-07-09 23:07:36 -05:00
Nikita Titov	7f9959fe1c	[tests] clarify RuntimeError in distributed tests(#4452 )	2021-07-07 08:29:53 -05:00
Nikita Titov	90342e929b	[python] allow to pass some params as pathlib.Path objects (#4440 ) * allow to pass some params as pathlib.Path objects * fix lint * improve indentation	2021-07-07 14:31:06 +03:00
James Lamb	b09da434f0	[dask] Make output of feature contribution predictions for sparse matrices match those from sklearn estimators (fixes #3881 ) (#4378 ) * test_classifier working * adding tests * docs * tests * revert unnecessary changes in tests * test output type * linting * linting * use from_delayed() instead * docstring pycodestyle is happy with * isort * put pytest skips back * respect sparse return type * fix doc * remove unnecessary dask_array_concatenate() * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update predict_proba() docstring * remove unnecessary np.array() * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * fix assertion * fix test use of len() * restore np.array() in tests * use np.asarray() instead * use toarray() * remove empty functions in compat Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-07-07 14:27:06 +03:00
James Lamb	e36cc9c171	[python-package] use toarray() instead of todense() in tests and examples (#4446 )	2021-07-07 01:12:47 +03:00
Nikita Titov	ec1debcee8	[python] migrate to pathlib in distributed tests (#4443 )	2021-07-05 18:47:24 -05:00
Nikita Titov	7eac5a6381	[python] minor refactoring of Python code (#4442 ) * Update test_sklearn.py * Update test_basic.py * Update dask.py * Update basic.py * Update basic.py * Update basic.py * Update basic.py * Update callback.py	2021-07-04 22:58:41 -05:00
Nikita Titov	03469ae59b	[tests][python] refactor file loading routine in C API test (#4437 ) * refactor file loading in C API test * continue	2021-07-04 17:10:48 -05:00
Nikita Titov	29052c5dc6	[tests] fix deprecation numpy warning (#4439 )	2021-07-04 17:00:25 -05:00
James Lamb	26cc160abc	[python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136 ) (#4436 ) * [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136) * Update tests/python_package_test/test_engine.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * revert get_workflow_status changes Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-07-04 15:10:32 -05:00
jmoralez	b699fa68cb	[tests][cli] distributed training (#4254 ) * include distributed tests * remove github action file * try CI * build shared library and fix linting error * ignore files created for testing. add type hints and check with mypy. include docstrings * lint * use pre_partition and write separate model files. remove mypy * update docs * remove ci. lower rtol. pass num_machines in config * write predict.conf in the predict method. more robust port setup. use subprocess.run and check returncode * add paths to tests and binary. remove lgb dependency. update .igtignore. * lint * allow to pass executable dir as argument to pytest * pass execfile to pytest instead of execdir * add suggestions * use os.path and add type hint to predict_config * Update tests/distributed/_test_distributed.py Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-07-04 00:10:09 -05:00
Nikita Titov	cff80442e1	[python] migrate to pathlib in python tests (#4435 )	2021-07-03 23:31:41 -05:00
Chen Yufei	c359896e9b	[python-package] Create Dataset from multiple data files (#4089 ) * [python-package] create Dataset from sampled data. * [python-package] create Dataset from List[Sequence]. 1. Use random access for data sampling 2. Support read data from multiple input files 3. Read data in batch so no need to hold all data in memory * [python-package] example: create Dataset from multiple HDF5 file. * fix: revert is_class implementation for seq * fix: unwanted memory view reference for seq * fix: seq is_class accepts sklearn matrices * fix: requirements for example * fix: pycode * feat: print static code linting stage * fix: linting: avoid shell str regex conversion * code style: doc style * code style: isort * fix ci dependency: h5py on windows * [py] remove rm files in test seq https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623 * docs(python): init_from_sample summary https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389 * remove dataset dump sample data debugging code. * remove typo fix. Create separate PR for this. * fix typo in src/c_api.cpp Co-authored-by: James Lamb <jaylamb20@gmail.com> * style(linting): py3 type hint for seq * test(basic): os.path style path handling * Revert "feat: print static code linting stage" This reverts commit `10bd79f7f8`. * feat(python): sequence on validation set * minor(python): comment * minor(python): test option hint * style(python): fix code linting * style(python): add pydoc for ref_dataset * doc(python): sequence Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> * revert(python): sequence class abc * chore(python): remove rm_files * Remove useless static_assert. * refactor: test_basic test for sequence. * fix lint complaint. * remove dataset._dump_text in sequence test. * Fix reverting typo fix. * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Fix type hint, code and doc style. * fix failing test_basic. * Remove TODO about keep constant in sync with cpp. * Install h5py only when running python-examples. * Fix lint complaint. * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Doc fixes, remove unused params_str in __init_from_seqs. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Remove unnecessary conda install in windows ci script. * Keep param as example in dataset_from_multi_hdf5.py * Add _get_sample_count function to remove code duplication. * Use batch_size parameter in generate_hdf. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Fix after applying suggestions. * Fix test, check idx is instance of numbers.Integral. * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Expose Sequence class in Python-API doc. * Handle Sequence object not having batch_size. * Fix isort lint complaint. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update docstring to mention Sequence as data input. * Remove get_one_line in test_basic.py * Make Sequence an abstract class. * Reduce number of tests for test_sequence. * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices. * empty commit to trigger ci * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t. Also rename total_nrow to num_total_row in c_api.h for consistency. * Doc about Sequence in docs/Python-Intro.rst. * Fix: basic.py change LGBM_SampleIndices out_len to int32. * Add create_valid test case with Dataset from Sequence. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Apply suggestions from code review Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT. * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Willian Zhang <willian@willian.email> Co-authored-by: Willian Z <Willian@Willian-Zhang.com> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-07-02 15:17:17 +03:00
Nikita Titov	189a80181e	fix compiler warning about types conversion in cpp tests (#4418 )	2021-06-29 14:46:25 +08:00
Frank Fineis	b5502d19b2	[dask] add support for eval sets and custom eval functions (#4101 ) * es WiP, need to add eval_sample_weight and eval_group * add weight, group to dask es. WiP. * dask es reorg * Update python-package/lightgbm/dask.py _train_part model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py _train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py _train_part model.fit args to lines pt3 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py dask_model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py use is instead of id() Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * applying changes to eval_set PR WiP * dask support for eval_names, eval_metric, eval_stopping_rounds * add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP * fix lint errors in test_dask.py * drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker * add eval_at to test_dask eval_set ranker tests * add back group_shape to lgbmmmodel docs, tighten tests * drop random eval weights from early stopping, probably causing training to terminate too early * add eval data templates to sklearn fit docs, add eval data docs to dask * add n_features to _create_data, eval_set tests stop w/ desirable tree counts * import alphabetically * add back get_worker for eval_set error handling * test_dask argmin typo * push forgotten eval_names bugfix * eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test * change default eval_at to tuple 1-5 * re-drop get_worker * drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit * add eval_class_weight and eval_init_score to lightgbm/dask, WiP * clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets * remove redundant backslash * lint fixes * fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple * use all data_outputs for test_eval_set tests * undo newlines from first pr * add custom_eval_metric test, correct issue with eval_at and metric names * move _constant_metric outside of test * dataset reference names instead of __strings__ * add padding to eval_set parts makes each part has same len(eval_set) * eval set code clean up * revert n_evals to be max len eval_set across all parts on worker * pylint errors in _DatasetNames * more pylint fixes * pylinting... * add by pytest.mark, mistakenly deleted during merge conflict resolution * address code review comments * add _pad_eval_names to handle nondeterministic evals_result_ valid set names * change not evaluated evals_result_ test criteria * address fit eval docs issues, switch _DatasetNames to Enum * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default * address eval_set dict keys naming in docstr and training eval_set naming issue * in test_dask check for obj-default metric names in eval_results, remove check for training key * lint fixes for _pad_eval_names * remove unnecessary breaklinen in _pad_eval_names docstr * use Enum.member syntax not Enum.member.name * remove str from supported eval_at types * add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train * remove "of shape = [n_samples]" from group_shape docs * add eval_at base_doc in DaskLGBMRanker.fit * remove excess paren from eval_names docs in _train * make requested changes to test_dask.py * remove Optional() wrapper on eval_at * add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__ * fix ordering of .sklearn imports to attempt lint fix * dask custom eval note to f-string pt1 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 2 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 3 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-06-27 22:30:07 -05:00
Nikita Titov	45ac271ba9	[python] replace numpy.zeros with numpy.empty for the speedup (#4410 )	2021-06-27 15:58:25 +03:00
James Lamb	db3915c25c	[tests][dask] add missing compute() in Dask test (#4412 )	2021-06-27 15:54:14 +03:00
James Lamb	8116d880f7	[dask] pass additional predict() parameters through when input is a Dask Array (#4399 ) * [dask] pass predict() kwargs through when input is a Dask Array * add tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * add prediction early stopping params Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-06-26 16:01:32 +03:00
Nikita Titov	aab8fc18a2	fix param aliases (#4387 )	2021-06-26 15:07:37 +03:00
kruda	c7134fa7cc	Fixed issue https://github.com/microsoft/LightGBM/issues/4272 and added tests for partition (#4280 )	2021-06-18 11:09:41 -05:00
Chen Yufei	f126db6470	Log warning instead of fatal when parsing float get under/overflow (#4336 ) * Log warning instead of fatal when parsing float get under/overflow. For texts that resolve to infinity, under or overflow should be accepted. * Remove outdated unit test. * empty commit to trigger ci	2021-06-18 09:03:39 -05:00
Nikita Titov	c738c83bbd	[tests] replace pytest.parametrize (#4377 ) * replace pytest.parametrize * add informative message for assert	2021-06-15 18:51:41 +03:00
Nikita Titov	c3b9363d02	[tests][python] fix f-string in test_dask.py (#4373 )	2021-06-12 18:24:20 +03:00
sayantan sadhu	d677d6c647	[python] improving the syntax of the fstring in the file : tests/python_package_test/test_dask.py (#4358 ) * updated the old syntax with fstrings * Updated the strings with + catenation to fstrings * Updated the strings with + catenation to fstrings * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-06-09 11:58:18 -05:00
Weston King-Leatham	9143003df6	[python-package] change to f-strings in test_plotting.py (#4359 )	2021-06-08 21:16:31 -05:00
sayantan sadhu	bab58d0e90	[python-package] updated test_consistency.py to use f-strings (#4348 )	2021-06-07 21:03:52 +03:00
Belinda Trotta	1b5bec0047	Add linear leaf models to json output (fixes #4186 ) (#4329 ) * Add linear leaf models to json output * Add closing bracket * Move test into test_engine.py and add asserts * Update tests/python_package_test/test_engine.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_engine.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_engine.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-06-03 21:32:08 +10:00
sayantan sadhu	da3465cbf1	[python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py (#4312 )	2021-05-21 10:19:40 -05:00
Nikita Titov	a372ed5032	[dask] run Dask tests on aarch64 architecture (#3996 ) * run Dask tests on aarch64 architecture * make random Dask test to fail * Revert "make random Dask test to fail" This reverts commit `c43c98507f`. * empty commit * empty commit * empty commit * empty commit Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-05-21 10:18:53 -05:00
Nikita Titov	237ac299fc	[python] handle arbitrary length feature names in Python-package (#4293 ) * handle arbitrary length feature names in Python-package * added tests	2021-05-21 15:19:37 +03:00
Nikita Titov	272fedb95a	[tests][python] Handle data types more accurate in C API test (#4297 )	2021-05-20 15:22:18 +03:00
sayantan sadhu	b423cb47fe	Improved the syntax of the fstrings (#4294 )	2021-05-16 18:13:16 -05:00
Chen Yufei	f83180883a	Precise text file parsing (#4081 ) * New build option: USE_PRECISE_TEXT_PARSER. Use fast_double_parser for text file parsing. For each number, fallback to strtod in case of parse failure. * Add benchmark for CSVParser with Atof and AtofPrecise. * Fix lint complaint. * Fix typo in open result error message. * Revert "Fix lint complaint." This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387. * Revert "Add benchmark for CSVParser with Atof and AtofPrecise." This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2. * Use AtofPrecise in Common::__StringToTHelper. * [option] precise_float_parser: precise float number parsing for text input. * Remove USE_PRECISE_TEXT_PARSER compile option. * test: add test for Common::AtofPrecise. * test: remove ChunkedArrayTest with 0 length. This triggers Log::Fatal which aborts the test program. * fix lint, add copyright. * Revert "test: remove ChunkedArrayTest with 0 length." This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102. * Use LightGBM::Common::Sign * save precise_float_parser in model file. * Fix error checking in AtofPrecise. Add more test cases. * Remove test case that can't pass under macOS. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-05-07 11:00:48 +08:00
Andrew Ziem	e79716e0b6	Correct spelling (#4250 ) * Correct spelling Most changes were in comments, and there were a few changes to literals for log output. There were no changes to variable names, function names, IDs, or functionality. * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Correct spelling Most are code comments, but one case is a literal in a logging message. There are a few grammar fixes too. Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-05-04 10:10:55 -05:00
James Lamb	086f0785a1	[ci][python-package] remove unused import in tests (#4233 )	2021-04-28 16:59:45 +03:00
Nikita Titov	211ef7878f	[ci] run cpp tests at CI (#4166 ) * run cpp tests at CI * Update docs/Installation-Guide.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-04-16 16:22:46 +03:00
Christoph Aymanns	9e1d7fa1bb	enforce interaction constraints with monotone_constraints_method = intermediate/advanced (#4043 ) * add test for interaction constraints and monotone constraints * enforce interaction constraints in RecomputeBestSplitForLeaf * code formatting * code formatting * move interaction constraint test to test_engine * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-04-11 16:44:15 +03:00
jmoralez	965b9fc97a	[tests][dask] replace client fixture with cluster fixture (#4159 ) * replace client fixture with cluster fixture * wait on persist before rebalance	2021-04-05 22:32:47 +03:00
jmoralez	d517ba12f2	[tests][dask] Add voting_parallel algorithm in tests (fixes #3834 ) (#4088 ) * include voting_parallel tree_learner in test_regressor, test_classifier and test_ranker * remove test for warnings and test for error when using feature_parallel * use real names for tree_learner intest and include test for aliases. use the error message in the test for error in feature parallel * split all tests with rf in test_classifier * remove task parametrization for tree_learner aliases test. smaller input data from feature_parallel error * define task for tree_learner aliases	2021-04-01 08:51:24 -05:00
jmoralez	46a20ab0ed	use dy_true mean in denominator for r2_score (#4151 )	2021-04-01 08:06:27 -05:00
James Lamb	1ce4b22b8c	[dask] make random port search more resilient to random collisions (fixes #4057 ) (#4133 ) * [dask] make random port search more resilient to random collisions * linting * more reliable ports check * address review comments * add error message	2021-03-31 09:25:16 -05:00
jmoralez	f879018b50	[tests][dask] test all boosting types (fixes #3896 ) (#4119 ) * test all boosting types * lint * bring scores comparison back and set y as second argument in assert_eq	2021-03-30 15:21:31 +03:00
Nikita Titov	7bf81f8c6d	[ci] apply cpplint to cpp tests (#4092 ) * Update chunked_array.hpp * Update ChunkedArray_API_extensions.i * Update StringArray.i * apply cpplint to cpp tests * Update test_chunked_array to please cpplint (#4121) * Update test_chunked_array to please cpplint * Simplify var name * Add comment Co-authored-by: Alberto Ferreira <AlbertoEAF@users.noreply.github.com>	2021-03-28 15:52:42 +03:00
Nikita Titov	d32ee23a74	[ci] remove output parametrization from two Dask tests (#4123 ) * Update test_dask.py * Update test_dask.py	2021-03-28 00:10:23 +03:00
jmoralez	fe1b80a5c1	[dask] Include support for raw_score in predict (fixes #3793 ) (#4024 ) * include test for prediction with raw_score * close client * initial comments * update data creation and include ranking task * linting * update _create_data * compare unique raw_predictions with values in leaves_df	2021-03-27 18:20:41 +03:00
jmoralez	8cc6eefcef	[tests][dask] Create an informative categorical feature (#4113 ) * make one categorical variable informative. increase n_samples. reduce n_features for regression * adjust tolerances in checks	2021-03-26 14:40:31 -05:00
Alberto Ferreira	4ded1342ae	[SWIG] Add streaming data support + cpp tests (#3997 ) * [feature] Add ChunkedArray to SWIG * Add ChunkedArray * Add ChunkedArray_API_extensions.i * Add SWIG class wrappers * Address some review comments * Fix linting issues * Move test to tests/test_ChunkedArray_manually.cpp * Add test note * Move ChunkedArray to include/LightGBM/utils/ * Declare more explicit types of ChunkedArray in the SWIG API. * Port ChunkedArray tests to googletest * Please C++ linter * Address StrikerRUS' review comments * Update SWIG doc & disable ChunkedArray<int64_t> * Use CHECK_EQ instead of assert * Change include order (linting) * Rename ChunkedArray -> chunked_array files * Change header guards * Address last comments from StrikerRUS	2021-03-21 15:07:21 +03:00
Nikita Titov	1f4a084230	[tests][dask] simplify code in Dask tests (#4075 ) * simplify Dask tests code * enable CI * disable CI	2021-03-15 20:02:55 -05:00
James Lamb	39c85dd97d	[dask] [ci] fix flaky network-setup test (#4071 )	2021-03-15 15:52:32 -05:00
Philip Hyunsu Cho	bcf443b568	Add CMake option to enable sanitizers and build gtest (#3555 ) * Add CMake option to enable sanitizer * Set up gtest * Address reviewer's feedback * Address reviewer's feedback * Update CMakeLists.txt Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-03-13 00:53:08 +03:00
James Lamb	296397df7b	[dask] raise more informative error for duplicates in 'machines' (fixes #4057 ) (#4059 ) * [dask] raise more informative error for duplicates in 'machines' * uncomment * avoid test failure * Revert "avoid test failure" This reverts commit `9442bdf00f`.	2021-03-10 12:02:27 -06:00
jmoralez	1d7b54d30f	[dask] include multiclass-classification task in tests (#4048 ) * include multiclass-classification task and task_to_model_factory dicts * define centers coordinates. flatten init_scores within each partition for multiclass-classification * include issue comment and fix linting error	2021-03-09 21:58:38 -06:00
jmoralez	37e987828d	[dask] Include support for init_score (#3950 ) * include support for init_score * use dataframe from init_score and test difference with and without init_score in local model * revert refactoring * initial docs. test between distributed models with and without init_score * remove ranker from tests * test value for root node and change docs * comma * re-include parametrize * fix incorrect merge * use single init_score and the booster_ attribute * use np.float64 instead of float	2021-03-04 11:50:08 -06:00
James Lamb	2a00b6ffbc	[dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031 ) (#4032 ) * [dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031) * Update tests/python_package_test/test_dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * try upgrading mixtexsetup * they changed the executable name UGH * more changes for executable name * another path change * changing package mirrors * undo experiments Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-03-02 16:29:08 +03:00
Nikita Titov	3ab6bbf9f3	[tests][dask] simplify fit calls in Dask tests (#4018 ) * simplify fit calls in Dask tests * Update .vsts-ci.yml * Update .vsts-ci.yml	2021-02-24 08:17:55 -06:00
jmoralez	5dacd603ba	[dask][python-package] include support for column array as label (#3943 ) * include support for column array as label * remove nested ifs * fix linting errors * include tests for sklearn regressors * include docstring for numpy_1d_array_to_dtype * include . at end of docstring * remove pandas import and test for regression, classification and ranking * check predictions of sklearn models as well * test training only in dask. drop pandas series tests * use PANDAS_INSTALLED and pd_Series * inline imports * use col array in fit for test_dask * include review comments	2021-02-24 14:47:49 +03:00
Nikita Titov	86a085f7ca	[tests][python] Add test for single leaf in linear tree (#4015 ) * Update test_engine.py * Update python_package.yml * Update python_package.yml * Update test_engine.py * hotfix	2021-02-24 18:46:05 +11:00
jmoralez	0e57657585	[dask] use random ports in network setup (#3823 ) * use socket.bind with port 0 and client.run to find random open ports * include test for found ports * find random open ports as default * parametrize local_listen_port. type hint to _find_random_open_port. fid open ports only on workers with data. * make indentation consistent and pass list of workers to client.run * remove socket import * change random port implementation * fix test	2021-02-23 22:14:12 -06:00
James Lamb	1f73f55938	[dask] allow tight control over ports (#3994 ) * [dask] allow tight control over ports * getting there, getting there * fix params maybe * fixing params * remove unnecessary stuff * fix tests * fixes * some minor changes * fix flaky test * linting * more linting * clarify parameter description * add warning * revert docs change * Update python-package/lightgbm/dask.py * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * trying to fix stuff * this is working * update tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * indent Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-02-23 23:48:53 +03:00
imjwang	eb5f471bc1	[tests][dask] add scikit-learn compatibility tests (fixes #3894 ) (#3947 ) * add test_dask.py * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * clients * remove ports * safe sklearn checks * safe sklearn checks * fix whitespace * fix whitespace-try 2 * fix whitespace-try 3 * isort * isort * sklearn_checks_to_learn Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-02-18 05:28:39 +03:00
James Lamb	a3f4831d75	[tests][dask] make find-open-port test more reliable (#3993 ) * [dask] make find-open-port test more reliable * use listen_port fixture * Apply suggestions from code review	2021-02-18 03:59:35 +03:00
Nikita Titov	75b9b0d3c8	[ci][python] hotfix imports order (#3992 )	2021-02-17 01:18:37 +03:00
Nikita Titov	1413c060b0	Run tests and build Python wheels for aarch64 architecture (#3948 ) * Update setup.sh * Update test.sh * Update test_dask.py * Update test_engine.py * Update .vsts-ci.yml	2021-02-16 23:35:37 +03:00
Nikita Titov	d6ebd063ff	[ci][python] run isort in CI linting job (#3990 ) * run isort in CI linting job * workaround conda compatibility issues	2021-02-16 20:09:13 +03:00
Zhuyi Xue	1248d55f0d	[ci][python] apply isort to tests/python_package_test/test_engine.py #3958 (#3981 )	2021-02-16 15:02:36 +03:00
Zhuyi Xue	9445b2ca26	[ci][python] apply isort to tests/python_package_test/test_basic.py #3958 (#3977 )	2021-02-16 03:06:09 +03:00
Zhuyi Xue	d64fcbe080	[ci][python] apply isort to tests/python_package_test/test_consistency.py #3958 (#3978 )	2021-02-16 03:04:03 +03:00
Zhuyi Xue	cac97d0c51	[ci][python] apply isort to tests/python_package_test/test_plotting.py #3958 (#3982 )	2021-02-16 01:10:58 +03:00
Zhuyi Xue	0cb94fa59a	[ci][python] apply isort to tests/python_package_test/test_utilities.py #3958 (#3984 )	2021-02-16 01:08:48 +03:00
Zhuyi Xue	cdfe97f5d7	[ci][python] apply isort to tests/cpp_test/test.py #3958 (#3976 )	2021-02-15 20:57:33 +03:00
Zhuyi Xue	07d7b7972f	[ci][python] apply isort to tests/c_api_test/test_.py #3958 (#3975 )	2021-02-15 20:56:46 +03:00
Zhuyi Xue	219f613a76	[ci][python] apply isort to tests/python_package_test/test_dual.py #3958 (#3980 )	2021-02-15 20:56:00 +03:00
Zhuyi Xue	1a294c87ff	[ci][python] apply isort to tests/python_package_test/test_dask.py #3958 (#3979 )	2021-02-15 17:50:19 +03:00
James Lamb	18d57934b0	[dask] test that Dask automatically treats 'category' columns as categorical features (#3932 )	2021-02-10 01:03:33 +03:00
James Lamb	06ed4337e0	[dask] [docs] Fix inaccuracies in API docs for Dask module (fixes #3871 ) (#3930 ) * got fit() working * add predict() * predict_proba() * remove custom objective docs * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * fix capitalization * Update tests/python_package_test/test_dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-02-09 15:28:10 -06:00
jmoralez	7b47ab8fad	[dask] test training when a worker has no data (#3897 ) * include test for training when a worker has no data * test single partition against local model for all tasks and outputs * remove futures_of * include james' comments * remove product import	2021-02-08 20:48:37 -06:00
James Lamb	37485fff5d	[dask] Add support for 'pred_leaf' in Dask estimators (fixes #3792 ) (#3919 ) * fix tests * fix tests * fix test comments * simplify tests * Apply suggestions from code review	2021-02-07 13:17:28 -06:00
GOusignu	6f127847dc	[dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (#3911 ) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907) * [dask] Add unit tests that signatures are the same between Dask and scikit-learn estimators (fixes microsoft#3907)	2021-02-07 11:38:48 -06:00
James Lamb	fc6b71e08e	[dask] Support Dask dataframes with 'category' columns (fixes #3861 ) (#3908 ) * add support for pandas categorical columns * remove commented code * quotes * syntax error * fix shape for ranker test * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_dask.py * trying * fix tests * remove unnecessary debugging stuff * skip accuracy checks on categorical * use category columns as categorical features Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-02-07 01:19:49 +03:00
Nikita Titov	b1e000c045	[dask] remove unused private _client attribute (#3904 ) * Update test_dask.py * Update dask.py * Update .vsts-ci.yml * Revert "Update .vsts-ci.yml" This reverts commit `98422be5b5`.	2021-02-03 10:44:08 -06:00

1 2 3 4 5 ...

509 Коммитов