Граф коммитов

2388 Коммитов

Автор SHA1 Сообщение Дата
shiyu1994 ec4bd1e0a4
set is_linear_ to false when it is absent from the model file (fix #3778) (#4056) 2021-03-13 00:44:18 +03:00
Nikita Titov e5c3f7e755
[docs] add Yu Shi to repo maintainers (#4060)
* Update FAQ.rst

* Update CODEOWNERS
2021-03-10 20:30:11 -06:00
Nikita Titov 8d0669fb4d
set 'pending' commit status for R Solaris optional workflow (#4061) 2021-03-10 18:29:00 -06:00
James Lamb 15853a7a02
[dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030)
* [dask] add tutorial documentation (fixes #3814, fixes #3838)

* add notes on saving the model

* quick start examples

* add examples

* fix timeouts in examples

* remove notebook

* fill out prediction section

* table of contents

* add line back

* linting

* isort

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* move examples under python-guide

* remove unused pickle import

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-10 13:34:43 -06:00
James Lamb 296397df7b
[dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)
* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf00f.
2021-03-10 12:02:27 -06:00
marcelonieva7 b75a43a05b
Update index.rst (#4029)
Add alt text to logo image

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-03-10 09:32:46 -06:00
jmoralez 1d7b54d30f
[dask] include multiclass-classification task in tests (#4048)
* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error
2021-03-09 21:58:38 -06:00
James Lamb 13680d89a1
[ci] add CMake + R 3.6 test back (fixes #3469) (#4053)
* [ci] add CMake + R 3.6 test back (fixes #3469)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .ci/test_r_package_windows.ps1

* -Wait and remove rtools40

* empty commit

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-09 18:54:01 -06:00
James Lamb 85bda857c0
[ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055)
* [ci] fix R CMD CHECK note about example timings (fixes #4049)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* empty commit

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-09 16:37:06 -06:00
James Lamb 4e9c976867
[ci] prevent getting incompatible dask and distributed versions (#4054)
* [ci] prevent getting incompatible dask and distributed versions

* Update .ci/test.sh

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* empty commit

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-09 11:20:53 -06:00
James Lamb 3a5e3c001f
[ci] ignore untitle Jupyter notebooks in .gitignore (#4047) 2021-03-05 15:20:55 +03:00
jmoralez 37e987828d
[dask] Include support for init_score (#3950)
* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float
2021-03-04 11:50:08 -06:00
shiyu1994 19f357726c
[docs] update description of deterministic parameter (#4027)
* update description of deterministic parameter to require using with force_row_wise or force_col_wise

* Update include/LightGBM/config.h

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update docs

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-04 15:33:54 +03:00
James Lamb 87c37bf04f
[ci] [R-package] upgrade to R 4.0.4 in CI (#4042) 2021-03-03 14:12:27 +08:00
Subham Agrawal f92aa54fa0
[docs] Add alt text to image in Parameters-Tuning.rst (#4035)
* [docs] Add alt text to image in Parameters-Tuning.rst

Add alt text to Leaf-wise growth image, as part of #4028

* Update docs/Parameters-Tuning.rst

Co-authored-by: James Lamb <jaylamb20@gmail.com>

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-03-02 13:43:01 -06:00
James Lamb 2a00b6ffbc
[dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031) (#4032)
* [dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031)

* Update tests/python_package_test/test_dask.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* try upgrading mixtexsetup

* they changed the executable name UGH

* more changes for executable name

* another path change

* changing package mirrors

* undo experiments

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-03-02 16:29:08 +03:00
Qingyun Wu 6356e659af
[docs] Add FLAML for efficient hyperparameter optimization (#4013)
* add FLAML for HPO in DOC

* add FLAML for HPO

* revise FLAML phasing

* Update docs/Parameters-Tuning.rst

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update README.md

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-24 08:25:01 -06:00
Nikita Titov 3ab6bbf9f3
[tests][dask] simplify fit calls in Dask tests (#4018)
* simplify fit calls in Dask tests

* Update .vsts-ci.yml

* Update .vsts-ci.yml
2021-02-24 08:17:55 -06:00
jmoralez 5dacd603ba
[dask][python-package] include support for column array as label (#3943)
* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments
2021-02-24 14:47:49 +03:00
Nikita Titov 86a085f7ca
[tests][python] Add test for single leaf in linear tree (#4015)
* Update test_engine.py

* Update python_package.yml

* Update python_package.yml

* Update test_engine.py

* hotfix
2021-02-24 18:46:05 +11:00
jmoralez 0e57657585
[dask] use random ports in network setup (#3823)
* use socket.bind with port 0 and client.run to find random open ports

* include test for found ports

* find random open ports as default

* parametrize local_listen_port. type hint to _find_random_open_port. fid open ports only on workers with data.

* make indentation consistent and pass list of workers to client.run

* remove socket import

* change random port implementation

* fix test
2021-02-23 22:14:12 -06:00
Nikita Titov 7777852a19
[dask] Reuse addresses saved in variable (#4016) 2021-02-24 04:05:41 +03:00
James Lamb 1f73f55938
[dask] allow tight control over ports (#3994)
* [dask] allow tight control over ports

* getting there, getting there

* fix params maybe

* fixing params

* remove unnecessary stuff

* fix tests

* fixes

* some minor changes

* fix flaky test

* linting

* more linting

* clarify parameter description

* add warning

* revert docs change

* Update python-package/lightgbm/dask.py

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* trying to fix stuff

* this is working

* update tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* indent

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-23 23:48:53 +03:00
Belinda Trotta b09c1ff70d
[DOCS] Update docs to note that pred_contrib is not available for linear trees (#4006)
* Update docs to note that pred_contrib is not available for linear trees

* Add warning in code

* Change warning to error
2021-02-23 17:52:13 +03:00
James Lamb 7171558444
[doc] Reorganize documentation on distributed learning (fixes #3596) (#3951)
* rework distributed learning page

* more references

* more changes

* more changes

* add anchors for olds links

* revert changes from #4000

* fix links

* more links

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docs/Parallel-Learning-Guide.rst

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-21 20:43:02 -06:00
mjmckp 605c97b5ee
Fix evalution of linear trees with a single leaf. (#3987)
* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* Fix evalution of linear trees with a single leaf.

Note that trees without linear models at the leaf always handle num_leaves = 1 as a special case and directly output the leaf value.  Linear trees were missing this special case handling, and hence would have the following issues:
 * Calling Tree::Predict or Tree::PredictByMap would cause an access violation exception attempting to access the first value of the empty split_feature_ array in GetLeaf.
 * PredictionFunLinear would either cause an access violation or go into an infinite loop when attempting to do the equivalent of GetLeaf.

Note also that PredictionFun does not need the same changes as PredictionFunLinear, since both are only called by Tree::AddPredictionToScore, which has a special case for (!is_linear_ && num_leaves_ <= 1) that precludes calling PredictionFun.

Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2021-02-21 17:15:16 -06:00
James Lamb b1d382ee0c
[ci] prefer older binary to new source for R packages on Mac builds (fixes #4008) (#4010)
* [ci] prefer older binary to new source for R packages

* back to binary

* preserve choice on Linux
2021-02-21 17:11:17 -06:00
James Lamb 646267d265
[dask] use more specific method names on _DaskLGBMModel (#4004) 2021-02-20 14:39:55 +03:00
mjmckp 7f91dc66f9
Use high precision conversion from double to string in Tree::ToString() for new linear tree members (#3938)
* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* In Tree::ToString() method, print double values for linear tree models with high precision, so that the tree may be accurately reproduced elsewhere (LightGBM.Net in particular)

* Need to use more precise StringToArray instead of StringToArrayFast when parsing double valued arrays for linear trees, to ensure models round-trip via string or file correctly.

Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2021-02-20 07:28:18 +08:00
James Lamb 7880b79fde
[docs] Change some 'parallel learning' references to 'distributed learning' (#4000)
* [docs] Change some 'parallel learning' references to 'distributed learning'

* found a few more

* one more reference
2021-02-19 09:47:30 -06:00
James Lamb 0ee4d37fb5
remove commented-out code in cross-entropy metric source (#3999) 2021-02-18 23:09:17 -06:00
imjwang eb5f471bc1
[tests][dask] add scikit-learn compatibility tests (fixes #3894) (#3947)
* add test_dask.py

* Update tests/python_package_test/test_dask.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* clients

* remove ports

* safe sklearn checks

* safe sklearn checks

* fix whitespace

* fix whitespace-try 2

* fix whitespace-try 3

* isort

* isort

* sklearn_checks_to_learn

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-02-18 05:28:39 +03:00
James Lamb a3f4831d75
[tests][dask] make find-open-port test more reliable (#3993)
* [dask] make find-open-port test more reliable

* use listen_port fixture

* Apply suggestions from code review
2021-02-18 03:59:35 +03:00
mjmckp 5321fef67b
Fix for CreatePredictor function and VS2017 Debug build (#3937)
* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* Fix for CreatePredictor function: for VS2017 in Debug build, the previous version would end up giving an uninitialised prediction function that would throw access violation exceptions when invoked.

Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2021-02-17 15:17:11 +03:00
Alex Ford de8c610512
Optimize array-from-ctypes in basic.py (#3927)
Approximately %80 of runtime when loading "low column count, high row
count" DataFrames into Datasets is consumed in `np.fromiter`, called
as part of the `Dataset.get_field` method.

This is particularly pernicious hotspot, as unlike other ctypes-based
methods this is a hot loop over a python iterator loop and causes
significant GIL-contention in multi-threaded applications.

Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
which allows a single-shot `copy` of the underlying array.

This reduces the load time of a ~35 million row categorical dataframe
with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
execution.
2021-02-16 23:23:48 -06:00
Nikita Titov 75b9b0d3c8
[ci][python] hotfix imports order (#3992) 2021-02-17 01:18:37 +03:00
Nikita Titov 1413c060b0
Run tests and build Python wheels for aarch64 architecture (#3948)
* Update setup.sh

* Update test.sh

* Update test_dask.py

* Update test_engine.py

* Update .vsts-ci.yml
2021-02-16 23:35:37 +03:00
Nikita Titov d6ebd063ff
[ci][python] run isort in CI linting job (#3990)
* run isort in CI linting job

* workaround conda compatibility issues
2021-02-16 20:09:13 +03:00
Zhuyi Xue 4ae59494ab
[ci][python] apply isort to python-package/lightgbm/compat.py #3958 (#3968) 2021-02-16 15:05:33 +03:00
Zhuyi Xue 6110bd1585
[ci][python] apply isort to python-package/lightgbm/engine.py #3958 (#3970) 2021-02-16 15:04:21 +03:00
Zhuyi Xue 1248d55f0d
[ci][python] apply isort to tests/python_package_test/test_engine.py #3958 (#3981) 2021-02-16 15:02:36 +03:00
Zhuyi Xue af0c226057
[ci][python] apply isort to python-package/lightgbm/basic.py #3958 (#3967) 2021-02-16 05:22:12 +03:00
Zhuyi Xue 9b64b9c91b
[ci][python] apply isort to python-package/lightgbm/__init__.py #3958 (#3966) 2021-02-16 04:03:00 +03:00
Zhuyi Xue acb677416a
[ci][python] apply isort to python-package/lightgbm/sklearn.py #3958 (#3973) 2021-02-16 03:37:16 +03:00
Zhuyi Xue 9445b2ca26
[ci][python] apply isort to tests/python_package_test/test_basic.py #3958 (#3977) 2021-02-16 03:06:09 +03:00
Zhuyi Xue d64fcbe080
[ci][python] apply isort to tests/python_package_test/test_consistency.py #3958 (#3978) 2021-02-16 03:04:03 +03:00
Zhuyi Xue cac97d0c51
[ci][python] apply isort to tests/python_package_test/test_plotting.py #3958 (#3982) 2021-02-16 01:10:58 +03:00
Zhuyi Xue 0cb94fa59a
[ci][python] apply isort to tests/python_package_test/test_utilities.py #3958 (#3984) 2021-02-16 01:08:48 +03:00
Zhuyi Xue 332b0db5ef
[ci][python] apply isort to python-package/setup.py #3958 (#3974) 2021-02-15 23:44:21 +03:00
Zhuyi Xue e9ea85bd06
[ci][python] apply isort to python-package/lightgbm/plotting.py #3958 (#3972) 2021-02-15 23:41:21 +03:00