LightGBM

Граф коммитов

Автор	SHA1	Сообщение	Дата
James Lamb	668bf5dadf	[python-package] deprecate support for H2O 'datatable' (#6670 ) Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2024-10-15 08:32:14 -05:00
James Lamb	e0cda880fc	[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' (#6468 )	2024-06-03 20:17:40 -05:00
Nikita Titov	ce486e5b45	[python] remove `early_stopping_rounds` argument of `train()` and `cv()` functions (#4908 )	2021-12-26 17:20:49 +03:00
James Lamb	417ba19217	[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034 ) (#4545 ) * documentation changes * add list of supported formats to error message * add unit tests * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update per review comments * make references consistent Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-08-24 21:33:13 -05:00
Chen Yufei	c359896e9b	[python-package] Create Dataset from multiple data files (#4089 ) * [python-package] create Dataset from sampled data. * [python-package] create Dataset from List[Sequence]. 1. Use random access for data sampling 2. Support read data from multiple input files 3. Read data in batch so no need to hold all data in memory * [python-package] example: create Dataset from multiple HDF5 file. * fix: revert is_class implementation for seq * fix: unwanted memory view reference for seq * fix: seq is_class accepts sklearn matrices * fix: requirements for example * fix: pycode * feat: print static code linting stage * fix: linting: avoid shell str regex conversion * code style: doc style * code style: isort * fix ci dependency: h5py on windows * [py] remove rm files in test seq https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623 * docs(python): init_from_sample summary https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389 * remove dataset dump sample data debugging code. * remove typo fix. Create separate PR for this. * fix typo in src/c_api.cpp Co-authored-by: James Lamb <jaylamb20@gmail.com> * style(linting): py3 type hint for seq * test(basic): os.path style path handling * Revert "feat: print static code linting stage" This reverts commit `10bd79f7f8`. * feat(python): sequence on validation set * minor(python): comment * minor(python): test option hint * style(python): fix code linting * style(python): add pydoc for ref_dataset * doc(python): sequence Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> * revert(python): sequence class abc * chore(python): remove rm_files * Remove useless static_assert. * refactor: test_basic test for sequence. * fix lint complaint. * remove dataset._dump_text in sequence test. * Fix reverting typo fix. * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Fix type hint, code and doc style. * fix failing test_basic. * Remove TODO about keep constant in sync with cpp. * Install h5py only when running python-examples. * Fix lint complaint. * Apply suggestions from code review Co-authored-by: James Lamb <jaylamb20@gmail.com> * Doc fixes, remove unused params_str in __init_from_seqs. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Remove unnecessary conda install in windows ci script. * Keep param as example in dataset_from_multi_hdf5.py * Add _get_sample_count function to remove code duplication. * Use batch_size parameter in generate_hdf. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Fix after applying suggestions. * Fix test, check idx is instance of numbers.Integral. * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Expose Sequence class in Python-API doc. * Handle Sequence object not having batch_size. * Fix isort lint complaint. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update docstring to mention Sequence as data input. * Remove get_one_line in test_basic.py * Make Sequence an abstract class. * Reduce number of tests for test_sequence. * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices. * empty commit to trigger ci * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t. Also rename total_nrow to num_total_row in c_api.h for consistency. * Doc about Sequence in docs/Python-Intro.rst. * Fix: basic.py change LGBM_SampleIndices out_len to int32. * Add create_valid test case with Dataset from Sequence. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Apply suggestions from code review Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT. * Update python-package/lightgbm/basic.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Willian Zhang <willian@willian.email> Co-authored-by: Willian Z <Willian@Willian-Zhang.com> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: shiyu1994 <shiyu_k1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-07-02 15:17:17 +03:00
Andrew Ziem	e79716e0b6	Correct spelling (#4250 ) * Correct spelling Most changes were in comments, and there were a few changes to literals for log output. There were no changes to variable names, function names, IDs, or functionality. * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by: James Lamb <jaylamb20@gmail.com> * Correct spelling Most are code comments, but one case is a literal in a logging message. There are a few grammar fixes too. Co-authored-by: James Lamb <jaylamb20@gmail.com>	2021-05-04 10:10:55 -05:00
Gaurav Chopra	c10b0430b8	[docs] fix typo: one-hot coding should be one-hot encoding (#3898 ) * Update Python-Intro.rst * Update docs/Python-Intro.rst Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-02-08 01:16:18 +03:00
Ray Bell	78d31d9ae3	[docs][python] add conda-forge install instructions (#3544 ) * DOC: add conda-forge install instructions * DOC: add conda-forge instructions * DOC: fix hyperlink * DOC: point to installation guide * add detailed * Update python-package/README.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/README.rst Co-authored-by: James Lamb <jaylamb20@gmail.com> * rm characters * add pip install * add : * Update python-package/README.rst Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/README.rst Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * remove pip from header * channel Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>	2021-01-11 20:01:25 +03:00
Guolin Ke	0d45ebd65c	[docs] Simplify the python installation instruction (#3378 ) * Update Python-Intro.rst * Update README.rst	2020-09-11 23:10:30 +03:00
Nikita Titov	c633c6c2af	[python] Re-enable scikit-learn 0.22+ support (#2949 ) * Revert "specify the last supported version of scikit-learn (#2637)" This reverts commit `d100277649`. * ban scikit-learn 0.22.0 and skip broken test * fix updated test * fix lint test * Revert "fix lint test" This reverts commit `8b4db0805f`.	2020-04-10 12:53:21 +09:00
Nikita Titov	d100277649	specify the last supported version of scikit-learn (#2637 )	2019-12-19 19:00:29 +08:00
Nikita Titov	4848776fb1	[docs] clarified support of LibSVM zero-based format files (#2504 )	2019-10-14 13:41:17 +03:00
leasunhy	6a1a538f45	[docs] remove duplicated param in Python-Intro.rst (#2181 ) `num_round` is redundant here because it will be overrideen by `num_trees` in the `param` dictionary.	2019-05-18 20:51:55 +03:00
Nikita Titov	f91e5644a3	[python] added ability to pass first_metric_only in params (#2175 ) * added ability to pass first_metric_only in params * simplified tests * fixed test * fixed punctuation	2019-05-15 15:44:37 +03:00
Guolin Ke	94fbe5bb9f	[docs] updated Microsoft GitHub URL (#2152 ) * fix travis badge * updated GitHub Microsoft URL	2019-05-08 13:51:28 +08:00
Nikita Titov	b3c31c4015	[docs] Python wrapper doesn't support params in form of list of pairs (#2078 ) * fixed Python intro * fixed typos * scikit-learn added support of https	2019-04-10 13:26:12 +03:00
sheikheddy	fe115bbb72	[docs] Fix typo in Python-Intro.rst (#2074 )	2019-04-03 00:21:40 +03:00
James Lamb	572ae40038	[docs] Small aesthetic improvements to RTD docs (#2060 ) * Small aesthetic improvements to RTD docs * fixed markdown table in Development-Guide * removed unnecessary blank line in conf.py	2019-03-26 22:55:17 +03:00
kenmatsu4	011cc90a77	[python] Use first_metric_only flag for early_stopping function. (#2049 ) * Use first_metric_only flag for early_stopping function. In order to apply early stopping with only first metric, applying first_metric_only flag for early_stopping function. * upcate comment * Revert "upcate comment" This reverts commit `1e75a1a415`. * added test * fixed docstring * cut comment and save one line * document new feature	2019-03-25 13:18:22 +08:00
Nikita Titov	c5cfe3e3e4	[python] update DataTable handling (#2020 )	2019-02-21 19:13:09 +03:00
Harry Moreno	7ebf80f8cf	Fix wording (#2015 )	2019-02-18 14:39:59 +08:00
Harry Moreno	a777aedd9d	Change variable name test_data > validation_data (#2018 ) * it is confusing to name validation data `test_data` especially as terms like train, validation, test splits are common in ML. Change variable name in python quick start.	2019-02-18 14:39:09 +08:00
Guolin Ke	2c9d332057	[python] convert datatable to numpy directly (#1970 ) * convert datatable to numpy directly * fix according to comments * updated more docstrings * simplified isinstance check * Update compat.py	2019-02-05 00:48:22 +08:00
Nikita Titov	f3dce7e6e7	[docs] corrected misleading note about best_iteration (#1758 ) * removed misleading note about best_iteration * Update engine.py * Update Python-Intro.rst * Updated Engine.py * Updated Python-Intro.rst * add article 'the best', break huge line and remove excess empty line	2018-10-16 23:02:50 +08:00
Alex	ac6951d368	[docs] fixed some typos and grammatical errors (#1738 )	2018-10-10 16:34:16 +08:00
Nikita Titov	536f5ddeb0	[docs] minor docs enhancements (#1647 ) * added links to corresponding params in Quick-Start guide * updated description of possible input types in python * clarify list of numpy arrays input type in docs	2018-09-08 16:06:09 +08:00
Nikita Titov	cd6d058386	various improvements around metric param and early_stopping_rounds param description (#1589 ) * bring consistency and clearness into early_stopping_rounds desc, metric desc and implementation * hotfix * hotfix * used NDCG as default metric for lambdarank task * fixed missed methods at ReadTheDocs and changed default eval_metric * leaved only unique metrics * fixed comment	2018-08-27 14:46:18 +03:00
Nikita Titov	a39c848e64	[docs][python] made OS detection more reliable and little docs improvements (#1414 ) * added missed description of plot_example in python_guide folder and fixed consistency for packages naming * more reliable OS detection * fixed grammar * made pylint happy	2018-06-03 12:46:59 +03:00
Zach Kurtz	af40156146	[docs] Edits for grammer and clarity (#1389 ) * A nitpicky grammer edit with minor clarifications added. * fix link * strike s * try a different optimal-split link, clarify experimental details * smoothing the FAQ * edit Features.rst * several minor edits throughout docs * historgram-based	2018-05-26 22:14:38 +03:00
Fujii Hironori	e2a0de5082	[docs][python][R] early_stopping_rounds doesn't check all of eval_set (#1393 ) The document of `early_stopping_rounds` says it will check all of eval_set. But, this is not true. It doesn't check the dataset specified as the training data. This change appends an extra phrase "except the training data" to all of the sentences "If there's more than one, will check all of them" in documents.	2018-05-24 13:01:46 +03:00
Misha Lisovyi	d1fd52e9d8	[python][docs] add info on adaptive learning rate in the sklearn API (#1354 ) * add info on adaptive learning rate in the sklearn API * adjust learning rate documentation following the PR discussion * fix early stopping documentation * improve wording * fixing trailing spaces	2018-05-05 09:09:58 +08:00
Darío Hereñú	819df01278	[docs] Typo on #119 (#1166 )	2018-01-01 17:59:28 +03:00
Nikita Titov	968a353f4a	fixed typos (#1155 )	2017-12-30 08:42:20 +08:00
Nikita Titov	4aa3296739	[docs] documentation improvement (#976 ) * fixed typos and hotfixes * converted gcc-tips.Rmd; added ref to gcc-tips * renamed files * renamed Advanced-Topics * renamed README * renamed Parameters-Tuning * renamed FAQ * fixed refs to FAQ * fixed undecodable source characters * renamed Features * renamed Quick-Start * fixed undecodable source characters in Features * renamed Python-Intro * renamed GPU-Tutorial * renamed GPU-Windows * fixed markdown * fixed undecodable source characters in GPU-Windows * renamed Parameters * fixed markdown * removed recommonmark dependence * hotfixes * added anchors to links * fixed 404 * fixed typos * added more anchors * removed sphinxcontrib-napoleon dependence * removed outdated line in Travis config * fixed max-width of the ReadTheDocs theme * added horizontal align to images	2017-10-12 21:34:23 +09:00

34 Коммитов