Граф коммитов

34 Коммитов

Автор SHA1 Сообщение Дата
James Lamb 668bf5dadf
[python-package] deprecate support for H2O 'datatable' (#6670)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2024-10-15 08:32:14 -05:00
James Lamb e0cda880fc
[python-package] remove uses of deprecated NumPy random number generation APIs, require 'numpy>=1.17.0' (#6468) 2024-06-03 20:17:40 -05:00
Nikita Titov ce486e5b45
[python] remove `early_stopping_rounds` argument of `train()` and `cv()` functions (#4908) 2021-12-26 17:20:49 +03:00
James Lamb 417ba19217
[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) (#4545)
* documentation changes

* add list of supported formats to error message

* add unit tests

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update per review comments

* make references consistent

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-08-24 21:33:13 -05:00
Chen Yufei c359896e9b
[python-package] Create Dataset from multiple data files (#4089)
* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389

* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence

Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review

Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-07-02 15:17:17 +03:00
Andrew Ziem e79716e0b6
Correct spelling (#4250)
* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.

Co-authored-by: James Lamb <jaylamb20@gmail.com>
2021-05-04 10:10:55 -05:00
Gaurav Chopra c10b0430b8
[docs] fix typo: one-hot coding should be one-hot encoding (#3898)
* Update Python-Intro.rst

* Update docs/Python-Intro.rst

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-02-08 01:16:18 +03:00
Ray Bell 78d31d9ae3
[docs][python] add conda-forge install instructions (#3544)
* DOC: add conda-forge install instructions

* DOC: add conda-forge instructions

* DOC: fix hyperlink

* DOC: point to installation guide

* add detailed

* Update python-package/README.rst

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/README.rst

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* rm characters

* add pip install

* add :

* Update python-package/README.rst

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/README.rst

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* remove pip from header

* channel

Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2021-01-11 20:01:25 +03:00
Guolin Ke 0d45ebd65c
[docs] Simplify the python installation instruction (#3378)
* Update Python-Intro.rst

* Update README.rst
2020-09-11 23:10:30 +03:00
Nikita Titov c633c6c2af
[python] Re-enable scikit-learn 0.22+ support (#2949)
* Revert "specify the last supported version of scikit-learn (#2637)"

This reverts commit d100277649.

* ban scikit-learn 0.22.0 and skip broken test

* fix updated test

* fix lint test

* Revert "fix lint test"

This reverts commit 8b4db0805f.
2020-04-10 12:53:21 +09:00
Nikita Titov d100277649 specify the last supported version of scikit-learn (#2637) 2019-12-19 19:00:29 +08:00
Nikita Titov 4848776fb1
[docs] clarified support of LibSVM zero-based format files (#2504) 2019-10-14 13:41:17 +03:00
leasunhy 6a1a538f45 [docs] remove duplicated param in Python-Intro.rst (#2181)
`num_round` is redundant here because it will be overrideen by `num_trees` in the `param` dictionary.
2019-05-18 20:51:55 +03:00
Nikita Titov f91e5644a3
[python] added ability to pass first_metric_only in params (#2175)
* added ability to pass first_metric_only in params

* simplified tests

* fixed test

* fixed punctuation
2019-05-15 15:44:37 +03:00
Guolin Ke 94fbe5bb9f
[docs] updated Microsoft GitHub URL (#2152)
* fix travis badge

* updated GitHub Microsoft URL
2019-05-08 13:51:28 +08:00
Nikita Titov b3c31c4015
[docs] Python wrapper doesn't support params in form of list of pairs (#2078)
* fixed Python intro

* fixed typos

* scikit-learn added support of https
2019-04-10 13:26:12 +03:00
sheikheddy fe115bbb72 [docs] Fix typo in Python-Intro.rst (#2074) 2019-04-03 00:21:40 +03:00
James Lamb 572ae40038 [docs] Small aesthetic improvements to RTD docs (#2060)
* Small aesthetic improvements to RTD docs

* fixed markdown table in Development-Guide

* removed unnecessary blank line in conf.py
2019-03-26 22:55:17 +03:00
kenmatsu4 011cc90a77 [python] Use first_metric_only flag for early_stopping function. (#2049)
* Use first_metric_only flag for early_stopping function.

In order to apply early stopping with only first metric, applying first_metric_only flag for early_stopping function.

* upcate comment

* Revert "upcate comment"

This reverts commit 1e75a1a415.

* added test

* fixed docstring

* cut comment and save one line

* document new feature
2019-03-25 13:18:22 +08:00
Nikita Titov c5cfe3e3e4
[python] update DataTable handling (#2020) 2019-02-21 19:13:09 +03:00
Harry Moreno 7ebf80f8cf Fix wording (#2015) 2019-02-18 14:39:59 +08:00
Harry Moreno a777aedd9d Change variable name test_data > validation_data (#2018)
* it is confusing to name validation data `test_data` especially as terms like train, validation, test splits are common in ML. Change variable name in python quick start.
2019-02-18 14:39:09 +08:00
Guolin Ke 2c9d332057
[python] convert datatable to numpy directly (#1970)
* convert datatable to numpy directly

* fix according to comments

* updated more docstrings

* simplified isinstance check

* Update compat.py
2019-02-05 00:48:22 +08:00
Nikita Titov f3dce7e6e7 [docs] corrected misleading note about best_iteration (#1758)
* removed misleading note about best_iteration

* Update engine.py

* Update Python-Intro.rst

* Updated Engine.py

* Updated Python-Intro.rst

* add article 'the best', break huge line and remove excess empty line
2018-10-16 23:02:50 +08:00
Alex ac6951d368 [docs] fixed some typos and grammatical errors (#1738) 2018-10-10 16:34:16 +08:00
Nikita Titov 536f5ddeb0 [docs] minor docs enhancements (#1647)
* added links to corresponding params in Quick-Start guide

* updated description of possible input types in python

* clarify list of numpy arrays input type in docs
2018-09-08 16:06:09 +08:00
Nikita Titov cd6d058386
various improvements around metric param and early_stopping_rounds param description (#1589)
* bring consistency and clearness into early_stopping_rounds desc, metric desc and implementation

* hotfix

* hotfix

* used NDCG as default metric for lambdarank task

* fixed missed methods at ReadTheDocs and changed default eval_metric

* leaved only unique metrics

* fixed comment
2018-08-27 14:46:18 +03:00
Nikita Titov a39c848e64
[docs][python] made OS detection more reliable and little docs improvements (#1414)
* added missed description of plot_example in python_guide folder and fixed consistency for packages naming

* more reliable OS detection

* fixed grammar

* made pylint happy
2018-06-03 12:46:59 +03:00
Zach Kurtz af40156146 [docs] Edits for grammer and clarity (#1389)
* A nitpicky grammer edit with minor clarifications added.

* fix link

* strike s

* try a different optimal-split link, clarify experimental details

* smoothing the FAQ

* edit Features.rst

* several minor edits throughout docs

* historgram-based
2018-05-26 22:14:38 +03:00
Fujii Hironori e2a0de5082 [docs][python][R] early_stopping_rounds doesn't check all of eval_set (#1393)
The document of `early_stopping_rounds` says it will check all of
eval_set. But, this is not true. It doesn't check the dataset
specified as the training data.

This change appends an extra phrase "except the training data" to all
of the sentences "If there's more than one, will check all of them" in
documents.
2018-05-24 13:01:46 +03:00
Misha Lisovyi d1fd52e9d8 [python][docs] add info on adaptive learning rate in the sklearn API (#1354)
* add info on adaptive learning rate in the sklearn API

* adjust learning rate documentation following the PR discussion

* fix early stopping documentation

* improve wording

* fixing trailing spaces
2018-05-05 09:09:58 +08:00
Darío Hereñú 819df01278 [docs] Typo on #119 (#1166) 2018-01-01 17:59:28 +03:00
Nikita Titov 968a353f4a fixed typos (#1155) 2017-12-30 08:42:20 +08:00
Nikita Titov 4aa3296739 [docs] documentation improvement (#976)
* fixed typos and hotfixes

* converted gcc-tips.Rmd; added ref to gcc-tips

* renamed files

* renamed Advanced-Topics

* renamed README

* renamed Parameters-Tuning

* renamed FAQ

* fixed refs to FAQ

* fixed undecodable source characters

* renamed Features

* renamed Quick-Start

* fixed undecodable source characters in Features

* renamed Python-Intro

* renamed GPU-Tutorial

* renamed GPU-Windows

* fixed markdown

* fixed undecodable source characters in GPU-Windows

* renamed Parameters

* fixed markdown

* removed recommonmark dependence

* hotfixes

* added anchors to links

* fixed 404

* fixed typos

* added more anchors

* removed sphinxcontrib-napoleon dependence

* removed outdated line in Travis config

* fixed max-width of the ReadTheDocs theme

* added horizontal align to images
2017-10-12 21:34:23 +09:00