* [python-package] create Dataset from sampled data.
* [python-package] create Dataset from List[Sequence].
1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory
* [python-package] example: create Dataset from multiple HDF5 file.
* fix: revert is_class implementation for seq
* fix: unwanted memory view reference for seq
* fix: seq is_class accepts sklearn matrices
* fix: requirements for example
* fix: pycode
* feat: print static code linting stage
* fix: linting: avoid shell str regex conversion
* code style: doc style
* code style: isort
* fix ci dependency: h5py on windows
* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
* docs(python): init_from_sample summary
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
* remove dataset dump sample data debugging code.
* remove typo fix.
Create separate PR for this.
* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* style(linting): py3 type hint for seq
* test(basic): os.path style path handling
* Revert "feat: print static code linting stage"
This reverts commit 10bd79f7f8.
* feat(python): sequence on validation set
* minor(python): comment
* minor(python): test option hint
* style(python): fix code linting
* style(python): add pydoc for ref_dataset
* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* revert(python): sequence class abc
* chore(python): remove rm_files
* Remove useless static_assert.
* refactor: test_basic test for sequence.
* fix lint complaint.
* remove dataset._dump_text in sequence test.
* Fix reverting typo fix.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Fix type hint, code and doc style.
* fix failing test_basic.
* Remove TODO about keep constant in sync with cpp.
* Install h5py only when running python-examples.
* Fix lint complaint.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Doc fixes, remove unused params_str in __init_from_seqs.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Remove unnecessary conda install in windows ci script.
* Keep param as example in dataset_from_multi_hdf5.py
* Add _get_sample_count function to remove code duplication.
* Use batch_size parameter in generate_hdf.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Fix after applying suggestions.
* Fix test, check idx is instance of numbers.Integral.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Expose Sequence class in Python-API doc.
* Handle Sequence object not having batch_size.
* Fix isort lint complaint.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update docstring to mention Sequence as data input.
* Remove get_one_line in test_basic.py
* Make Sequence an abstract class.
* Reduce number of tests for test_sequence.
* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
* empty commit to trigger ci
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
Also rename total_nrow to num_total_row in c_api.h for consistency.
* Doc about Sequence in docs/Python-Intro.rst.
* Fix: basic.py change LGBM_SampleIndices out_len to int32.
* Add create_valid test case with Dataset from Sequence.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* centralize Python-package logging in one place
* continue
* fix test name
* removed unused import
* enhance test
* fix lint
* hotfix test
* workaround for GPU test
* remove custom logger from Dask-package
* replace one log func with flags by multiple funcs
* [python] add return_cvbooster flag to cv function and rename _CVBooster to make public (#283,#2105)
* [python] Reduce expected metric of unit testing
* [docs] add the CVBooster to the documentation
* [python] reflect the review comments
- Add some clarifications to the documentation
- Rename CVBooster.append to make private
- Decrease iteration rounds of testing to save CI time
- Use CVBooster as root member of lgb
* [python] add more checks in testing for cv
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* [python] add docstring for instance attributes of CVBooster
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* [python] fix docstring
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Revert "specify the last supported version of scikit-learn (#2637)"
This reverts commit d100277649.
* ban scikit-learn 0.22.0 and skip broken test
* fix updated test
* fix lint test
* Revert "fix lint test"
This reverts commit 8b4db0805f.
* 🎨 `sphinx.ext.autosummary` for generating Python-API summaries
Add `docs/.gitignore` to not track autosummary stubs
Add `sphinx.ext.autosummary` in `docs/conf.py`
Add 'members' and 'inherited-members' as default parameters
Add 'autosummary = True' for setting output with `:toctree:`
Add `.. autosummary::` tags to replace `.. autoclass::`
Previously the `Python-API.rst` dumped all of the Python API onto
a single page.
This replaces the Python-API documentation with an index listing
all modules, and paginates all functions and classes onto
separate pages.
* ✏️ Corrections following feedback
Drop `docs/.gitignore` to use the general `.gitignore`
Add `show-inheritance` to `autodoc_default_flags` in `docs/conf.py`
Fix `both` to `class` in `autoclass_content` in `docs/conf.py`
* ✏️ Replacing deprecated Sphinx parameter
Fix deprecated `autodoc_default_flags` to `autodoc_default_options`
* ✏️ Adding `autodoc_default_flags` in to support early Sphinx versions
Add `autodoc_default_flags` with parameters from
`autodoc_default_options`
* added plot_split_value_histogram function
* updated init module
* added plot split value histogram example
* added plot_split_value_histogram to notebook
* added test
* fixed pylint
* updated API docs
* fixed grammar
* set y ticks to int value in more sufficient way
* bring consistency and clearness into early_stopping_rounds desc, metric desc and implementation
* hotfix
* hotfix
* used NDCG as default metric for lambdarank task
* fixed missed methods at ReadTheDocs and changed default eval_metric
* leaved only unique metrics
* fixed comment
* fixed Python-API references
* moved Features section to ReadTheDocs
* fixed index of ReadTheDocs
* moved Experiments section to ReadTheDocs
* fixed capital letter
* fixed citing
* moved Parallel Learning section to ReadTheDocs
* fixed markdown
* fixed Python-API
* fixed link to Quick-Start
* fixed gpu docker README
* moved Installation Guide from wiki to ReadTheDocs
* removed references to wiki
* fixed capital letters in headings
* hotfixes
* fixed non-Unicode symbols and reference to Python API
* fixed citing references
* fixed links in .md files
* fixed links in .rst files
* store images locally in the repo
* fixed missed word
* fixed indent in Experiments.rst
* fixed 'Duplicate implicit target name' message which is successfully
resolved by adding anchors
* less verbose
* prevented maito: ref creation
* fixed indents
* fixed 404
* fixed 403
* fixed 301
* fixed fake anchors
* fixed file extentions
* fixed Sphinx warnings
* added StrikerRUS profile link to FAQ
* added henry0312 profile link to FAQ