* [Python-package] FIX fix the metrics of classify module
* [Python-package] FIX fix the metrics of classify module
* [Python-package] FIX fix the metrics of classify module
* [Python-package] FIX fix the metrics of classify module
* [Python-package] FIX fix the metrics of classify module
* feat: support custom metrics in params
* feat: support objective in params
* test: custom objective and metric
* fix: imports are incorrectly sorted
* feat: convert eval metrics str and set to list
* feat: convert single callable eval_metric to list
* test: single callable objective in params
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* feat: callable fobj in basic cv function
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: cv support objective callable
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* fix: assert in cv_res
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* docs: objective callable in params
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* recover test_boost_from_average_with_single_leaf_trees
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* linters fail
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* remove metrics helper functions
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* feat: choose objective through _choose_param_values
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: test objective through _choose_param_values
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: test objective is callabe in train
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: parametrize choose_param_value with objective aliases
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: cv booster metric is none
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* fix: if string and callable choose callable
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test train uses custom objective metrics
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: cv uses custom objective metrics
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* refactor: remove fobj parameter in train and cv
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* refactor: objective through params in sklearn API
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* custom objective function in advanced_example
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* fix whitespackes lint
* objective is none not a particular case for predict method
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* replace scipy.expit with custom implementation
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* test: set num_boost_round value to 20
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* fix: custom objective default_value is none
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* refactor: remove self._fobj
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* custom_objective default value is None
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* refactor: variables name reference dummy_obj
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* linter errors
* fix: process objective parameter when calling predict
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* linter errors
* fix: objective is None during predict call
Signed-off-by: Miguel Trejo <armando.trejo.marrufo@gmail.com>
* [python-package] create Dataset from sampled data.
* [python-package] create Dataset from List[Sequence].
1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory
* [python-package] example: create Dataset from multiple HDF5 file.
* fix: revert is_class implementation for seq
* fix: unwanted memory view reference for seq
* fix: seq is_class accepts sklearn matrices
* fix: requirements for example
* fix: pycode
* feat: print static code linting stage
* fix: linting: avoid shell str regex conversion
* code style: doc style
* code style: isort
* fix ci dependency: h5py on windows
* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
* docs(python): init_from_sample summary
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
* remove dataset dump sample data debugging code.
* remove typo fix.
Create separate PR for this.
* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* style(linting): py3 type hint for seq
* test(basic): os.path style path handling
* Revert "feat: print static code linting stage"
This reverts commit 10bd79f7f8.
* feat(python): sequence on validation set
* minor(python): comment
* minor(python): test option hint
* style(python): fix code linting
* style(python): add pydoc for ref_dataset
* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* revert(python): sequence class abc
* chore(python): remove rm_files
* Remove useless static_assert.
* refactor: test_basic test for sequence.
* fix lint complaint.
* remove dataset._dump_text in sequence test.
* Fix reverting typo fix.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Fix type hint, code and doc style.
* fix failing test_basic.
* Remove TODO about keep constant in sync with cpp.
* Install h5py only when running python-examples.
* Fix lint complaint.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Doc fixes, remove unused params_str in __init_from_seqs.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Remove unnecessary conda install in windows ci script.
* Keep param as example in dataset_from_multi_hdf5.py
* Add _get_sample_count function to remove code duplication.
* Use batch_size parameter in generate_hdf.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Fix after applying suggestions.
* Fix test, check idx is instance of numbers.Integral.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Expose Sequence class in Python-API doc.
* Handle Sequence object not having batch_size.
* Fix isort lint complaint.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update docstring to mention Sequence as data input.
* Remove get_one_line in test_basic.py
* Make Sequence an abstract class.
* Reduce number of tests for test_sequence.
* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
* empty commit to trigger ci
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
Also rename total_nrow to num_total_row in c_api.h for consistency.
* Doc about Sequence in docs/Python-Intro.rst.
* Fix: basic.py change LGBM_SampleIndices out_len to int32.
* Add create_valid test case with Dataset from Sequence.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Improved the syntax of the fstrings
* Improved the strings to fstrings
* Reverted back the white space.
* Update examples/python-guide/advanced_example.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Correct spelling
Most changes were in comments, and there were a few changes to literals for log output.
There were no changes to variable names, function names, IDs, or functionality.
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Correct spelling
Most are code comments, but one case is a literal in a logging message.
There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Add Eigen library.
* Working for simple test.
* Apply changes to config params.
* Handle nan data.
* Update docs.
* Add test.
* Only load raw data if boosting=gbdt_linear
* Remove unneeded code.
* Minor updates.
* Update to work with sk-learn interface.
* Update to work with chunked datasets.
* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.
* Save raw data in binary dataset file.
* Update docs and fix parameter checking.
* Fix dataset loading.
* Add test for regularization.
* Fix bugs when saving and loading tree.
* Add test for load/save linear model.
* Remove unneeded code.
* Fix case where not enough leaf data for linear model.
* Simplify code.
* Speed up code.
* Speed up code.
* Simplify code.
* Speed up code.
* Fix bugs.
* Working version.
* Store feature data column-wise (not fully working yet).
* Fix bugs.
* Speed up.
* Speed up.
* Remove unneeded code.
* Small speedup.
* Speed up.
* Minor updates.
* Remove unneeded code.
* Fix bug.
* Fix bug.
* Speed up.
* Speed up.
* Simplify code.
* Remove unneeded code.
* Fix bug, add more tests.
* Fix bug and add test.
* Only store numerical features
* Fix bug and speed up using templates.
* Speed up prediction.
* Fix bug with regularisation
* Visual studio files.
* Working version
* Only check nans if necessary
* Store coeff matrix as an array.
* Align cache lines
* Align cache lines
* Preallocation coefficient calculation matrices
* Small speedups
* Small speedup
* Reverse cache alignment changes
* Change to dynamic schedule
* Update docs.
* Refactor so that linear tree learner is not a separate class.
* Add refit capability.
* Speed up
* Small speedups.
* Speed up add prediction to score.
* Fix bug
* Fix bug and speed up.
* Speed up dataload.
* Speed up dataload
* Use vectors instead of pointers
* Fix bug
* Add OMP exception handling.
* Change return type of LGBM_BoosterGetLinear to bool
* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change
* Remove unused internal_parent_ property of tree
* Remove unused parameter to CreateTreeLearner
* Remove reference to LinearTreeLearner
* Minor style issues
* Remove unneeded check
* Reverse temporary testing change
* Fix Visual Studio project files
* Restore LightGBM.vcxproj.filters
* Speed up
* Speed up
* Simplify code
* Update docs
* Simplify code
* Initialise storage space for max num threads
* Move Eigen to include directory and delete unused files
* Remove old files.
* Fix so it compiles with mingw
* Fix gpu tree learner
* Change AddPredictionToScore back to const
* Fix python lint error
* Fix C++ lint errors
* Change eigen to a submodule
* Update comment
* Add the eigen folder
* Try to fix build issues with eigen
* Remove eigen files
* Add eigen as submodule
* Fix include paths
* Exclude eigen files from Python linter
* Ignore eigen folders for pydocstyle
* Fix C++ linting errors
* Fix docs
* Fix docs
* Exclude eigen directories from doxygen
* Update manifest to include eigen
* Update build_r to include eigen files
* Fix compiler warnings
* Store raw feature data as float
* Use float for calculating linear coefficients
* Remove eigen directory from GLOB
* Don't compile linear model code when building R package
* Fix doxygen issue
* Fix lint issue
* Fix lint issue
* Remove uneeded code
* Restore delected lines
* Restore delected lines
* Change return type of has_raw to bool
* Update docs
* Rename some variables and functions for readability
* Make tree_learner parameter const in AddScore
* Fix style issues
* Pass vectors as const reference when setting tree properties
* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const
* Remove get_raw_size, use num_numeric_features instead
* Fix typo
* Make contains_nan_ and any_nan_ properties immutable again
* Remove data_has_nan_ property of tree
* Remove temporary test code
* Make linear_tree a dataset param
* Fix lint error
* Make LinearTreeLearner a separate class
* Fix lint errors
* Fix lint error
* Add linear_tree_learner.o
* Simulate omp_get_max_threads if openmp is not available
* Update PushOneData to also store raw data.
* Cast size to int
* Fix bug in ReshapeRaw
* Speed up code with multithreading
* Use OMP_NUM_THREADS
* Speed up with multithreading
* Update to use ArrayToString
* Fix tests
* Fix test
* Fix bug introduced in merge
* Minor updates
* Update docs
* Revert "specify the last supported version of scikit-learn (#2637)"
This reverts commit d100277649.
* ban scikit-learn 0.22.0 and skip broken test
* fix updated test
* fix lint test
* Revert "fix lint test"
This reverts commit 8b4db0805f.
* Implementation of XE_NDCG loss function for ranking.
* Add citation
* Check in example usage for xe_ndcg loss.
* Seed the generator when a seed is provided in the config. Add unit-tests for xe_ndcg
* Update documentation
* Fix indentation
* Address issues raised by reviewers.
* Clean up include statements.
* Fix issues raised by reviewers.
* Regenerate parameters.rst
* Add a note to explain that reproducing xe_ndcg results requires num_threads to be one.
* Introduce objective_seed and use that in rank_xendcg instead of directly using seed
* Change default value of objective_seed
* Fix bug where small values of max_bin cause crash.
* Revert "Fix bug where small values of max_bin cause crash."
This reverts commit fe5c8e2547.
* Add auc-mu multiclass metric.
* Fix bug where scores are equal.
* Merge.
* Change name to auc_mu everywhere (instead of auc-mu).
* Fix comparison between signed and unsigned int.
* Change name to AUC-mu in docs and output messages.
* Improve test.
* Use prefix increment.
* Update R package.
* Fix style issues.
* Tidy up test code.
* Read all lines first then process.
* Allow passing AUC-mu weights directly as a list in parameters.
* Remove unused code, improve example and docs.
- Add reference to documentation on the query data format.
- Refer readers to the official install instructions.
- Change command to use relative path to the `lightgbm` binary built at
the project's root when following build instructions.