* add quantized training (first stage)
* add histogram construction functions for integer gradients
* add stochastic rounding
* update docs
* fix compilation errors by adding template instantiations
* update files for compilation
* fix compilation of gpu version
* initialize gradient discretizer before share states
* add a test case for quantized training
* add quantized training for data distributed training
* Delete origin.pred
* Delete ifelse.pred
* Delete LightGBM_model.txt
* remove useless changes
* fix lint error
* remove debug loggings
* fix mismatch of vector and allocator types
* remove changes in main.cpp
* fix bugs with uninitialized gradient discretizer
* initialize ordered gradients in gradient discretizer
* disable quantized training with gpu and cuda
fix msvc compilation errors and warnings
* fix bug in data parallel tree learner
* make quantized training test deterministic
* make quantized training in test case more accurate
* refactor test_quantized_training
* fix leaf splits initialization with quantized training
* check distributed quantized training result
* add parameter data_sample_strategy
* abstract GOSS as a sample strategy(GOSS1), togetherwith origial GOSS (Normal Bagging has not been abstracted, so do NOT use it now)
* abstract Bagging as a subclass (BAGGING), but original Bagging members in GBDT are still kept
* fix some variables
* remove GOSS(as boost) and Bagging logic in GBDT
* rename GOSS1 to GOSS(as sample strategy)
* add warning about use GOSS as boosting_type
* a little ; bug
* remove CHECK when "gradients != nullptr"
* rename DataSampleStrategy to avoid confusion
* remove and add some ccomments, followingconvention
* fix bug about GBDT::ResetConfig (ObjectiveFunction inconsistencty bet…
* add std::ignore to avoid compiler warnings (anpotential fails)
* update Makevars and vcxproj
* handle constant hessian
move resize of gradient vectors out of sample strategy
* mark override for IsHessianChange
* fix lint errors
* rerun parameter_generator.py
* update config_auto.cpp
* delete redundant blank line
* update num_data_ when train_data_ is updated
set gradients and hessians when GOSS
* check bagging_freq is not zero
* reset config_ value
merge ResetBaggingConfig and ResetGOSS
* remove useless check
* add ttests in test_engine.py
* remove whitespace in blank line
* remove arguments verbose_eval and evals_result
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update src/boosting/sample_strategy.cpp
modify warning about setting goss as `boosting_type`
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Update tests/python_package_test/test_engine.py
replace load_boston() with make_regression()
remove value checks of mean_squared_error in test_sample_strategy_with_boosting()
* Update tests/python_package_test/test_engine.py
add value checks of mean_squared_error in test_sample_strategy_with_boosting()
* Modify warnning about using goss as boosting type
* Update tests/python_package_test/test_engine.py
add random_state=42 for make_regression()
reduce the threshold of mean_square_error
* Update src/boosting/sample_strategy.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* remove goss from boosting types in documentation
* Update src/boosting/bagging.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/boosting/bagging.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/boosting/goss.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update src/boosting/goss.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* rename GOSS with GOSSStrategy
* update doc
* address comments
* fix table in doc
* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* update documentation
* update test case
* revert useless change in test_engine.py
* add tests for evaluation results in test_sample_strategy_with_boosting
* include <string>
* change to assert_allclose in test_goss_boosting_and_strategy_equivalent
* more tolerance in result checking, due to minor difference in results of gpu versions
* change == to np.testing.assert_allclose
* fix test case
* set gpu_use_dp to true
* change --report to --report-level for rstcheck
* use gpu_use_dp=true in test_goss_boosting_and_strategy_equivalent
* revert unexpected changes of non-ascii characters
* revert unexpected changes of non-ascii characters
* remove useless changes
* allocate gradients_pointer_ and hessians_pointer when necessary
* add spaces
* remove redundant virtual
* include <LightGBM/utils/log.h> for USE_CUDA
* check for in test_goss_boosting_and_strategy_equivalent
* check for identity in test_sample_strategy_with_boosting
* remove cuda option in test_sample_strategy_with_boosting
* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update tests/python_package_test/test_engine.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* ResetGradientBuffers after ResetSampleConfig
* ResetGradientBuffers after ResetSampleConfig
* ResetGradientBuffers after bagging
* remove useless code
* check objective_function_ instead of gradients
* enable rf with goss
simplify params in test cases
* remove useless changes
* allow rf with feature subsampling alone
* change position of ResetGradientBuffers
* check for dask
* add parameter types for data_sample_strategy
Co-authored-by: Guangda Liu <v-guangdaliu@microsoft.com>
Co-authored-by: Yu Shi <shiyu_k1994@qq.com>
Co-authored-by: GuangdaLiu <90019144+GuangdaLiu@users.noreply.github.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Run OpenCL tests against POCL instead of the AMD App SDK
* Update .ci/setup.sh
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Run Linux gpu source on default Python version
* [docs] Update GPU Targets Table
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* new cuda framework
* add histogram construction kernel
* before removing multi-gpu
* new cuda framework
* tree learner cuda kernels
* single tree framework ready
* single tree training framework
* remove comments
* boosting with cuda
* optimize for best split find
* data split
* move boosting into cuda
* parallel synchronize best split point
* merge split data kernels
* before code refactor
* use tasks instead of features as units for split finding
* refactor cuda best split finder
* fix configuration error with small leaves in data split
* skip histogram construction of too small leaf
* skip split finding of invalid leaves
stop when no leaf to split
* support row wise with CUDA
* copy data for split by column
* copy data from host to CPU by column for data partition
* add synchronize best splits for one leaf from multiple blocks
* partition dense row data
* fix sync best split from task blocks
* add support for sparse row wise for CUDA
* remove useless code
* add l2 regression objective
* sparse multi value bin enabled for CUDA
* fix cuda ranking objective
* support for number of items <= 2048 per query
* speedup histogram construction by interleaving global memory access
* split optimization
* add cuda tree predictor
* remove comma
* refactor objective and score updater
* before use struct
* use structure for split information
* use structure for leaf splits
* return CUDASplitInfo directly after finding best split
* split with CUDATree directly
* use cuda row data in cuda histogram constructor
* clean src/treelearner/cuda
* gather shared cuda device functions
* put shared CUDA functions into header file
* change smaller leaf from <= back to < for consistent result with CPU
* add tree predictor
* remove useless cuda_tree_predictor
* predict on CUDA with pipeline
* add global sort algorithms
* add global argsort for queries with many items in ranking tasks
* remove limitation of maximum number of items per query in ranking
* add cuda metrics
* fix CUDA AUC
* remove debug code
* add regression metrics
* remove useless file
* don't use mask in shuffle reduce
* add more regression objectives
* fix cuda mape loss
add cuda xentropy loss
* use template for different versions of BitonicArgSortDevice
* add multiclass metrics
* add ndcg metric
* fix cross entropy objectives and metrics
* fix cross entropy and ndcg metrics
* add support for customized objective in CUDA
* complete multiclass ova for CUDA
* separate cuda tree learner
* use shuffle based prefix sum
* clean up cuda_algorithms.hpp
* add copy subset on CUDA
* add bagging for CUDA
* clean up code
* copy gradients from host to device
* support bagging without using subset
* add support of bagging with subset for CUDAColumnData
* add support of bagging with subset for dense CUDARowData
* refactor copy sparse subrow
* use copy subset for column subset
* add reset train data and reset config for CUDA tree learner
add deconstructors for cuda tree learner
* add USE_CUDA ifdef to cuda tree learner files
* check that dataset doesn't contain CUDA tree learner
* remove printf debug information
* use full new cuda tree learner only when using single GPU
* disable all CUDA code when using CPU version
* recover main.cpp
* add cpp files for multi value bins
* update LightGBM.vcxproj
* update LightGBM.vcxproj
fix lint errors
* fix lint errors
* fix lint errors
* update Makevars
fix lint errors
* fix the case with 0 feature and 0 bin
fix split finding for invalid leaves
create cuda column data when loaded from bin file
* fix lint errors
hide GetRowWiseData when cuda is not used
* recover default device type to cpu
* fix na_as_missing case
fix cuda feature meta information
* fix UpdateDataIndexToLeafIndexKernel
* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
* add refit by tree for cuda tree learner
* fix test_refit in test_engine.py
* create set of large bin partitions in CUDARowData
* add histogram construction for columns with a large number of bins
* add find best split for categorical features on CUDA
* add bitvectors for categorical split
* cuda data partition split for categorical features
* fix split tree with categorical feature
* fix categorical feature splits
* refactor cuda_data_partition.cu with multi-level templates
* refactor CUDABestSplitFinder by grouping task information into struct
* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
* fix misuse of reference
* remove useless changes
* add support for path smoothing
* virtual destructor for LightGBM::Tree
* fix overlapped cat threshold in best split infos
* reset histogram pointers in data partition and spllit finder in ResetConfig
* comment useless parameter
* fix reverse case when na is missing and default bin is zero
* fix mfb_is_na and mfb_is_zero and is_single_feature_column
* remove debug log
* fix cat_l2 when one-hot
fix gradient copy when data subset is used
* switch shared histogram size according to CUDA version
* gpu_use_dp=true when cuda test
* revert modification in config.h
* fix setting of gpu_use_dp=true in .ci/test.sh
* fix linter errors
* fix linter error
remove useless change
* recover main.cpp
* separate cuda_exp and cuda
* fix ci bash scripts
add description for cuda_exp
* add USE_CUDA_EXP flag
* switch off USE_CUDA_EXP
* revert changes in python-packages
* more careful separation for USE_CUDA_EXP
* fix CUDARowData::DivideCUDAFeatureGroups
fix set fields for cuda metadata
* revert config.h
* fix test settings for cuda experimental version
* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version
* fix lint issue by adding a blank line
* fix lint errors by resorting imports
* fix lint errors by resorting imports
* fix lint errors by resorting imports
* merge cuda.yml and cuda_exp.yml
* update python version in cuda.yml
* remove cuda_exp.yml
* remove unrelated changes
* fix compilation warnings
fix cuda exp ci task name
* recover task
* use multi-level template in histogram construction
check split only in debug mode
* ignore NVCC related lines in parameter_generator.py
* update job name for CUDA tests
* apply review suggestions
* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* update header
* remove useless TODOs
* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062
* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only
* fix include order
* fix include order
* remove extra space
* address review comments
* add warning when cuda_exp is used together with deterministic
* add comment about gpu_use_dp in .ci/test.sh
* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* docs: weight parameter non-negative
* docs: weights non negative only for train data
* docs: weights should be non negative for validation data
* typo in html render
* docs: brief weights non-negative description
* clarify that categoricals will be converted to ints and not that they should be ints in the input data
* update remaining sections
* update config.h
* add suggestions
* [python] add type hints in docs/conf.py
* more specific hint for sphinx app
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* [docs] [R-package] use CRAN-style builds when building pkgdown site
* install with --with-keep.source
* empty commit
* set new_proccess = FALSE to get a better traceback
* copy pkgdown config
* [python-package] create Dataset from sampled data.
* [python-package] create Dataset from List[Sequence].
1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory
* [python-package] example: create Dataset from multiple HDF5 file.
* fix: revert is_class implementation for seq
* fix: unwanted memory view reference for seq
* fix: seq is_class accepts sklearn matrices
* fix: requirements for example
* fix: pycode
* feat: print static code linting stage
* fix: linting: avoid shell str regex conversion
* code style: doc style
* code style: isort
* fix ci dependency: h5py on windows
* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
* docs(python): init_from_sample summary
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
* remove dataset dump sample data debugging code.
* remove typo fix.
Create separate PR for this.
* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* style(linting): py3 type hint for seq
* test(basic): os.path style path handling
* Revert "feat: print static code linting stage"
This reverts commit 10bd79f7f8.
* feat(python): sequence on validation set
* minor(python): comment
* minor(python): test option hint
* style(python): fix code linting
* style(python): add pydoc for ref_dataset
* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* revert(python): sequence class abc
* chore(python): remove rm_files
* Remove useless static_assert.
* refactor: test_basic test for sequence.
* fix lint complaint.
* remove dataset._dump_text in sequence test.
* Fix reverting typo fix.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Fix type hint, code and doc style.
* fix failing test_basic.
* Remove TODO about keep constant in sync with cpp.
* Install h5py only when running python-examples.
* Fix lint complaint.
* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Doc fixes, remove unused params_str in __init_from_seqs.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Remove unnecessary conda install in windows ci script.
* Keep param as example in dataset_from_multi_hdf5.py
* Add _get_sample_count function to remove code duplication.
* Use batch_size parameter in generate_hdf.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Fix after applying suggestions.
* Fix test, check idx is instance of numbers.Integral.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Expose Sequence class in Python-API doc.
* Handle Sequence object not having batch_size.
* Fix isort lint complaint.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update docstring to mention Sequence as data input.
* Remove get_one_line in test_basic.py
* Make Sequence an abstract class.
* Reduce number of tests for test_sequence.
* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
* empty commit to trigger ci
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
Also rename total_nrow to num_total_row in c_api.h for consistency.
* Doc about Sequence in docs/Python-Intro.rst.
* Fix: basic.py change LGBM_SampleIndices out_len to int32.
* Add create_valid test case with Dataset from Sequence.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* New build option: USE_PRECISE_TEXT_PARSER.
Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.
* Add benchmark for CSVParser with Atof and AtofPrecise.
* Fix lint complaint.
* Fix typo in open result error message.
* Revert "Fix lint complaint."
This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.
* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."
This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.
* Use AtofPrecise in Common::__StringToTHelper.
* [option] precise_float_parser: precise float number parsing for text input.
* Remove USE_PRECISE_TEXT_PARSER compile option.
* test: add test for Common::AtofPrecise.
* test: remove ChunkedArrayTest with 0 length.
This triggers Log::Fatal which aborts the test program.
* fix lint, add copyright.
* Revert "test: remove ChunkedArrayTest with 0 length."
This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.
* Use LightGBM::Common::Sign
* save precise_float_parser in model file.
* Fix error checking in AtofPrecise. Add more test cases.
* Remove test case that can't pass under macOS.
* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Correct spelling
Most changes were in comments, and there were a few changes to literals for log output.
There were no changes to variable names, function names, IDs, or functionality.
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* Correct spelling
Most are code comments, but one case is a literal in a logging message.
There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* run cpp tests at CI
* Update docs/Installation-Guide.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>