Граф коммитов

68 Коммитов

Автор SHA1 Сообщение Дата
Nikita Titov f8230aeaa7
[ci] set `MinimumVisualStudioVersion` to MSVS 2015 (#6689) 2024-10-24 22:44:41 -04:00
Nikita Titov 99e27b8190
[ci] fix build of LightGBM with MSVS GUI (#6690)
Update LightGBM.vcxproj
2024-10-21 18:34:51 +03:00
James Lamb 631e0a2a7b
[ci] prevent trailing whitespace, ensure files end with newline (#6373) 2024-03-18 23:24:14 -05:00
shiyu1994 776c5c3c49
[c++][fix] Support Quantized Training with Categorical Features on CPU (#6301)
* support quantized training with categorical features on cpu

* remove white spaces

* add tests for quantized training with categorical features

* skip tests for cuda version

* fix cases when only 1 data block in row-wise quantized histogram construction with 8 inner bits

* remove useless capture

* fix compilation warnings

revert useless changes

* revert useless change

* separate functions in feature histogram into cpp file

* add feature_histogram.o in Makevars
2024-02-23 13:41:21 +08:00
James Lamb 5516533c63
[c++] include OpenMP-control files in MSBuild solution file (fixes #6238) (#6251) 2024-01-09 10:18:44 -06:00
James Lamb d2121aa34d
update MSBuild solution to Windows SDK v10.0, add inet_pton define (fixes #5856) (#5884) 2023-05-31 15:13:49 -05:00
shiyu1994 17ecfab335
Add quantized training (CPU part) (#5800)
* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result
2023-05-05 16:41:48 +08:00
Scott Votaw 0f7983b6c3
feature: Add serialization of reference dataset (#5427)
* Add serialization of reference dataset

* lint and missing file

* Fixes from reviewers

* responded to comments

* revert sdk change
2023-02-14 14:52:50 +08:00
Yifei Liu fffd066cb3
Decouple Boosting Types (fixes #3128) (#4827)
* add parameter data_sample_strategy

* abstract GOSS as a sample strategy(GOSS1), togetherwith origial GOSS (Normal Bagging has not been abstracted, so do NOT use it now)

* abstract Bagging as a subclass (BAGGING), but original Bagging members in GBDT are still kept

* fix some variables

* remove GOSS(as boost) and Bagging logic in GBDT

* rename GOSS1 to GOSS(as sample strategy)

* add warning about use GOSS as boosting_type

* a little ; bug

* remove CHECK when "gradients != nullptr"

* rename DataSampleStrategy to avoid confusion

* remove and add some ccomments, followingconvention

* fix bug about GBDT::ResetConfig (ObjectiveFunction inconsistencty bet…

* add std::ignore to avoid compiler warnings (anpotential fails)

* update Makevars and vcxproj

* handle constant hessian

move resize of gradient vectors out of sample strategy

* mark override for IsHessianChange

* fix lint errors

* rerun parameter_generator.py

* update config_auto.cpp

* delete redundant blank line

* update num_data_ when train_data_ is updated

set gradients and hessians when GOSS

* check bagging_freq is not zero

* reset config_ value

merge ResetBaggingConfig and ResetGOSS

* remove useless check

* add ttests in test_engine.py

* remove whitespace in blank line

* remove arguments verbose_eval and evals_result

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update src/boosting/sample_strategy.cpp

modify warning about setting goss as `boosting_type`

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

replace load_boston() with make_regression()

remove value checks of mean_squared_error in test_sample_strategy_with_boosting()

* Update tests/python_package_test/test_engine.py

add value checks of mean_squared_error in test_sample_strategy_with_boosting()

* Modify warnning about using goss as boosting type

* Update tests/python_package_test/test_engine.py

add random_state=42 for make_regression()

reduce the threshold of mean_square_error

* Update src/boosting/sample_strategy.cpp

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* remove goss from boosting types in documentation

* Update src/boosting/bagging.hpp

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/bagging.hpp

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/goss.hpp

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/goss.hpp

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* rename GOSS with GOSSStrategy

* update doc

* address comments

* fix table in doc

* Update include/LightGBM/config.h

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update documentation

* update test case

* revert useless change in test_engine.py

* add tests for evaluation results in test_sample_strategy_with_boosting

* include <string>

* change to assert_allclose in test_goss_boosting_and_strategy_equivalent

* more tolerance in result checking, due to minor difference in results of gpu versions

* change == to np.testing.assert_allclose

* fix test case

* set gpu_use_dp to true

* change --report to --report-level for rstcheck

* use gpu_use_dp=true in test_goss_boosting_and_strategy_equivalent

* revert unexpected changes of non-ascii characters

* revert unexpected changes of non-ascii characters

* remove useless changes

* allocate gradients_pointer_ and hessians_pointer when necessary

* add spaces

* remove redundant virtual

* include <LightGBM/utils/log.h> for USE_CUDA

* check for  in test_goss_boosting_and_strategy_equivalent

* check for identity in test_sample_strategy_with_boosting

* remove cuda  option in test_sample_strategy_with_boosting

* Update tests/python_package_test/test_engine.py

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py

Co-authored-by: James Lamb <jaylamb20@gmail.com>

* ResetGradientBuffers after ResetSampleConfig

* ResetGradientBuffers after ResetSampleConfig

* ResetGradientBuffers after bagging

* remove useless code

* check objective_function_ instead of gradients

* enable rf with goss

simplify params in test cases

* remove useless changes

* allow rf with feature subsampling alone

* change position of ResetGradientBuffers

* check for dask

* add parameter types for data_sample_strategy

Co-authored-by: Guangda Liu <v-guangdaliu@microsoft.com>
Co-authored-by: Yu Shi <shiyu_k1994@qq.com>
Co-authored-by: GuangdaLiu <90019144+GuangdaLiu@users.noreply.github.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-12-28 14:09:11 +08:00
shiyu1994 9489f878b3
Add default definition for GetColWiseData and GetColWiseData (#5413)
* add default definition for GetColWiseData and GetColWiseData

* fix warnings of template instantiation

* remove files in Makevars and LightGBM.vcxproj
2022-08-16 14:15:49 +08:00
Belinda Trotta 44d37184d1
Use double precision in threaded calculation of linear tree coefficients (fixes #5226) (#5368) 2022-07-29 11:09:08 -05:00
shiyu1994 6b56a90cd1
[CUDA] New CUDA version Part 1 (#4630)
* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers

Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-03-23 10:39:23 +08:00
Nikita Titov 6bb6164e3c
Move compute and eigen libraries to external_libs folder (#3809)
* move all submodules to external_libs folder

* Update .Rbuildignore

* Update MANIFEST.in

* Update .appveyor.yml

* Update CMakeLists.txt

* Update build_r.R

* Update test.sh

* Update setup.py

* Update CMakeLists.txt

* Update test.sh

* Update setup.py

* Update conf.py

* Update MANIFEST.in

* Update LightGBM.vcxproj

* continue

* test

* test

* Update setup.py

* hotfix

* revert CI tests
2021-01-22 17:45:43 +03:00
James Lamb 6cb968af2e
[python-package] remove unused Eigen files, compile with EIGEN_MPL2_ONLY (fixes #3684) (#3685)
* [python-package] remove unused Eigen files (fixes #3684)

* more changes

* add EIGEN_MPL2_ONLY in VS solution file

* fix VS project

* remove EIGEN_MPL2_ONLY define in linear_tree_learner

Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
2020-12-30 02:16:34 +03:00
Belinda Trotta fcfd4132e6
Trees with linear models at leaves (#3299)
* Add Eigen library.

* Working for simple test.

* Apply changes to config params.

* Handle nan data.

* Update docs.

* Add test.

* Only load raw data if boosting=gbdt_linear

* Remove unneeded code.

* Minor updates.

* Update to work with sk-learn interface.

* Update to work with chunked datasets.

* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.

* Save raw data in binary dataset file.

* Update docs and fix parameter checking.

* Fix dataset loading.

* Add test for regularization.

* Fix bugs when saving and loading tree.

* Add test for load/save linear model.

* Remove unneeded code.

* Fix case where not enough leaf data for linear model.

* Simplify code.

* Speed up code.

* Speed up code.

* Simplify code.

* Speed up code.

* Fix bugs.

* Working version.

* Store feature data column-wise (not fully working yet).

* Fix bugs.

* Speed up.

* Speed up.

* Remove unneeded code.

* Small speedup.

* Speed up.

* Minor updates.

* Remove unneeded code.

* Fix bug.

* Fix bug.

* Speed up.

* Speed up.

* Simplify code.

* Remove unneeded code.

* Fix bug, add more tests.

* Fix bug and add test.

* Only store numerical features

* Fix bug and speed up using templates.

* Speed up prediction.

* Fix bug with regularisation

* Visual studio files.

* Working version

* Only check nans if necessary

* Store coeff matrix as an array.

* Align cache lines

* Align cache lines

* Preallocation coefficient calculation matrices

* Small speedups

* Small speedup

* Reverse cache alignment changes

* Change to dynamic schedule

* Update docs.

* Refactor so that linear tree learner is not a separate class.

* Add refit capability.

* Speed up

* Small speedups.

* Speed up add prediction to score.

* Fix bug

* Fix bug and speed up.

* Speed up dataload.

* Speed up dataload

* Use vectors instead of pointers

* Fix bug

* Add OMP exception handling.

* Change return type of LGBM_BoosterGetLinear to bool

* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change

* Remove unused internal_parent_ property of tree

* Remove unused parameter to CreateTreeLearner

* Remove reference to LinearTreeLearner

* Minor style issues

* Remove unneeded check

* Reverse temporary testing change

* Fix Visual Studio project files

* Restore LightGBM.vcxproj.filters

* Speed up

* Speed up

* Simplify code

* Update docs

* Simplify code

* Initialise storage space for max num threads

* Move Eigen to include directory and delete unused files

* Remove old files.

* Fix so it compiles with mingw

* Fix gpu tree learner

* Change AddPredictionToScore back to const

* Fix python lint error

* Fix C++ lint errors

* Change eigen to a submodule

* Update comment

* Add the eigen folder

* Try to fix build issues with eigen

* Remove eigen files

* Add eigen as submodule

* Fix include paths

* Exclude eigen files from Python linter

* Ignore eigen folders for pydocstyle

* Fix C++ linting errors

* Fix docs

* Fix docs

* Exclude eigen directories from doxygen

* Update manifest to include eigen

* Update build_r to include eigen files

* Fix compiler warnings

* Store raw feature data as float

* Use float for calculating linear coefficients

* Remove eigen directory from GLOB

* Don't compile linear model code when building R package

* Fix doxygen issue

* Fix lint issue

* Fix lint issue

* Remove uneeded code

* Restore delected lines

* Restore delected lines

* Change return type of has_raw to bool

* Update docs

* Rename some variables and functions for readability

* Make tree_learner parameter const in AddScore

* Fix style issues

* Pass vectors as const reference when setting tree properties

* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const

* Remove get_raw_size, use num_numeric_features instead

* Fix typo

* Make contains_nan_ and any_nan_ properties immutable again

* Remove data_has_nan_ property of tree

* Remove temporary test code

* Make linear_tree a dataset param

* Fix lint error

* Make LinearTreeLearner a separate class

* Fix lint errors

* Fix lint error

* Add linear_tree_learner.o

* Simulate omp_get_max_threads if openmp is not available

* Update PushOneData to also store raw data.

* Cast size to int

* Fix bug in ReshapeRaw

* Speed up code with multithreading

* Use OMP_NUM_THREADS

* Speed up with multithreading

* Update to use ArrayToString

* Fix tests

* Fix test

* Fix bug introduced in merge

* Minor updates

* Update docs
2020-12-24 14:01:23 +08:00
shiyu1994 0655d67cc1
Optimization of row-wise histogram construction (#3522)
* store without offset in multi_val_dense_bin

* fix offset bug

* add comment for offset

* add comment for bin type selection

* faster operations for offset

* keep most freq bin in histogram for multi val dense

* use original feature iterators

* consider 9 cases (3 x 3) for multi val bin construction

* fix dense bin setting

* fix bin data in multi val group

* fix offset of the first feature histogram

* use float hist buf

* avx in histogram construction

* use avx for hist construction without prefetch

* vectorize bin extraction

* use only 128 vec

* use avx2

* use vectorization for sparse row wise

* add bit size for multi val dense bin

* float with no vectorization

* change multithreading strategy to dynamic

* remove intrinsic header

* fix dense multi val col copy

* remove bit size

* use large enough block size when the bin number is large

* calc min block size by sparsity

* rescale gradients

* rollback gradients scaling

* single precision histogram buffer as an option

* add float hist buffer with thread buffer

* fix setting zero in hist data

* fix hist begin pointer in tree learners

* remove debug logs

* remove omp simd

* update Makevars of R-package

* fix feature group binary storing

* two row wise for double hist buffer

* add subfeature for two row wise

* remove useless code and fix two row wise

* refactor code

* grouping the dense feature groups can get sparse multi val bin

* clean format problems

* one thread for two blocks in sep row wise

* use ordered gradients for sep row wise

* fix grad ptr

* ordered grad with combined block for sep row wise

* fix block threading

* use the same min block size

* rollback share min block size

* remove logs

* Update src/io/dataset.cpp

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* fix parameter description

* remove sep_row_wise

* remove check codes

* add check for empty multi val bin

* fix lint error

* rollback changes in config.h

* Apply suggestions from code review

Co-authored-by: Ubuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2020-11-13 23:26:38 +08:00
James Lamb 81d761133f
[ci] [R-package] Fix memory leaks found by valgrind (#3443)
* fix int64 write error

* attempt

* [WIP] [ci] [R-package] Add CI job that runs valgrind tests

* update all-successful

* install

* executable

* fix redirect stuff

* Apply suggestions from code review

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* more flags

* add mc to msvc proj

* fix memory leak in mc

* Update monotone_constraints.hpp

* Update r_package.yml

* remove R_INT64_PTR

* disable openmp

* Update gbdt_model_text.cpp

* Update gbdt_model_text.cpp

* Apply suggestions from code review

* try to free vector

* free more memories.

* Update src/boosting/gbdt_model_text.cpp

* fix using

* try the UNPROTECT(1);

* fix a const pointer

* fix Common

* reduce UNPROTECT

* remove UNPROTECT(1);

* fix null handle

* fix predictor

* use NULL after free

* fix a leaking in test

* try more fixes

* test the effect of tests

* throw exception in Fatal

* add test back

* Apply suggestions from code review

* commet some tests

* Apply suggestions from code review

* Apply suggestions from code review

* trying to comment out tests

* Update openmp_wrapper.h

* Apply suggestions from code review

* Update configure

* Update configure.ac

* trying to uncomment

* more comments

* more uncommenting

* more uncommenting

* fix comment

* more uncommenting

* uncomment fully-commented out stuff

* try uncommenting more dataset tests

* uncommenting more tests

* ok getting closer

* more uncommenting

* free dataset

* skipping a test, more uncommenting

* more skipping

* re-enable OpenMP

* allow on OpenMP thing

* move valgrind to comment-only job

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* changes from code review

* Apply suggestions from code review

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* linting

* issue comments too

* remove issue_comment

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2020-10-17 23:04:14 -05:00
Guolin Ke 69a2691a59
fix vc project (#3304) 2020-08-13 15:30:56 +03:00
Joan Fontanals 1c35c3b9ed
Change locking strategy of Booster, allow for share and unique locks (#2760)
* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Put everything possible as const

* Include shared_mutex, for now as unique_lock

* Update test values

* Put everything possible as const

* Include shared_mutex, for now as unique_lock

* Make PredictSingleRow const and share the lock with other reading threads

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Remove TODO

* Add preprocessing option to compile with c++17

* Update python-package/setup.py

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remove unwanted changes

* Move option

* Fix problems introduced by conflict fix

* Avoid switching to c++17 and use yamc mutex library to access shared lock functionality

* Add extra yamc include

* Change header order

* some lint fix

* change include order and remove some extra blank lines

* Further fix lint issues

* Update c_api.cpp

* Further fix lint issues

* Move yamc include files to a new yamc folder

* Use standard unique_lock

* Update windows/LightGBM.vcxproj

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update windows/LightGBM.vcxproj.filters

Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update windows/LightGBM.vcxproj.filters

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update windows/LightGBM.vcxproj.filters

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update windows/LightGBM.vcxproj.filters

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix problems coming from merge conflict resolution

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: joanfontanals <jfontanals@ntent.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
2020-07-19 03:44:14 +03:00
James Lamb 84bdf25798
[R-package] adding routine registration in R package (fixes #1910) (#2911) 2020-04-01 11:18:52 -05:00
Nikita Titov e32d34ab6e
[ci] fix VS project files (#2924)
* Update LightGBM.vcxproj

* Update LightGBM.vcxproj.filters
2020-03-19 14:38:39 +03:00
Guolin Ke bcad692e26
Speed-up "Split" and some code refactorings (#2883)
* commit

* fix msvc

* fix format
2020-03-08 10:15:38 +08:00
Nikita Titov 532722b916
fixed cpplint errors and VS project files (#2873) 2020-03-06 21:29:47 +08:00
Guolin Ke 77d92b7cde
speed up `FindBestThresholdFromHistogram` (#2867)
* speed up for const hessian

* rename template

* some refactorings

* refine

* refine

* simplify codes

* fix random in feature histogram

* code refine

* refine

* try fix

* make gcc happy

* remove timer

* rollback some changes

* more templates

* fix a bug

* reduce the cost of timer

* fix gpu

* fix bug

* fix gpu
2020-03-05 14:47:30 +08:00
Guolin Ke 8d90bbe314
Debug flags (#2825)
* add debug mode in camke

* add debug dll

* Apply suggestions from code review

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* fix naming

* Apply suggestions from code review

* Apply suggestions from code review

* Update LightGBM.sln

* refine

* run MPI job in debug mode

* document USE_DEBUG and USE_TIMETAG

Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2020-03-01 08:15:25 +08:00
Guolin Ke e676af2366
Code refactoring for ranking objective & Faster ndcg_xendcg (#2801)
* code refactoring

* update vcproject

* refine

* fix test

* Update tests/python_package_test/test_sklearn.py

* fix test
2020-02-26 13:04:56 +08:00
Guolin Ke 84fb5e518a
No LinkTimeCodeGeneration in VS (#2812)
* add Link Time Code Generation

* remove ltcg

* Apply suggestions from code review
2020-02-26 11:54:40 +08:00
Nikita Titov 5de42f8453
removed AVX2 from VS config and reset PlatformToolset values back (#2731) 2020-02-03 13:18:28 +08:00
Guolin Ke 509c2e50c2
Support both row-wise and col-wise multi-threading (#2699)
* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h

Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641.

* slightly refactor data partition

* reduce the memory cost

Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2020-02-02 12:42:17 +08:00
sbruch 86530988a0
Implementation of XE_NDCG_MART for the ranking task (#2620)
* Implementation of XE_NDCG loss function for ranking.

* Add citation

* Check in example usage for xe_ndcg loss.

* Seed the generator when a seed is provided in the config. Add unit-tests for xe_ndcg

* Update documentation

* Fix indentation

* Address issues raised by reviewers.

* Clean up include statements.

* Fix issues raised by reviewers.

* Regenerate parameters.rst

* Add a note to explain that reproducing xe_ndcg results requires num_threads to be one.

* Introduce objective_seed and use that in rank_xendcg instead of directly using seed

* Change default value of objective_seed
2020-01-30 11:14:11 +08:00
Guolin Ke 70fc45b0e7
code refactoring: cost effective gradient boosting (#2407)
* refactoring

* fix style

* fix style

* Update cost_effective_gradient_boosting.hpp

* Update serial_tree_learner.cpp

* Update serial_tree_learner.h

* fix style

* update vc project

* Update cost_effective_gradient_boosting.hpp
2019-09-26 23:35:05 +08:00
Guolin Ke 94fbe5bb9f
[docs] updated Microsoft GitHub URL (#2152)
* fix travis badge

* updated GitHub Microsoft URL
2019-05-08 13:51:28 +08:00
Guolin Ke dc6995742a
Refine config object (#1381)
* [WIP] refine config

* [wip] ready for the auto code generate

* auto generate config codes

* use with to open file

* fix bug

* fix pylint

* fix bug

* fix pylint

* fix bugs.

* tmp for failed test.

* fix tests.

* added nthreads alias

* added new aliases from new config.h

* fixed duplicated alias

* refactored parameter_generator.py

* added new aliases from config.h and removed remaining old names

* fix bugs & some miss alias

* added aliases

* add more descriptions.

* add comment.
2018-05-20 18:21:30 +08:00
Guolin Ke a0659a3ddf fix vs build 2018-04-26 17:50:37 +08:00
Guolin Ke 5dcb4b905c fix vs build 2018-02-27 14:15:11 +08:00
Guolin Ke 1a35083a72 add file to vc project. 2018-01-12 11:48:21 +08:00
wxchan 2b20569e30 fix protobuf on vs build (#1004)
* [optional] support protobuf

* fix windows/LightGBM.vcxproj

* add doc

* fix doc

* fix vs support (#2)

* fix vs support

* fix cmake
2017-10-26 09:16:07 +08:00
Guolin Ke 1ef3d43ecd Revert "[optional] support protobuf (#908)" (#1002)
This reverts commit 53b99854aa.
2017-10-20 21:12:09 +08:00
wxchan 53b99854aa [optional] support protobuf (#908) 2017-10-19 14:49:54 +08:00
Guolin Ke 6d0eae0c7b clean code for Boosting. 2017-08-29 18:39:57 +08:00
Guolin Ke 6c4a9750cf clean code for the split of bins and leaves. 2017-08-20 19:30:01 +08:00
Qiwei Ye cc2cfe6d65 Upgrade VC120 to VC140 for better compatibily of C99 and C++ 2017-07-24 13:35:44 +08:00
Guolin Ke f98d75fc7e Compile R package by custom tool chain. (#584)
* add R's library file to vs project and cmake.

* support using dll built by vs.

* better search for the library file.

* remove mingw related doc .

* update document.

* Let R handle the library compile.

* try fix build from github.

* Update README.md

* cleaner build.

* fix the install problem in linux.

* Update README.md
2017-06-05 20:09:42 +08:00
cbecker 993bbd5f91 Add prediction early stopping (#550)
* Add early stopping for prediction

* Fix GBDT if-else prediction with early stopping

* Small C++ embelishments to early stopping API and functions

* Fix early stopping efficiency issue by creating a singleton for no early stopping

* Python improvements to early stopping API

* Add assertion check for binary and multiclass prediction score length

* Update vcxproj and vcxproj.filters with new early stopping files

* Remove inline from PredictRaw(), the linker was not able to find it otherwise
2017-05-29 23:09:58 +08:00
Guolin Ke 14f429f26d fix vs build. 2017-05-03 11:46:36 +08:00
Guolin Ke fb96b717a8 Link Boost by static library. 2017-04-25 12:17:26 +08:00
Guolin Ke 062bfa7964 Revert "[WIP]faster histogram sum up" (#422)
* Revert "python-package: support valid_names in scikit-learn API (#420)"

This reverts commit de39dbcf3d.

* Revert "faster histogram sum up (#418)"

This reverts commit 98c7c2a35a.
2017-04-17 10:16:23 +08:00
Guolin Ke 98c7c2a35a faster histogram sum up (#418)
* some refactor.

* two stage sum up to reduce sum up error.

* add more two-stage sumup.

* some refactor.

* add alignment.

* change name to aligned_allocator.

* remove some useless sumup.

* fix a warning.

* add -march=native .

* remove the padding of gradients.

* no alignment.

* fix test.

* change KNumSumupGroup to 32768.

* change gcc flags.
2017-04-16 09:10:35 +08:00
Guolin Ke bfb0217a02 Move all prediction transform to the objective. (#383)
* many refactors.

* remove multi_loglossova.

* fix tests.

* avoid using lambda function.

* fix some format.

* reduce branching.
2017-04-06 19:14:58 +08:00
Guolin Ke 4d033831f5 not use exception in command line due to catch-miss of multi-threading exceptions. 2017-03-13 13:41:21 +08:00