* Add integrated OpenCL build on Linux
* Build integrated OpenCL Linux wheel in CI
* Fix test_dual.py on Linux arm64
* Enable integrated OpenCL Linux wheel arm64 testing in CI
* Update documentation
* Add comment about gpu_use_dp
* add missing fi dropped in merge conflict resolution
* install opencl-headers on bdist task
* use new CI image for x86_64
* update check_dynamic_dependencies script
* use main CI image
Co-authored-by: James Lamb <jaylamb20@gmail.com>
* add binary objective for cuda_exp
* include <string> and <vector>
* exchange include ordering
* fix length of score to copy in evaluation
* fix EvalOneMetric
* fix cuda binary objective and prediction when boosting on gpu
* Add white space
* fix BoostFromScore for CUDABinaryLogloss
update log in test_register_logger
* include <algorithm>
* simplify shared memory buffer
* Extract streaming to own PR
* small merge fixes and cleanup
* linting fixes
* fix cast warning
* Fix accidental deletion during branch transfer
* responded to initial triage comments
* Added more tests to use create-from-samples APIs
* added mutex and adjusted nclasses logic
* Fix thread-safety for pushing data to sparse bins through Push APIs
* lint and doc fixes
* Small SWIG fix
* nit fix
* Responded to StrikerRUS comments
* fix breaking change after merge with master
* Extract streaming to own PR
* small merge fixes and cleanup
* Fix accidental deletion during branch transfer
* responded to initial triage comments
* Added more tests to use create-from-samples APIs
* Fix rstcheck call in ci
* remove TODOs
* Extract streaming to own PR
* small merge fixes and cleanup
* Fix accidental deletion during branch transfer
* responded to initial triage comments
* Added more tests to use create-from-samples APIs
* Small SWIG fix
* remove ci change
* responded to shiyu1994 comments
* responded to StrikerRUS comments
* Fixes from StrikerRUS comments
* initial work for boosting and evaluation with CUDA
* fix compatibility with CPU code
* fix creating objective without USE_CUDA_EXP
* fix static analysis errors
* fix static analysis errors
* new cuda framework
* add histogram construction kernel
* before removing multi-gpu
* new cuda framework
* tree learner cuda kernels
* single tree framework ready
* single tree training framework
* remove comments
* boosting with cuda
* optimize for best split find
* data split
* move boosting into cuda
* parallel synchronize best split point
* merge split data kernels
* before code refactor
* use tasks instead of features as units for split finding
* refactor cuda best split finder
* fix configuration error with small leaves in data split
* skip histogram construction of too small leaf
* skip split finding of invalid leaves
stop when no leaf to split
* support row wise with CUDA
* copy data for split by column
* copy data from host to CPU by column for data partition
* add synchronize best splits for one leaf from multiple blocks
* partition dense row data
* fix sync best split from task blocks
* add support for sparse row wise for CUDA
* remove useless code
* add l2 regression objective
* sparse multi value bin enabled for CUDA
* fix cuda ranking objective
* support for number of items <= 2048 per query
* speedup histogram construction by interleaving global memory access
* split optimization
* add cuda tree predictor
* remove comma
* refactor objective and score updater
* before use struct
* use structure for split information
* use structure for leaf splits
* return CUDASplitInfo directly after finding best split
* split with CUDATree directly
* use cuda row data in cuda histogram constructor
* clean src/treelearner/cuda
* gather shared cuda device functions
* put shared CUDA functions into header file
* change smaller leaf from <= back to < for consistent result with CPU
* add tree predictor
* remove useless cuda_tree_predictor
* predict on CUDA with pipeline
* add global sort algorithms
* add global argsort for queries with many items in ranking tasks
* remove limitation of maximum number of items per query in ranking
* add cuda metrics
* fix CUDA AUC
* remove debug code
* add regression metrics
* remove useless file
* don't use mask in shuffle reduce
* add more regression objectives
* fix cuda mape loss
add cuda xentropy loss
* use template for different versions of BitonicArgSortDevice
* add multiclass metrics
* add ndcg metric
* fix cross entropy objectives and metrics
* fix cross entropy and ndcg metrics
* add support for customized objective in CUDA
* complete multiclass ova for CUDA
* separate cuda tree learner
* use shuffle based prefix sum
* clean up cuda_algorithms.hpp
* add copy subset on CUDA
* add bagging for CUDA
* clean up code
* copy gradients from host to device
* support bagging without using subset
* add support of bagging with subset for CUDAColumnData
* add support of bagging with subset for dense CUDARowData
* refactor copy sparse subrow
* use copy subset for column subset
* add reset train data and reset config for CUDA tree learner
add deconstructors for cuda tree learner
* add USE_CUDA ifdef to cuda tree learner files
* check that dataset doesn't contain CUDA tree learner
* remove printf debug information
* use full new cuda tree learner only when using single GPU
* disable all CUDA code when using CPU version
* recover main.cpp
* add cpp files for multi value bins
* update LightGBM.vcxproj
* update LightGBM.vcxproj
fix lint errors
* fix lint errors
* fix lint errors
* update Makevars
fix lint errors
* fix the case with 0 feature and 0 bin
fix split finding for invalid leaves
create cuda column data when loaded from bin file
* fix lint errors
hide GetRowWiseData when cuda is not used
* recover default device type to cpu
* fix na_as_missing case
fix cuda feature meta information
* fix UpdateDataIndexToLeafIndexKernel
* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
* add refit by tree for cuda tree learner
* fix test_refit in test_engine.py
* create set of large bin partitions in CUDARowData
* add histogram construction for columns with a large number of bins
* add find best split for categorical features on CUDA
* add bitvectors for categorical split
* cuda data partition split for categorical features
* fix split tree with categorical feature
* fix categorical feature splits
* refactor cuda_data_partition.cu with multi-level templates
* refactor CUDABestSplitFinder by grouping task information into struct
* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
* fix misuse of reference
* remove useless changes
* add support for path smoothing
* virtual destructor for LightGBM::Tree
* fix overlapped cat threshold in best split infos
* reset histogram pointers in data partition and spllit finder in ResetConfig
* comment useless parameter
* fix reverse case when na is missing and default bin is zero
* fix mfb_is_na and mfb_is_zero and is_single_feature_column
* remove debug log
* fix cat_l2 when one-hot
fix gradient copy when data subset is used
* switch shared histogram size according to CUDA version
* gpu_use_dp=true when cuda test
* revert modification in config.h
* fix setting of gpu_use_dp=true in .ci/test.sh
* fix linter errors
* fix linter error
remove useless change
* recover main.cpp
* separate cuda_exp and cuda
* fix ci bash scripts
add description for cuda_exp
* add USE_CUDA_EXP flag
* switch off USE_CUDA_EXP
* revert changes in python-packages
* more careful separation for USE_CUDA_EXP
* fix CUDARowData::DivideCUDAFeatureGroups
fix set fields for cuda metadata
* revert config.h
* fix test settings for cuda experimental version
* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version
* fix lint issue by adding a blank line
* fix lint errors by resorting imports
* fix lint errors by resorting imports
* fix lint errors by resorting imports
* merge cuda.yml and cuda_exp.yml
* update python version in cuda.yml
* remove cuda_exp.yml
* remove unrelated changes
* fix compilation warnings
fix cuda exp ci task name
* recover task
* use multi-level template in histogram construction
check split only in debug mode
* ignore NVCC related lines in parameter_generator.py
* update job name for CUDA tests
* apply review suggestions
* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* update header
* remove useless TODOs
* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062
* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only
* fix include order
* fix include order
* remove extra space
* address review comments
* add warning when cuda_exp is used together with deterministic
* add comment about gpu_use_dp in .ci/test.sh
* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* cmake: use object library to avoid duplicate compilation.
* debug: verbose make log for building r package.
* Include /usr/local/include for AppleClang.
* Revert "debug: verbose make log for building r package."
* update cmake comment and fix indentation
* debug cmake USE_DEBUG.
* Revert "debug cmake USt E_DEBUG."
* Add -fPIC for building shared library.
* Always set -fPIC for non MSVC compiler.
* debug: print exception in setup.py
* debug: print cmake output for vs build.
* debug: set opencl related target_xxx on lightgbm_objs.
* Define compile definitions, link libraries on lightgbm_objs.
* Add PUBLIC to target_link_libraries to expose library dependency.
* Use target_link_libraries on object library.
This should propagate usage requirements.
* Fix CUDA linking.
Linking object library (lightgbm_objs) to object library (histograms)
does not linked objects.
* Use PUBLIC link for lightgbm lib.
* Set cuda related properties on final targets.
* Remove debugging changes.
Revert "debug: print exception in setup.py"
Revert "debug: print cmake output for vs build."
etc.
* Remove -D_lightgbm_EXPORTS.
* Revert to add -fPIC only for NOT USE_DEBUG.
* Enable PIC for shared lib.
* Fix enable PIC.
* Use -fPIC for shared lib.
* testlightgbm depends only on object files.
* tweak build for R.
* Try to remove OpenMP related include dir settings.
* link with openmp for capi object library.
* Use PUBLIC for _lightgbm target_link_libraries.
* Try removing exports definition.
* fix typo
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* fix typo
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
* Add some comments for cmake code.
* Try to fix cmake warnings CUDA.
* revert accidentally commited R-package path change.
* Try to fix cmake CUDA warnings, set for _lightgbm target.
* Try to fix cmake CUDA warnings, set for lightgbm target.
* empty commit to trigger ci
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>