Граф коммитов

83 Коммитов

Автор SHA1 Сообщение Дата
Nikita Titov 41152eab4b [python][docs] reworked predict method in sklearn wrapper and docs improvements (#1351)
* fixed docs

* reworker predict method of sklearn wrapper

* fixed encapsulation

* added test

* fixed consistency between docstring and params docs

* fixed verbose

* replaced predict_proba with predict in test

* fixed verbose again

* fixed fraction params descriptions

* added description of skip_drop and drop_rate constraints

* fixed subsample_freq consistency with C++ default value

* fixed nice look of params list

* made force splits json file example clickable

* fixed nice look of metrics list and added comma

* reduced warning in test about same param specified twice

* replaced pred_parameter with **kwargs in predict method

* added test for **kwargs in predict method

* fixed warnings

* fixed pylint
2018-05-10 17:48:29 +08:00
Nikita Titov 21487d8a28 [ci][python] updated pep8 to pycodestyle (#1358)
* updated pep8 to pycodestyle

* fixed E722 do not use bare 'except'

* fixed W605 invalid escape sequence '\*'

* fixed W504 line break after binary operator

* ignore W605 invalid escape sequence '\*' in nuget builder

* made pycodestyle happy
2018-05-08 12:23:35 +08:00
Guolin Ke e005cdb049
Monotone Constraint (#1314) 2018-04-18 11:12:36 +08:00
ebernhardson 7e186a5783 Experimental support for HDFS (#1243)
* Read and write datsets from hdfs.
* Only enabled when cmake is run with -DUSE_HDFS:BOOL=TRUE
* Introduces VirtualFile(Reader|Writer) to asbtract VFS differences
2018-02-27 12:53:21 +08:00
Guolin Ke 1e61f24f72
try to fix problem with multi-dimensional sliced object. (#1210) 2018-01-24 23:46:55 +08:00
Guolin Ke 5a89a76df3
fix early stopping edge case (#1133)
* fix early stopping edge case

* fix message.

* fix tests

* fix GPU tests.
2017-12-23 11:55:53 +08:00
Guolin Ke 8a5ec366aa
Speed up saving and loading model (#1083)
* remove protobuf

* add version number

* remove pmml script

* use float for split gain

* fix warnings

* refine the read model logic of gbdt

* fix compile error

* improve decode speed

* fix some bugs

* fix double accuracy problem

* fix bug

* multi-thread save model

* speed up save model to string

* parallel save/load model

* fix some warnings.

* fix warnings.

* fix a bug

* remove debug output

* fix doc

* fix max_bin warning in tests.

* fix max_bin warning

* fix pylint

* clean code for stringToArray

* clean code for TToString

* remove max_bin

* replace "class" with typename
2017-11-26 16:07:06 +08:00
wxchan bc0579c81b add init_score & test cpp and python result consistency (#1007)
* add init_score & test cpp and python result consistency

* try fix common.h

* Fix tests (#3)

* update atof

* fix bug

* fix tests.

* fix bug

* fix dtypes

* fix categorical feature override

* fix protobuf on vs build (#1004)

* [optional] support protobuf

* fix windows/LightGBM.vcxproj

* add doc

* fix doc

* fix vs support (#2)

* fix vs support

* fix cmake

* fix #1012

* [python] add network config api  (#1019)

* add network

* update doc

* add float tolerance in bin finder.

* fix a bug

* update tests

* add double torelance on tree model

* fix tests

* simplify the double comparison

* fix lightsvm zero base

* move double tolerance to the bin finder.

* fix pylint

* clean test.sh

* add sklearn test

* remove underline

* clean codes

* set random_state=None

* add last line

* fix doc

* rename file

* try fix test
2017-11-09 23:24:20 +08:00
Nikita Titov b9dc51a6c5 [python] fixed stratifiedkfold for non-classifying tasks (#1016)
* Update test_engine.py

* Update test_engine.py
2017-10-24 10:56:58 +08:00
Guolin Ke 087ec475b2 Use one-vs-other for small categorical features.
commit c9e123f24fcbb159c04e6694c7f830530bb2f27e
Author: Guolin Ke <i@yumumu.me>
Date:   Wed Oct 18 10:00:19 2017 +0800

    change default max_cat_to_onehot

commit 805a5c3125b9979d634922e1708877fa0fec80c6
Author: Guolin Ke <i@yumumu.me>
Date:   Tue Oct 17 22:57:18 2017 +0800

    use one hot coding for the small cats
2017-10-18 10:00:55 +08:00
Guolin Ke db9ec2176c reduce parameters in categorical split 2017-10-17 01:58:15 +08:00
Guolin Ke eadc7b9d3f Refine categorical features (#993)
* many fixes for categorical feature

* add l2 to categorcial split.

* remove useless file

* update version

* add cat_l2

* update appveyor verison

* remove file

* fix tests.

* change default cat_l2 value

* fix a bug in bin finder

* change default cat_smooth_ratio
2017-10-16 14:55:25 +08:00
Guolin Ke ef221275d1 fix #991 (#992)
* refine categorical split

* a bug fix

* fix a bug
2017-10-14 00:01:38 +08:00
ChenZhiyong cc11525d26 refine categorical split (#919)
* refine categorical split

* add test
2017-09-28 12:29:18 +08:00
Nikita Titov 0350a9a6ff [python] bring pandas support to the sklearn wrapper back (#904)
* added test for sklearn handle categorical features

* use raw X, y in sklearn wrapper in case of pandas.DataFrame

* fixed probs
2017-09-19 14:55:03 +08:00
Scott Lundberg 67c2bdf905 Fix feature attributions for regression models and add Python bindings (#861)
* Fix feature attributions for regression models and add Python bindings

* Address pylint issue

* Lazy fix missing tree depth info
2017-09-16 23:03:07 +08:00
Nikita Titov 8984111f05 [python] [setup] improving installation (#880)
* disabled logs from compilers; fixed #874

* fixed safe clear_fplder

* added windows folder to manifest.in

* added windows folder to build

* added library path

* added compilation with MSBuild from .sln-file

* fixed unknown PlatformToolset returns exitcode 0

* hotfix

* updated Readme

* removed return

* added installation with mingw test to appveyor

* let's test appveyor with both VS 2015 and VS 2017; but MinGW isn't installed on VS 2017 image

* fixed built-in name 'file'

* simplified appveyor

* removed excess data_files

* fixed unreadable paths

* separated exceptions for cmake and mingw

* refactored silent_call

* don't create artifacts with VS 2015 and mingw

* be more precise with python versioning in Travis

* removed unnecessary if statement

* added classifiers for PyPI and python versions badge

* changed python version in travis

* added support of scikit-learn 0.18.x

* added more python versions to Travis

* added more python versions to Appveyor

* reduced number of tests in Travis

* Travis trick is not needed anymore

* attempt to fix according to https://github.com/Microsoft/LightGBM/pull/880#discussion_r137438856
2017-09-08 18:17:00 +08:00
Nikita Titov db8b6b00a1 [python] fixed sklearn test on python 2.7 (#888)
* fixed sklearn test on python 2.7

* commit to show that problem has been solved

* come back to python 3.6

* removed warnings check
2017-09-05 21:14:14 +08:00
Nikita Titov 015c8fff72 [python] improved sklearn interface (#870)
* improved sklearn interface; added sklearns' tests

* moved best_score into the if statement

* improved docstrings; simplified LGBMCheckConsistentLength

* fixed typo

* pylint

* updated example

* fixed Ranker interface

* added missed boosting_type

* fixed more comfortable autocomplete without unused objects

* removed check for None of eval_at

* fixed according to review

* fixed typo

* added description of fit return type

* dictionary->dict for short

* markdown cleanup
2017-09-05 18:19:45 +08:00
wxchan 603bffcfac [MRG] expose feature importance to c_api (#860)
* expose feature importance to c_api

* support type=gain

* remove dump model from examples and tests temporarily because it's unstable

* use double instead of float
2017-08-24 23:09:43 +08:00
Nikita Titov 3f0061ca5f [python] parameters renaming for sklearn naming convention (#854)
* updated scikit-learn interface

* fixed better description

* updated set_params()

* removed backward compatibility

* removed excess lines

* replaced pop with setdefault

* added deprecated warnings

* added tests
2017-08-23 13:25:30 +08:00
Mikhail Korobov 6be7aa7ab8 TST check that single-leaf trees don't cause segfaults (#852) 2017-08-20 23:40:57 +08:00
wxchan c8142e3037 [MRG] [python] check params for num_boost_round & early_stopping_rounds (#806)
* check params

* add test case

* fix pylint
2017-08-18 19:07:57 +08:00
Guolin Ke 4e9b589bfd update tests. 2017-08-18 19:01:21 +08:00
j-mark-hou e7c53270a0 added test for training when both train and valid are subsets of a si… (#759)
* added test for training when both train and valid are subsets of a single lgb.Dataset object

* pep8 changes

* more pep8

* added test involving subsets of subsets of lgb.Dataset objects

* minor fix to contruction of X matrix

* even more pep8

* simplified test further
2017-08-18 18:52:01 +08:00
Guolin Ke 00cb04a255 Better missing value handle (#747)
* finish the data loading part

* allow prediction.

* fix bug for decision type.

* finish split finding part

* fix bugs.

* bug fixed. add a test .

* fix pep8 .

* update documents.

* fix test bugs.

* fix a format

* fix import error in python test.

* disable missing handle in categorial features.

* fix a bug.

* add more tests.

* fix pep8

* fix bugs.

* remove the missing handle code for categorical feature.
2017-07-30 20:09:41 +08:00
Guolin Ke 6a7470a2b0 Add Random Forest Mode (#678)
* add draft of RF.

* fix score bugs.

* fix scores.

* fix tests.

* update document

* fix GetPredictAt
2017-07-11 19:44:46 +08:00
Guolin Ke 6d4c7b03b7 Support early stopping of prediction in CLI (#565)
* fix multi-threading.

* fix name style.

* support in CLI version.

* remove warnings.

* Not default parameters.

* fix if...else... .

* fix bug.

* fix warning.

* refine c_api.

* fix R-package.

* fix R's warning.

* fix tests.

* fix pep8 .
2017-05-30 18:28:17 +08:00
cbecker 993bbd5f91 Add prediction early stopping (#550)
* Add early stopping for prediction

* Fix GBDT if-else prediction with early stopping

* Small C++ embelishments to early stopping API and functions

* Fix early stopping efficiency issue by creating a singleton for no early stopping

* Python improvements to early stopping API

* Add assertion check for binary and multiclass prediction score length

* Update vcxproj and vcxproj.filters with new early stopping files

* Remove inline from PredictRaw(), the linker was not able to find it otherwise
2017-05-29 23:09:58 +08:00
Tsukasa OMOTO babf01c2f6 python: use pytest for tests (#498)
https://docs.pytest.org/
2017-05-11 13:27:18 +08:00
wxchan 35440b9cb9 [python-package] change default best_iteration to 0 (#495)
* make test fail

* change default best_iteration to 0

* fix test

* change data_splitter to folds in cv

* update docs
2017-05-06 23:37:41 +08:00
wxchan a39141e10b re-write test cases: remove global template (#479) 2017-05-02 10:13:06 +08:00
wxchan ef408f552a lambdarank cv (#459) 2017-04-26 15:05:26 +08:00
wxchan 7339ed648c replace whitespaces with underlines in feature name (#426)
* change whitespace to underline in feature names

* add test

* fix bug

* fix bug

* warning -> fatal
2017-04-18 11:03:32 +08:00
Guolin Ke f6b25ac98d fix test. 2017-04-17 11:17:37 +08:00
wxchan 45c1c6e8c1 add best score (#413) 2017-04-15 19:04:35 +08:00
Laurae ba99bcddc6 Switch RMSE to MSE (true L2 loss) (#408)
* RMSE (L2) -> MSE (true L2)

* Remove sqrt unneeded reference

* Square L2 test (RMSE to MSE)

* No square root on test

* Attempt to add RMSE
2017-04-13 18:43:41 +08:00
Huan Zhang a5f11d47ef Use only one thread in test_basic.py (#412) 2017-04-13 12:20:29 +08:00
Huan Zhang 0bb4a825af Initial GPU acceleration support for LightGBM (#368)
* add dummy gpu solver code

* initial GPU code

* fix crash bug

* first working version

* use asynchronous copy

* use a better kernel for root

* parallel read histogram

* sparse features now works, but no acceleration, compute on CPU

* compute sparse feature on CPU simultaneously

* fix big bug; add gpu selection; add kernel selection

* better debugging

* clean up

* add feature scatter

* Add sparse_threshold control

* fix a bug in feature scatter

* clean up debug

* temporarily add OpenCL kernels for k=64,256

* fix up CMakeList and definition USE_GPU

* add OpenCL kernels as string literals

* Add boost.compute as a submodule

* add boost dependency into CMakeList

* fix opencl pragma

* use pinned memory for histogram

* use pinned buffer for gradients and hessians

* better debugging message

* add double precision support on GPU

* fix boost version in CMakeList

* Add a README

* reconstruct GPU initialization code for ResetTrainingData

* move data to GPU in parallel

* fix a bug during feature copy

* update gpu kernels

* update gpu code

* initial port to LightGBM v2

* speedup GPU data loading process

* Add 4-bit bin support to GPU

* re-add sparse_threshold parameter

* remove kMaxNumWorkgroups and allows an unlimited number of features

* add feature mask support for skipping unused features

* enable kernel cache

* use GPU kernels withoug feature masks when all features are used

* REAdme.

* REAdme.

* update README

* fix typos (#349)

* change compile to gcc on Apple as default

* clean vscode related file

* refine api of constructing from sampling data.

* fix bug in the last commit.

* more efficient algorithm to sample k from n.

* fix bug in filter bin

* change to boost from average output.

* fix tests.

* only stop training when all classes are finshed in multi-class.

* limit the max tree output. change hessian in multi-class objective.

* robust tree model loading.

* fix test.

* convert the probabilities to raw score in boost_from_average of classification.

* fix the average label for binary classification.

* Add boost_from_average to docs (#354)

* don't use "ConvertToRawScore" for self-defined objective function.

* boost_from_average seems doesn't work well in binary classification. remove it.

* For a better jump link (#355)

* Update Python-API.md

* for a better jump in page

A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)

After adding the spaces, we can jump to the exact position in page by click the link.

* fixed something mentioned by @wxchan

* Update Python-API.md

* add FitByExistingTree.

* adapt GPU tree learner for FitByExistingTree

* avoid NaN output.

* update boost.compute

* fix typos (#361)

* fix broken links (#359)

* update README

* disable GPU acceleration by default

* fix image url

* cleanup debug macro

* remove old README

* do not save sparse_threshold_ in FeatureGroup

* add details for new GPU settings

* ignore submodule when doing pep8 check

* allocate workspace for at least one thread during builing Feature4

* move sparse_threshold to class Dataset

* remove duplicated code in GPUTreeLearner::Split

* Remove duplicated code in FindBestThresholds and BeforeFindBestSplit

* do not rebuild ordered gradients and hessians for sparse features

* support feature groups in GPUTreeLearner

* Initial parallel learners with GPU support

* add option device, cleanup code

* clean up FindBestThresholds; add some omp parallel

* constant hessian optimization for GPU

* Fix GPUTreeLearner crash when there is zero feature

* use np.testing.assert_almost_equal() to compare lists of floats in tests

* travis for GPU
2017-04-09 21:53:14 +08:00
Laurae 21861cd49f [Python-package]: Fix RandomState issue #376 (#377)
* Python: Fix RandomState issue #376

* Add test case for Python's Shuffle=True
2017-04-02 09:00:50 +08:00
wxchan 6ed335df29 refine early stopping and add a test case (#369) 2017-03-28 10:37:38 +08:00
Guolin Ke b38a19a489 fix test. 2017-03-24 21:01:34 +08:00
Guolin Ke 2e962c779f fix tests. 2017-03-23 00:46:36 +08:00
Guolin Ke ef77806934 Add categorical feature support back. 2017-03-01 21:00:46 +08:00
Guolin Ke 4f77bd2860 update to v2. 2017-03-01 20:59:35 +08:00
wxchan 13d4581b96 add data_splitter to cv (#298)
* add data_splitter for cv

* update gitignore

* clean code
2017-02-18 20:46:44 +08:00
wxchan eef4d2d0c8 refine plotting library (#282)
* refine plot

* use warnings

* refine  logic

* revert 'move to compat.py'
2017-02-03 14:38:47 +08:00
wxchan ab4ed7254c add feature name (#280) 2017-02-02 15:36:44 +08:00
wxchan 58565547e8 [python-package] add plot metrics (#266)
* add plot metrics

* move 'raise Exception' to check_not_tuple_of_2_elements

* rename 'plot_metrics' to 'plot_metric'

* fix misleading message/docs

* change 'Metrics' in title to 'Metric'

* fix misleading comment
2017-01-28 19:10:02 +08:00
wxchan 8980fc7220 [python-package] add plot tree (#262)
* add plot tree

* add docs

* add example

* add test

* fix test

* fix decision type

* add show_info

* use feature name if available
2017-01-25 19:03:00 +08:00