Граф коммитов

75 Коммитов

Автор SHA1 Сообщение Дата
Guolin Ke 4e9b589bfd update tests. 2017-08-18 19:01:21 +08:00
wxchan a4ab155368 [python] refine: solve several trivial issues (#753)
* refine python codes

* fix appveryor test

* add note to feature_importances
2017-08-18 18:52:22 +08:00
j-mark-hou e7c53270a0 added test for training when both train and valid are subsets of a si… (#759)
* added test for training when both train and valid are subsets of a single lgb.Dataset object

* pep8 changes

* more pep8

* added test involving subsets of subsets of lgb.Dataset objects

* minor fix to contruction of X matrix

* even more pep8

* simplified test further
2017-08-18 18:52:01 +08:00
Guolin Ke 00cb04a255 Better missing value handle (#747)
* finish the data loading part

* allow prediction.

* fix bug for decision type.

* finish split finding part

* fix bugs.

* bug fixed. add a test .

* fix pep8 .

* update documents.

* fix test bugs.

* fix a format

* fix import error in python test.

* disable missing handle in categorial features.

* fix a bug.

* add more tests.

* fix pep8

* fix bugs.

* remove the missing handle code for categorical feature.
2017-07-30 20:09:41 +08:00
Guolin Ke 6a7470a2b0 Add Random Forest Mode (#678)
* add draft of RF.

* fix score bugs.

* fix scores.

* fix tests.

* update document

* fix GetPredictAt
2017-07-11 19:44:46 +08:00
Guolin Ke 80c641cd17 [python] Submit to PyPI (#635)
* add make command to the python package.

* Update README.rst

* Update README.rst

* Update README.rst

* fix tests.

* fix unix build

* update readme

* fix setup.py

* update travis

* Update .travis.yml

* Update test.py

* some fixes.

* check the 64-bit python

* fix build.

* refine MANIFEST.in

* update Manifest.in

* add more build options.

* Add fatal in cmake

* fix a endif.

* fix bugs.

* fix pep8

* add test for the pip package build

* add test pip install in travis.

* fix version with pre-compile dll

* fix readme.rst

* update readme
2017-06-20 13:17:02 +08:00
Guolin Ke 4d2aa8403c Add Appveyor for windows CI (#634)
* add appveyor

* add nuget and artifacts

* Update appveyor.yml

* remove python27 test
2017-06-18 19:44:57 +08:00
Guolin Ke 6d4c7b03b7 Support early stopping of prediction in CLI (#565)
* fix multi-threading.

* fix name style.

* support in CLI version.

* remove warnings.

* Not default parameters.

* fix if...else... .

* fix bug.

* fix warning.

* refine c_api.

* fix R-package.

* fix R's warning.

* fix tests.

* fix pep8 .
2017-05-30 18:28:17 +08:00
cbecker 993bbd5f91 Add prediction early stopping (#550)
* Add early stopping for prediction

* Fix GBDT if-else prediction with early stopping

* Small C++ embelishments to early stopping API and functions

* Fix early stopping efficiency issue by creating a singleton for no early stopping

* Python improvements to early stopping API

* Add assertion check for binary and multiclass prediction score length

* Update vcxproj and vcxproj.filters with new early stopping files

* Remove inline from PredictRaw(), the linker was not able to find it otherwise
2017-05-29 23:09:58 +08:00
Tsukasa OMOTO babf01c2f6 python: use pytest for tests (#498)
https://docs.pytest.org/
2017-05-11 13:27:18 +08:00
wxchan 35440b9cb9 [python-package] change default best_iteration to 0 (#495)
* make test fail

* change default best_iteration to 0

* fix test

* change data_splitter to folds in cv

* update docs
2017-05-06 23:37:41 +08:00
wxchan 1ad85be5c1 [MRG] auto replace gbdt::prediction with if-else prediction (#482)
* auto replace gbdt::prediction

* add test if_else_prediction

* not override gbdt_prediction.cpp

* close ifstream

* re-order .travis.yml
2017-05-02 23:07:03 +08:00
wxchan a39141e10b re-write test cases: remove global template (#479) 2017-05-02 10:13:06 +08:00
wxchan ef408f552a lambdarank cv (#459) 2017-04-26 15:05:26 +08:00
wxchan 7339ed648c replace whitespaces with underlines in feature name (#426)
* change whitespace to underline in feature names

* add test

* fix bug

* fix bug

* warning -> fatal
2017-04-18 11:03:32 +08:00
Guolin Ke f6b25ac98d fix test. 2017-04-17 11:17:37 +08:00
wxchan 45c1c6e8c1 add best score (#413) 2017-04-15 19:04:35 +08:00
Laurae ba99bcddc6 Switch RMSE to MSE (true L2 loss) (#408)
* RMSE (L2) -> MSE (true L2)

* Remove sqrt unneeded reference

* Square L2 test (RMSE to MSE)

* No square root on test

* Attempt to add RMSE
2017-04-13 18:43:41 +08:00
Huan Zhang a5f11d47ef Use only one thread in test_basic.py (#412) 2017-04-13 12:20:29 +08:00
Huan Zhang 0bb4a825af Initial GPU acceleration support for LightGBM (#368)
* add dummy gpu solver code

* initial GPU code

* fix crash bug

* first working version

* use asynchronous copy

* use a better kernel for root

* parallel read histogram

* sparse features now works, but no acceleration, compute on CPU

* compute sparse feature on CPU simultaneously

* fix big bug; add gpu selection; add kernel selection

* better debugging

* clean up

* add feature scatter

* Add sparse_threshold control

* fix a bug in feature scatter

* clean up debug

* temporarily add OpenCL kernels for k=64,256

* fix up CMakeList and definition USE_GPU

* add OpenCL kernels as string literals

* Add boost.compute as a submodule

* add boost dependency into CMakeList

* fix opencl pragma

* use pinned memory for histogram

* use pinned buffer for gradients and hessians

* better debugging message

* add double precision support on GPU

* fix boost version in CMakeList

* Add a README

* reconstruct GPU initialization code for ResetTrainingData

* move data to GPU in parallel

* fix a bug during feature copy

* update gpu kernels

* update gpu code

* initial port to LightGBM v2

* speedup GPU data loading process

* Add 4-bit bin support to GPU

* re-add sparse_threshold parameter

* remove kMaxNumWorkgroups and allows an unlimited number of features

* add feature mask support for skipping unused features

* enable kernel cache

* use GPU kernels withoug feature masks when all features are used

* REAdme.

* REAdme.

* update README

* fix typos (#349)

* change compile to gcc on Apple as default

* clean vscode related file

* refine api of constructing from sampling data.

* fix bug in the last commit.

* more efficient algorithm to sample k from n.

* fix bug in filter bin

* change to boost from average output.

* fix tests.

* only stop training when all classes are finshed in multi-class.

* limit the max tree output. change hessian in multi-class objective.

* robust tree model loading.

* fix test.

* convert the probabilities to raw score in boost_from_average of classification.

* fix the average label for binary classification.

* Add boost_from_average to docs (#354)

* don't use "ConvertToRawScore" for self-defined objective function.

* boost_from_average seems doesn't work well in binary classification. remove it.

* For a better jump link (#355)

* Update Python-API.md

* for a better jump in page

A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)

After adding the spaces, we can jump to the exact position in page by click the link.

* fixed something mentioned by @wxchan

* Update Python-API.md

* add FitByExistingTree.

* adapt GPU tree learner for FitByExistingTree

* avoid NaN output.

* update boost.compute

* fix typos (#361)

* fix broken links (#359)

* update README

* disable GPU acceleration by default

* fix image url

* cleanup debug macro

* remove old README

* do not save sparse_threshold_ in FeatureGroup

* add details for new GPU settings

* ignore submodule when doing pep8 check

* allocate workspace for at least one thread during builing Feature4

* move sparse_threshold to class Dataset

* remove duplicated code in GPUTreeLearner::Split

* Remove duplicated code in FindBestThresholds and BeforeFindBestSplit

* do not rebuild ordered gradients and hessians for sparse features

* support feature groups in GPUTreeLearner

* Initial parallel learners with GPU support

* add option device, cleanup code

* clean up FindBestThresholds; add some omp parallel

* constant hessian optimization for GPU

* Fix GPUTreeLearner crash when there is zero feature

* use np.testing.assert_almost_equal() to compare lists of floats in tests

* travis for GPU
2017-04-09 21:53:14 +08:00
Laurae 21861cd49f [Python-package]: Fix RandomState issue #376 (#377)
* Python: Fix RandomState issue #376

* Add test case for Python's Shuffle=True
2017-04-02 09:00:50 +08:00
wxchan 6ed335df29 refine early stopping and add a test case (#369) 2017-03-28 10:37:38 +08:00
Guolin Ke b38a19a489 fix test. 2017-03-24 21:01:34 +08:00
Guolin Ke 2e962c779f fix tests. 2017-03-23 00:46:36 +08:00
Guolin Ke ef77806934 Add categorical feature support back. 2017-03-01 21:00:46 +08:00
Guolin Ke 4f77bd2860 update to v2. 2017-03-01 20:59:35 +08:00
wxchan 13d4581b96 add data_splitter to cv (#298)
* add data_splitter for cv

* update gitignore

* clean code
2017-02-18 20:46:44 +08:00
wxchan eef4d2d0c8 refine plotting library (#282)
* refine plot

* use warnings

* refine  logic

* revert 'move to compat.py'
2017-02-03 14:38:47 +08:00
wxchan ab4ed7254c add feature name (#280) 2017-02-02 15:36:44 +08:00
wxchan 58565547e8 [python-package] add plot metrics (#266)
* add plot metrics

* move 'raise Exception' to check_not_tuple_of_2_elements

* rename 'plot_metrics' to 'plot_metric'

* fix misleading message/docs

* change 'Metrics' in title to 'Metric'

* fix misleading comment
2017-01-28 19:10:02 +08:00
wxchan 8980fc7220 [python-package] add plot tree (#262)
* add plot tree

* add docs

* add example

* add test

* fix test

* fix decision type

* add show_info

* use feature name if available
2017-01-25 19:03:00 +08:00
wxchan a4a0235d17 use json instead of repr/eval for pandas_categorical (#247)
* use json instead of repr/eval for pandas_categorical

* fix json dumps with numpy data

* add more test cases
2017-01-23 19:06:36 +08:00
wxchan abaefb54ce [python-package] add plot importance (#237)
* add plot importance

* add plot example
2017-01-20 13:53:18 +08:00
wxchan 57d552726f fix bug for pandas auto categorical_feature (#218)
* fix bug for categorical_feature

* add test on load model with categorical feature

* add unseen category in test dataset

* save/load pandas_categorical to model

* fix logic

* cast pandas columns to string

* add load pandas_categorical from file to _InnerPredictor init
2017-01-16 17:01:48 +08:00
ClimbsRocks b0f7aa508a sklearn compatibility update- renames .feature_importance_ to .feature_importances_ 2017-01-12 00:01:37 -08:00
wxchan 6c248d37c3 suppprt pandas categorical (#193)
* suppprt pandas categorical

* refine logic

* make default=auto

* fix train/valid categorical codes

* add test

* unify set _predictor

* fix tests

* fix warning

* support feature_name=int
2017-01-12 15:27:35 +08:00
wxchan 7f4610a8ad refine pmml.py (#179)
* add pmml to test

* refine pmml.py

* use ~n instead of -n-1

* change map to list comprehension

* fix check

* fix 'use ~n instead of -n-1'

* fix exception
2017-01-10 00:27:52 +08:00
wxchan 1b7643ba60 `_is_constructed` -> `handle is not None`; add FAQ for docs (#173)
* use handle is not None for _is_constructed

* sort imports; clean code; move FAQ to docs
2017-01-08 16:51:04 +08:00
Guolin Ke 551d59ca71 R package (#168)
* finish R's c_api

* clean code

* fix sizeof pointer in 32bit system.

* add predictor class

* add Dataset class

* format code

* add booster

* add type check for expose function

* add a simple callback

* add all callbacks

* finish the basic training logic

* update docs

* add an simple training interface

* add basic test

* adapt the changes in c_api

* add test for Dataset

* add test for custom obj/eval functions

* fix python test

* fix bug in metadata init

* fix R CMD check
2017-01-08 09:13:36 +08:00
wxchan e29ab9f682 fix reset parameter; re-define CVBooster (#166)
* fix reset parameter

* redefine CVBooster

* env.model won't be None

* update env.params
2017-01-06 21:37:35 +08:00
wxchan f893fbf6f2 simplify Dataset class (#163)
* simplify Dataset class

* simplify check output; fix deprecated warning
2017-01-05 12:40:30 +08:00
Guolin Ke 21ee59476e fix double to string precision (std::numeric_limits<double>::digits10 + 2) 2017-01-04 16:26:39 +08:00
wxchan dd425973a5 python code style with pep8 (#161)
* format python code with pep8

* **DO NOT MERGE** deliberately break rules to see what will happen during check

* Revert "**DO NOT MERGE** deliberately break rules to see what will happen during check"

This reverts commit 0db93cd7a4.

* fix format in test.py

* add docs for pep-8
2017-01-04 15:19:12 +08:00
Guolin Ke cf4edf0e39 add test for check model file persistence 2017-01-04 09:31:04 +08:00
wxchan 1c6c7046f0 add @property to sklearn interface (#155)
* add @property to sklearn interface

* add deprecated; fix binary_metric
2017-01-03 13:44:26 +08:00
Guolin Ke f6024c8bd3 fix test for continued train, due to default saved number of model is best_iteration now 2017-01-02 12:38:10 +08:00
Guolin Ke 28972b8667 [python-package] fix tmp file access problem in windows 2017-01-02 10:56:46 +08:00
wxchan a034ceeb3a support pickle (#151)
* support pickle

* add pickle/joblib test; change test_basic to unittest

* remove file for deepcopy

* fix tests

* test basic predict from file

* Revert "test basic predict from file"

This reverts commit 60d2c31585.

* test predict from file

* use tempfile for copy & pickle

* use tempfile w/o binary mode

* clean test
2017-01-01 21:05:44 +08:00
Guolin Ke 72c2d79087 some refine for c_api (#152)
1. add csc support
2. some data type from float to double
2016-12-31 14:48:19 +08:00
wxchan bd7274baee add callbacks to sklearn interface (#150) 2016-12-31 11:12:00 +08:00