e43c919521
Fixes bug: Adds missing import |
||
---|---|---|
environment | ||
experiments | ||
.gitignore | ||
INSTALL.md | ||
LICENSE | ||
README.md | ||
requirements.txt |
README.md
Fast Retraining
In this repo we compare two of the fastest boosted decision tree libraries: XGBoost and LightGBM. We will evaluate them across datasets of several domains and different sizes.
On July 25, 2017, we published a blog post evaluating both libraries and discussing the benchmark results. The post is Lessons Learned From Benchmarking Fast Machine Learning Algorithms.
Installation and Setup
The installation instructions can be found here.
Project
In the folder experiments you can find the different experiments of the project. We developed 6 experiments with the CPU and GPU versions of the libraries.
- Airline
- BCI
- Football
- Planet Kaggle
- Fraud Detection
- HIGGS
In the folder experiment/libs there is the common code for the project.
Benchmark
In the following table there are summarized the time results (in seconds) and the ratio of the benchmarks performed in the experiments:
Dataset | Experiment | Data size | Features | xgb time: CPU (GPU) |
xgb_hist time: CPU (GPU) |
lgb time: CPU (GPU) |
ratio xgb/lgb: CPU (GPU) |
ratio xgb_hist/lgb: CPU (GPU) |
---|---|---|---|---|---|---|---|---|
Football | Link CPU Link GPU |
19673 | 46 | 2.27 (7.09) | 2.47 (4.58) | 0.58 (0.97) | 3.90 (7.26) |
4.25 (4.69) |
Fraud Detection | Link CPU Link GPU |
284807 | 30 | 4.34 (5.80) | 2.01 (1.64) | 0.66 (0.29) | 6.58 (19.74) |
3.04 (5.58) |
BCI | Link CPU Link GPU |
20497 | 2048 | 11.51 (12.93) | 41.84 (42.69) | 7.31 (2.76) | 1.57 (4.67) |
5.72 (15.43) |
Planet Kaggle | Link CPU Link GPU |
40479 | 2048 | 313.89 (-) | 2115.28 (2028.43) | 194.57 (317.68) | 1.61 (-) |
10.87 (6.38) |
HIGGS | Link CPU Link GPU |
11000000 | 28 | 2996.16 (-) | 121.21 (114.88) | 119.34 (71.87) | 25.10 (-) |
1.01 (1.59) |
Airline | Link CPU Link GPU |
115069017 | 13 | - (-) | 1242.09 (1271.91) | 1056.20 (645.40) | - (-) |
1.17 (1.97) |
In the next table we summarize the performance results using the F1-Score.
Dataset | Experiment | Data size | Features | xgb F1: CPU (GPU) |
xgb_hist F1: CPU (GPU) |
lgb F1: CPU (GPU) |
---|---|---|---|---|---|---|
Football | Link Link |
19673 | 46 | 0.458 (0.470) | 0.460 (0.472) | 0.459 (0.470) |
Fraud Detection | Link Link |
284807 | 30 | 0.824 (0.821) | 0.802 (0.814) | 0.813 (0.811) |
BCI | Link Link |
20497 | 2048 | 0.110 (0.093) | 0.142 (0.120) | 0.137 (0.138) |
Planet Kaggle | Link Link |
40479 | 2048 | 0.805 (-) | 0.822 (0.822) | 0.822 (0.821) |
HIGGS | Link Link |
11000000 | 28 | 0.763 (-) | 0.767 (0.767) | 0.768 (0.767) |
Airline | Link Link |
115069017 | 13 | - (-) | 0.741 (0.745) | 0.732 (0.745) |
The experiments were run on an Azure NV24 VM with 24 cores and 224 GB memory. The machine has 4 NVIDIA M60 GPUs. In both cases we used Ubuntu 16.04.
Contributing
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.