LightGBM/python-package/README.rst

276 строки
15 KiB
ReStructuredText
Исходник Обычный вид История

LightGBM Python-package
2016-12-01 11:32:47 +03:00
=======================
|License| |Python Versions| |PyPI Version| |Downloads| |API Docs|
2016-12-01 11:32:47 +03:00
Installation
------------
Preparation
'''''''''''
32-bit Python is not supported. Please install 64-bit version. If you have a strong need to install with 32-bit Python, refer to `Build 32-bit Version with 32-bit Python section <#build-32-bit-version-with-32-bit-python>`__.
`setuptools <https://pypi.org/project/setuptools>`_ is needed.
Install from `PyPI <https://pypi.org/project/lightgbm>`_
''''''''''''''''''''''''''''''''''''''''''''''''''''''''
.. code:: sh
pip install lightgbm
You may need to install `wheel <https://pythonwheels.com>`_ via ``pip install wheel`` first.
Compiled library that is included in the wheel file supports both **GPU** and **CPU** versions out of the box. This feature is experimental and available only for **Windows** and **Linux** currently. To use **GPU** version you only need to install OpenCL Runtime libraries. For NVIDIA and AMD GPU they are included in the ordinary drivers for your graphics card, so no action is required. If you would like your AMD or Intel CPU to act like a GPU (for testing and debugging) you can install `AMD APP SDK <https://github.com/microsoft/LightGBM/releases/download/v2.0.12/AMD-APP-SDKInstaller-v3.0.130.135-GA-windows-F-x64.exe>`_ on **Windows** and `PoCL <http://portablecl.org>`_ on **Linux**. Many modern Linux distributions provide packages for PoCL, look for ``pocl-opencl-icd`` on Debian-based distributions and ``pocl`` on RedHat-based distributions.
For **Windows** users, `VC runtime <https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads>`_ is needed if **Visual Studio** (2015 or newer) is not installed.
For **Linux** users, **glibc** >= 2.14 is required for LightGBM ``<=3.3.3`` and **glibc** >= 2.28 is required for newer versions. Also, in some rare cases, when you hit ``OSError: libgomp.so.1: cannot open shared object file: No such file or directory`` error during importing LightGBM, you need to install OpenMP runtime library separately (use your package manager and search for ``lib[g|i]omp`` for doing this).
For **macOS** (we provide wheels for 3 newest macOS versions) users:
- Starting from version 2.2.1, the library file in distribution wheels is built by the **Apple Clang** (Xcode_8.3.3 for versions 2.2.1 - 2.3.1, Xcode_9.4.1 for versions 2.3.2 - 3.3.2 and Xcode_11.7 from version 4.0.0) compiler. This means that you don't need to install the **gcc** compiler anymore. Instead of that you need to install the **OpenMP** library, which is required for running LightGBM on the system with the **Apple Clang** compiler. You can install the **OpenMP** library by the following command: ``brew install libomp``.
- For version smaller than 2.2.1 and not smaller than 2.1.2, **gcc-8** with **OpenMP** support must be installed first. Refer to `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#gcc>`__ for installation of **gcc-8** with **OpenMP** support.
- For version smaller than 2.1.2, **gcc-7** with **OpenMP** is required.
Build from Sources
******************
2017-07-06 04:29:29 +03:00
.. code:: sh
2017-06-25 05:21:19 +03:00
pip install --no-binary :all: lightgbm
2017-06-25 05:21:19 +03:00
For **Linux** and **macOS** users, installation from sources requires installed `CMake`_.
2017-06-25 05:21:19 +03:00
For **Linux** users, **glibc** >= 2.28 is required. Also, in some rare cases you may need to install OpenMP runtime library separately (use your package manager and search for ``lib[g|i]omp`` for doing this).
For **macOS** users, you can perform installation either with **Apple Clang** or **gcc**.
- In case you prefer **Apple Clang**, you should install **OpenMP** (details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#apple-clang>`__) first and **CMake** version 3.16 or higher is required.
- In case you prefer **gcc**, you need to install it (details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#gcc>`__) and specify compilers by running ``export CXX=g++-7 CC=gcc-7`` (replace "7" with version of **gcc** installed on your machine) first.
2017-07-06 04:29:29 +03:00
For **Windows** users, **Visual Studio** (or `VS Build Tools <https://visualstudio.microsoft.com/downloads/>`_) is needed. If you get any errors during installation, you may need to install `CMake`_ (version 3.8 or higher).
2017-07-06 04:29:29 +03:00
Build Threadless Version
~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--nomp
All requirements, except the **OpenMP** requirement, from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well.
It is **strongly not recommended** to use this version of LightGBM!
Build MPI Version
~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--mpi
All requirements from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well.
For **Windows** users, compilation with **MinGW-w64** is not supported and `CMake`_ (version 3.8 or higher) is strongly required.
**MPI** libraries are needed: details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#build-mpi-version>`__.
Build GPU Version
~~~~~~~~~~~~~~~~~
2017-07-22 12:44:21 +03:00
.. code:: sh
pip install lightgbm --install-option=--gpu
All requirements from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well.
For **Windows** users, `CMake`_ (version 3.8 or higher) is strongly required.
**Boost** and **OpenCL** are needed: details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#build-gpu-version>`__. Almost always you also need to pass ``OpenCL_INCLUDE_DIR``, ``OpenCL_LIBRARY`` options for **Linux** and ``BOOST_ROOT``, ``BOOST_LIBRARYDIR`` options for **Windows** to **CMake** via ``pip`` options, like
.. code:: sh
pip install lightgbm --install-option=--gpu --install-option="--opencl-include-dir=/usr/local/cuda/include/" --install-option="--opencl-library=/usr/local/cuda/lib64/libOpenCL.so"
All available options:
- boost-root
- boost-dir
- boost-include-dir
- boost-librarydir
- opencl-include-dir
- opencl-library
For more details see `FindBoost <https://cmake.org/cmake/help/latest/module/FindBoost.html>`__ and `FindOpenCL <https://cmake.org/cmake/help/latest/module/FindOpenCL.html>`__.
Build CUDA Version
~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--cuda
All requirements from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well, and `CMake`_ (version 3.16 or higher) is strongly required.
**CUDA** library (version 10.0 or higher) is needed: details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#build-cuda-version-experimental>`__.
To use the CUDA version within Python, pass ``{"device": "cuda"}`` respectively in parameters.
[CUDA] New CUDA version Part 1 (#4630) * new cuda framework * add histogram construction kernel * before removing multi-gpu * new cuda framework * tree learner cuda kernels * single tree framework ready * single tree training framework * remove comments * boosting with cuda * optimize for best split find * data split * move boosting into cuda * parallel synchronize best split point * merge split data kernels * before code refactor * use tasks instead of features as units for split finding * refactor cuda best split finder * fix configuration error with small leaves in data split * skip histogram construction of too small leaf * skip split finding of invalid leaves stop when no leaf to split * support row wise with CUDA * copy data for split by column * copy data from host to CPU by column for data partition * add synchronize best splits for one leaf from multiple blocks * partition dense row data * fix sync best split from task blocks * add support for sparse row wise for CUDA * remove useless code * add l2 regression objective * sparse multi value bin enabled for CUDA * fix cuda ranking objective * support for number of items <= 2048 per query * speedup histogram construction by interleaving global memory access * split optimization * add cuda tree predictor * remove comma * refactor objective and score updater * before use struct * use structure for split information * use structure for leaf splits * return CUDASplitInfo directly after finding best split * split with CUDATree directly * use cuda row data in cuda histogram constructor * clean src/treelearner/cuda * gather shared cuda device functions * put shared CUDA functions into header file * change smaller leaf from <= back to < for consistent result with CPU * add tree predictor * remove useless cuda_tree_predictor * predict on CUDA with pipeline * add global sort algorithms * add global argsort for queries with many items in ranking tasks * remove limitation of maximum number of items per query in ranking * add cuda metrics * fix CUDA AUC * remove debug code * add regression metrics * remove useless file * don't use mask in shuffle reduce * add more regression objectives * fix cuda mape loss add cuda xentropy loss * use template for different versions of BitonicArgSortDevice * add multiclass metrics * add ndcg metric * fix cross entropy objectives and metrics * fix cross entropy and ndcg metrics * add support for customized objective in CUDA * complete multiclass ova for CUDA * separate cuda tree learner * use shuffle based prefix sum * clean up cuda_algorithms.hpp * add copy subset on CUDA * add bagging for CUDA * clean up code * copy gradients from host to device * support bagging without using subset * add support of bagging with subset for CUDAColumnData * add support of bagging with subset for dense CUDARowData * refactor copy sparse subrow * use copy subset for column subset * add reset train data and reset config for CUDA tree learner add deconstructors for cuda tree learner * add USE_CUDA ifdef to cuda tree learner files * check that dataset doesn't contain CUDA tree learner * remove printf debug information * use full new cuda tree learner only when using single GPU * disable all CUDA code when using CPU version * recover main.cpp * add cpp files for multi value bins * update LightGBM.vcxproj * update LightGBM.vcxproj fix lint errors * fix lint errors * fix lint errors * update Makevars fix lint errors * fix the case with 0 feature and 0 bin fix split finding for invalid leaves create cuda column data when loaded from bin file * fix lint errors hide GetRowWiseData when cuda is not used * recover default device type to cpu * fix na_as_missing case fix cuda feature meta information * fix UpdateDataIndexToLeafIndexKernel * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore * add refit by tree for cuda tree learner * fix test_refit in test_engine.py * create set of large bin partitions in CUDARowData * add histogram construction for columns with a large number of bins * add find best split for categorical features on CUDA * add bitvectors for categorical split * cuda data partition split for categorical features * fix split tree with categorical feature * fix categorical feature splits * refactor cuda_data_partition.cu with multi-level templates * refactor CUDABestSplitFinder by grouping task information into struct * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder * fix misuse of reference * remove useless changes * add support for path smoothing * virtual destructor for LightGBM::Tree * fix overlapped cat threshold in best split infos * reset histogram pointers in data partition and spllit finder in ResetConfig * comment useless parameter * fix reverse case when na is missing and default bin is zero * fix mfb_is_na and mfb_is_zero and is_single_feature_column * remove debug log * fix cat_l2 when one-hot fix gradient copy when data subset is used * switch shared histogram size according to CUDA version * gpu_use_dp=true when cuda test * revert modification in config.h * fix setting of gpu_use_dp=true in .ci/test.sh * fix linter errors * fix linter error remove useless change * recover main.cpp * separate cuda_exp and cuda * fix ci bash scripts add description for cuda_exp * add USE_CUDA_EXP flag * switch off USE_CUDA_EXP * revert changes in python-packages * more careful separation for USE_CUDA_EXP * fix CUDARowData::DivideCUDAFeatureGroups fix set fields for cuda metadata * revert config.h * fix test settings for cuda experimental version * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version * fix lint issue by adding a blank line * fix lint errors by resorting imports * fix lint errors by resorting imports * fix lint errors by resorting imports * merge cuda.yml and cuda_exp.yml * update python version in cuda.yml * remove cuda_exp.yml * remove unrelated changes * fix compilation warnings fix cuda exp ci task name * recover task * use multi-level template in histogram construction check split only in debug mode * ignore NVCC related lines in parameter_generator.py * update job name for CUDA tests * apply review suggestions * Update .github/workflows/cuda.yml Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update .github/workflows/cuda.yml Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update header * remove useless TODOs * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062 * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only * fix include order * fix include order * remove extra space * address review comments * add warning when cuda_exp is used together with deterministic * add comment about gpu_use_dp in .ci/test.sh * revert changing order of included headers Co-authored-by: Yu Shi <shiyu1994@qq.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
2022-03-23 05:39:23 +03:00
Build HDFS Version
~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--hdfs
All requirements from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well.
**HDFS** library is needed: details for installation can be found in `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#build-hdfs-version>`__.
Note that the installation process of HDFS version was tested only on **Linux**.
Build with MinGW-w64 on Windows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--mingw
`CMake`_ and `MinGW-w64 <https://www.mingw-w64.org/>`_ should be installed first.
It is recommended to use **Visual Studio** for its better multithreading efficiency in **Windows** for many-core systems
(see `Question 4 <https://github.com/microsoft/LightGBM/blob/master/docs/FAQ.rst#4-i-am-using-windows-should-i-use-visual-studio-or-mingw-for-compiling-lightgbm>`__ and `Question 8 <https://github.com/microsoft/LightGBM/blob/master/docs/FAQ.rst#8-cpu-usage-is-low-like-10-in-windows-when-using-lightgbm-on-very-large-datasets-with-many-core-systems>`__).
2016-12-01 11:32:47 +03:00
Build 32-bit Version with 32-bit Python
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--bit32
By default, installation in environment with 32-bit Python is prohibited. However, you can remove this prohibition on your own risk by passing ``bit32`` option.
It is **strongly not recommended** to use this version of LightGBM!
Build with Time Costs Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
pip install lightgbm --install-option=--time-costs
Use this option to make LightGBM output time costs for different internal routines, to investigate and benchmark its performance.
Install from `conda-forge channel <https://anaconda.org/conda-forge/lightgbm>`_
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
If you use ``conda`` to manage Python dependencies, you can install LightGBM using ``conda install``.
We strongly recommend installation from the ``conda-forge`` channel and not from the ``default`` one due to many reasons. The main ones are less time delay for new releases, greater number of supported architectures and better handling of dependency conflicts, especially workaround for OpenMP is crucial for LightGBM. More details can be found in `this comment <https://github.com/microsoft/LightGBM/issues/4948#issuecomment-1013766397>`_.
**Note**: The `lightgbm conda-forge feedstock <https://github.com/conda-forge/lightgbm-feedstock>`_ is not maintained by LightGBM maintainers.
.. code:: sh
conda install -c conda-forge lightgbm
Install from GitHub
'''''''''''''''''''
2016-12-01 11:32:47 +03:00
All requirements from `Build from Sources section <#build-from-sources>`__ apply for this installation option as well.
2017-07-06 04:29:29 +03:00
For **Windows** users, if you get any errors during installation and there is the warning ``WARNING:LightGBM:Compilation with MSBuild from existing solution file failed.`` in the log, you should install `CMake`_ (version 3.8 or higher).
2017-06-25 05:21:19 +03:00
.. code:: sh
git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM/python-package
# export CXX=g++-7 CC=gcc-7 # macOS users, if you decided to compile with gcc, don't forget to specify compilers (replace "7" with version of gcc installed on your machine)
python setup.py install
Note: ``sudo`` (or administrator rights in **Windows**) may be needed to perform the command.
Run ``python setup.py install --nomp`` to disable **OpenMP** support. All requirements from `Build Threadless Version section <#build-threadless-version>`__ apply for this installation option as well.
Run ``python setup.py install --mpi`` to enable **MPI** support. All requirements from `Build MPI Version section <#build-mpi-version>`__ apply for this installation option as well.
Run ``python setup.py install --mingw``, if you want to use **MinGW-w64** on **Windows** instead of **Visual Studio**. All requirements from `Build with MinGW-w64 on Windows section <#build-with-mingw-w64-on-windows>`__ apply for this installation option as well.
Run ``python setup.py install --gpu`` to enable GPU support. All requirements from `Build GPU Version section <#build-gpu-version>`__ apply for this installation option as well. To pass additional options to **CMake** use the following syntax: ``python setup.py install --gpu --opencl-include-dir=/usr/local/cuda/include/``, see `Build GPU Version section <#build-gpu-version>`__ for the complete list of them.
Run ``python setup.py install --cuda`` to enable CUDA support. All requirements from `Build CUDA Version section <#build-cuda-version>`__ apply for this installation option as well.
Run ``python setup.py install --hdfs`` to enable HDFS support. All requirements from `Build HDFS Version section <#build-hdfs-version>`__ apply for this installation option as well.
2016-12-01 11:32:47 +03:00
Run ``python setup.py install --bit32``, if you want to use 32-bit version. All requirements from `Build 32-bit Version with 32-bit Python section <#build-32-bit-version-with-32-bit-python>`__ apply for this installation option as well.
Run ``python setup.py install --time-costs``, if you want to output time costs for different internal routines. All requirements from `Build with Time Costs Output section <#build-with-time-costs-output>`__ apply for this installation option as well.
If you get any errors during installation or due to any other reasons, you may want to build dynamic library from sources by any method you prefer (see `Installation Guide <https://github.com/microsoft/LightGBM/blob/master/docs/Installation-Guide.rst>`__) and then just run ``python setup.py install --precompile``.
Build Wheel File
****************
You can use ``python setup.py bdist_wheel`` instead of ``python setup.py install`` to build wheel file and use it for installation later. This might be useful for systems with restricted or completely without network access.
Install Dask-package
''''''''''''''''''''
.. warning::
Dask-package is only tested on Linux.
To install all additional dependencies required for Dask-package, you can append ``[dask]`` to LightGBM package name:
.. code:: sh
pip install lightgbm[dask]
Or replace ``python setup.py install`` with ``pip install -e .[dask]`` if you are installing the package from source files.
Troubleshooting
---------------
In case you are facing any errors during the installation process, you can examine ``$HOME/LightGBM_compilation.log`` file, in which all operations are logged, to get more details about occurred problem. Also, please attach this file to the issue on GitHub to help faster indicate the cause of the error.
Refer to `FAQ <https://github.com/microsoft/LightGBM/tree/master/docs/FAQ.rst>`_.
Examples
--------
Refer to the walk through examples in `Python guide folder <https://github.com/microsoft/LightGBM/tree/master/examples/python-guide>`_.
Development Guide
-----------------
The code style of Python-package follows `PEP 8 <https://www.python.org/dev/peps/pep-0008/>`_.
The package's documentation strings (docstrings) are written in the `numpydoc style <https://numpydoc.readthedocs.io/en/latest/format.html>`_.
To check that a contribution to the package matches its style expectations, run the following from the root of the repo.
.. code:: sh
sh .ci/lint-python.sh .
.. |License| image:: https://img.shields.io/github/license/microsoft/lightgbm.svg
:target: https://github.com/microsoft/LightGBM/blob/master/LICENSE
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/lightgbm.svg?logo=python&logoColor=white
:target: https://pypi.org/project/lightgbm
.. |PyPI Version| image:: https://img.shields.io/pypi/v/lightgbm.svg?logo=pypi&logoColor=white
:target: https://pypi.org/project/lightgbm
.. |Downloads| image:: https://pepy.tech/badge/lightgbm
:target: https://pepy.tech/project/lightgbm
.. |API Docs| image:: https://readthedocs.org/projects/lightgbm/badge/?version=latest
:target: https://lightgbm.readthedocs.io/en/latest/Python-API.html
.. _CMake: https://cmake.org/