зеркало из https://github.com/microsoft/CBM.git
Родитель
c093687595
Коммит
c2ac332f56
|
@ -0,0 +1,43 @@
|
|||
name: Publish Python 🐍 distributions 📦 to PyPI and TestPyPI
|
||||
|
||||
on: push
|
||||
|
||||
jobs:
|
||||
build-n-publish:
|
||||
name: Build and publish Python 🐍 distributions 📦 to PyPI and TestPyPI
|
||||
runs-on: ubuntu-18.04
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@master
|
||||
- name: Set up Python 3.8
|
||||
uses: actions/setup-python@v1
|
||||
with:
|
||||
python-version: 3.8
|
||||
|
||||
- name: Install pypa/build
|
||||
run: >-
|
||||
python -m
|
||||
pip install
|
||||
build
|
||||
--user
|
||||
|
||||
- name: Build a binary wheel and a source tarball
|
||||
run: >-
|
||||
python -m
|
||||
build
|
||||
--sdist
|
||||
--wheel
|
||||
--outdir dist/
|
||||
.
|
||||
|
||||
- name: Publish distribution 📦 to Test PyPI
|
||||
uses: pypa/gh-action-pypi-publish@master
|
||||
with:
|
||||
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
|
||||
repository_url: https://test.pypi.org/legacy/
|
||||
|
||||
- name: Publish distribution 📦 to PyPI
|
||||
if: startsWith(github.ref, 'refs/tags')
|
||||
uses: pypa/gh-action-pypi-publish@master
|
||||
with:
|
||||
password: ${{ secrets.PYPI_API_TOKEN }}
|
62
README.md
62
README.md
|
@ -1,14 +1,60 @@
|
|||
# Project
|
||||
# Cyclic Boosting Machines
|
||||
|
||||
> This repo has been populated by an initial template to help get you started. Please
|
||||
> make sure to update the content to build a great experience for community-building.
|
||||
This is an efficient and Scikit-learn compatible implementation of the machine learning algorithm [Cyclic Boosting -- an explainable supervised machine learning algorithm](https://arxiv.org/abs/2002.03425), specifically for predicting count-data, such as sales and demand.
|
||||
|
||||
As the maintainer of this project, please make a few updates:
|
||||
## Usage
|
||||
|
||||
- Improving this README.MD file to provide a great experience
|
||||
- Updating SUPPORT.MD with content about this project's support experience
|
||||
- Understanding the security reporting process in SECURITY.MD
|
||||
- Remove this section from the README
|
||||
```python
|
||||
import cbm
|
||||
import numpy as np
|
||||
|
||||
x_train: np.ndarray = ... # will be cast to uint8, so make sure you featurize before hand
|
||||
y_train: np.ndarray = ... # will be cast to uint32
|
||||
|
||||
model = cbm.CBM()
|
||||
model.fit(x_train, y_train)
|
||||
|
||||
x_test: np.numpy = ...
|
||||
y_pred = model.predict(x_test)
|
||||
```
|
||||
|
||||
## Explainability
|
||||
|
||||
The CBM model predicts by multiplying the global mean with each weight estimate for each bin and feature. Thus the weights can be interpreted as % increase or decrease from the global mean. e.g. a weight of 1.2 for the bin _Monday_ of the feature _Day-of-Week_ can be interpreted as a 20% increase of the target.
|
||||
|
||||
<img src="https://render.githubusercontent.com/render/math?math=\hat{y}_i = \mu \cdot \product^{p}_{j=1} f^k_j"> with <img src="https://render.githubusercontent.com/render/math?math=k = \{x_{j,_i} \in b^k_j \}">
|
||||
|
||||
```python
|
||||
model = cbm.CBM()
|
||||
model.fit(x_train, y_train)
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
fig, axes = plt.subplots(2,
|
||||
int(np.ceil(x_train.shape[1] / 2)),
|
||||
figsize=(25, 20),
|
||||
sharex=True)
|
||||
|
||||
for feature in np.arange(x_train.shape[1]):
|
||||
w = model.weights[feature]
|
||||
|
||||
ax = axes[feature % 2, feature // 2]
|
||||
(ax.barh(x_train.iloc[:,feature].cat.categories.astype(str),
|
||||
np.array(w) - 1, # make sure it looks nice w/ bars go up and down from zero
|
||||
)
|
||||
)
|
||||
|
||||
ax.set_title(x_train.columns[feature])
|
||||
ax.xaxis.set_tick_params(which='both', labelbottom=True)
|
||||
|
||||
fig.tight_layout()
|
||||
```
|
||||
|
||||
## Featurization
|
||||
|
||||
Categorical features can be passed as 0-based indices, with a maximum of 255 categories supported at the moment.
|
||||
|
||||
Continuous features need to be discretized. [pandas.qcut](https://pandas.pydata.org/docs/reference/api/pandas.qcut.html) for equal-sized bins or [numpy.interp](https://numpy.org/doc/stable/reference/generated/numpy.interp.html) for equal-distant bins yield good results for us.
|
||||
|
||||
## Contributing
|
||||
|
||||
|
|
|
@ -0,0 +1,5 @@
|
|||
[build-system]
|
||||
# These are the assumed default build requirements from pip:
|
||||
# https://pip.pypa.io/en/stable/reference/pip/#pep-517-and-518-support
|
||||
requires = ["setuptools>=40.8.0", "wheel"]
|
||||
build-backend = "setuptools.build_meta"
|
2
setup.py
2
setup.py
|
@ -15,7 +15,7 @@ class get_pybind_include(object):
|
|||
|
||||
|
||||
setup(
|
||||
name="cbm",
|
||||
name="cyclicbm",
|
||||
version="0.0.1",
|
||||
description="Cyclic Boosting Machines",
|
||||
url="https://github.com/Microsoft/CBM",
|
||||
|
|
Загрузка…
Ссылка в новой задаче