update readme

add CI/CD
2021-09-17 15:10:14 +02:00 · 2021-09-17 15:10:14 +02:00 · c2ac332f56
--- a/.github/workflows/publish-to-test-pypi.yml
+++ b/.github/workflows/publish-to-test-pypi.yml
@ -0,0 +1,43 @@
+name: Publish Python 🐍 distributions 📦 to PyPI and TestPyPI
+
+on: push
+
+jobs:
+  build-n-publish:
+    name: Build and publish Python 🐍 distributions 📦 to PyPI and TestPyPI
+    runs-on: ubuntu-18.04
+
+  steps:
+    - uses: actions/checkout@master
+    - name: Set up Python 3.8
+      uses: actions/setup-python@v1
+      with:
+        python-version: 3.8
+
+    - name: Install pypa/build
+      run: >-
+        python -m
+        pip install
+        build
+        --user
+
+    - name: Build a binary wheel and a source tarball
+      run: >-
+        python -m
+        build
+        --sdist
+        --wheel
+        --outdir dist/
+        .        
+
+    - name: Publish distribution 📦 to Test PyPI
+      uses: pypa/gh-action-pypi-publish@master
+      with:
+        password: ${{ secrets.TEST_PYPI_API_TOKEN }}
+        repository_url: https://test.pypi.org/legacy/
+
+    - name: Publish distribution 📦 to PyPI
+      if: startsWith(github.ref, 'refs/tags')
+      uses: pypa/gh-action-pypi-publish@master
+      with:
+        password: ${{ secrets.PYPI_API_TOKEN }}        
--- a/README.md
+++ b/README.md
@ -1,14 +1,60 @@
-# Project
+# Cyclic Boosting Machines

-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+This is an efficient and Scikit-learn compatible implementation of the machine learning algorithm [Cyclic Boosting -- an explainable supervised machine learning algorithm](https://arxiv.org/abs/2002.03425), specifically for predicting count-data, such as sales and demand.

-As the maintainer of this project, please make a few updates:
+## Usage

- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
+```python
+import cbm
+import numpy as np
+
+x_train: np.ndarray = ... # will be cast to uint8, so make sure you featurize before hand
+y_train: np.ndarray = ... # will be cast to uint32
+
+model = cbm.CBM()
+model.fit(x_train, y_train)
+
+x_test: np.numpy = ...
+y_pred = model.predict(x_test)
+```
+
+## Explainability
+
+The CBM model predicts by multiplying the global mean with each weight estimate for each bin and feature. Thus the weights can be interpreted as % increase or decrease from the global mean. e.g. a weight of 1.2 for the bin _Monday_ of the feature _Day-of-Week_ can be interpreted as a 20% increase of the target.
+
+<img src="https://render.githubusercontent.com/render/math?math=\hat{y}_i = \mu \cdot \product^{p}_{j=1} f^k_j"> with <img src="https://render.githubusercontent.com/render/math?math=k = \{x_{j,_i} \in b^k_j \}">
+
+```python
+model = cbm.CBM()
+model.fit(x_train, y_train)
+
+import matplotlib.pyplot as plt
+
+fig, axes = plt.subplots(2, 
+                         int(np.ceil(x_train.shape[1] / 2)),
+                         figsize=(25, 20),
+                         sharex=True)
+
+for feature in np.arange(x_train.shape[1]):
+    w = model.weights[feature]
+    
+    ax = axes[feature % 2, feature // 2]
+    (ax.barh(x_train.iloc[:,feature].cat.categories.astype(str),
+             np.array(w) - 1, # make sure it looks nice w/ bars go up and down from zero
+             )
+    )
+    
+    ax.set_title(x_train.columns[feature])
+    ax.xaxis.set_tick_params(which='both', labelbottom=True)
+    
+fig.tight_layout()
+``` 
+
+## Featurization
+
+Categorical features can be passed as 0-based indices, with a maximum of 255 categories supported at the moment.
+
+Continuous features need to be discretized. [pandas.qcut](https://pandas.pydata.org/docs/reference/api/pandas.qcut.html) for equal-sized bins or [numpy.interp](https://numpy.org/doc/stable/reference/generated/numpy.interp.html) for equal-distant bins yield good results for us.

 ## Contributing

--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,5 @@
+[build-system]
+# These are the assumed default build requirements from pip:
+# https://pip.pypa.io/en/stable/reference/pip/#pep-517-and-518-support
+requires = ["setuptools>=40.8.0", "wheel"]
+build-backend = "setuptools.build_meta"
--- a/setup.py
+++ b/setup.py
@ -15,7 +15,7 @@ class get_pybind_include(object):


 setup(
-    name="cbm",
+    name="cyclicbm",
    version="0.0.1",
    description="Cyclic Boosting Machines",
    url="https://github.com/Microsoft/CBM",