f6be39ce93
* Python 3.8 and pybind11 (#493) * First steps to enable python 3.7 on windows + use pybind11 instead of boost_python * Add Python 3.7 for Linux, Mac, remove unnecessary dependencies to boost python * install pybind11 * fix build * Fix build on windows * Update build.cmd * Append None into list instead of empty objects, translate C++ exception * fix wrong cast with string * Fix issues with columns names having unicode characters. * Update build.cmd * update build.cmd * update build.cmd * Fix one issue with sparse data * Complete merge * Update dllmain.cpp * Update dllmain.cpp * Quick modifications * Fix issue with sparse data when switching to pybind11 * Fix one final unit test * update CI * update build ci * fix CI and compilation issues * Update DataViewInterop.cpp * Update dllmain.cpp * add configuration for python 3.7 * fix broken unit test * Update build.sh * fix build for Windows * Linux py3.7 build * fix pytest version * upgrade pytest * fix pytest-cov version * fix isinstance(., int) for python 2.7 * fix merge issues * use BOOST_PYTHON for all releases * fix iteration issue * Update build.sh * use custom python * Update phase-template.yml * update CI * Update phase-template.yml * update CI * fix CI * update CI * Update .vsts-ci.yml * update python versio * update CI * Update .vsts-ci.yml * Update .vsts-ci.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * update CI * fix paths * Update build.sh * fix linux build * Update phase-template.yml * Update phase-template.yml * Update .vsts-ci.yml * Update build.sh * Update build.sh * Update build.sh * update build * Update test_estimator_checks.py * initial commit * 'fix' * merge * Remove boost * merge * merge * pybind11 install * more pybind11 port * up version to 1.8.0 * Remove python 2.7 Remove coverage Fix tests Upgrade Featurizers lib * Remove boost * fix python path * remove boost libs * Remove boost & py2.7 for Lin/Mac * fix Lin build * fix lin build * fix mac build * fix Lin build * fix libc install * fix build * fix linbuild * remove isnan * Add python 3.8 build pieces * fix win build * fix py 3.8 build * fix linux build update setup * remove platform dependency use distro instead * fix python url for linux * Fix linux python 3.8 build * linux build * fix path * Rollback to preview2 * fix build * fix build * build * fix build * 'fix' * 'fix' * 'sudo' * 'build' * 'test' * 'test' * 'test' * 'test' * 'test' * fix tests * fix tests * fix tests path for 3.8 * fix tests * fix mac * fix linux tests * fix mac * fix mac tests * run as root * fix mac tests Co-authored-by: xavier dupré <xavier.dupre@gmail.com> Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> Co-authored-by: Admin <admin@Admins-MacBook-Pro.local> * fix comments * fix build * fix build Co-authored-by: xavier dupré <xavier.dupre@gmail.com> Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> Co-authored-by: Admin <admin@Admins-MacBook-Pro.local> |
||
---|---|---|
.github | ||
build | ||
docs | ||
src | ||
.gitignore | ||
.vsts-ci.yml | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
LICENSE | ||
PULL_REQUEST_TEMPLATE.md | ||
README.md | ||
THIRD-PARTY-NOTICES.txt | ||
build.cmd | ||
build.sh | ||
nimbusml.sln | ||
nuget.config | ||
release-next.md | ||
version.txt |
README.md
NimbusML
nimbusml
is a Python module that provides Python bindings for ML.NET.
ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel, and others. nimbusml
was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.
nimbusml
enables training ML.NET pipelines or integrating ML.NET components directly into scikit-learn pipelines. It adheres to existing scikit-learn
conventions, allowing simple interoperability between nimbusml
and scikit-learn
components, while adding a suite of fast, highly optimized, and scalable algorithms, transforms, and components written in C++ and C#.
See examples below showing interoperability with scikit-learn
. A more detailed example in the documentation shows how to use a nimbusml
component in a scikit-learn
pipeline, and create a pipeline using only nimbusml
components.
nimbusml
supports numpy.ndarray
, scipy.sparse_cst
, and pandas.DataFrame
as inputs. In addition, nimbusml
also supports streaming from files without loading the dataset into memory with FileDataStream
, which allows training on data significantly exceeding memory.
Documentation can be found here and additional notebook samples can be found here.
Installation
nimbusml
runs on Windows, Linux, and macOS.
nimbusml
requires Python 2.7, 3.5, 3.6, 3.7 64 bit version only.
Install nimbusml
using pip
with:
pip install nimbusml
nimbusml
has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.
Examples
Here is an example of how to train a model to predict sentiment from text samples (based on this ML.NET example). The full code for this example is here.
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer
train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()
train_data = FileDataStream.read_csv(train_file, sep='\t')
test_data = FileDataStream.read_csv(test_file, sep='\t')
pipeline = Pipeline([ # nimbusml pipeline
NGramFeaturizer(columns={'Features': ['Text']}),
FastTreesBinaryClassifier(feature=['Features'], label='Label')
])
# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)
Instead of creating an nimbusml
pipeline, you can also integrate components into scikit-learn pipelines:
from sklearn.pipeline import Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()
train_data = pd.read_csv(train_file, sep='\t')
test_data = pd.read_csv(test_file, sep='\t')
pipeline = Pipeline([ # sklearn pipeline
('tfidf', TfidfVectorizer()), # sklearn transform
('clf', FastTreesBinaryClassifier()) # nimbusml learner
])
# fit and predict
pipeline.fit(train_data["Text"], train_data["Label"])
results = pipeline.predict(test_data["Text"])
Many additional examples and tutorials can be found in the documentation.
Building
To build nimbusml
from source please visit our developer guide.
Contributing
The contributions guide can be found here.
Support
If you have an idea for a new feature or encounter a problem, please open an issue in this repository or ask your question on Stack Overflow.
License
NimbusML is licensed under the MIT license.