Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Перейти к файлу

Gani Nazirov 32e2d67749 Upgrade version (#122 ) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade version		2019-06-03 22:41:25 -07:00
.github/ISSUE_TEMPLATE	Update issue templates	2018-11-01 15:28:53 -07:00
build	Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120 )	2019-06-03 22:17:59 -07:00
docs	add configuration for python 3.7 (#101 )	2019-04-11 13:37:24 -07:00
src	Upgrade version (#122 )	2019-06-03 22:41:25 -07:00
.gitignore	fix tests	2018-10-23 14:48:52 -07:00
.vsts-ci.yml	add configuration for python 3.7 (#101 )	2019-04-11 13:37:24 -07:00
CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md	2018-11-01 15:22:26 -07:00
CONTRIBUTING.md	Fixing link in CONTRIBUTING.md (#44 )	2018-11-02 13:48:34 -07:00
LICENSE	Update LICENSE	2018-10-19 10:45:04 -07:00
PULL_REQUEST_TEMPLATE.md	Create PULL_REQUEST_TEMPLATE.md	2018-11-01 15:25:38 -07:00
README.md	Removing 3.7 for now as its not in PyPI	2019-05-07 12:00:11 -07:00
THIRD-PARTY-NOTICES.txt	Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40 )	2018-10-31 10:22:17 -07:00
build.cmd	Fix latest Windows build issues. (#105 )	2019-05-26 20:18:35 -07:00
build.sh	Upgrade the pytest-remotedata package to fix missing attribute error. (#121 )	2019-06-03 22:18:28 -07:00
nimbusml.sln	add configuration for python 3.7 (#101 )	2019-04-11 13:37:24 -07:00
nuget.config	Initial checkin	2018-10-19 09:57:24 -07:00
version.txt	Upgrade version (#122 )	2019-06-03 22:41:25 -07:00

README.md

NimbusML

nimbusml is a Python module that provides experimental Python bindings for ML.NET.

ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and others. nimbusml was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.

This package enables training ML.NET pipelines or integrating ML.NET components directly into scikit-learn pipelines (it supports numpy.ndarray, scipy.sparse_cst, and pandas.DataFrame as inputs).

Documentation can be found here and additional notebook samples can be found here.

Installation

nimbusml runs on Windows, Linux, and macOS.

nimbusml requires Python 2.7, 3.5, 3.6 64 bit version only.

Install nimbusml using pip with:

pip install nimbusml

nimbusml has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7.

Examples

Here is an example of how to train a model to predict sentiment from text samples (based on this ML.NET example). The full code for this example is here.

from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.text import NGramFeaturizer

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = FileDataStream.read_csv(train_file, sep='\t')
test_data = FileDataStream.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # nimbusml pipeline
    NGramFeaturizer(columns={'Features': ['Text']}),
    FastTreesBinaryClassifier(feature=['Features'], label='Label')
])

# fit and predict
pipeline.fit(train_data)
results = pipeline.predict(test_data)

Instead of creating an nimbusml pipeline, you can also integrate components into scikit-learn pipelines:

from sklearn.pipeline import Pipeline
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

train_file = get_dataset('gen_twittertrain').as_filepath()
test_file = get_dataset('gen_twittertest').as_filepath()

train_data = pd.read_csv(train_file, sep='\t')
test_data = pd.read_csv(test_file, sep='\t')

pipeline = Pipeline([ # sklearn pipeline
    ('tfidf', TfidfVectorizer()), # sklearn transform
    ('clf', FastTreesBinaryClassifier()) # nimbusml learner
])

# fit and predict
pipeline.fit(train_data["Text"], train_data["Label"])
results = pipeline.predict(test_data["Text"])

Many additional examples and tutorials can be found in the documentation.

Building

To build nimbusml from source please visit our developer guide.

Contributing

The contributions guide can be found here. Given the experimental nature of this project, support will be provided on a best-effort basis. We suggest opening an issue for discussion before starting a PR with big changes.

Support

If you have an idea for a new feature or encounter a problem, please open an issue in this repository or ask your question on Stack Overflow.

License

NimbusML is licensed under the MIT license.