Performance Robustness Evaluation for Statistical Classifiers
Перейти к файлу
David Zeber 3d4dc74165
Merge pull request #297 from alberginia/vae-example
ML Classifier Copies VAE sampler example
2023-01-25 17:16:09 -06:00
.circleci Merge pull request #299 from mozilla/rotate-docs-key 2023-01-13 19:24:14 -06:00
bin Add CircleCI build badge and a build script for master 2020-06-26 11:52:41 -04:00
datasets Merge pull request #225 from jdiego-miyashiro/master 2020-11-16 19:14:35 -06:00
docs Merge pull request #294 from alberginia/tutorial 2022-10-03 19:10:20 -05:00
examples Remove unnecessary dependencies from ML classifier copies autoencoder sampler. 2022-10-30 23:56:19 +01:00
literature Added references harris2020 and crane2018 to literature folder. 2020-07-13 18:22:15 -05:00
presc Add orthonormal weights approximation option to autoencoder train parameters. 2022-10-30 23:33:29 +01:00
scripts Move add_tags script out of the main library 2021-02-24 17:42:21 -06:00
sphinx_docs Modify ML section name at the documentation index. 2022-06-08 06:17:45 +02:00
tests Add one second waiting before checking if online streamer is off. 2022-10-01 23:15:36 +02:00
.flake8 Add basic package-wide config functionality 2021-03-08 11:54:49 -06:00
.gitignore Merge pull request #259 from mozilla/execution-updates 2021-03-03 11:39:34 -06:00
.pre-commit-config.yaml Update environment packages & python version 2022-03-10 16:06:27 -06:00
CODE_OF_CONDUCT.md Add code of conduct 2021-05-26 14:36:53 -05:00
Dockerfile Add docker configuration and run script 2020-06-24 13:45:30 -04:00
LICENSE Initial commit 2020-02-24 16:45:47 -06:00
Makefile Added doc generation on CircleCI build 2021-02-25 17:45:02 -05:00
README.md Move to pip local install 2023-01-13 19:07:34 -06:00
environment.yml Add TF to the env 2023-01-25 16:38:01 -06:00
setup.py Bump version 2022-07-08 19:26:31 -05:00

README.md

PRESC: Performance and Robustness Evaluation for Statistical Classifiers

CircleCI Join the chat at https://gitter.im/PRESC-outreachy/community

PRESC is a toolkit for the evaluation of machine learning classification models. Its goal is to provide insights into model performance which extend beyond standard scalar accuracy-based measures and into areas which tend to be underexplored in application, including:

  • Generalizability of the model to unseen data for which the training set may not be representative
  • Sensitivity to statistical error and methodological choices
  • Performance evaluation localized to meaningful subsets of the feature space
  • In-depth analysis of misclassifications and their distribution in the feature space

More details about the specific features we are considering are presented in the project roadmap. We believe that these evaluations are essential for developing confidence in the selection and tuning of machine learning models intended to address user needs, and are important prerequisites towards building trustworthy AI.

It also includes a package to carry out copies of machine learning classifiers.

As a tool, PRESC is intended for use by ML engineers to assist in the development and updating of models. It is usable in the following ways:

  • As a standalone tool which produces a graphical report evaluating a given model and dataset
  • As a Python package/API which can be integrated into an existing pipeline

A further goal is to use PRESC:

  • As a step in a Continuous Integration workflow: evaluations run as a part of CI, for example, on regular model updates, and fail if metrics produce unacceptable values.

For the time being, the following are considered out of scope:

  • User-facing evaluations, eg. explanations
  • Evaluations which depend explicitly on domain context or value judgements of features, eg. protected demographic attributes. A domain expert could use PRESC to study misclassifications across such protected groups, say, but the PRESC evaluations themselves should be agnostic to such determinations.
  • Analyses which do not involve the model, eg. class imbalance in the training data

There is a considerable body of recent academic research addressing these topics, as well as a number of open-source projects solving related problems. Where possible, we plan to offer integration with existing tools which align with our vision and goals.

Documentation

Project documentation is available here and provides much more detail, including:

  • Getting set up
  • Running a report
  • Computing evaluations
  • Configuration
  • Package API

Examples

An example script demonstrating how to run a report is available here.

There are a number of notebooks and explorations in the examples/ dir, but they are not guaranteed to run or be up-to-date as the package has undergone major changes recently and we have not yet finished updating these.

Some well-known datasets are provided in CSV format in the datasets/ dir for exploration purposes.

Notes for contributors

Contributions are welcome. We are using the repo issues to manage project tasks in alignment with the roadmap, as well as hosting discussions. You can also reach out on Gitter.

We recommend that submissions for new feature implementations include a Juypter notebook demonstrating their application to a real-world dataset and model.

This repo adheres to Python black formatting, which is enforced by a pre-commit hook (see below).

Along with code contributions, we welcome general feedback:

  • Testing out the package functionality. Try running the report on a classification model and dataset. You can also try running individual evaluations in a Jupyter notebook.
    • If you don't have a dataset or classification model to work with, you can use one of the datasets in the repo, and create a classifier using scikit-learn. Some examples are given in the examples/ dir.
    • If you can apply PRESC to a classification problem you have already been working on, we'd be very excited to hear your feedback. If your data & model can be considered public, you are welcome to submit any artifacts to our examples/ dir.
  • Please open issues for any bugs you encounter (including things that don't work as you expect or aren't well explained).
    • If you want to offer a PR for a fix, that is welcome too.
  • We would welcome any feedback on the general approach, the evaluations described in the roadmap, the results you get from running PRESC, etc, including similar projects you're familiar with. You can start a discussion by opening an issue.

The development of the ML Classifier Copies package is being carried out in the branch model-copying.

Setting up a dev environment

Make sure you have conda (eg. Miniconda) installed. conda init should be run during installation to set the PATH properly.

Set up and activate the environment. This will also enable a pre-commit hook to verify that code conforms to flake8 and black formatting rules. On Windows, these commands should be run from the Anaconda command prompt.

$ conda env create -f environment.yml
$ conda activate presc
$ python -m pip install -e .
$ pre-commit install

To run tests:

$ pytest

Acknowledgements

This project is maintained by Mozilla's Data Science team. We have also received code contributions by participants in the following programs, and we are grateful for their support:

The ML Classifier Copying package is being funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.