26908f40d5
* Add statefulness to selectivity of the regions of the Venn diagram, and calculate the percentage of errors out of the total number of instances * Better differentiate hover vs select styles |
||
---|---|---|
.vscode | ||
backwardcompatibilityml | ||
development | ||
docs | ||
examples | ||
tests | ||
widget | ||
.flake8 | ||
.flaskenv | ||
.gitignore | ||
CONTRIBUTING.md | ||
LICENSE.txt | ||
README.md | ||
SECURITY.md | ||
dev-requirements.txt | ||
example-requirements.txt | ||
jest.config.js | ||
package.json | ||
release.sh | ||
requirements.txt | ||
setup.py | ||
tsconfig.json | ||
webpack.config.js |
README.md
Introduction
Updates that may improve an AI system’s accuracy can also introduce new and unanticipated errors that damage user trust. Updates that introduce new errors can also break trust between software components and machine learning models, as these errors are propagated and compounded throughout larger integrated AI systems. The Backward Compatibility ML library is an open-source project for evaluating AI system updates in a new way for increasing system reliability and human trust in AI predictions for actions.
The Backward Compatibility ML project has two components:
-
A series of loss functions in which users can vary the weight assigned to the dissonance factor and explore performance/capability tradeoffs during machine learning optimization.
-
Visualization widgets that help users examine metrics and error data in detail. They provide a view of error intersections between models and incompatibility distribution across classes.
Getting Started
- Setup a Python virtual environment or Conda environment and activate it.
- From within the root folder of this project do
pip install -r requirements.txt
- From within the root folder do
npm install
- From within the root folder of this project do
npm run build && pip install -e .
orNODE_ENV=production npx webpack && pip install -e .
- You should now be able to import the
backwardcompatibilityml
module and use it.
Examples
Start your Jupyter Notebooks server and load in the example notebook under the examples
folder
to see how the backwardcompatibilityml
module is used.
To demo the widget, open the notebook compatibility-analysis.ipynb
.
Tests
To run tests, make sure that you are in the project root folder and do:
pip install -r dev-requirements.txt
pytest tests/
npm install
npm run test
Development environment
To run the widget and APIs in a development environment:
- Open a new terminal, then from within the root folder do
npm start
. This will host the widget withwebpack-dev-server
.- To customize the host IP, run
webpack-dev-server --hot --mode development --host <ip>
instead ofnpm start
. npm start
uses0.0.0.0
which makes the server accessible externally.
- To customize the host IP, run
- Open a new terminal, then from within the root folder do
flask run
. This will start the Flask server for the APIs used by the widget.
The widget can be loaded in the web browser at localhost:3000
. It will be loaded independently from a Jupyter notebook. The APIs will be hosted at localhost:5000
. If you are running the server within a VM, you might need to use your machine's local IP instead of localhost
.
Changes to the CSS or TypeScript code will be hot loaded automatically in the browser. Flask will run in debug mode and automatically restart whenever the Python code is changed.
Contributing
Check CONTRIBUTING page.
Research and Acknowledgements
This project materializes and implements ideas from ongoing research on Backward Compatibility in Machine Learning and Model Comparison. Here is a list of development and research contributors:
Current Project Leads: Xavier Fernandes, Juan Lema, Besmira Nushi
Research Contributors: Gagan Bansal, Megha Srivastava, Besmira Nushi, Ece Kamar, Eric Horvitz, Dan Weld, Shital Shah
References
"Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff." Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, Eric Horvitz; AAAI 2019. Pdf
@inproceedings{bansal2019updates, title={Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff}, author={Bansal, Gagan and Nushi, Besmira and Kamar, Ece and Weld, Daniel S and Lasecki, Walter S and Horvitz, Eric}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={33}, pages={2429--2437}, year={2019} }
"An Empirical Analysis of Backward Compatibility in Machine Learning Systems." Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz; KDD 2020. Pdf
@inproceedings{srivastava2020empirical, title={An Empirical Analysis of Backward Compatibility in Machine Learning Systems}, author={Srivastava, Megha and Nushi, Besmira and Kamar, Ece and Shah, Shital and Horvitz, Eric}, booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining}, pages={3272--3280}, year={2020} }
"Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure." Besmira Nushi, Ece Kamar, Eric Horvitz; HCOMP 2018. Pdf
@article{nushi2018towards, title={Towards accountable ai: Hybrid human-machine analyses for characterizing system failure}, author={Nushi, Besmira and Kamar, Ece and Horvitz, Eric}, journal={ Proceedings of the Sixth AAAI Conference on Human Computation and Crowdsourcing}, pages = {126--135}, year={2018} }
Microsoft Open Source Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
License
This project is licensed under the terms of the MIT license. See LICENSE.txt for additional details.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.