Measure how our data deviates from normal distribution
Перейти к файлу
Gregory Mierzwinski f3cd1fcebf Update scipy to 1.10.0. 2023-02-09 08:28:27 -05:00
moz_measure_noise Fix MWU detector, and replace old methods. 2023-01-27 12:52:58 -05:00
resources Change contact info, and package name. (#32) 2022-08-02 11:07:32 -04:00
tests Change contact info, and package name. (#32) 2022-08-02 11:07:32 -04:00
vendor update from machine 2020-08-11 07:09:29 -04:00
.gitignore add venv instructions, add mo-http library 2020-07-08 08:34:38 -04:00
.travis.yml no more 2.7 2020-07-08 08:45:04 -04:00
CODE_OF_CONDUCT.md Add Mozilla Code of Conduct file 2019-03-30 00:12:52 -07:00
LICENSE Initial commit 2019-03-14 19:15:43 -04:00
README.md Add information about updating branches, and PyPi packages. 2022-08-02 11:42:06 -04:00
config.json config changes, check using RO database 2020-07-24 08:47:15 -04:00
requirements.txt require updated numpy (#31) 2022-08-02 08:34:48 -04:00
setup.py Update scipy to 1.10.0. 2023-02-09 08:28:27 -05:00
setuptools.json Update to version 2.59.0.2. 2022-08-02 13:04:52 -04:00

README.md

measure-noise

Measure how our data deviates from normal distribution. (project summary)

Branch Status Coverage
master Build Status
dev Build Status Coverage Status

Install

pip install measure-noise

Usage

The deviance() method will return a (description, score) pair describing how the samples deviate from a normal distribution, and by how much. This is intended to screen samples for use in the t-test, and other statistics, that assume a normal distribution.

  • SKEWED - samples are heavily to one side of the mean
  • OUTLIERS - there are more outliers than would be expected from normal distribution
  • MODAL - few samples are near the mean (probably bimodal)
  • OK - no egregious deviation from normal
  • N/A - not enough data to make a conclusion (aka OK)

Example

from measure_noise import deviance

>>> desc, score = deviance([1,2,3,4,5,6,7,8])
>>> desc
'OK'

Development

For Linux/OSX

git clone https://github.com/mozilla/measure-noise.git
cd measure-noise
python -m venv .venv      
source .venv/bin/activate
pip install -r requirements.txt
pip install -r tests/requirements.txt
export PYTHONPATH=.:vendor
python -m unittest discover tests 

Similar for Windows:

git clone https://github.com/mozilla/measure-noise.git
cd measure-noise
python.exe -m pip install virtualenv  
python.exe -m virtualenv .venv 
.venv\Scripts\activate
pip install -r requirements.txt
pip install -r tests\requirements.txt
set PYTHONPATH=.;vendor
python -m unittest discover tests 

Analysis Configuration

The analysis is controlled by the config.json file, but it is missing the secrets required to connect to the Treeherder RO database and BigQuery. You must provide references to secrets by making your own config file.

Please look at the existing resources/config-<user>.json files. You may make a copy of one, and update it with references to the secrets on your dev machine. For example, resources/kyle-config.json looks like

{
    "$ref": "../config.json",
    "database.$ref": "~/private.json#treeherder",
    "deviant_summary.account_info.$ref": "file:///e:/moz-fx-dev-ekyle-treeherder-a838a7718652.json",
    "constants.measure_noise.analysis.SCATTER_RANGE": "month",
    "constants.measure_noise.analysis.TREEHERDER_RANGE": "90day"
}

You may be able to guess where each of my secrets are found on my machine. For example, I have a private.json file in my user directory. Here is an example of the contents:

// Example contents of ~/private.json
{
    "treeherder": {
        "host": "treeherder-prod-ro.cd3i3txkp6c6.us-east-1.rds.amazonaws.com",
        "port" : 3306,
        "username": "username",
        "password": "password"
    }
}

Feel free to add your resources/config-<user>.json file and make a PR with it. Remember, never put secrets in your project directory; use references to secrets instead.

You may use {"$ref":"env://MY_ENV_VARIABLE"} to use environment variables. More details

Running Analysis

Ensure you are in the main project directory, and point to your config file

Linux/OSX

export PYTHONPATH=.:vendor
python measure_noise/analysis.py --config=resources/config-<user>.json

Windows

set PYTHONPATH=.;vendor
python measure_noise\analysis.py --config=resources\config-kyle.json

Some other options are

  • --id - show specific signature_ids
  • --download=<filename> - Download all deviant statistics to file
  • --deviant=<int> - Show number of series with most deviant noise
  • --outliers=<int> - Show number of series with most outliers noise
  • --modal=<int> - Show number of series that seem bi-modal
  • --skewed=<int> - Show number of series where mean and median differ
  • --ok=<int> - Show number of series that are worst, but good enough
  • --noise=<int> - Show number of series with largest relative standard deviation
  • --extra=<int> - Show number of series where Perfherder has detected steps, but not MWU
  • --missing=<int> - Show number of series where Perfherder missed alerting

Post Analysis

The analysis.py fills a local Sqlite database (as per the config file). It can be used to lookup other series that may be of interest, or to feed yet-another-program.

Windows

You must download the scipy and numpy binary packages.

Branch Development and Updating

You can merge your patches with PRs into the dev branch. Merges into master happen on any version updates (reflecting a new package in PyPi).

To upload to PyPi you need to have access as a contributor to the moz-measure-noise package found here, as well as in the testpypi package found here.

If you want to upload a package to PyPi, first ensure the changes work locally using the Development instructions above. Increase the version number by adding a .dev to the end of the version. Next, ensure you have the build, and twine modules available:

python3 -m pip install build
python3 -m pip install twine

Now build your packages:

python3 -m build

And upload to the test server:

twine upload -r testpypi dist/*

Then, you should be able to install it with the following in a virtualenv (the --extra-index-url is required to pull in requirements from the main pypi server):

python3.8 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple moz-measure-noise==2.59.0.1

Test the package and ensure your changes are working.

To create a new package on the official page, increase the version number, commit it to the dev branch, and merge the dev branch into master:

# Increase version number in setup.py, and commit and push to dev
...

# Merge changes from dev
git checkout master
git merge dev
git push origin master

# Remove old/possibly bad archives
rm -rf build/*
rm -rf dist/*

# Build and upload
python3 -m build
twine upload dist/*

Then you'll be able to install your newest package with: python3.8 -m pip install moz-measure-noise