python_mozaggregator/script/validation
Anthony Miyaguchi 9bc450318e Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
..
README.md Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
entrypoint.sh Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
fetch_credentials.sh Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
fetch_stats.py Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
results.png Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
validate.py Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
validate_data_ref.py Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00
validate_data_test.py Add script for validating a copy of the database 2020-03-05 11:11:42 -08:00

README.md

Validation against copies of the mozaggregator database

These scripts run validation to determine whether two databases contain the same aggregate data for a day. This requires boto3 to be configured correctly on the host machine.

script/validation/entrypoint.sh <reference host> <test host> <date in YYYYMMDD>

Results

Error is the percentage difference between three statistics:

  • Number of aggregate rows for the reference date
  • Sum of all histogram sums for the reference date
  • Sum of all histogram counts for the reference date
  submission_date     test      ref  err
0        20200229  2936771  2936771  0.0
1        20200301  2826848  2826848  0.0

              sum_test              sum_ref sum_err
0  2626701384304806794  2626701384304806794       0
1  2015714821244959696  2015714821244959696       0

   count_test   count_ref count_err
0  4192687908  4192687908         0
1  3461041798  3461041798         0

The plot shows the GC_MS histogram using the reference date and prior date across both databases.

results