- Bug fix: if the map cache .rda file can't be written, log an error rather
than stopping analysis
- bin/test.sh: for decode-dist and decode-assoc smoke tests, write the testdata
in a layout that is more easily exported
- add DEP_PYTHON and DEP_FAST_EM environment variables
- scripts/g3export: script to export analysis code and generated testdata
- test.sh: Change to h=1 to trigger the bug more reliably
- move validation of inputs to Decode(), instead of decode_dist.R, so we
get it whenever we call Decode()
- Check dimensions in CreateAssocStringMap, for a better error message
- Require 'params' when calling ReadMapFile/LoadMapFile
- Log a message when entries are removed from the map
Fixed the boolean decode pipeline to avoid using the Boolean map; this was
causing a few errors in the association pipeline when TRUE values were close
to zero.
It can be selected by passing --em-executable as fast_em.sh (wrapper for
Python) instead of the binary compiled from fast_em.cc.
See comments at the top of fast_em.py.
- association_test.R: test all three implementations
- bin/test.sh
- Add demo for TensorFlow
- Add demo of early convergence
- Put the output from each implementation in its own directory.
- fast_em.R: Test exit code when shelling out to --em-executable
- Write number of EM iterations in the assoc-metrics.json output (all
implementations). This changes the protocol between the R driver and EM
implementation.
- Clean up the log output from fast_em.cc
Code cleanup.
- Added bin/decode-assoc command line tool for automation. Define and
read a new schema so we know whether variables are string or boolean,
and what their encoding parameters are.
- bin/test.sh: Add tests for bin/decode-assoc, e.g. for the (string x
boolean) problem
- Add Boolean RAPPOR encoding (no hashing step) to the Python client
with Encoder.encode_bits
- Add association testdata generation to rappor_sim.py
- Optimizations
- In association.R, remove from GetCondProb() any computation that
doesn't depend on the report. (e.g. 100x-1000x speedup for the
joint conditional stage)
- Use mclapply in R for the steps that did lapply() over all reports
(N). The number of cores is controlled by decode-assoc --num-cores.
- Provide an alternative implementation of EM in C++ in
analysis/cpp/fast_em.cc (analysis/R/fast_em.R is a wrapper that does
serialization and a system() call) ~100x speedup.
- Fixed bugs in the k=1 case, due to R matrix/vector confusion (adapted
from Ananth's changes)
- Allow different variables in association analysis to have different
parameters (adapted from Ananth's changes, but tests still pass)
- Add hacky support for boolean vars (adapted from Ananth's changes)
- Code cleanup in association.R. Rename variables to be more clear.
- Minor refactoring of decode_dist.R to resemble decode_assoc.R