- reran on our DataBricks cluster
- added compatibility code to run from Python 2
- added some description
- print out each count as they are computed
- cleaned up final print statements
- reran so as to capture outputs
- added compatibility code to run from Python 2
- refactored analysis to create a cached base dataset at the start and run analyses against that
- load session replay sites from file rather than Spark temp table
- refactored analysis to drop (page, script) duplicates at the top level
- restructured the top summary section and updated all the numbers
- refactored the suffix analysis to use a single joined DF
Script urls were not being counted correctly so a number of analysis
results were incorrect. This fixes those errors.
Also adds an analysis that looks at the % of function calls that are
created using eval in the sample to compare with the spark results.
- How many function calls are created using eval.
- How many web pages use eval.
- How many scripts with function created using eval are hosted on
different domains than the web page that uses them.
The previous analysis of eval usage was lacking an initial preamble and
problem justification, only looked at 1 sample, and was comparing
website domains instead of comparing what company controlled the domains. This change fixes those issues and makes a number of other improvements including:
- Adding a sumamry of results in the analysis introduction section to
make the analysis easier to read.
- Giving justification for why each analysis is being performed.
- Giving an interpretation of the results so that their significance can be
more easily understood.