1
0
Форкнуть 0
Граф коммитов

31 Коммитов

Автор SHA1 Сообщение Дата
Dave Zeber 12f0f2696c Cleaned up analysis dir with project folders 2018-06-05 18:39:12 -05:00
David Zeber d0a5e3c36e
Merge pull request #13 from mozilla/notebook-refactor
Notebook refactor
2018-06-05 18:30:47 -05:00
mlopatka 9a5e973210
Update schema.md 2018-05-15 15:12:36 +02:00
mlopatka a968657861
Update schema.md 2018-05-15 15:11:41 +02:00
mlopatka 085243fd29
Update schema.md 2018-05-15 15:05:31 +02:00
mlopatka 93a8a9de26
Update schema.md 2018-05-15 14:26:41 +02:00
Dave Zeber b1e1c28d2e Fixed typo in replay notebook 2018-05-11 13:17:42 -05:00
Dave Zeber 2b9f1331a2 Removed .py file for cryptojacking analysis 2018-05-10 17:39:23 -05:00
Dave Zeber a1137d5054 Minor updates to evercookie notebook
- added compatibility code to run from Python 2
- updated code for running on our DB cluster
2018-05-10 17:33:34 -05:00
Dave Zeber 40425fafe0 Minor updates to Spark eval analysis notebook:
- reran on our DataBricks cluster
- added compatibility code to run from Python 2
- added some description
- print out each count as they are computed
- cleaned up final print statements
2018-05-10 17:03:58 -05:00
Dave Zeber cc8e0b6fed Updates to session replay notebook:
- reran so as to capture outputs
- added compatibility code to run from Python 2
- refactored analysis to create a cached base dataset at the start and run analyses against that
- load session replay sites from file rather than Spark temp table
- refactored analysis to drop (page, script) duplicates at the top level
- restructured the top summary section and updated all the numbers
- refactored the suffix analysis to use a single joined DF
2018-05-10 16:28:18 -05:00
Dave Zeber 05a6bba792 Minor cleanup of the cryptojacking notebook 2018-05-09 14:14:18 -05:00
mlopatka 658b3ecfbc
Merge pull request #10 from Tyler-R/FixEvalUsageAnalysis
Fix miscounting of scripts URLs.
2018-04-22 14:44:23 +02:00
Dave Zeber e42d285e8d Updated cryptojacking notebook:
- added some compatibility code for running in Python 2
- added rendered HTML showing cell outputs
2018-04-21 12:52:34 +02:00
Tyler-R 63b1de6ea0 Fix miscounting of scripts URLs and corfirm spark analysis.
Script urls were not being counted correctly so a number of analysis
results were incorrect.  This fixes those errors.

Also adds an analysis that looks at the % of function calls that are
created using eval in the sample to compare with the spark results.
2018-04-20 17:40:00 -07:00
mlopatka 02b434249c
Merge pull request #9 from koosoku/session-replay
Modify the session replay analysis to use spark and databricks
2018-04-19 18:31:50 +02:00
Kyle Kung efb71a9ef0 Modify the session replay analysis to use spark and databricks 2018-04-19 11:56:26 -04:00
mlopatka 9e23f8582b
Merge pull request #4 from Alexander1994/cookie_data_cleanup
Cleaned up Extract cookie notebook and added scraper to pull cookie info with a simple cache
2018-04-18 13:59:07 +02:00
mlopatka 7afbf81970
Merge pull request #6 from Alexander1994/databricks_evercookie_search
Databricks evercookie search script added
2018-04-18 13:58:39 +02:00
David Zeber 6812ac3531
Merge pull request #7 from Tyler-R/EvalAnalysisSpark
Add analysis of eval usage for the whole data set, using Spark.
2018-04-18 13:56:28 +02:00
David Zeber a1d6d863e3
Merge pull request #5 from vvnjin/cryptojacking
Add finished cryptojacking analysis notebook with markdown and source…
2018-04-18 13:56:00 +02:00
Tyler-R 55423a5b36 Add analysis of eval usage, using Spark, that examines:
- How many function calls are created using eval.
- How many web pages use eval.
- How many scripts with function created using eval are hosted on
different domains than the web page that uses them.
2018-04-13 22:13:50 -07:00
Vivian Jin de957ea139 Fix typo in notebook. 2018-04-13 15:08:12 -07:00
Alex 4e1f491c6c data bricks evercookie search script added 2018-04-13 11:28:27 -03:00
Vivian Jin c6e067a5c0 Add finished cryptojacking analysis notebook with markdown and source files. 2018-04-12 17:09:00 -07:00
Alex 85946e745d cookie data analysis cleaned up and md added 2018-04-12 11:09:35 -03:00
mlopatka b28eb8c07a
Merge pull request #3 from Tyler-R/EvalMultipleSampling
Update the analysis of eval usage.
2018-04-06 14:12:09 +02:00
Tyler-R a7f96446cf Update the analysis of eval usage.
The previous analysis of eval usage was lacking an initial preamble and
problem justification, only looked at 1 sample, and was comparing
website domains instead of comparing what company controlled the domains. This change fixes those issues and makes a number of other improvements including:

- Adding a sumamry of results in the analysis introduction section to
make the analysis easier to read.
- Giving justification for why each analysis is being performed.
- Giving an interpretation of the results so that their significance can be
more easily understood.
2018-04-04 00:49:48 -07:00
mlopatka 08350e069a Update to most recent state of old repo. 2018-03-26 19:42:10 +02:00
mlopatka 81e5cfba55 UCOSP project fresh/historyless commit. 2018-03-26 11:15:13 +02:00
mlopatka 3cc44edbe6
Initial commit 2018-03-26 11:07:55 +02:00