Граф коммитов

2086 Коммитов

Автор SHA1 Сообщение Дата
Georgia Kokkinou 889b54d429
Save full Firefox profile (#917)
* Save full Firefox profile

Save the whole Firefox profile directory instead of only saving a few of
its subcomponents. Remove an unused import of shutil from
profile_commands.py.

Additionally, remove the `extension_port.txt` file after reading the
port from it, to prevent reading stale port information when a browser
is restarted after a crash.

Finally, remove a part of the documentation that references the old way
of dumping the profile and update a leftover reference to the
`log_directory` config option.

Closes #62.

* Test saving full profile

Add a test that checks that attempting to save an incomplete profile
raises an error. Also, extend `test_saving` to check that a few basic
files and directories of the Firefox profile are present in the archived
profile.
2021-05-07 13:33:17 +02:00
Stefan Zabka dafc26cb56
Blocking on startup (#902)
* Refactored BrowserManager into class

* Completing refactor

* Improved type annotations

* Blocking on OPENWPM_STARTUP_SUCCESS.txt

* Cherrypicked changes from logging_extension_test

* Intermediate

* Implemented test

* Refactoring shutdown control flow

* Refactored status_queue check in close_browser_manager

* Updated log message

* Update openwpm/browser_manager.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Guarded property access

* Fixed timeout code

* Fixing up docstrings

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-05-03 12:57:56 +02:00
Stefan Zabka 9943a218ca
Fixing extension logging (#912)
* Combined log_directory and log_file to log_path

* Updated documentation

* Fixed tests

* Implemented test, need to change CSP

* Extension logging restored and tested

* Renamed extra to custom_params

* Reverting stackdump changes
2021-04-30 19:38:08 +02:00
Stefan Zabka 05e5dcb0a5
Combined log_directory and log_file to log_path (#911)
* Combined log_directory and log_file to log_path

* Updated documentation

* Fixed tests
2021-04-30 16:52:22 +02:00
Ayush Anand 18e12864ce
Added npm script info under scripts (#908) 2021-04-29 16:36:30 +02:00
Stefan Zabka 262e4f2847
Browser manager as class (#901)
* Refactored BrowserManager into class

* Completing refactor

* Improved type annotations
2021-04-28 14:45:36 +02:00
Georgia Kokkinou 3619b55682
Update profile tests (#893)
* Re-enable test_profile_saved_when_launch_crashes

Update `test_profile_saved_when_launch_crashes` so that it does not
depend on the no longer supported proxy to make browser restarts fail.
Instead, set the `FIREFOX_BINARY` environment variable to point to a
wrong path.

Also, fix a bug in `kill_browser_manager()`, which would cause OpenWPM
to crash because of a `psutil.NoSuchProcess` exception and without
archiving the browser profile, whenever a browser restart failed to
launch geckodriver.

Finally, make `kill_process_and_children()` use the timeout set via its
arguments, which it previously ignored.

* Update docstring of dump_profile

Add a note for callers that they should make sure Firefox is closed, in
order to prevent ending up with a corrupted database in the archived
profile.

* Update test_browser_profile_coverage

Remove the buggy and outdated for loop that determined whether a url is
expected to be missing from the places.sqlite database of the browser
profile, as we have not observed any missing urls when running this
test.
2021-04-28 13:04:10 +02:00
Stefan Zabka a159496e22
Fixed default directories in ManagerParams (#903)
Our default was to create a literal folder called ~ instead of writing to the homedir
2021-04-27 16:29:15 +03:00
Stefan Zabka cc10baebee
Warning against using v0.14.1 and v0.14.0 (#899) 2021-04-26 17:07:35 +02:00
Georgia Kokkinou 07d89ca7f4
Generate module index (#900)
* Fix documentation module index

Populate the module index by setting up Sphinx to automatically run
sphinx-apidoc for every build. Also, move readthedocs dependencies under
docs/ and make prune-environment.py automatically generate the
environment-rtd.yaml file whenever we run repin.sh.

* Fix black and mypy errors
2021-04-26 17:05:30 +02:00
Stefan Zabka 6efeade359
Updated NPM dependencies (#888) 2021-04-20 15:30:42 +02:00
Stefan Zabka 3ed53b5429
Created environment-rtd.yaml (#894)
* Created environment-rtd.yaml

* Removed all non-sphinx dependencies
2021-04-20 16:10:22 +03:00
Stefan Zabka 625d81460b
Introducing Sphinx (#863)
We can now generate documentation to a variety of display formats including HTML by using sphinx.
With this new infrastructure we are now also able to generate documentation on readthedocs.io.

Co-authored-by: jhabarsingh <jhabarsinghbhati23@gmail.com>
Co-authored-by: Cyrus <cyruskarsan@gmail.com>
Co-authored-by: cyruskarsan <55566678+cyruskarsan@users.noreply.github.com>
Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
Co-authored-by: ankushduacodes <61025943+ankushduacodes@users.noreply.github.com>
Co-authored-by: Mollie Bakal <bakalm@umich.edu>
Co-authored-by: MollieBakal <molliebakal@gmail.com>
Co-authored-by: jhabarsingh <43932986+jhabarsingh@users.noreply.github.com>
Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-04-15 12:04:55 +02:00
Stefan Zabka e72ed2339c
Removing localtest.me (#886)
* Removing localtest.me

As it has been highly unreliable when running
local tests (returning DnsNotFound errors)

* Fixing tests

* Switched to localhost

* Localtest.me to localhost
2021-04-13 17:39:03 +02:00
Stefan Zabka e20fc6a29e
Moved _issue_command to BrowserManagerHandle (#882)
* Renamed Browser to BrowserManagerHandler

* Renamed TaskManager._issue_command to BrowserManagerHandle.execute_command_sequence

* Fixing stuff

* Apply suggestions from code review

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* tm to task_manager

* Found and renamed only mention of  in the docs

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-04-12 16:22:59 +03:00
Georgia Kokkinou b8f6262cb2
Clean up some unused files (#881)
* Remove some unused files

* Fix minor typos
2021-04-06 14:26:49 +02:00
Stefan Zabka 8030c2c470
Remove python to js string (#879)
Introduced `cleaned_js_instrument_settings` in BrowserParamsInternal to hold the expanded config dict.
Propagating the `js_instrument_settings` through the extension as an object for as long as possible.
2021-03-30 14:22:02 +02:00
Steven Englehardt bfc4644a71
Merge pull request #864 from boolean5/restore-stateful-crawls
Restore stateful crawling support
2021-03-29 11:34:05 -04:00
Stefan Zabka 358c8a7337
Restoring Docker build (#871)
GitHub Action based on https://github.com/usha-mandya/SimpleWhaleDemo/blob/master/.github/workflows/github_registry.yml

Everything on the master branch gets published as `:latest` and version tags (pattern `vx.y.z`) get published as `:x.y.z`
2021-03-29 14:08:05 +02:00
Stefan Zabka 3241482608
Adjusted check for debug mode in loggingdb.js (#878)
This way the default config in index.js and the check in loggingdb.js
match up
2021-03-29 12:30:39 +03:00
Stefan Zabka 9a8dea34d9
Hotfixed GetCommand (#875)
webdriver.switch_to.alert unlike most other variants of the switch_to API is not a function but a property.
This led to TypeError:'Alert' object is not callable when there was actually an Alert to switch to.
This PR fixes that behaviour.
2021-03-23 22:38:09 +01:00
Georgia Kokkinou 37271ba62d
Remove unnecessary import 2021-03-22 19:22:17 +02:00
Georgia Kokkinou 9a21d86e8f
Simplify PatchedGeckoDriverService class
Make `PatchedGeckoDriverService` class subclass
selenium.webdriver.firefox.service.Service instead of
selenium.webdriver.common.service.Service, so that we only have to keep
track of the changes in the `__init__()` method of the former class.
2021-03-22 15:57:01 +02:00
Georgia Kokkinou 49965db164
Use local test server in profile tests 2021-03-22 12:54:24 +02:00
Georgia Kokkinou 06b83596ab
Do not copy tar before extracting in load_profile 2021-03-21 23:59:51 +02:00
Georgia Kokkinou e536c630cc
Add comment for Marionette port race condition 2021-03-21 23:59:51 +02:00
Georgia Kokkinou 9e8298c455
Fix test_browser_profile_coverage
Use the public suffix + 1 instead of the public suffix when comparing
the domains in the crawl database with those in the profile history.
Also, update an incorrectly formed query to the crawl database.
2021-03-21 23:59:51 +02:00
Georgia Kokkinou c51f9e56bf
Fix minor typos 2021-03-21 23:59:18 +02:00
Georgia Kokkinou a19b12478b
Remove `reset=True` from tests 2021-03-19 21:14:51 +02:00
Georgia Kokkinou 3b4219d0f9
Add some type annotations 2021-03-19 21:09:20 +02:00
Steven Englehardt ab01a2f6bd
Remove commented out code; monitor speculative connections (#872) 2021-03-19 11:50:21 +01:00
Georgia Kokkinou 1e16513370
Improve profile dumping logic
Move the core implementation of profile dumping into a `dump_profile`
function, which can be used both internally when closing or restarting a
crashed browser and from the `execute()` method of `DumpProfileCommand`.
Also, make compression the default in `DumpProfileCommand`. Finally, do
not compress the tar archive of the crashed browser's profile when
restarting from a crash. We should avoid the extra compression/
decompression step as this is a short-lived tar file.
2021-03-16 17:10:50 +02:00
Stefan Zabka b6849c71f5
Release preparations (#866)
* Release preparations

* Apply suggestions from code review

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Bumped typedoc version number

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
2021-03-16 15:23:05 +01:00
Stefan Zabka 7624476b3b
Updated wording for profiles (#865)
* Updated wording for profiles

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-03-16 14:22:53 +01:00
Georgia Kokkinou 3f7efc2490
Skip test_browser_profile_coverage locally 2021-03-15 16:01:47 +02:00
Georgia Kokkinou 9ea8e8a051
Rename temp dir of crashed browser's profile tar 2021-03-15 15:37:12 +02:00
Georgia Kokkinou aa1de922c9
Add reminder to update geckodriver prefs 2021-03-15 15:15:28 +02:00
Georgia Kokkinou f5bacaed84
Update manual_test.py
Running manual_test.py resulted in an error because the `xpi()` fixture
was called directly. Apply the fix suggested in
https://docs.pytest.org/en/stable/deprecations.html#calling-fixtures-directly
Also, use a custom profile instead of `FirefoxProfile` and update some
docstrings.
2021-03-15 13:44:36 +02:00
Georgia Kokkinou 1d3de72292
Reference our own issue instead of geckodriver's 2021-03-15 12:23:06 +02:00
Georgia Kokkinou d2aff836f4
Remove unused status string "Proxy Ready" 2021-03-15 11:09:44 +02:00
Georgia Kokkinou 403185a38a
Simplify profile location handling
1. In `deploy_firefox` do not use `driver.capabilities["moz:profile"]`
to get the profile location. Custom profiles, unlike profiles created
via `FirefoxProfile`, are used in-place, so we already know the
location.

2. In `launch_browser_manager`, `spawned_profile_path` and
`driver_profile_path` point to the same location now that we are using a
custom profile. Replace them with a single `browser_profile_path`
variable.

3. Rename `prof_folder` and `browser_profile_folder` to
`browser_profile_path` for consistency.

4. Improve naming of the temporary Firefox profile.
2021-03-15 11:03:10 +02:00
Georgia Kokkinou 2237822eab
Do not intercept profile location from logs 2021-03-08 09:58:48 +02:00
Georgia Kokkinou a355dc840d
Create user.js manually in custom profile
Geckodriver has a bug that makes it write the browser preferences we
set, as well as its own default browser preferences, to a user.js file
in the wrong profile directory when using a custom profile:
https://github.com/mozilla/geckodriver/issues/1844. As a temporary
workaround until this issue gets fixed, we create the user.js file
ourselves. In order to do this, we keep a copy of geckodriver's default
preferences in our code.

Closes #423
2021-03-03 13:16:15 +02:00
Georgia Kokkinou 7f51e50f44
Pass service_args to geckodriver
Fix a bug in PatchedGeckoDriverService that caused geckodriver not to
receive the service_args passed when starting the browser.
PatchedGeckoDriverService is a modified version of Selenium's Service
class and this bug has been fixed in the original version.
2021-03-03 13:16:15 +02:00
Georgia Kokkinou 51c0849cbd
Use custom browser profile
Use a custom profile by setting it as an argument via the Options class,
instead of using the FirefoxProfile class. This way geckodriver does not
delete it when crashing or closing. Also, remove some unused arguments
from the function that configures privacy settings in Firefox. Finally,
remove the code that clears driver.profile before calling driver.quit(),
as driver.profile is always None when using a custom profile.
2021-03-03 13:15:56 +02:00
Georgia Kokkinou daa6dba4e3
Enable stateful crawling and tests
Reenable stateful crawling and profile tests. Also, update the docs now
that stateful crawling is supported. Currently, stateful crawling is
broken, as geckodriver deletes the browser profile when closing or
crashing before we can archive it.
2021-03-01 17:18:00 +02:00
Stefan Zabka cb95ecc05f
Now handling all constraint notations in unpinned enviroment.yamls (#860)
* Now handling all constraint notations in unpinned enviroment.yamls

* Addressing review comments
2021-02-23 11:36:40 +01:00
vringar 9aaa3c0b87 Removed test_gcp.py 2021-02-22 18:38:05 +01:00
Stefan Zabka b29c3f4052
Data Aggregator Rewrite (#753)
* First steps in the rewrite

* Fixed import paths

* One giant refactor

* Fixing tests

* Adding mypy

* Removed mypy from pre-commit workflow

* First draft on DataAggregator

* Wrote a DataAggregator that starts and shuts down

* Created tests and added more empty types

* Got demo.py working

* Created sql_provider

* Cleaned up imports in TaskManager

* Added async

* Fixed minor bugs

* First steps at porting arrow

* Introduced TableName and different Task handling

* Added more failing tests

* First first completes others don't

* It works

* Started working on arrow_provider

* Implemented ArrowProvider

* Added logger fixture

* Fixed test_storage_controller

* Fixing OpenWPMTest.visit()

* Moved test/storage_providers to test/storage

* Fixing up tests

* Moved automation to openwpm

* Readded datadir to .gitignore

* Ran repin.sh

* Fixed formatting

* Let's see if this works

* Fixed imports

* Got arrow_memory_provider working

* Starting to rewrite tests

* Setting up fixtures

* Attempting to fix all the tests

* Still fixing tests

* Broken content saving

* Added node

* Fixed screenshot tests

* Fixing more tests

* Fixed tests

* Implemented local_storage.py

* Cleaned up flush_cache

* Fixing more tests

* Wrote test for LocalArrowProvider

* Introduced tests for local_storage_provider.py

* Asserting test dir is empty

* Creating subfolder for different aggregators

* New depencies and init()

* Everything is terribly broken

* Figured out finalize_visit_id

* Running two event loops kinda works???

* Rearming the event

* Introduced mypy

* Downgraded black in pre-commit

* Modifying the database directly

* Fixed formatting

* Made mypy a lil stricter

* Fixing docs and config printing

* Realising I've been using the wrong with

* Trying to figure arrow_storage

* Moving lock initialization in in_memory_storage

* Fixing tests

* Fixing up tests and adding more typechecking

* Fixed num_browsers in test_cache_hits_recorded

* Parametrized unstructured

* String fix

* Added failing test

* New test

* Review changes with Steven

* Fixed repin.sh and test_arrow_cache

* Minor change

* Fixed prune-environment.py

* Removing references to DataAggregator

* Fixed test_seed_persistance

* More paths

* Fixed test display shutdown

* Made cache test more robust

* Update crawler.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Slimming down ManagerParams

* Fixing more tests

* Update test/storage/test_storage_controller.py

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Purging references to DataAggregator

* Reverted changes to .travis.yml

* Demo.py saves locally again

* Readjusting test paths

* Expanded comment on initialize to reference #846

* Made token optional in finalize_visit_id

* Simplified test paramtetrization

* Fixed callback semantics change

* Removed test_parse_http_stack_trace_str

* Added DataSocket

* WIP need to fix path encoding

* Fixed path encoding

* Added task and crawl to schema

* Fixed paths in GitHub actions

* Refactored completion handling

* Fix tests

* Trying to fix tests on CI

* Removed redundant setting of tag

* Removing references to S3

* Purging more DataAggregator references

* Craking up logging to figure out test failure

* Moved test_values into a fixture

* Fixing GcpUnstructuredProvider

* Fixed paths for future crawls

* Renamed sqllite to official sqlite

* Restored demo.py

* Update openwpm/commands/profile_commands.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Restored previous behaviour of DumpProfileCommand

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed leftovers

* Cleaned up comments

* Expanded lock check

* Fixed more stuff

* More comment updates

* Update openwpm/socket_interface.py

Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>

* Removed outdated comment

* Using config_encoder

* Renamed tar_location to tar_path

* Removed references to database_name in docs

* Cleanup

* Moved screenshot_path and source_dump_path to ManagerParamsInternal

* Fixed imports

* Fixing up comments

* Fixing up comments

* More docs

* updated dependencies

* Fixed test_task_manager

* Reupgraded to python 3.9.1

* Restoring crawl_reference in mp_logger

* Removed unused imports

* Apply suggestions from code review

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>

* Cleaned up socket handling

* Fixed TaskManager.__exit__

* Moved validation code into config.py

* Removed comment

* Removed comment

* Removed comment

Co-authored-by: Steven Englehardt <senglehardt@mozilla.com>
Co-authored-by: Georgia Kokkinou <geor5ko@gmail.com>
2021-02-22 17:51:32 +01:00
Stefan Zabka a81d80ad59
Pin python to 3.8.6 (#859)
* Update npm

* Repin

* Make sure python makes it into pinned environment

Co-authored-by: Sarah Bird <birdsarah@mozilla.com>
2021-02-16 18:34:21 +01:00