Граф коммитов

19 Коммитов

Автор SHA1 Сообщение Дата
englehardt 451c3babf9 Allow stateful crawling, but provide a warning about profile loss. 2019-07-01 18:11:36 -07:00
Fredrik Wollsén fe11c53620 Enable navigation and js instruments for demo crawl 2019-06-27 16:44:54 +03:00
englehardt 01f2fba875 Remove dump_profile_cookies command.
This command is no longer necessary with the new instrumentation. It was
broken by the latest Firefox upgrade, so it no longer makes sense to
keep it around.
2019-06-12 08:07:55 -07:00
englehardt 78e6b3fb04 Fix isort failures 2019-04-16 10:47:17 -07:00
Nihanth Subramanya 16ee6b52f7 Update macos install script 2019-04-08 16:38:56 +02:00
englehardt e3cef0c65b Fixing isort issues, reclassifying six 2018-08-15 10:30:21 -04:00
Stephen Donner 99984e538e First big round of flake8 + isort fixes 2018-07-31 23:48:06 -07:00
englehardt 9dd05ecc83 PEP8 Fixes 2017-10-04 15:39:35 -04:00
Zack Weinberg 1c5d9356c0 Apply python-modernize + some hand tidy-ups.
This should get us 90% of the way to Python 3 support.
2017-03-09 11:00:54 -05:00
englehardt 5d86590149 Making extension-based HTTP instrumentation default and deprecating
proxy instrumentation.

The naming of sql tables and browser params have been updated to reflect
that the extension HTTP instrumentation is preferred to the proxy. A few other
notable changes:
(1) Extension HTTP instrumentation is preferred, but still off-by-default
(2) The proxy is now off-by-default and shouldn't be used.
(3) browser_params['save_javascript'] uses the extension, proxy-based
    javascript saving is controlled with browser_params['save_javascript_proxy']
(4) The "post processing pipeline" (which was only used to parse HTTP
    cookies) has been removed and the TaskManager::close API updated.
2016-12-05 12:32:55 -05:00
dreisman c5fae6a49d Added some comments to emphasize that some commands will close the current tab 2016-10-27 10:56:03 -04:00
dreisman 4cc559fd8e Added test of simple commands and table integrity, bug fixes 2016-05-04 11:46:15 -04:00
dreisman 4b107d17c5 Fixed tests for command_sequence compatibility 2016-05-03 16:54:42 -04:00
dreisman c77b5c7109 Merged old command sequences pull request to be more modern 2016-05-03 12:21:26 -04:00
englehardt dffffe606a Adding optional sleep argument to get command. Close #59 2016-04-07 13:18:57 -04:00
shivamagarwal-iitb 150b3862c3 Added file to sequence commands in one batch for one top url site visit
Added file to sequence commands in one batch for one top url site visit

Added support for command sequence in task manager

Replacing top_url with visit id for http_request, http_response, flash_cookies and profile_cookies tables

Changed task manager to execute a command sequence

Task manager updated to rectify Reading single entry from the table

Changed iteration for commands of command sequence

Add visit id in the command sequence instead of browser manager

Added visit id to proxy, command executor and browser manager

Added file to sequence commands in one batch for one top url site visit

Added support for command sequence in task manager

Replacing top_url with visit id for http_request, http_response, flash_cookies and profile_cookies tables

Changed task manager to execute a command sequence

Task manager updated to rectify Reading single entry from the table

Add visit id in the command sequence instead of browser manager

Added visit id to proxy, command executor and browser manager

Added visit id in extension

Changes task manager to close after completing the command sequence

Fixed small changes
2015-12-03 20:30:03 -05:00
englehardt d8be257f0c Correcting some mistakes in the comments 2015-10-25 12:28:33 -07:00
englehardt 407a03c72a Adding in a manager_params dictionary
The goal of this change is to provide a better way of passing
per-crawl parameters. Since the browser_params dictionary is meant for
per-browser settings, adding in any additional parameters (like
logger_address) would require adding the address to all copies of the
dictionary. With manager_params, a single additional dictionary is
passed around to hold the crawl-wide configuration settings. This
includes things like the location of the crawl database and the log
file.
2015-09-14 11:05:50 -04:00
englehardt c1449783de 0.1.1 release 2014-07-01 12:37:17 -04:00