proxy instrumentation.
The naming of sql tables and browser params have been updated to reflect
that the extension HTTP instrumentation is preferred to the proxy. A few other
notable changes:
(1) Extension HTTP instrumentation is preferred, but still off-by-default
(2) The proxy is now off-by-default and shouldn't be used.
(3) browser_params['save_javascript'] uses the extension, proxy-based
javascript saving is controlled with browser_params['save_javascript_proxy']
(4) The "post processing pipeline" (which was only used to parse HTTP
cookies) has been removed and the TaskManager::close API updated.
Added file to sequence commands in one batch for one top url site visit
Added support for command sequence in task manager
Replacing top_url with visit id for http_request, http_response, flash_cookies and profile_cookies tables
Changed task manager to execute a command sequence
Task manager updated to rectify Reading single entry from the table
Changed iteration for commands of command sequence
Add visit id in the command sequence instead of browser manager
Added visit id to proxy, command executor and browser manager
Added file to sequence commands in one batch for one top url site visit
Added support for command sequence in task manager
Replacing top_url with visit id for http_request, http_response, flash_cookies and profile_cookies tables
Changed task manager to execute a command sequence
Task manager updated to rectify Reading single entry from the table
Add visit id in the command sequence instead of browser manager
Added visit id to proxy, command executor and browser manager
Added visit id in extension
Changes task manager to close after completing the command sequence
Fixed small changes
The goal of this change is to provide a better way of passing
per-crawl parameters. Since the browser_params dictionary is meant for
per-browser settings, adding in any additional parameters (like
logger_address) would require adding the address to all copies of the
dictionary. With manager_params, a single additional dictionary is
passed around to hold the crawl-wide configuration settings. This
includes things like the location of the crawl database and the log
file.