There were some more bugs as a result of the boto
to boto3 migration
that weren't covered by existing tests. Now they
are fixed, and covered.
Hopefully I got everything this time.
Closes#2805 from ashb/AIRFLOW-1839-s3
-hook_loadsa-tests
In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.
In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.
Closes#2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes
Set the correct fields to enable the visualisation
of the rendering
of the Druid indexing spec. Add some tests to make
sure that the
templating is working :-)
Closes#2783 from Fokko/AIRFLOW-1811-fix-druid-
operator
Allow users to pass in Slack token through
connection which can provide better security. This
enables user to expose token only to workers
instead to both workers and schedulers.
Closes#2789 from
yrqls21/add_conn_supp_in_slack_op
Adds a postgres_to_gcs operator to contrib so that a user can copy a
dump from postgres to google cloud storage. Tests write to local
NamedTemporayFiles so we correctly test serializing encoded ndjson in
both python3 and python2.7.
Uses `__future__.unicode_literals` and replaces calling `json.dumps`
with `json.dump` followed by `tmp_file_handle.write` to write json lines
to the ndjson file. When using python3, `json.dump` will return a
unicode string instead of a byte string, therefore we encode the unicode
string to `utf-8` which is compatible with bigquery (see:
https://cloud.google.com/bigquery/docs/loading-data#loading_encoded_data).
Since S3Hook is reimplemented based on the AwsHook
using boto3, its package dependencies need to be
updated as well.
Closes#2790 from m1racoli/fix-setup-s3
The SSH Operator will throw an empty "SSH operator
error" when running
commands that do not immediately log something to
the terminal. This is
due to a call to stdout.channel.recv when the
channel currently has a
0-size buffer, either because the command has not
yet logged anything,
or never will (e.g. sleep 5)
Make code PEP8 compliant
Closes#2785 from RJKeevil/fix-ssh-operator-no-
terminal-output
Execution dates can contain special characters
that
need to be url encoded. In case of timezone
information
this information is lost if not url encoded.
Closes#2779 from bolkedebruin/AIRFLOW-1801
The change from boto2 to boto3 in S3Hook caused
this to break (the
return type of `hook.get_key()` changed. There's a
better method
designed for that we should use anyway.
This wasn't caught by the tests as the mocks
weren't updated. Rather
than mocking the return of the hook I have changed
it to use "moto"
(already in use elsewhere in the tests) to mock at
the S3 layer, not
our hook.
Closes#2773 from ashb/AIRFLOW-1756-s3-logging-
boto3-fix
With the switch to Boto3 we now need the content
to be bytes, not a
string. On Python2 there is no difference, but for
Python3 this matters.
And since there were no real tests covering the
S3Hook I've added some
basic ones.
Closes#2771 from ashb/AIRFLOW-1797
python-daemon declares its docutils dependency in a setup_requires
clause, and 'python setup.py install' fails since it misses
that dependency.
Closes#2765 from wrp/docutils
The DruidOperator allows you to template the
intervals field which is
important when you are doing backfills with Druid.
This field was
missing in the constructor and Airflow threw a
warning
Closes#2764 from Fokko/patch-1
Logging functionality for SSHOperator was added in
[AIRFLOW-1712] but it
only logged stdout.
This commit also logs stderr to log.warning
Closes#2761 from OpringaoDoTurno/stderr_in_ssh
Fixes Batch clear in Task Instances view is not working
for task instances in RUNNING state and all batch
operations in Task instances view cannot work when
manually triggered task instances are selected
because they have a different execution date
format.
Closes#2759 from yrqls21/fix-ti-batch-clear-n
-set-state-bugs
Fix long task output lines with unicode from
hanging parent process. Tasks that create output
that gets piped into a file in the parent airflow
process would hang if they had long lines with
unicode characters.
Closes#2758 from aoen/ddavydov--
fix_unicode_output_string
This commit adopts the `provide_session` helper in
almost the entire
codebase. This ensures session are handled and
closed consistently.
In particular, this ensures we don't forget to
close and thus leak
database connections.
As an additional change, the `provide_session`
helper has been extended
to also rollback and close created connections
under error conditions.
As an additional helper, this commit also
introduces a contextmanager
that provides the same functionality as the
`provide_session`
decorator. This is helpful in cases where the
scope of a session should
be smaller than the entire method where it is
being used.
Closes#2739 from StephanErb/session_close
Make use of paramiko's set_keepalive method to
send keepalive packets every
keepalive_interval seconds. This will prevent
long running queries with no terminal
output from being termanated as idle, for example
by an intermediate NAT.
Set on by default with a 30 second interval.
Closes#2749 from RJKeevil/add-sshhook-keepalive
https://github.com/spulec/moto/pull/1048 introduced `docker` as a
dependency in Moto, causing a conflict as Airflow uses `docker-py`. As
both packages don't work together, Moto is pinned to the version
prior to that change.
Some docstrings were missing spaces, causing them
to render strangely
in documentation. This corrects the issue by
adding in the spaces.
Closes#2667 from cjonesy/master
The new logging framework was not properly
capturing stdout/stderr
output. Redirection the the correct logging
facility is required.
Closes#2745 from bolkedebruin/redirect_std
Previously the experimental API was either wide-
open only (allow any
request) or secured behind Kerberos. This adds a
third option of
deny-all.
Closes#2737 from ashb/exp-api-securable
The web interface should not use the experimental
api as the
authentication options differ between the two.
Additionally, rather than
having an API call to get the last run data we can
easily include it in
the generated HMTL response. One less round-trip,
less endpoints, and
less time before the page has fully rendered.
This is based original off @NielsZeilemaker's PR
for the same Jira
issue (#2734)
Closes#2738 from ashb/no-exp-api-from-web-
interface
Until now, the dga processor had its own logging
implementation,
making it hard to adjust for certain use cases
like working
in a container.
This patch moves everything to the standard
logging framework.
Closes#2728 from bolkedebruin/AIRFLOW-1018
Adds RedshiftHook class, allowing for management
of AWS Redshift
clusters and snapshots using boto3 library. Also
adds new test file and
unit tests for class methods.
Closes#2717 from andyxhadji/1695
Certain schemas for group membership return a
string
instead of a list. Instead of using a check we now
use the entries API from ldap3.
Closes#2731 from bolkedebruin/AIRFLOW-1711
add 'exclude-packages' and 'repositories' as
options
to SparkSubmitOperator as they were missing
Closes#2725 from kretes/AIRFLOW-1757-spark-new-
options
Before initializing the logging framework, we want
to set the python
path so the logging config can be found.
Closes#2721 from Fokko/AIRFLOW-1731-import-
pythonpath