Add AWS Redshift Cluster Sensor to contrib, along
with corresponding
unit tests. Additionally, updated Redshift Hook
cluster_status method to
better handle cluster_not_found exception, added
unit tests, and
corrected linting errors.
Closes#2849 from andyxhadji/AIRFLOW-1888
s3_endpoint_url is a legacy name from when AwsHook
was only used to
connect to S3. The endpoint_url is more general
and what is effectively
used elsewhere for this piece of information.
Closes#2848 from villasv/AIRFLOW-1887
Rather than having try_number+1 in various places,
try_number
will now automatically contain the right value for
when the TI
will next be run, and handle the case where
try_number is
accessed when the task is currently running.
This showed up as a bug where the logs from
running operators would
show up in the next log file (2.log for the first
try)
Closes#2832 from ashb/AIRFLOW-1873-task-operator-
log-try-number
Previously logging was setup outside a
TaskInstance,
this puts everything inside. Also propery closes
the logging.
Closes#2837 from bolkedebruin/AIRFLOW-1879
Previously operators logged under
airflow.operators or
airflow.contrib.operators. This unifies them under
airflow.task.operators allowing the task log to
pick
them up and not have 'double' logging.
Closes#2838 from bolkedebruin/AIRFLOW-1881
Previously setting the context was not propagated
to the parent
loggers. Unfortnately, in case of a non explicitly
defined logger
the returned logger is shallow, ie. it does not
have handlers
defined. So to set the context it is required to
walk the tree.
Closes#2831 from bolkedebruin/fix_logging
The cmd is first copied before the password is
masked. This ensures
that the orignal cmd isn't changed. Replacing the
password with a
masked value replaces the password in the original
command since it
is passed by reference.
Closes#2817 from Fokko/AIRFLOW-1850-copy-cmd-
before-replacing-password
This change enables the scheduler to recover from temporary database
errors and downtimes. The same holds true for the webserver if run
without its regular worker refresh.
The reconnect logic is based on a truncated exponential binary backoff
to ensure reconnect attempts don't overload the database.
Included changes:
* Switch to recommended pessimistic disconnect handling for engines
http://docs.sqlalchemy.org/en/rel_1_1/core/pooling.html#disconnect-handling-pessimistic
* Remove legacy pool-based disconnect handling.
* Ensure event handlers are registered for each newly created engine.
Engines are re-initialized in child processes so this is crucial for
correctness.
This commit is based on a contribution by @vklogin
https://github.com/apache/incubator-airflow/pull/2744
When a forked process or the entire interpreter terminates, we have
to close all pooled database connections. The database can run out
of connections otherwise. At a minimum, it will print errors in its
log file.
By using an atexit handler we ensure that connections are closed
for each interpreter and Gunicorn worker termination. Only usages
of multiprocessing.Process require special handling as those
terminate via os._exit() which does not run finalizers.
This commit is based on a contribution by @dhuang
https://github.com/apache/incubator-airflow/pull/2767
All file descriptors except 0, 1 and 2 will be closed before the
child process is executed. This is the default on Python 3.2 and
above. This patch ensures consistent behaviour for older Python
versions.
Resources will be released once the main thread disposes
them, independent of the longevity of its subprocesses.
Background information:
* https://www.python.org/dev/peps/pep-0446/
* https://bugs.python.org/issue7213
In situations where a database is heavily loaded with connections it
can be beneficial for operators to (temporarily) reduce the connection
footprint of Airflow on the database. This is particularly important
when Airflow or self-made extensions do not dispose the connection
pool when terminating.
Disabling the connection pool comes with a slowdown but that may be
acceptable in many deployment scenarios.
Converting to naive time is required in order to make sure
to run at exact times for crons.
E.g. if you specify to run at 8:00pm every day you do not
want suddenly to run at 7:00pm due to DST.
As a user it would be nice to properly log the
exceptions
thrown in the collect_dags function to debug the
faulty dags
Closes#2803 from Fokko/AIRFLOW-1838-Properly-log-
collect-dags-exception
Fixed incorrect Super class name in gcs to gcs
copy operator from `GoogleCloudStorageOperatorToGo
ogleCloudStorageOperator` to
`GoogleCloudStorageToGoogleCloudStorageOperator`.
Closes#2812 from kaxil/patch-2
If the page was scrolled before the dialog was
displayed then the grey
background would not cover the whole page
correctly.
Closes#2813 from ashb/AIRFLOW-1845-modal-
background-on-long-pages
Add two UI improvement. 1: the links from "DAG
runs" to DAG graph view
include the execution_date. So you land on the
expected DAG, instead of
the last DAG run. 2: A new link is added for the
column "Run Id" with the
same behaviour.
Closes#2801 from abij/AIRFLOW-1229
Copies an object from a Google Cloud Storage
bucket to another Google
Cloud Storage bucket, with renaming if required.
Closes#2808 from litdeviant/gcs_to_gcs
Documentation stated it's String type but in code
it was union of String and Bool.
Changed to to pure string by substituting False to
None since in operator and hook
code checks only for presence of value in
variable.
Make it more predictable by using simpler String
type.
Closes#2807 from litdeviant/gcs-operator-hook