Граф коммитов

4496 Коммитов

Автор SHA1 Сообщение Дата
Andy Hadjigeorgiou 4936a80773 [AIRFLOW-1888] Add AWS Redshift Cluster Sensor
Add AWS Redshift Cluster Sensor to contrib, along
with corresponding
unit tests. Additionally, updated Redshift Hook
cluster_status method to
better handle cluster_not_found exception, added
unit tests, and
corrected linting errors.

Closes #2849 from andyxhadji/AIRFLOW-1888
2017-12-08 10:16:44 +01:00
Victor Villas 9ad6d1202d [AIRFLOW-1887] Renamed endpoint url variable
s3_endpoint_url is a legacy name from when AwsHook
was only used to
connect to S3. The endpoint_url is more general
and what is effectively
used elsewhere for this piece of information.

Closes #2848 from villasv/AIRFLOW-1887
2017-12-07 13:54:34 +00:00
Ash Berlin-Taylor 4b4e504eea [AIRFLOW-1873] Set TI.try_number to right value depending TI state
Rather than having try_number+1 in various places,
try_number
will now automatically contain the right value for
when the TI
will next be run, and handle the case where
try_number is
accessed when the task is currently running.

This showed up as a bug where the logs from
running operators would
show up in the next log file (2.log for the first
try)

Closes #2832 from ashb/AIRFLOW-1873-task-operator-
log-try-number
2017-12-07 13:31:46 +00:00
Guillermo Rodriguez Cano ad4f751111 [AIRFLOW-1891] Fix non-ascii typo in default configuration template
Closes #2851 from wileeam/non-utf8-typo-default-
cfg
2017-12-07 11:51:15 +01:00
Clinton Boys 09f7142d41 Update README.md
Added Playbuzz to list of companies using Airflow.

Closes #2828 from clintonboys/patch-5
2017-12-06 09:54:41 +01:00
Bolke de Bruin 301ce6b4f0 [AIRFLOW-1879] Handle ti log entirely within ti
Previously logging was setup outside a
TaskInstance,
this puts everything inside. Also propery closes
the logging.

Closes #2837 from bolkedebruin/AIRFLOW-1879
2017-12-06 09:46:53 +01:00
William Pursell 06b41fbe1b [AIRFLOW-1869] Write more error messages into gcs and file logs
Closes #2826 from wrp/gcs-log
2017-12-05 11:24:35 -08:00
William Pursell a9ceca5e04 [AIRFLOW-1876] Write subtask id to task log header
Closes #2835 from wrp/subtask-id
2017-12-05 11:22:50 -08:00
Paulius ff0d75f062 [AIRFLOW-1554] Fix wrong DagFileProcessor termination method call
Closes #2821 from
pdambrauskas/fix/wrong_termination_call
2017-12-05 19:39:25 +01:00
Alexis Rosuel c0c71cac63 add hostnfly as users of airflow
Closes #2845 from alexisrosuel/master
2017-12-05 17:13:47 +01:00
Bolke de Bruin bdafb12f8d [AIRFLOW-342] Do not use amqp, rpc as result backend
amqp and rpc (and redis most likely) cannot store
results for tasks
long enough.

Closes #2830 from bolkedebruin/AIRFLOW-342
2017-12-05 10:14:50 +01:00
Bolke de Bruin aa737a582c [AIRFLOW-966] Make celery broker_transport_options configurable
Required for changing visibility timeout and other
options required
for Redis/SQS.

Closes #2842 from bolkedebruin/AIRFLOW-966
2017-12-05 10:13:05 +01:00
Bolke de Bruin 97383f76d0 [AIRFLOW-1881] Make operator log in task log
Previously operators logged under
airflow.operators or
airflow.contrib.operators. This unifies them under
airflow.task.operators allowing the task log to
pick
them up and not have 'double' logging.

Closes #2838 from bolkedebruin/AIRFLOW-1881
2017-12-05 09:19:11 +01:00
Kaxil Naik 28c2d8d90a [AIRFLOW-XXX] Added DataReply to the list of Airflow Users
Closes #2841 from kaxil/patch-1
2017-12-04 14:55:02 -08:00
Kaxil Naik 8d2f430732 [AIRFLOW-1883] Get File Size for objects in Google Cloud Storage
Closes #2840 from kaxil/Get_File_Size
2017-12-04 14:10:37 -08:00
Bolke de Bruin 1359d87352 Merge pull request #2822 from StephanErb/db_robustness 2017-12-02 16:22:50 +01:00
Bolke de Bruin 406d738b1c [AIRFLOW-1872] Set context for all handlers including parents
Previously setting the context was not propagated
to the parent
loggers. Unfortnately, in case of a non explicitly
defined logger
the returned logger is shallow, ie. it does not
have handlers
defined. So to set the context it is required to
walk the tree.

Closes #2831 from bolkedebruin/fix_logging
2017-12-02 09:56:13 +01:00
Kaxil Naik 3e321790d5 [AIRFLOW-1855][AIRFLOW-1866] Add GCS Copy Operator to copy multiple files
Closes #2819 from kaxil/master
2017-12-01 15:51:30 -08:00
Bolke de Bruin b9c82c0400 [AIRFLOW-1870] Enable flake8 tests
Flake8 tests now run for diffs

Closes #2829 from bolkedebruin/use_flake8
2017-11-30 15:57:17 +01:00
Fokko Driesprong 8e7b0abed2 [AIRFLOW-1785] Enable Python 3 tests
Enable tests under Python 3 to make sure that
tests run under Python
3.

Closes #2755 from Fokko/AIRFLOW-1785-Enable-
Python3-tests
2017-11-29 15:31:53 +01:00
Fokko Driesprong 4135c82bf4 [AIRFLOW-1850] Copy cmd before masking
The cmd is first copied before the password is
masked. This ensures
that the orignal cmd isn't changed. Replacing the
password with a
masked value replaces the password in the original
command since it
is passed by reference.

Closes #2817 from Fokko/AIRFLOW-1850-copy-cmd-
before-replacing-password
2017-11-29 15:05:39 +01:00
Stephan Erb 94deac34ec [AIRFLOW-1665] Reconnect on database errors
This change enables the scheduler to recover from temporary database
errors and downtimes. The same holds true for the webserver if run
without its regular worker refresh.

The reconnect logic is based on a truncated exponential binary backoff
to ensure reconnect attempts don't overload the database.

Included changes:

* Switch to recommended pessimistic disconnect handling for engines
  http://docs.sqlalchemy.org/en/rel_1_1/core/pooling.html#disconnect-handling-pessimistic
* Remove legacy pool-based disconnect handling.
* Ensure event handlers are registered for each newly created engine.
  Engines are re-initialized in child processes so this is crucial for
  correctness.

This commit is based on a contribution by @vklogin
https://github.com/apache/incubator-airflow/pull/2744
2017-11-29 12:29:47 +01:00
Stephan Erb 6bf1a6edaf [AIRFLOW-1559] Dispose SQLAlchemy engines on exit
When a forked process or the entire interpreter terminates, we have
to close all pooled database connections. The database can run out
of connections otherwise. At a minimum, it will print errors in its
log file.

By using an atexit handler we ensure that connections are closed
for each interpreter and Gunicorn worker termination. Only usages
of multiprocessing.Process require special handling as those
terminate via os._exit() which does not run finalizers.

This commit is based on a contribution by @dhuang
https://github.com/apache/incubator-airflow/pull/2767
2017-11-29 09:49:57 +01:00
Stephan Erb 5a303ebbc5 [AIRFLOW-1559] Close file handles in subprocesses
All file descriptors except 0, 1 and 2 will be closed before the
child process is executed. This is the default on Python 3.2 and
above. This patch ensures consistent behaviour for older Python
versions.

Resources will be released once the main thread disposes
them, independent of the longevity of its subprocesses.

Background information:

* https://www.python.org/dev/peps/pep-0446/
* https://bugs.python.org/issue7213
2017-11-29 09:46:42 +01:00
Stephan Erb 3bde95e599 [AIRFLOW-1559] Make database pooling optional
In situations where a database is heavily loaded with connections it
can be beneficial for operators to (temporarily) reduce the connection
footprint of Airflow on the database. This is particularly important
when Airflow or self-made extensions do not dispose the connection
pool when terminating.

Disabling the connection pool comes with a slowdown but that may be
acceptable in many deployment scenarios.
2017-11-29 08:50:34 +01:00
Crystal Qian 02112197c6 [AIRFLOW-1848][Airflow-1848] Fix DataFlowPythonOperator py_file extension doc comment
Closes #2816 from cjqian/1848
2017-11-28 10:57:36 +01:00
Bolke de Bruin d99053106e Merge pull request #2781 from bolkedebruin/AIRFLOW-1802 2017-11-27 21:39:28 +01:00
Igors Vaitkus d8115e982b [AIRFLOW-1843] Add Google Cloud Storage Sensor with prefix
Sensor for checking if there any files in bucket
at certain prefix

Closes #2809 from litdeviant/gcs_prefix_sensor
2017-11-27 11:35:01 -08:00
Chris Riccomini eff68882b2 Merge pull request #2786 from x/postgres_to_bigquery_operator 2017-11-27 10:55:57 -08:00
Bolke de Bruin f1ab56cc6a [AIRFLOW-1803] Time zone documentation 2017-11-27 15:54:27 +01:00
Bolke de Bruin 518a41acf3 [AIRFLOW-1826] Update views to use timezone aware objects 2017-11-27 15:54:27 +01:00
Bolke de Bruin f43c0e9ba5 [AIRFLOW-1827] Fix api endpoint date parsing 2017-11-27 15:54:27 +01:00
Bolke de Bruin 8aadc31125 [AIRFLOW-1806] Use naive datetime when using cron 2017-11-27 15:54:27 +01:00
Bolke de Bruin 9624f5f24e [AIRFLOW-1809] Update tests to use timezone aware objects 2017-11-27 15:54:27 +01:00
Bolke de Bruin dcac3e97a4 [AIRFLOW-1806] Use naive datetime for cron scheduling
Converting to naive time is required in order to make sure
to run at exact times for crons.
E.g. if you specify to run at 8:00pm every day you do not
want suddenly to run at 7:00pm due to DST.
2017-11-27 15:54:27 +01:00
Bolke de Bruin 2f168634aa [AIRFLOW-1807] Force use of time zone aware db fields
This change will check if all date times being stored are
indeed timezone aware.
2017-11-27 15:54:27 +01:00
Bolke de Bruin c857436b75 [AIRFLOW-1808] Convert all utcnow() to time zone aware
datetime.utcnow() does not set time zone information.
2017-11-27 15:54:20 +01:00
Bolke de Bruin a47255fb2d [AIRFLOW-1804] Add time zone configuration options
Time zone defaults to UTC as is the default now in order
to maintain backwards compatibility.
2017-11-27 15:53:03 +01:00
Bolke de Bruin b658c78f67 [AIRFLOW-1802] Convert database fields to timezone aware 2017-11-27 15:53:03 +01:00
Bolke de Bruin 59aba30649 [AIRFLOW-XXX] Add dask lock files to excludes 2017-11-27 15:47:12 +01:00
Hugo Prudente 68d3a80dcb [AIRFLOW-1790] Add support for AWS Batch operator
Closes #2762 from hprudent/aws-batch
2017-11-26 19:23:29 +01:00
lindsey anne 2728cde34b [AIRFLOW-XXX] Update README.md
Closes #2780 from runongirlrunon/patch-1
2017-11-26 19:17:45 +01:00
William Pursell d4816667e5 [AIRFLOW-1820] Remove timestamp from metric name
Closes #2792 from wrp/datetime
2017-11-26 19:15:41 +01:00
Sanjay P 0422157864 [AIRFLOW-1810] Remove unused mysql import in migrations.
Closes #2782 from MortalViews/master
2017-11-26 19:13:31 +01:00
Fokko Driesprong f5df0d3437 [AIRFLOW-1838] Properly log collect_dags exception
As a user it would be nice to properly log the
exceptions
thrown in the collect_dags function to debug the
faulty dags

Closes #2803 from Fokko/AIRFLOW-1838-Properly-log-
collect-dags-exception
2017-11-26 15:38:41 +01:00
Kaxil Naik 4247ff0228 [AIRFLOW-1842] Fixed Super class name for the gcs to gcs copy operator
Fixed incorrect Super class name in gcs to gcs
copy operator from `GoogleCloudStorageOperatorToGo
ogleCloudStorageOperator` to
`GoogleCloudStorageToGoogleCloudStorageOperator`.

Closes #2812 from kaxil/patch-2
2017-11-26 15:36:42 +01:00
Ash Berlin-Taylor 87c6c83525 [AIRFLOW-1845] Modal background now covers long or tall pages
If the page was scrolled before the dialog was
displayed then the grey
background would not cover the whole page
correctly.

Closes #2813 from ashb/AIRFLOW-1845-modal-
background-on-long-pages
2017-11-26 15:35:38 +01:00
Alexander Bij d76bf76de2 [AIRFLOW-1229] Add link to Run Id, incl execution_date
Add two UI improvement. 1: the links from "DAG
runs" to DAG graph view
include the execution_date. So you land on the
expected DAG, instead of
the last DAG run. 2: A new link is added for the
column "Run Id" with the
same behaviour.

Closes #2801 from abij/AIRFLOW-1229
2017-11-23 10:46:21 +01:00
Igors Vaitkus 149195845d [AIRFLOW-1842] Add gcs to gcs copy operator with renaming if required
Copies an object from a Google Cloud Storage
bucket to another Google
Cloud Storage bucket, with renaming if required.

Closes #2808 from litdeviant/gcs_to_gcs
2017-11-23 08:51:40 +01:00
Igors Vaitkus cbd6e70411 [AIRFLOW-1841] change False to None in operator and hook
Documentation stated it's String type but in code
it was union of String and Bool.
Changed to to pure string by substituting False to
None since in operator and hook
code checks only for presence of value in
variable.
Make it more predictable by using simpler String
type.

Closes #2807 from litdeviant/gcs-operator-hook
2017-11-23 08:50:01 +01:00