Граф коммитов

117 Коммитов

Автор SHA1 Сообщение Дата
GRANT NICHOLAS cde3a5fecd [AIRFLOW-1517] Add minikube for kubernetes integration tests
Add better support for minikube integration tests; By default minikube integration tests will run with kubernetes 1.7 and kubernetes 1.8
2018-01-11 15:28:32 -08:00
Daniel Imberman 78ff2fc180 [AIRFLOW-1517] Kubernetes Operator 2017-12-26 08:45:31 -08:00
Fokko Driesprong 815270bb56 [AIRFLOW-1911] Rename celeryd_concurrency
There are still celeryd_concurrency occurrences
left in the code
this needs to be renamed to worker_concurrency to
make the config
with Celery consistent

Closes #2870 from Fokko/AIRFLOW-1911-update-
airflow-config
2017-12-12 13:47:55 +01:00
Bolke de Bruin 22453d037e [AIRFLOW-1908] Fix celery broker options config load
Options were set to visibility timeout instead of
broker_options
directly. Furthermore, options should be int,
float, bool or string
not all string.

Closes #2867 from bolkedebruin/AIRFLOW-1908
2017-12-12 12:44:06 +01:00
Fokko Driesprong 30076f1e45 [AIRFLOW-1840] Make celery configuration congruent with Celery 4
Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.

Closes #2806 from Fokko/AIRFLOW-1840-Fix-celery-
config
2017-12-11 18:56:29 +01:00
Bolke de Bruin b9c82c0400 [AIRFLOW-1870] Enable flake8 tests
Flake8 tests now run for diffs

Closes #2829 from bolkedebruin/use_flake8
2017-11-30 15:57:17 +01:00
Bolke de Bruin 518a41acf3 [AIRFLOW-1826] Update views to use timezone aware objects 2017-11-27 15:54:27 +01:00
Stefanie Grunwald a61d9444cd
[AIRFLOW-1669] Fix Docker and pin Moto to 1.1.19
https://github.com/spulec/moto/pull/1048 introduced `docker` as a
dependency in Moto, causing a conflict as Airflow uses `docker-py`. As
both packages don't work together, Moto is pinned to the version
prior to that change.
2017-11-02 14:23:32 +01:00
Maxime Beauchemin b464d23a6d [AIRFLOW-1698] Remove SCHEDULER_RUNS env var in systemd
In the very early days, the Airflow scheduler
needed to be restarted
every so often to take new DAG_FOLDERS mutations
into account properly. This is no longer
required.

Closes #2677 from mistercrunch/scheduler_runs
2017-10-18 21:55:57 +02:00
fenglu-g 7cb818bbac [AIRFLOW-1723] Support sendgrid in email backend
Closes #2695 from fenglu-g/master
2017-10-18 12:27:14 -07:00
Dan Davydov 21e94c7d15 [AIRFLOW-1697] Mode to disable charts endpoint 2017-10-10 11:33:50 -07:00
Bolke de Bruin 65f3b468a2 [AIRFLOW-1527] Refactor celery config
The celery config is currently part of the celery executor definition.
This is really inflexible for users wanting to change it. In addition
Celery 4 is moving to lowercase.

Closes #2542 from bolkedebruin/upgrade_celery
2017-09-25 11:19:16 -07:00
Bolke de Bruin fa1dc1eb20 Revert "[AIRFLOW-1368] Automatically remove Docker container on exit"
This reverts commit 46c86a5cd2.
2017-09-24 19:35:28 +02:00
Nathaniel Varona 46c86a5cd2 [AIRFLOW-1368] Automatically remove Docker container on exit
Closes #2411 from nathanielvarona/docker-operator
2017-09-22 10:15:23 -07:00
Fokko Driesprong eb2f589099 [AIRFLOW-1604] Rename logger to log
In all the popular languages the variable name log
is the de facto
standard for the logging. Rename LoggingMixin.py
to logging_mixin.py
to comply with the Python standard.

When using the .logger a deprecation warning will
be emitted.

Closes #2604 from Fokko/AIRFLOW-1604-logger-to-log
2017-09-19 10:17:14 +02:00
Fokko Driesprong de99aa20f4 [AIRFLOW-1324] Generalize Druid operator and hook
Make the druid operator and hook more specific.
This allows us to
have a more flexible configuration, for example
ingest parquet.
Also get rid of the PyDruid extension since it is
more focussed on
querying druid, rather than ingesting data. Just
requests is
sufficient to submit an indexing job. Add a test
to the hive_to_druid
operator to make sure it behaves as we expect.
Furthermore cleaned
up the docstring a bit

Closes #2378 from Fokko/AIRFLOW-1324-make-more-
general-druid-hook-and-operator
2017-08-18 21:34:03 +02:00
Jay fe0edeaab5 [AIRFLOW-756][AIRFLOW-751] Replace ssh hook, operator & sftp operator with paramiko based
Closes #1999 from jhsenjaliya/AIRFLOW-756
2017-07-20 22:07:45 +02:00
Bolke de Bruin fb21bcbcc1 Re-enable caching for hadoop components 2017-06-16 08:41:54 -04:00
Bolke de Bruin 38b2747c5b Pin Hive and Hadoop to a specific version and create writable warehouse dir 2017-06-15 19:22:09 -04:00
Kengo Seki 0f55477ccb [AIRFLOW-1172] Support nth weekday of the month cron expression
Closes #2321 from sekikn/AIRFLOW-1172
2017-06-14 17:59:02 -07:00
Sumit Maheshwari 6be02475f8 [AIRFLOW-1192] Some enhancements to qubole_operator
1. Upgrade qds_sdk version to latest
2. Add support to run Zeppelin Notebooks
3. Move out initialization of QuboleHook from
init()

Closes #2322 from msumit/AIRFLOW-1192
2017-06-07 09:09:50 +02:00
Stanislav Kudriashev d2d3e49ca0 [AIRFLOW-1201] Update deprecated 'nose-parameterized'
The 'parameterized' package should be used now,

Closes #2298 from skudriashev/airflow-1201
2017-05-16 11:34:52 +02:00
Chris Riccomini 3e9c666e8e [AIRFLOW-1203] Pin Google API client version to fix OAuth issue
Closes #2296 from criccomini/AIRFLOW-1203
2017-05-15 14:42:09 -07:00
Niels Zeilemaker ac9ccb1518 [AIRFLOW-1179] Fix Pandas 0.2x breaking Google BigQuery change
Closes #2279 from NielsZeilemaker/AIRFLOW-1179
2017-05-09 09:42:32 -07:00
Chris Riccomini 94f9822ffd [AIRFLOW-1138] Add missing licenses to files in scripts directory
Closes #2253 from criccomini/AIRFLOW-1138
2017-04-21 13:16:54 -07:00
Henk Griffioen 219c506414 [AIRFLOW-1094] Run unit tests under contrib in Travis
Rename all unit tests under tests/contrib to start
with test_* and fix
broken unit tests so that they run for the Python
2 and 3 builds.

Closes #2234 from hgrif/AIRFLOW-1094
2017-04-17 10:04:36 +02:00
Henk Griffioen f1bc5f38ac [AIRFLOW-1065] Add functionality for Azure Blob Storage over wasb://
This PR implements a hook to interface with Azure
storage over wasb://
via azure-storage; adds sensors to check for blobs
or prefixes; and
adds an operator to transfer a local file to the
Blob Storage.

Design is similar to that of the S3Hook in
airflow.operators.S3_hook.

Closes #2216 from hgrif/AIRFLOW-1065
2017-04-05 09:56:23 +02:00
Xiangrui Meng 70f1bf10a5 [AIRFLOW-1067] use example.com in examples
We use airflow@airflow.com in examples. However,
https://airflow.com
is owned by a company named Airflow (selling fans,
etc). We should use
airflow@example.com instead. That domain is
created for this purpose.

Closes #2217 from mengxr/AIRFLOW-1067
2017-04-04 09:22:37 -07:00
Bolke de Bruin 15fd4d98d1 Merge branch 'AIRFLOW-719' into AIRFLOW-719-3 2017-04-04 11:55:20 +02:00
Bolke de Bruin eb705fd55c [AIRFLOW-719] Fix race condition in ShortCircuit, Branch and LatestOnly
Both the ShortCircuitOperator, Branchoperator and LatestOnlyOperator
 were arbitrarily changing the states of TaskInstances without locking
them in the database. As the scheduler checks the state of dag runs
asynchronously the dag run state could be set to failed while the
operators are updating the downstream tasks.

A better fix would to use the dag run iteself in the context of the
Operator.
2017-04-03 10:38:12 +02:00
Alexander Bij 6393366a78 [AIRFLOW-840] Make ticket renewer python3 compatible
The return from the subprocess is in bytes when
the universal
newlines is set to False (default). This will fail
in Py3 and
works fine in Py2. And with a working unit test.

Closes #2158 from abij/AIRFLOW-840
2017-03-28 16:50:10 -07:00
Alex Guziel fe9ebe3ccf [AIRFLOW-1047] Sanitize strings passed to Markup
We add the Apache-licensed bleach library and use
it to sanitize html
passed to Markup (which is supposed to be already
escaped). This avoids
some XSS issues with unsanitized user input being
displayed.

Closes #2193 from saguziel/aguziel-xss
2017-03-28 16:40:32 -07:00
Bolke de Bruin 4f52db317f [AIRFLOW-911] Add coloring and timing to tests
Closes #2106 from bolkedebruin/profile_tests
2017-02-25 22:10:14 +01:00
Jeremiah Lowin 6e22102782 [AIRFLOW-862] Add DaskExecutor
Adds a DaskExecutor for running Airflow tasks
in Dask clusters.

Closes #2067 from jlowin/dask-executor
2017-02-12 16:06:31 -05:00
Jeremiah Lowin bbfd43df46 [AIRFLOW-863] Example DAGs should have recent start dates
Avoid unnecessary backfills by having start dates
of
just a few days ago. Adds a utility function
airflow.utils.dates.days_ago().

Closes #2068 from jlowin/example-start-date
2017-02-12 15:37:56 -05:00
Dan Davydov b56cb5cc97 [AIRFLOW-219][AIRFLOW-398] Cgroups + impersonation
Submitting on behalf of plypaul

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-219
-
https://issues.apache.org/jira/browse/AIRFLOW-398

Testing Done:
- Running on Airbnb prod (though on a different
mergebase) for many months

Credits:
Impersonation Work: georgeke did most of the work
but plypaul did quite a bit of work too.
Cgroups: plypaul did most of the work, I just did
some touch up/bug fixes (see commit history,
cgroups + impersonation commit is actually plypaul
's not mine)

Closes #1934 from aoen/ddavydov/cgroups_and_impers
onation_after_rebase
2017-01-18 18:11:06 -08:00
Bolke de Bruin 3ac2fba888 Merge branch 'AIRFLOW-760' 2017-01-16 22:23:36 +01:00
Jay 44798e0d4d [AIRFLOW-683] Add jira hook, operator and sensor
Closes #1950 from jhsenjaliya/AIRFLOW-683
2017-01-16 17:46:21 +01:00
Bolke de Bruin f3e18fbe02 [AIRFLOW-760] Update systemd config 2017-01-14 21:32:27 +01:00
Bolke de Bruin 19ed9001b9 [AIRFLOW-740] Pin jinja2 to < 2.9.0
Jinja2 2.9.1 seems to have a conflict with flask-admin.
2017-01-07 19:53:01 +01:00
Vijay Bhat 7fa86f72c6 [AIRFLOW-673] Add operational metrics test for SchedulerJob
Extend SchedulerJob to instrument the execution
performance of task instances contained in each
DAG.
We want to know if any DAG is starved of resources,
and this will be reflected in the stats printed
out at the end of the test run.

Extend SchedulerJob to instrument the execution
performance of task instances contained in each
DAG. We want to know if any DAG is starved of
resources, and this will be reflected in the stats
printed out at the end of the test run.

this test is for instrumenting
the operational impact of
https://github.com/apache/incubator-
airflow/pull/1906

Closes #1919 from vijaysbhat/scheduler_perf_tool
2017-01-03 08:13:06 -05:00
Bolke de Bruin d5ac6bd9d0 [AIRFLOW-489] Add API Framework
This implements a framework for API calls to Airflow. Currently
all access is done by cli or web ui. Especially in the context
of the cli this raises security concerns which can be alleviated
with a secured API call over the wire.

Secondly integration with other systems is a bit harder if you have
to call a cli. For public facing endpoints JSON is used.

As an example the trigger_dag functionality is now made into a
API call.

Backwards compat is retained by switching to a LocalClient.
2016-11-27 19:44:31 +01:00
Li Xuanji dedc54eeaf [AIRFLOW-640] Install and enable nose-ignore-docstring
Closes #1896 from zodiac/nose-ignore-docstring
2016-11-20 17:38:24 -08:00
Li Xuanji ca6dbc6485 [AIRFLOW-639]AIRFLOW-639] Alphasort package names
Closes #1895 from zodiac/alphasort_requirements
2016-11-20 17:06:47 -08:00
Bolke de Bruin 910c0ddd78 [AIRFLOW-504] Store fractional seconds in MySQL tables
Both utcnow() and now() return fractional seconds. These
are sometimes used in primary_keys (eg. in task_instance).
If MySQL is not configured to store these fractional seconds
a primary key might fail (eg. at session.merge) resulting in
a duplicate entry being added or worse.

Postgres does store fractional seconds if left unconfigured,
sqlite needs to be examined.
2016-11-13 22:43:17 +01:00
David Gingrich ff45d8f221 [AIRFLOW-512] Fix 'bellow' typo in docs & comments
Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-512

Testing Done:
- N/A, but ran core tests: `./run_unit_tests.sh
tests.core:CoreTest -s`

Closes #1800 from dgingrich/master
2016-09-16 09:45:12 -07:00
Bolke de Bruin 2c3d0fdbe9 Merge remote-tracking branch 'apache/master' 2016-08-09 15:09:51 +02:00
Bolke de Bruin 1d67d6293e [AIRFLOW-404] Retry download if unpacking fails for hive
Travis cache can have a faulty files. This results in builds
that fail as they are dependent on certain components being
available, ie. hive. This addresses the issue for hive by
redownloading if unpacking fails.
2016-08-09 15:00:25 +02:00
Li Xuanji 9d254a317d [AIRFLOW-276] Gunicorn rolling restart
- Tell gunicorn to prepend `[ready]` to worker process name once worker is ready (to serve requests) - in particular this happens after DAGs folder is parsed
- Airflow cli runs gunicorn as a child process instead of `excecvp`-ing over itself
- Airflow cli monitors gunicorn worker processes and restarts them by sending TTIN/TTOU signals to the gunicorn master process
- Fix bug where `conf.get('webserver', 'workers')` and `conf.get('webserver', 'webserver_worker_timeout')` were ignored

- Alternatively, https://github.com/apache/incubator-airflow/pull/1684/files does the same thing but the worker-restart script is provided separately for the user to run

- Start airflow, observe that workers are restarted
- Add new dags to dags folder and check that they show up
- Run `siege` against airflow while server is restarting and confirm that all requests succeed
- Run with configuration set to `batch_size = 0`, `batch_size = 1` and `batch_size = 4`

Closes #1685 from zodiac/xuanji_gunicorn_rolling_restart_2
2016-08-08 11:26:38 -07:00
Paul Yang fdb7e94914 [AIRFLOW-160] Parse DAG files through child processes
Instead of parsing the DAG definition files in the same process as the
scheduler, this change parses the files in a child process. This helps
to isolate the scheduler from bad user code.

Closes #1636 from plypaul/plypaul_schedule_by_file_rebase_master
2016-07-31 12:49:39 -07:00