Граф коммитов

288 Коммитов

Автор SHA1 Сообщение Дата
Daniel van der Ende a67e4390d2 [AIRFLOW-815] Add prev/next execution dates to template variables
This patch adds the previous/next execution dates
to the default variables available in a template.

Closes #2033 from danielvdende/add-execution-dates
2017-01-29 12:41:39 +01:00
Dan Davydov b56cb5cc97 [AIRFLOW-219][AIRFLOW-398] Cgroups + impersonation
Submitting on behalf of plypaul

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-219
-
https://issues.apache.org/jira/browse/AIRFLOW-398

Testing Done:
- Running on Airbnb prod (though on a different
mergebase) for many months

Credits:
Impersonation Work: georgeke did most of the work
but plypaul did quite a bit of work too.
Cgroups: plypaul did most of the work, I just did
some touch up/bug fixes (see commit history,
cgroups + impersonation commit is actually plypaul
's not mine)

Closes #1934 from aoen/ddavydov/cgroups_and_impers
onation_after_rebase
2017-01-18 18:11:06 -08:00
Benjamin Tallman 1caaceb388 [AIRFLOW-558] Add Support for dag.catchup=(True|False) Option
Added a dag.catchup option and modified the
scheduler to look at the value when scheduling
DagRuns
(by moving dag.start_date up to
dag.previous_schedule),
and added a config option catchup_by_default
(defaults to True) that allows users to set this
to False for all
dags modifying the existing DAGs

In addition, we added a test to jobs.py
(test_dag_catchup_option)

Closes #1830 from
btallman/NoBackfill_clean_feature
2017-01-13 12:39:55 +01:00
Ben Hoyt 2933655885 [AIRFLOW-662] Change seasons to months in project description
Things like "winter 2016" are ambiguous even in
the northern hemisphere, and mean something
totally different in the southern hemisphere
(July/August/September). Changing the season-based
dates to months. References:

*
http://incubator.apache.org/projects/airflow.html
* http://nerds.airbnb.com/airflow/
* https://github.com/apache/incubator-
airflow/graphs/contributors

Closes #1913 from benhoyt/patch-1
2016-11-30 16:23:35 -08:00
Bolke de Bruin 6fb94630c1 Merge branch 'api_v3' 2016-11-27 20:13:26 +01:00
Bolke de Bruin d5ac6bd9d0 [AIRFLOW-489] Add API Framework
This implements a framework for API calls to Airflow. Currently
all access is done by cli or web ui. Especially in the context
of the cli this raises security concerns which can be alleviated
with a secured API call over the wire.

Secondly integration with other systems is a bit harder if you have
to call a cli. For public facing endpoints JSON is used.

As an example the trigger_dag functionality is now made into a
API call.

Backwards compat is retained by switching to a LocalClient.
2016-11-27 19:44:31 +01:00
Vincent Poulain 98197d9568 [AIRFLOW-345] Add contrib ECSOperator
Closes #1894 from poulainv/ecs_operator
2016-11-23 10:49:57 -08:00
Maxime Beauchemin e79ea3b8f0 [AIRFLOW-612] Move resources/articles links to wiki
Closes #1863 from mistercrunch/docs_links
2016-11-02 12:24:11 -07:00
Duy-Minh TRAN f3af6f44eb [AIRFLOW-96] s3_conn_id using environment variable
Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
- [AIRFLOW-96](https://issues.apache.org/jira/brow
se/AIRFLOW-96) : allow parameter "s3_conn_id" of
S3KeySensor and S3PrefixSensor to be defined using
an environment variable.

Actually, S3KeySensor and S3PrefixSensor use the
S3hook, which extends BaseHook. BaseHook has
get_connection, which looks a connection up :
- in environment variables first
- and then in the database

Closes #1517 from dm-tran/fix-jira-airflow-96
2016-10-20 21:30:34 +05:30
Arthur Wiedmer 916f1eb2fe Merge pull request #1402 from lauralorenz/schedule_interval_default_args_docs 2016-10-17 09:46:57 -07:00
lauralorenz 80d3c8d461 [AIRFLOW-575] Clarify tutorial and FAQ about `schedule_interval` always inheriting from DAG object
- Update the tutorial with a comment helping to explain the use of default_args and
include all the possible parameters in line
- Clarify in the FAQ the possibility of an unexpected default `schedule_interval`in case
airflow users mistakenly try to overwrite the default `schedule_interval` in a DAG's
`default_args` parameter
2016-10-17 12:36:38 -04:00
Mike Lyons a66cf75e23 [AIRFLOW-500] Use id for github allowed teams
The team string is not unique across an organization
and therefore we should use the long id instead.

Closes #1788 from mylons/master
2016-10-08 23:27:27 +02:00
George Leslie-Waksman eb5982d4aa [AIRFLOW-333][AIRFLOW-258] Fix non-module plugin components
* Distinguish between module and non-module plugin
components
* Fix handling of non-module plugin components

  * admin views, flask blueprints, and menu links
need to not be
    wrapped in modules

* Fix improper use of zope.deprecation.deprecated

  * zope.deprecation.deprecated does NOT support
classes as
    first parameter
  * deprecating classes must be handled by calling
the deprecate
    function on the class name

* Added tests for plugin loading
* Updated plugin documentation to match test
plugin
* Updated executors to always load plugins
* More logging

Closes #1738 from gwax/plugin_module_fixes
2016-10-01 23:43:20 -07:00
Daniel Zohar c02425d483 [AIRFLOW-530] Update docs to reflect connection environment var has to be in uppercase
Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
https://issues.apache.org/jira/browse/AIRFLOW-530

Right now, the documentation does not clearly
state that connection names are converted to
uppercase form when searched in the environment
(https://github.com/apache/incubator-airflow/blob/
master/airflow/hooks/base_hook.py#L60-L60).
This is confusing as the best practice in Airflow
seems to be to define connections in lower case
form.

Closes #1811 from danielzohar/connection_env_var
2016-10-01 00:33:50 -07:00
George Leslie-Waksman edf033be65 [AIRFLOW-198] Implement latest_only_operator
Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-198

Testing Done:
- Local testing of dag operation with
LatestOnlyOperator
- Unit test added

Closes #1752 from gwax/latest_only
2016-09-27 17:07:14 -07:00
Casey Ching b28cedb98d [AIRFLOW-91] Add SSL config option for the webserver
SSL can now be enabled by providing certificate
and key in the usual
ways (config file or CLI options). Providing the
cert and key will
automatically enable SSL. The web server port will
not automatically
change.

The Security page in the docs now includes an SSL
section with basic
setup information.

Closes #1760 from caseyching/master
2016-09-19 15:55:10 +02:00
David Gingrich ff45d8f221 [AIRFLOW-512] Fix 'bellow' typo in docs & comments
Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-512

Testing Done:
- N/A, but ran core tests: `./run_unit_tests.sh
tests.core:CoreTest -s`

Closes #1800 from dgingrich/master
2016-09-16 09:45:12 -07:00
Alex Van Boxel c08b52aadb [AIRFLOW-159] Add cloud integration section + GCP documentation
Closes #1773 from alexvanboxel/feature/gcp-docs
2016-09-04 15:15:07 +02:00
Alex Van Boxel 86fe23f111 [AIRFLOW-477][AIRFLOW-478] Restructure security section for clarity
Closes #1775 from alexvanboxel/docs/security
2016-09-04 15:13:18 +02:00
Dan Davydov f360414774 [AIRFLOW-149] Task Dependency Engine + Why Isn't My Task Running View
Here is the original PR with Max's LGTM:
https://github.com/aoen/incubator-airflow/pull/1
Since then I have made some fixes but this PR is essentially the same.
It could definitely use more eyes as there are likely still issues.

**Goals**
- Simplify, consolidate, and make consistent the logic of whether or not
  a task should be run
- Provide a view/better logging that gives insight into why a task
  instance is not currently running (no more viewing the scheduler logs
  to find out why a task instance isn't running for the majority of
  cases):
![image](https://cloud.githubusercontent.com/assets/1592778/17637621/aa669f5e-6099-11e6-81c2-d988d2073aac.png)

**Notable Functional Changes**
- Webserver view + task_failing_deps CLI command to explain why a given
  task instance isn't being run by the scheduler
- Running a backfill in the command line and running a task in the UI
  will now display detailed error messages based on which dependencies
  were not met for a task instead of appearing to succeed but actually
  failing silently
- Maximum task concurrency and pools are not respected by backfills
- Backfill now has the equivalent of the old force flag to run even for
  successful tasks
  This will break one use case:
  Using pools to restrict some resource on airflow executors themselves
  (rather than an external resource like a DB), e.g. some task uses 60%
  of cpu on a worker so we restrict that task's pool size to 1 to
  prevent two of the tasks from running on the same host. When
  backfilling a task of this type, now the backfill will wait on the
  pool to have slots open up before running the task even though we
  don't need to do this if backfilling on a different host outside of
  the pool. I think breaking this use case is OK since the use case is a
  hack due to not having a proper resource isolation solution (e.g.
  mesos should be used in this case instead).
- To make things less confusing for users, there is now a "ignore all
  dependencies" option for running tasks, "ignore dependencies" has been
  renamed to "ignore task dependencies", and "force" has been renamed to
  "ignore task instance state". The new "Ignore all dependencies" flag
  will ignore the following:
  - task instance's pool being full
  - execution date for a task instance being in the future
  - a task instance being in the retry waiting period
  - the task instance's task ending prior to the task instance's
    execution date
  - task instance is already queued
  - task instance has already completed
  - task instance is in the shutdown state
  - WILL NOT IGNORE task instance is already running
- SLA miss emails will now include all tasks that did not finish for a
  particular DAG run, even if
  the tasks didn't run because depends_on_past was not met for a task
- Tasks with pools won't get queued automatically the first time they
  reach a worker; if they are ready to run they will be run immediately
- Running a task via the UI or via the command line (backfill/run
  commands) will now log why a task could not get run if one if it's
  dependencies isn't met. For tasks kicked off via the web UI this
  means that tasks don't silently fail to get queued despite a
  successful message in the UI.
- Queuing a task into a pool that doesn't exist will now get stopped in
  the scheduler instead of a worker

**Follow Up Items**
- Update the docs to reference the new explainer views/CLI command

Closes #1729 from aoen/ddavydov/blockedTIExplainerRebasedMaster
2016-08-26 15:07:44 -07:00
Ananya Mishra df848a5564 [AIRFLOW-444] Add Google authentication backend
Add Google authentication backend.
Add Google authentication information to security
docs.

Dear Airflow Maintainers,

Please accept this PR that addresses the following
issues:
-
https://issues.apache.org/jira/browse/AIRFLOW-444

Testing Done:
- Tested Google authentication backend locally
with no issues

This is mostly an adaptation of the GHE
authentication backend.

Closes #1747 from ananya77041/google_auth_backend
2016-08-19 16:12:58 -07:00
jlowin 7662cd8ce4 [AIRFLOW-328][AIRFLOW-371] Remove redundant default configuration & fix unit test configuration
AIRFLOW-328
https://issues.apache.org/jira/browse/AIRFLOW-328
Previously, Airflow had both a default template for airflow.cfg AND a
dictionary of default values. Frequently, these get out of sync (an
option in one has a different value than in the other, or isn’t present
in the other). This commit removes the default dict and uses the
airflow.cfg template to provide defaults. The ConfigParser first reads
the template, loading all the options it contains, and then reads the
user’s actual airflow.cfg to overwrite the default values with any new
ones.

AIRFLOW-371
https://issues.apache.org/jira/browse/AIRFLOW-371
Calling test_mode() didn't actually change Airflow's configuration! This actually wasn't an issue in unit tests because the unit test run script was hardcoded to point at the unittest.cfg file, but it needed to be fixed.

[AIRFLOW-328] Remove redundant default configuration

Previously, Airflow had both a default template
for airflow.cfg AND a dictionary of default
values. Frequently, these get out of sync (an
option in one has a different value than in the
other, or isn’t present in the other). This commit
removes the default dict and uses the airflow.cfg
template to provide defaults. The ConfigParser
first reads the template, loading all the options
it contains, and then reads the user’s actual
airflow.cfg to overwrite the default values with
any new ones.

[AIRFLOW-371] Make test_mode() functional

Previously, calling test_mode() didn’t actually
do anything.

This PR renames it to load_test_config() (to
avoid confusion, ht @r39132).

In addition, manually entering test_mode after
Airflow launches might be too late — some
options have already been loaded (DAGS_FOLDER,
etc.). This makes it so setting
tests/unit_test_mode OR the equivalent env var
(AIRFLOW__TESTS__UNIT_TEST_MODE) will load the
test config immediately, prior to loading the
rest of Airflow.

Closes #1677 from jlowin/Simplify-config
2016-08-12 10:34:50 -07:00
Maxime Beauchemin a737506bba [AIRFLOW-410] Add 2 Q/A to the FAQ in the docs
Also changed the markup of Questions as sections to be directly linkable.

I made sure the `rst` rendered nicely here:
<img width="690" alt="screen shot 2016-08-10 at 9 53 27 am" src="https://cloud.githubusercontent.com/assets/487433/17562690/c2dd3da0-5ee0-11e6-841b-9569eac4bf9a.png">

r39132  aoen plypaul

[AIRFLOW-410] Adding 2 Q/A to the FAQ in the docs

Typos

Closes #1720 from mistercrunch/docs_faqs
2016-08-11 15:51:38 -07:00
Li Xuanji f1abffa380 [AIRFLOW-402] Remove NamedHivePartitionSensor static check, add docs
Addresses the following issues:
[https://issues.apache.org/jira/browse/AIRFLOW-402](https://issues.apache.org/jira/browse/AIRFLOW-402)

Closes #1711 from zodiac/fix_named_hive_partition_sensor
2016-08-09 15:39:02 -07:00
Jamie Alessio de4b7c62fb [AIRFLOW-397] Documentation: Fix typo "instatiating" to "instantiating" 2016-08-04 12:45:17 -07:00
Ajay Yadava 968ba9c534 [AIRFLOW-322] Fix typo in FAQ section
Closes #1693 from ajayyadava/322
2016-08-03 08:01:44 -04:00
Peter Pang 7dbc3cd40e [AIRFLOW-331] modify the LDAP authentication config lines in 'Security' sample codes
Closes #1674 from impangt/master
2016-07-27 14:33:39 -07:00
Maxime Beauchemin c08b02229c [AIRFLOW-298] fix incubator diclaimer in docs
Closes #1640 from mistercrunch/disclaimer_tweaks

[AIRFLOW-298] fix incubator diclaimer in docs
2016-07-01 15:20:28 -07:00
Maxime Beauchemin 4a84a578a5 Add an Apache Incubator Disclaimer and mocking modules
Closes #1634 from mistercrunch/mock_docs

Adding an Apache Incubator Disclaimer and mocking modules
2016-06-29 13:39:15 -07:00
Chris Riccomini dc84fdecdf [AIRFLOW-285] Use Airflow 2.0 style imports for all remaining hooks/operators 2016-06-28 13:34:47 -07:00
Alex Van Boxel 54f1c11b6f [AIRFLOW-162] Allow variable to be accessible into templates
Closes #1540 from alexvanboxel/AIRFLOW-162

AIRFLOW-162 Allow variable to be accessible into templates
2016-06-21 10:29:19 -07:00
Ajay Yadav 3ffa656d97 [AIRFLOW-248] Add Apache license header to all files
- Added Apache license header for files with extension (.service, .in, .mako, .properties, .ini, .sh, .ldif, .coveragerc, .cfg, .yml, .conf, .sql, .css, .js, .html, .xml.
- Added/Replaced shebang on all .sh files with portable version - #!/usr/bin/env bash.
- Skipped third party css and js files. Skipped all minified js files as well.

Closes #1598 from ajayyadava/248
2016-06-21 08:15:42 -07:00
Bolke de Bruin 901e8f2a95 Merge branch 'align_startdate' 2016-06-11 13:47:27 +02:00
Bolke de Bruin f69eec3b44 [AIRFLOW-68] Align start_date with the schedule_interval
This particular issue arises because of an alignment issue between
start_date and schedule_interval. This can only happen with cron-based
schedule_intervals that describe absolute points in time (like “1am”) as
opposed to time deltas (like “every hour”)

In the past (and in the docs) we have simply said that users must make
sure the two params agree. But this is counter intuitive. As in these
cases, start_date is sort of like telling the scheduler to
“start paying attention” as opposed to “this is my first execution date”.

This patch changes the behavior of the scheduler. The next run date of
the dag will be treated as "start_date + interval" unless the start_date
is on the (previous) interval in which case the start_date will be the
next run date.
2016-06-07 10:52:09 +02:00
Maxime Beauchemin b85fd61d15 [AIRFLOW-9] Improving docs to meet Apache's standards 2016-06-06 18:14:34 -07:00
Sumit Maheshwari dce08f68bc [AIRFLOW-155] Documentation of Qubole Operator
Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:
- *https://issues.apache.org/jira/browse/AIRFLOW-155*

Thanks,
Sumit

Author: Sumit Maheshwari <sumitm@qubole.com>

Closes #1560 from msumit/AIRFLOW-155.
2016-06-01 14:57:33 -07:00
Matthew Chen 3e3094157e AIRFLOW-45: Support Hidden Airflow Variables 2016-05-25 08:45:24 -07:00
Mark Reid 8d72975734 docfix: Fix a couple of minor typos. 2016-05-23 09:16:38 -03:00
Chris Riccomini abc43c1445 Merge branch '1493' 2016-05-17 08:14:30 -07:00
Hervé Werner 150568228b [AIRFLOW-121] Documenting dag doc_md feature 2016-05-17 09:57:23 +02:00
Maxime Beauchemin aeb5a07ff9 Docs tweaks while generating the docs 2016-05-03 22:13:35 -07:00
Chris Riccomini 844eb2c8d0 AIRFLOW-15: Remove gcloud 2016-04-28 13:45:09 -07:00
Matt Pelland 11c34c4353
Implement a Cloudant hook 2016-04-19 16:11:54 -04:00
Joy Gao a5ad871a36 Support list/get/set variables in the CLI 2016-04-16 00:11:46 -07:00
bolkedebruin 975b90ec3c Add support for zipped dags
Currently dags are being read directly from the filesystem. Any
hierarchy (python namespaces, modules) need to be reflected on
the filesystem. This makes it hard to manage dags and their
depedencies.

This patch adds support for dags in zip files. It will add
the zip to sys.path and then it will read the zip file and
try to import any files as modules that are in the root of
the zip.

Please note that any module contained within the zip will
overwrite existing modules in the same namespace.
2016-04-14 09:03:42 +02:00
Jeremiah Lowin 96891596f9 Merge pull request #1318 from jlowin/infer_dag
Syntactic Sugar! Dag inference, operator composition, and a big docs update
2016-04-12 16:45:17 -04:00
jlowin fb0c5775cd Add DAG inference, deferral, and context manager
- Operators can be created without DAGs, but the DAG can be added at
any time thereafter (by assigning to the ‘dag’ attribute). Once a DAG
is assigned, it can not be removed or reassigned.

- Operators can infer DAGs from other operators. Setting a relationship
will also set the DAG, if possible. Operators from different DAGs and
operators with no DAGs can not be chained.

- DAGs can be used as context managers. When “inside” a DAG context
manager, the default DAG for all new Operators is that DAG (unless they
specify a different one)

- Unit tests

- Add default owner for Operators

- Support composing operators with >> and <<

Three special cases:
  op1 >> op2 is equivalent to op.set_downstream(op2)
  op1 << op2 is equivalent to op1.set_upstream(op2)
  dag >> op1 (in any order or direction) means op1.dag = dag

These can be chained:
  dag >> op1 >> op2 << op3

- Update concepts documentation
2016-04-12 14:27:53 -04:00
Sean McIntyre cab436a69a Update plugins.rst for clarity on the example (#1309)
The plugins tutorial was lacking in the following ways:

1. I wasn't sure where my template should live
2. I wasn't aware that both the TestView and Blueprint were necessary

In lieu of a code refactor, here's my suggestion on how to make the documentation more helpful from the perspective of someone who doesn't have experience with Flask Blueprints and Flask Admin, which can prevent the deep-dive into the code and supporting libs that I just did!
2016-04-10 22:40:03 -07:00
Jeremiah Lowin 2e66205e8c Merge pull request #1283 from clickthisnick/chore-remove-trailing-spaces
CHORE - Remove Trailing Spaces
2016-04-08 17:33:34 -04:00
jgao54 9410715c81 Add HipchatOperator 2016-04-07 20:47:40 +02:00