Граф коммитов

427 Коммитов

Автор SHA1 Сообщение Дата
Xiaodong c0c63ae2a4 [AIRFLOW-2839] Refine Doc Concepts->Connections (#3678) 2018-08-05 19:08:15 +01:00
Tao Feng da4f254283 [AIRFLOW-XXX] Add Feng Tao to committers list (#3689) 2018-08-03 20:42:30 +01:00
Cameron Moberg b4f43e6c48 [AIRFLOW-2658] Add GCP specific k8s pod operator (#3532)
Executes a task in a Kubernetes pod in the specified Google Kubernetes
Engine cluster. This makes it easier to interact with GCP kubernetes
engine service because it encapsulates acquiring credentials.
2018-08-02 20:44:16 +01:00
Xiaodong b120427b65 [AIRFLOW-2820] Add Web UI triggger in doc "Scheduling & Triggers"
In documentation page "Scheduling & Triggers",
it only mentioned the CLI method to
manually trigger a DAG run.

However, the manual trigger feature in Web UI
should be mentioned as well
(it may be even more frequently used by users).
2018-08-01 14:08:21 -07:00
Marcus Rehm 9983466fd1 [AIRFLOW-2795] Oracle to Oracle Transfer Operator (#3639) 2018-07-31 21:22:40 +02:00
Bolke de Bruin af15f1150d [AIRFLOW-2816] Fix license text in docs/license.rst 2018-07-28 13:20:26 +02:00
Amir Shahatit 98c7080361 Fix Typo in Scheduler documentation
Closes #3618 from amir656/patch-1
2018-07-21 13:33:29 +01:00
Marcus Rehm 52c745da71 [AIRFLOW-2596] Add Oracle to Azure Datalake Transfer Operator
Closes #3613 from
marcusrehm/oracle_to_azure_datalake_transfer
2018-07-20 22:46:59 +02:00
Ivan Arozamena ee4fc35774 [AIRFLOW-2749] Add feature to delete BQ Dataset
Closes #3598 from MENA1717/Add-bq-op
2018-07-17 13:56:05 +01:00
Matthew Thorley 6b7645261b [AIRFLOW-2710] Clarify fernet key value in documentation
Closes #3574 from padwasabimasala/AIRFLOW-2710
2018-07-08 20:52:51 +02:00
Tim Swast 89c1f530da [AIRFLOW-2682] Add how-to guides for bash and python operators
Closes #3552 from tswast/airflow-2682-bash-python-
how-to
2018-06-29 14:15:16 +02:00
Kevin Yang 284dbdb60a [AIRFLOW-2359] Add set failed for DagRun and task in tree view
Closes #3255 from
yrqls21/kevin_yang_add_set_failed
2018-06-28 13:30:36 -07:00
Kaxil Naik 7961ee8f08 [AIRFLOW-2663] Add instructions to install SSH dependencies
Closes #3536 from kaxil/patch-1
2018-06-22 16:35:48 +02:00
Kengo Seki 5f49ebf018 [AIRFLOW-2640] Add Cassandra table sensor
Just like a partition sensor for Hive,
this PR adds a sensor that waits for
a table to be created in Cassandra cluster.

Closes #3518 from sekikn/AIRFLOW-2640
2018-06-20 20:36:32 +02:00
niels 3dade5413f [AIRFLOW-2559] Azure Fileshare hook
Closes #3457 from NielsZeilemaker/fileshare_hook
2018-06-18 22:23:53 +01:00
Cameron Moberg dc38b2f46d [AIRFLOW-2613] Fix Airflow searching .zip bug
When Airflow was populating a DagBag from a .zip
file, if a single
file in the root directory did not contain the
strings 'airflow' and
'DAG' it would ignore the entire .zip file.

Also added a small amount of logging to not
bombard user with info
about skipping their .py files.

Closes #3505 from Noremac201/dag_name
2018-06-17 19:16:12 +01:00
Kengo Seki 4d153ad4e8 [AIRFLOW-2627] Add a sensor for Cassandra
Closes #3510 from sekikn/AIRFLOW-2627
2018-06-17 19:10:48 +01:00
Cameron Moberg 7255589f95 [AIRFLOW-2562] Add Google Kubernetes Engine Operators
Add Google Kubernetes Engine create_cluster,
delete_cluster operators
This allows users to use airflow to create or
delete clusters in the
google cloud platform

Closes #3477 from Noremac201/gke_create
2018-06-15 20:44:29 +01:00
Tim Swast 0f4d681f6f [AIRFLOW-2512][AIRFLOW-2522] Use google-auth instead of oauth2client
* Updates the GCP hooks to use the google-auth
library and removes
  dependencies on the deprecated oauth2client
package.
* Removes inconsistent handling of the scope
parameter for different
  auth methods.

Note: using google-auth for credentials requires a
newer version of the
google-api-python-client package, so this commit
also updates the
minimum version for that.

To avoid some annoying warnings about the
discovery cache not being
supported, so disable the discovery cache
explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923

Tested by running:

    nosetests
tests/contrib/operators/test_dataflow_operator.py
\
        tests/contrib/operators/test_gcs*.py \
        tests/contrib/operators/test_mlengine_*.py \
        tests/contrib/operators/test_pubsub_operator.py \
        tests/contrib/hooks/test_gcp*.py \
        tests/contrib/hooks/test_gcs_hook.py \
        tests/contrib/hooks/test_bigquery_hook.py

and also tested by running some GCP-related DAGs
locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstart

Closes #3488 from tswast/google-auth
2018-06-12 23:53:21 +01:00
renzofrigato be3d551f72 [AIRFLOW-1115] fix github oauth api URL
Closes #3469 from renzofrigato/airflow_1115
2018-06-11 15:14:02 -07:00
Andy Cooper 9e1d8ee837 [AIRFLOW-83] add mongo hook and operator
Closes #3440 from
andscoop/AIRFLOW_83_add_mongo_hooks_and_operators
2018-06-05 23:30:02 +01:00
Charles Caygill 817296a7be [AIRFLOW-XXX] Fix doc typos
Closes #3459 from ccayg-sainsburys/master
2018-06-04 11:15:38 -07:00
Chao-Han Tsai 2800c8e556 [AIRFLOW-2526] dag_run.conf can override params
Make sure you have checked _all_ steps below.

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-2526
    - In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
params can be overridden by the dictionary passed
through `airflow backfill -c`

```
templated_command = """
    echo "text = {{ params.text }}"
"""

bash_operator = BashOperator(
    task_id='bash_task',
    bash_command=templated_command,
    dag=dag,
    params= {
        "text" : "normal processing"
    })
```

In daily processing it prints:
```
normal processing
```

In backfill processing `airflow trigger_dag -c
"{"text": "override success"}"`, it prints
```
override success
```

### Tests
- [ ] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

### Documentation
- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

### Code Quality
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes #3422 from milton0825/params-overridden-
through-cli
2018-06-01 11:22:10 -07:00
Tao feng b81bd08a33 [AIRFLOW-2538] Update faq doc on how to reduce airflow scheduler latency
Make sure you have checked _all_ steps below.

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
    -
https://issues.apache.org/jira/browse/AIRFLOW-2538
    - In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
Update the faq doc on how to reduce airflow
scheduler latency. This comes from our internal
production setting which also aligns with Maxime's
email(https://lists.apache.org/thread.html/%3CCAHE
Ep7WFAivyMJZ0N+0Zd1T3nvfyCJRudL3XSRLM4utSigR3dQmai
l.gmail.com%3E).

### Tests
- [ ] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:

### Commits
- [ ] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

### Documentation
- [ ] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

### Code Quality
- [ ] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes #3434 from feng-tao/update_faq
2018-05-31 22:01:59 -07:00
Chao-Han Tsai d5d97dc971 [AIRFLOW-2536] docs about how to deal with airflow initdb failure
Add docs to faq.rst to talk about how to deal with
Exception: Global variable
explicit_defaults_for_timestamp needs to be on (1)
for mysql

Closes #3429 from milton0825/fix-docs
2018-05-29 20:29:27 +01:00
Tim Swast 4c0d67f0d0 [AIRFLOW-2523] Add how-to for managing GCP connections
I'd like to have how-to guides for all connection
types, or at least the
different categories of connection types. I found
it difficult to figure
out how to manage a GCP connection, this commit
add a how-to guide for
this.

Also, since creating and editing connections
really aren't all that
different, the PR renames the "creating
connections" how-to to "managing
connections".

Closes #3419 from tswast/howto
2018-05-25 09:37:29 +01:00
Chao-Han Tsai 66f00bbf7b [AIRFLOW-2510] Introduce new macros: prev_ds and next_ds
Closes #3418 from milton0825/introduce-next_ds-
prev_ds
2018-05-25 10:13:49 +02:00
Kengo Seki e4e7b55ad7 [AIRFLOW-2518] Fix broken ToC links in integration.rst
Closes #3412 from sekikn/AIRFLOW-2518
2018-05-24 21:55:19 +01:00
Tim Swast 084bc91367 [AIRFLOW-2509] Separate config docs into how-to guides
Also moves how-to style instructions for logging
from "integration" page
to a "Writing Logs" how-to.

Closes #3400 from tswast/howto
2018-05-23 10:08:53 +01:00
roc fff87b5cfd [AIRFLOW-2397] Support affinity policies for Kubernetes executor/operator
KubernetesPodOperator now accept a dict type
parameter called "affinity", which represents a
group of affinity scheduling rules (nodeAffinity,
podAffinity, podAntiAffinity).

API reference: https://kubernetes.io/docs/referenc
e/generated/kubernetes-api/v1.10/#affinity-v1-core

Closes #3369 from imroc/AIRFLOW-2397
2018-05-19 00:47:53 +02:00
Tao feng 8a2cd08ce8 [AIRFLOW-2479] Improve doc FAQ section
Closes #3373 from feng-tao/airflow-2478
2018-05-19 00:38:27 +02:00
Joy Gao f5115b7e6a [ARIFLOW-2458] Add cassandra-to-gcs operator
Closes #3354 from jgao54/cassandra-to-gcs
2018-05-18 02:02:57 +01:00
Marcus Rehm 7c233179e9 [AIRFLOW-2420] Azure Data Lake Hook
Add AzureDataLakeHook as a first step to enable
Airflow connect to
Azure Data Lake.

The hook has a simple interface to upload and
download files with all
parameters available in Azure Data Lake sdk and
also a check_for_file
to query if a file exists in data lake.

[AIRFLOW-2420] Add functionality for Azure Data
Lake

Make sure you have checked _all_ steps below.

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW-242
0) issues and references them in the PR title.
    -
https://issues.apache.org/jira/browse/AIRFLOW-2420

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
       This PR creates Azure Data Lake hook
(adl_hook.AdlHook) and all the setup required to
create a new Azure Data Lake connection.

### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
       Adds tests to airflow.hooks.adl_hook.py in
tests.hooks.test_adl_hook.py

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

### Documentation
- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

### Code Quality
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes #3333 from marcusrehm/master
2018-05-15 10:30:54 -07:00
Sakshi Bansal c1d583f91a [AIRFLOW-2213] Add Quoble check operator
Closes #3300 from sakshi2894/AIRFLOW-2213
2018-05-15 14:51:35 +05:30
Kengo Seki b76d560ce1 [AIRFLOW-2465] Fix wrong module names in the doc
Closes #3359 from sekikn/AIRFLOW-2465
2018-05-15 08:54:04 +02:00
Daniel Imberman 8fa0bbd56e [AIRFLOW-2460] Users can now use volume mounts and volumes
When launching pods using k8s operator

Closes #3356 from dimberman/k8s-mounts
2018-05-14 21:59:59 +02:00
Kaxil Naik cb9ba02cfe [AIRFLOW-XXX] Updated contributors list
Closes #3358 from kaxil/patch-3
2018-05-14 20:47:33 +01:00
Cristòfol Torrens 9c915c1c8b [AIRFLOW-2461] Add support for cluster scaling on dataproc operator
Closes #3357 from piffall/master
2018-05-14 16:38:28 +01:00
Bolke de Bruin 648e1e6930 [AIRFLOW-2425] Add lineage support
Add lineage support by having inlets and oulets
that
are made available to dependent upstream or
downstream
tasks.

If configured to do so can send lineage data to a
backend. Apache Atlas is supported out of the box.

Closes #3321 from bolkedebruin/lineage_exp
2018-05-14 09:09:25 +02:00
Jordan Zucker 4d43b78f11 [AIRFLOW-2333] Add Segment Hook and TrackEventOperator
Add support for Segment with an accompanying hook
and an
operator for sending track events

Closes #3335 from jzucker2/add-segment-support
2018-05-11 09:25:19 +02:00
Kengo Seki 686e805e67 [AIRFLOW-2446] Add S3ToRedshiftTransfer into the "Integration" doc
This PR adds an undocumented AWS-related operator
into the "Integration" section and fixes some
obsolete description.

Closes #3340 from sekikn/AIRFLOW-2446
2018-05-10 20:02:49 +02:00
Luke Bodeen e5f2a38d6a [AIRFLOW-1978] Add WinRM windows operator and hook
Closes #3316 from cloneluke/winrm_connector2
2018-05-08 11:12:59 -07:00
Kengo Seki 6f6884641f [AIRFLOW-XXX] Fix wrong table header in scheduler.rst
Closes #3306 from sekikn/table_header
2018-05-02 23:47:33 -07:00
Sergio Ballesteros 12ab796b11 [AIRFLOW-2394] default cmds and arguments in kubernetes operator
Commands aand arguments to docker image in kubernetes operator

Closes #3289 from ese/k8soperator
2018-05-02 15:43:51 +02:00
Moe Nadal a67c13e44c [AIRFLOW-2401] Document the use of variables in Jinja template
Closes #2847 from moe-nadal-ck/patch-1
2018-04-30 15:06:10 -07:00
Tao feng 700c0f488f [AIRFLOW-2389] Create a pinot db api hook
Closes #3274 from feng-tao/pinot_db_hook
2018-04-30 08:41:43 +02:00
Bovard Doerschuk-Tiberi 2a8bb0e1b7 [AIRFLOW-1835] Update docs: Variable file is json
Searching through all the documentation I couldn't
find anywhere
that explained what file format it expected for
uploading settings.

Closes #2802 from bovard/variable_files_are_json
2018-04-25 14:21:35 -07:00
Tristram Oaten fd6f1d1a07 [AIRFLOW-2041] Correct Syntax in python examples
I parsed it with the ol' eyeball compiler. Someone
could flake8 it better, perhaps.
Changes:

 - correct `def` syntax on line 50
 - use literal dict on line 67

Closes #2479 from 0atman/patch-1
2018-04-24 23:04:38 -07:00
Agraj Mangal 1f86299cf9 [AIRFLOW-2068] MesosExecutor allows optional Docker image
In its current form, MesosExecutor schedules tasks
on mesos slaves which
just contain airflow commands assuming that the
mesos slaves already
have airflow installed and configured on them.
This assumption goes
against the Mesos philosophy of having a
heterogeneous cluster.

Since Mesos provides an option to pull a Docker
image before actually
running the actual task/command so this
improvement changes the
mesos_executor.py to specify an optional docker
image containing
airflow which can be pulled on slaves before
running the actual
airflow command. This also opens the door for an
optimization of
resources in a future PR, by allowing the
specification of CPU and
memory needed for each airflow task.

Closes #3008 from agrajm/AIRFLOW-2068
2018-04-23 22:22:35 +02:00
Fokko Driesprong e30a1f451a [AIRFLOW-2357] Add persistent volume for the logs
The logs are kept inside of the worker pod. By
attaching a persistent
disk we keep the logs and make them available for
the webserver.

- Remove the requirements.txt since we dont want
to maintain another
  dependency file
- Fix some small casing stuff
- Removed some unused code
- Add missing shebang lines
- Started on some docs
- Fixed the logging

Closes #3252 from Fokko/airflow-2357-pd-for-logs
2018-04-23 18:43:24 +02:00