Граф коммитов

3020 Коммитов

Автор SHA1 Сообщение Дата
Ash Berlin-Taylor 623d5cdaff
Spend less time waiting for LocalTaskJob's subprocss process to finish (#11373)
* Spend less time waiting for LocalTaskJob's subprocss process to finish

This is about is about a 20% speed up for short running tasks!

This change doesn't affect the "duration" reported in the TI table, but
does affect the time before the slot is freeded up from the executor -
which does affect overall task/dag throughput.

(All these tests are with the same BashOperator tasks, just running `echo 1`.)

**Before**

```
Task airflow.executors.celery_executor.execute_command[5e0bb50c-de6b-4c78-980d-f8d535bbd2aa] succeeded in 6.597011625010055s: None
Task airflow.executors.celery_executor.execute_command[0a39ec21-2b69-414c-a11b-05466204bcb3] succeeded in 6.604327297012787s: None

```

**After**

```
Task airflow.executors.celery_executor.execute_command[57077539-e7ea-452c-af03-6393278a2c34] succeeded in 1.7728257849812508s: None
Task airflow.executors.celery_executor.execute_command[9aa4a0c5-e310-49ba-a1aa-b0760adfce08] succeeded in 1.7124666879535653s: None
```

**After, including change from #11372**

```
Task airflow.executors.celery_executor.execute_command[35822fc6-932d-4a8a-b1d5-43a8b35c52a5] succeeded in 0.5421732050017454s: None
Task airflow.executors.celery_executor.execute_command[2ba46c47-c868-4c3a-80f8-40adaf03b720] succeeded in 0.5469810889917426s: None
```
2020-10-13 10:00:16 +01:00
Kaxil Naik 2345cd1f03
Fix Harcoded Airflow version (#11483)
This test will fail or will need fixing whenever we release new Airflow
version
2020-10-13 02:05:35 +01:00
Kaxil Naik 4e32546faf
Mask Password in Log table when using the CLI (#11468) 2020-10-12 19:27:01 +01:00
Jarek Potiuk 358e61d7d2
Move the test_process_dags_queries_count test to quarantine (#11455)
The test (test_process_dags_queries_count)
randomly produces bigger number of counts. Example here:

https://github.com/apache/airflow/runs/1239572585#step:6:421
2020-10-12 11:48:54 +02:00
Tomek Urbaszek 02ce45cafe
Refactor celery worker command (#11336)
This commit does small refactor of the way we star celery worker.
In this way it will be easier to migrate to Celery 5.0.
2020-10-12 11:21:27 +02:00
Kaxil Naik d305876bee
Remove redundant None provided as default to dict.get() (#11448) 2020-10-12 00:31:35 +01:00
eladkal c3e340584b
Change prefix of AwsDynamoDB hook module (#11209)
* align import path of AwsDynamoDBHook in aws providers

Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
2020-10-11 20:49:23 +01:00
John Bampton b786327041
Fix spelling in CeleryExecutor (#11407) 2020-10-11 18:06:26 +02:00
Ephraim Anierobi 686e0ee7df
Fix incorrect typing, remove hardcoded argument values and improve code in AzureContainerInstancesOperator (#11408) 2020-10-11 16:48:51 +02:00
Jarek Potiuk 5bc5994c2c
Split tests to more sub-types (#11402)
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.

This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
2020-10-11 07:40:31 -07:00
Joshua Carp bd204bb91b
Optionally set null marker in csv exports in BaseSQLToGCSOperator (#11409) 2020-10-11 11:48:54 +02:00
Jarek Potiuk 9416bedf8e
Moving the test to quarantine. (#11405)
I've seen the test being flaky and failing intermittently several times.

Moving it to quarantine for now.
2020-10-10 21:29:42 -07:00
John Bampton 7959df94cf
Fix spelling (#11404) 2020-10-10 20:47:22 +02:00
John Bampton 0620aaa0f8
Fix spelling (#11401) 2020-10-10 18:47:10 +02:00
Michał Misiewicz b7404b079a
KubernetesPodOperator should retry log tailing in case of interruption (#11325)
* KubernetesPodOperator can retry log tailing in case of interruption

* fix failing test

* change read_pod_logs method formatting

* KubernetesPodOperator retry log tailing based on last read log timestamp

* fix test_parse_log_line test  formatting

* add docstring to parse_log_line method

* fix kubernetes integration test
2020-10-09 15:59:47 -07:00
Jarek Potiuk 6fe020e105
Add tests for Custom cluster policy (#11381)
The custom ClusterPolicyViolation has been added in #10282
This one adds more comprehensive test to it.

Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-10-10 00:57:10 +02:00
Ash Berlin-Taylor 73b9163a8f
Fully support running more than one scheduler concurrently (#10956)
* Fully support running more than one scheduler concurrently.

This PR implements scheduler HA as proposed in AIP-15. The high level
design is as follows:

- Move all scheduling decisions into SchedulerJob (requiring DAG
  serialization in the scheduler)
- Use row-level locks to ensure schedulers don't stomp on each other
  (`SELECT ... FOR UPDATE`)
- Use `SKIP LOCKED` for better performance when multiple schedulers are
  running. (Mysql < 8 and MariaDB don't support this)
- Scheduling decisions are not tied to the parsing speed, but can
  operate just on the database

*DagFileProcessorProcess*:

Previously this component was responsible for more than just parsing the
DAG files as it's name might imply. It also was responsible for creating
DagRuns, and also making scheduling decisions of TIs, sending them from
"None" to "scheduled" state.

This commit changes it so that the DagFileProcessorProcess now will
update the SerializedDAG row for this DAG, and make no scheduling
decisions itself.

To make the scheduler's job easier (so that it can make as many
decisions as possible without having to load the possibly-large
SerializedDAG row) we store/update some columns on the DagModel table:

- `next_dagrun`: The execution_date of the next dag run that should be created (or
  None)
- `next_dagrun_create_after`: The earliest point at which the next dag
  run can be created

Pre-computing these values (and updating them every time the DAG is
parsed) reduce the overall load on the DB as many decisions can be taken
by selecting just these two columns/the small DagModel row.

In case of max_active_runs, or `@once` these columns will be set to
null, meaning "don't create any dag runs"

*SchedulerJob*

The SchedulerJob used to only queue/send tasks to the executor after
they were parsed, and returned from the DagFileProcessorProcess.

This PR breaks the link between parsing and enqueuing of tasks, instead
of looking at DAGs as they are parsed, we now:

-  store a new datetime column, `last_scheduling_decision` on DagRun
  table, signifying when a scheduler last examined a DagRun
- Each time around the loop the scheduler will get (and lock) the next
  _n_ DagRuns via `DagRun.next_dagruns_to_examine`, prioritising DagRuns
  which haven't been touched by a scheduler in the longest period
- SimpleTaskInstance etc have been almost entirely removed now, as we
  use the serialized versions

* Move callbacks execution from Scheduler loop to DagProcessorProcess

* Don’t run verify_integrity if the Serialized DAG hasn’t changed

dag_run.verify_integrity is slow, and we don't want to call it every time, just when the dag structure changes (which we can know now thanks to DAG Serialization)

* Add escape hatch to disable newly added "SELECT ... FOR UPDATE" queries

We are worried that these extra uses of row-level locking will cause
problems on MySQL 5.x (most likely deadlocks) so we are providing users
an "escape hatch" to be able to make these queries non-locking -- this
means that only a singe scheduler should be run, but being able to run
one is better than having the scheduler crash.

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-10-09 22:44:27 +01:00
Daniel Imberman 49aad025b5
Users can specify sub-secrets and paths k8spodop (#11369)
Allows users to specify items for specific key path projections
when using the airflow.kubernetes.secret.Secret class
2020-10-09 09:00:09 -07:00
Kaxil Naik ff1a2aaff8
Set start_date, end_date & duration for tasks failing without DagRun (#11358) 2020-10-09 15:21:39 +01:00
Ash Berlin-Taylor fe0bf6e1f0
Reduce "start-up" time for tasks in CeleryExecutor (#11372)
This is similar to #11327, but for Celery this time.

The impact is not quite as pronounced here (for simple dags at least)
but takes the average queued to start delay from 1.5s to 0.4s
2020-10-09 13:18:32 +01:00
Tobiasz Kędzierski 8baf657fc2
Fix regression in DataflowTemplatedJobStartOperator (#11167) 2020-10-09 10:21:16 +02:00
Vijayant 422b61a9dd
Adding ElastiCache Hook for creating, describing and deleting replication groups (#8701) 2020-10-09 09:19:26 +01:00
Sumit Maheshwari 5605d1063b
Fix DagBag bug when a dag has invalid schedule_interval (#11344) 2020-10-09 13:29:41 +05:30
Kaxil Naik 27e637fbe3
Bugfix: Error in SSHOperator when command is None (#11361)
closes https://github.com/apache/airflow/issues/10656
2020-10-09 08:35:39 +01:00
Ash Berlin-Taylor 4839a5bc6e
Reduce "start-up" time for tasks in LocalExecutor (#11327)
Spawning a whole new python process and then re-loading all of Airflow
is expensive. All though this time fades to insignificance for long
running tasks, this delay gives a "bad" experience for new users when
they are just trying out Airflow for the first time.

For the LocalExecutor this cuts the "queued time" down from 1.5s to 0.1s
on average.
2020-10-08 17:37:51 +01:00
Michał Słowikowski 832a7850f1
Add Azure Blob Storage to GCS transfer operator (#11321) 2020-10-08 12:16:50 +02:00
Satyasheel 5d007fd2ff
Strict type check for azure hooks (#11342) 2020-10-08 09:36:35 +02:00
FHoffmannCode b0fcf67559
Add AzureFileShareToGCSOperator (#10991) 2020-10-07 11:08:58 +02:00
Kishore Vancheeshwaran bbc3cea057
Move latest_only_operator.py to latest_only.py (#11178) (#11304) 2020-10-07 00:15:28 +01:00
amaterasu-coder dd98b21494
Add acl_policy parameter to GCSToS3Operator (#10804) (#10829) 2020-10-06 13:09:01 +02:00
Cooper Gillan 03ff067152
Add type annotations to ZendeskHook, update unit test (#10888)
* Add type annotations to ZendeskHook

__What__

* Add correct type annotations to ZendeskHook and each method
* Update one unit test to call an empty dictionary rather than a
NoneType since the argument should be a dictionary

__Why__

* Building out type annotations is good for the code base
* The query parameter is accessed with an index at one point, which
means that it cannot be a None type, but should rather be defaulted to
an empty dictionary if not provided

* Remove useless return
2020-10-06 11:32:53 +01:00
Ephraim Anierobi c51016b0b8
Add LocalToAzureDataLakeStorageOperator (#10814) 2020-10-05 22:40:19 +02:00
Ash Berlin-Taylor c9efa56550
Access task type via the property, not dundervars (#11274)
We don't currently create TIs form serialized dags, but we are about to
start -- at which point some of these cases would have just shown
"SerializedBaseOperator", rather than the _real_ class name.

The other changes are just for "consistency" -- we should always get the
task type from this property, not via `__class__.__name__`.

I haven't set up a pre-commit rule for this as using this dunder
accessor is used elsewhere on things that are not BaseOperator
instances, and detecting that is hard to do in a pre-commit rule.
2020-10-05 11:32:42 +01:00
Ephraim Anierobi fd682fd70a
fix job deletion (#11272) 2020-10-05 09:39:50 +02:00
Kaxil Naik 6dce7a6c26
Enable MySQL 8 CI jobs (#11247)
closes https://github.com/apache/airflow/issues/11164
2020-10-04 13:45:05 +02:00
Tomek Urbaszek f697ff2381
Move test tools from tests.utils to tests.test_utils (#10889) 2020-10-03 14:27:06 +02:00
Ephraim Anierobi 4210618789
Ensure target_dedicated_nodes or enable_auto_scale is set in AzureBatchOperator (#11251) 2020-10-03 10:59:51 +01:00
Arunvel Sriram e4125666b5
Add option to bulk clear DAG Runs in Browse DAG Runs page (#11226)
closes: #11076
2020-10-03 10:30:08 +01:00
Daniel Imberman 7338912a78
Add task adoption to CeleryKubernetesExecutor (#11244)
Routes task adoption based on queue name to CeleryExecutor
or KubernetesExecutor

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
2020-10-02 11:51:11 -07:00
Ryan Hamilton 24d0ecf4ee
Airflow 2.0 UI Overhaul/Refresh (#11195)
Resolves #10953.

A refreshed UI for the 2.0 release. The existing "theming" is a bit long in the tooth and this PR attempts to give it a modern look and some freshness to compliment all of the new features under the hood.

The majority of the changes to UI have been done through updates to the Bootstrap theme contained in bootstrap-theme.css. These are simply overrides to the default stylings that are packaged with Bootstrap.
2020-10-02 15:58:58 +01:00
Jed Cunningham c74b3ac79a
Optional import error tracebacks in web ui (#10663)
This PR allows for partial import error tracebacks to be exposed on the UI, if enabled. This extra context can be very helpful for users without access to the parsing logs to determine why their DAGs are failing to import properly.
2020-10-01 21:48:48 +02:00
Daniel Imberman 3ca11eb9b0
Kubernetes executor can adopt tasks from other schedulers (#10996)
* KubernetesExecutor can adopt tasks from other schedulers

* simplify

* recreate tables properly

* fix pylint

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
2020-10-01 12:07:38 -07:00
James Timmins 427a4a8f01
Replace get accessible dag ids (#11027) 2020-10-01 17:37:00 +01:00
Michał Słowikowski 00ffedb8c4
Add amazon glacier to GCS transfer operator (#10947)
Add Amazon Glacier to GCS transfer operator, Glacier job operator and sensor.
2020-09-30 14:59:26 +02:00
Daniel Imberman 9860719c72
[AIRFLOW-5545] Fixes recursion in DAG cycle testing (#6175)
* Fixes an issue where cycle detection uses recursion

and stack overflows after about 1000 tasks

(cherry picked from commit 63f1a180a17729aa937af642cfbf4ddfeccd1b9f)

* reduce test length

* slightly more efficient

* Update airflow/utils/dag_cycle_tester.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

* slightly more efficient

* actually works this time

Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-29 11:34:55 -07:00
Omair Khan 68e0eb6976
in_container bats pre-commit hook and updated bats-tests hook (#11179) 2020-09-29 11:59:06 +02:00
Ash Berlin-Taylor 6694eaa831
Show the location of the queries when the assert_queries_count fails. (#11186)
Example output (I forced one of the existing tests to fail)

```
E   AssertionError: The expected number of db queries is 3. The current number is 2.
E
E   Recorded query locations:
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94:	1
E   	scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101:	1
```

This makes it a bit easier to see what the queries are, without having
to re-run with full query tracing and then analyze the logs.
2020-09-28 19:39:21 +01:00
Ephraim Anierobi cb52fb0ae1
Add example DAG and system test for MySQLToGCSOperator (#10990) 2020-09-27 19:05:04 +02:00
Logan Attwood 37798f0d2a
Do not silently allow the use of undefined variables in jinja2 templates (#11016)
This can have *extremely* bad consequences. After this change, a jinja2
template like the one below will cause the task instance to fail, if the
DAG being executed is not a sub-DAG. This may also display an error on
the Rendered tab of the Task Instance page.

task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id)

Prior to the change in this commit, the above template would pull the
latest value for task_id 'z', for the given execution_date, from *any DAG*.
If your task_ids between DAGs are all unique, or if DAGs using the same
task_id always have different execution_date values, this will appear to
act like dag_id=None.

Our current theory is SQLAlchemy/Python doesn't behave as expected when
comparing `jinja2.Undefined` to `None`.
2020-09-25 09:15:28 +02:00
Nadim Younes 68fa29bff0
Added support for encrypted private keys in SSHHook (#11097)
* Added support for encrypted private keys in SSHHook

* Fixed Styling issues and added unit testing

* fixed last pylint styling issue by adding newline to the end of the file

* re-fixed newline issue for pylint checks

* fixed pep8 styling issues and black formatted files to pass static checks

* added comma as per suggestion to fix static check

Co-authored-by: Nadim Younes <nyounes@kobo.com>
2020-09-25 07:02:16 +02:00
Tomek Urbaszek daf8f31080
Add template fields renderers for better UI rendering (#11061)
This PR adds possibility to define template_fields_renderers for an
operator. In this way users will be able to provide information
what lexer should be used for rendering a particular field. This is
super useful for custom operator and gives more flexibility than
predefined keywords.

Co-authored-by: Kamil Olszewski <34898234+olchas@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-09-23 15:31:40 +02:00
yuqian90 423a382678
SkipMixin: Add missing session.commit() and test (#10421) 2020-09-22 21:08:12 +01:00
yuqian90 e59ad5b2c6
Make Skipmixin handle empty branch properly (#10751)
closes: #10725

Make sure SkipMixin.skip_all_except() handles empty branches like this properly. When "task1" is followed, "join" must not be skipped even though it is considered to be immediately downstream of "branch".
2020-09-22 20:48:26 +01:00
James Timmins fbd994a4cf
Add permissions for stable API (#10594)
Related Github Issue: https://github.com/apache/airflow/issues/8112
2020-09-22 17:23:59 +01:00
Jarek Potiuk 1ebd3a631c
Pandas behaviour for None changed in 1.1.2 (#11004)
In Pandas version 1.1.2 experimental NAN value started to be
returned instead of None in a number of places. That broke our tests.

Fixing the tests also requires the Pandas to be updated to be >=1.1.2
2020-09-22 14:23:49 +02:00
Kaxil Naik cb979f9f21
Get Airflow configs with sensitive data from CloudSecretManagerBackend (#11024) 2020-09-22 08:17:58 +01:00
Daniel Imberman f4513c0389
Revert "KubernetesJobWatcher no longer inherits from Process (#11017)" (#11065)
This reverts commit 1539bd051c.
2020-09-21 15:28:00 -07:00
Jarek Potiuk 3db4d3b04d
All versions in CI yamls are not hard-coded any more (#10959)
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.

This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh

This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh

Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.

MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
2020-09-21 20:02:04 +02:00
Kaxil Naik 2410f592a4
Get Airflow configs with sensitive data from AWS Systems Manager (#11023)
Adds support to AWS SSM for feature added in https://github.com/apache/airflow/pull/9645
2020-09-19 19:05:42 +01:00
Shekhar Singh 9edfcb7ac4
Support extra_args in S3Hook and GCSToS3Operator (#11001) 2020-09-19 02:03:21 +01:00
yuqian90 49c193fb87
[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator (#10153)
This commit introduces TaskGroup, which is a simple UI task grouping concept.

- TaskGroups can be collapsed/expanded in Graph View when clicked
- TaskGroups can be nested
- TaskGroups can be put upstream/downstream of tasks or other TaskGroups with >> and << operators
- Search box, hovering, focusing in Graph View treats TaskGroup properly. E.g. searching for tasks also highlights TaskGroup that contains matching task_id. When TaskGroup is expanded/collapsed, the affected TaskGroup is put in focus and moved to the centre of the graph.


What this commit does not do:

- This commit does not change or remove SubDagOperator. Although TaskGroup is intended as an alternative for SubDagOperator, deprecating SubDagOperator will need to be discussed/implemented in the future.
- This PR only implemented TaskGroup handling in the Graph View. In places such as Tree View, it will look like as-if 
- TaskGroup does not exist and all tasks are in the same flat DAG.

GitHub Issue: #8078
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
2020-09-19 01:51:37 +01:00
Daniel Imberman 1539bd051c
KubernetesJobWatcher no longer inherits from Process (#11017)
multiprocessing.Process is set up in a very unfortunate manner
that pretty much makes it impossible to test a class that inherits from
Process or use any of its internal functions. For this reason we decided
to seperate the actual process based functionality into a class member
2020-09-18 11:33:22 -07:00
Shubham Joshi 966a06d96b
Fetching databricks host from connection if not supplied in extras. (#10762)
* Fetching databricks host from connection if not supplied in extras.

* Fixing formatting issue in databricks test

Co-authored-by: joshi95 <shubham@playsimple.in>
2020-09-18 13:15:11 +02:00
Daniel Imberman cba51d49ee
Simplify the K8sExecutor and K8sPodOperator (#10393)
* Simplify Airflow on Kubernetes Story

Removes thousands of lines of code that essentially ammount to us
re-creating the Kubernetes API. Will offer a faster, simpler
KubernetesExecutor for 2.0

* Fix podgen tests

* fix documentation

* simplify validate function

* @mik-laj comments

* spellcheck

* spellcheck

* Update airflow/executors/kubernetes_executor.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-17 08:40:20 -07:00
Jarek Potiuk 82a9477cd3
The test_find_not_should_ignore_path is now in heisentests (#10989)
It seems that the test_find_not_should_ignore_path test has some
dependency on side-effects from other tests.

See #10988 - we are moving this test to heisentests until we
solve the issue.
2020-09-17 14:46:36 +02:00
Kaxil Naik e066260ef8
Improve the Error message in Breeze for invalid params (#10980)
Changed `Is` to `Passed`

Before:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Is: 'dpostgres'.

Switch to supported value with --backend flag.
```

After:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Passed: 'dpostgres'.

Switch to supported value with --backend flag.
```
2020-09-17 03:21:47 +01:00
Ash Berlin-Taylor 59dad1a4ea
Allow CeleryExecutor to "adopt" an orphaned queued or running task (#10949)
This can happen when a task is enqueued by one executor, and then that
scheduler dies/exits.

The default fallback behaviour is unchanged -- that queued tasks are
cleared and then and then later rescheduled.

But for Celery, we can do better -- if we record the Celery-generated
task_id, we can then re-create the AsyncResult objects for orphaned
tasks at a later date.

However, since Celery just reports all AsyncResult as "PENDING", even if
they aren't tasks currently in the broker queue, we need to apply a
timeout to "unblock" these tasks in case they never actually made it to
the Celery broker.

This all means that we can adopt tasks that have been enqueued another
CeleryExecutor if it dies, without having to clear the task and slow
down. This is especially useful as the task may have already started
running, and while clearing it would stop it, it's better if we don't
have to reset it!

Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
2020-09-16 20:10:30 +01:00
Ephraim Anierobi 76545bb3d6
Add example dag and system test for S3ToGCSOperator (#10951) 2020-09-16 19:36:08 +02:00
Robert Grizzell 2aec99c228
Fix empty asctime field in JSON formatted logs (#10515) 2020-09-16 17:50:27 +01:00
Daniel Imberman 1294e15d44
KubernetesPodOperator template fix (#10963)
* Ensure that K8sPodOperator can pull namespace from pod_template_file

Fixes a bug where users who run K8sPodOperator could not run because
the operator was expecting a namespace parameter

* add test

* self.pod

* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

* don't create pod until run

* spellcheck

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-16 07:58:32 -07:00
Kaxil Naik 905cdd502a
Add a default for DagModel.default_view (#10897)
fixes https://github.com/apache/airflow/issues/10283
2020-09-16 00:23:47 +01:00
Denis Evseev f7da7d94b4
Fix ExternalTaskMarker serialized fields (#10924)
Co-authored-by: Denis Evseev <xOnelinx@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-15 23:40:41 +01:00
John Bampton ce19657ec6
Fix case of GitHub. (#10955)
Changed `Github` to `GitHub`.
2020-09-15 14:49:27 -04:00
Kaxil Naik d43bb75367
Remove test dependency from TestApiKerberos (#10950)
TestApiKerberos::test_trigger_dag previously was dependent that the `example_bash_operator` exist in the Database.

If one of the other tests didn't write it to the DB or if one of the other tests cleared it from the DB, this test failed.
2020-09-15 14:19:29 +01:00
Ping Zhang 96165185f1
Add CeleryKubernetesExecutor (#10901)
it consists of CeleryExecutor and KubernetesExecutor, which allows users
to route their tasks to either Kubernetes or Celery based on the queue
defined on a task
2020-09-15 09:42:55 +02:00
Jed Cunningham b628067b42
Minor refactor of the login methods in tests.www.test_views (#10918)
- Instead of supporting only an Admin user in the base test class, you can also use a normal User or Viewer
- Only add users when they are being used so we can do a little less in the setup phase (minor speedup in TestDagACLView)
2020-09-14 23:54:23 +02:00
Tomek Urbaszek 5d6d5a2f7d
Allow to specify path to kubeconfig in KubernetesHook (#10453) 2020-09-14 18:16:53 +02:00
Dmytro Usenko 4e1f3a69db
[AIRFLOW-10645] Add AWS Secrets Manager Hook (#10655) 2020-09-14 08:54:48 -07:00
Tomek Urbaszek eaa49b2257
Fix chain methods for XComArg (#10827)
__lshift__ and __rshift__ methods should return other not self.
This PR fixes XComArg implementation  to support chain like this one:
BaseOprator >> XComArg >> BaseOperator

Related to: #10153
2020-09-14 13:13:04 +02:00
Ash Berlin-Taylor 9e42a97f3f
Mark task as failed when it fails sending in Celery (#10881)
If a task failed hard on celery, _before_ being able to execute the
airflow code the task would end up stuck in queued state. This change
makes it get retried.

This was discovered in load testing the HA work (but unrelated to HA
changes), where I swamped the kube-dns pod, meaning the worker was
sometimes unable to resolve the db name via DNS, so the state in the DB
was never updated
2020-09-14 10:40:14 +01:00
Jarek Potiuk b2dc346062
Make breeeze-complete Google Shell Guide compatible (#10708)
Also added unit tests for breeze-complete
Part of #10576
2020-09-14 10:21:09 +02:00
Jarek Potiuk 791f9044fe
Adds the maintain-heart-rate to quarantine. (#10922)
The test occasionally fails, moving it to quarantine for now.
2020-09-14 10:18:54 +02:00
tszerszen 12a652f534
Fix parameter name collision in AutoMLBatchPredictOperator #10723 (#10869)
Rename `params` to `prediction_params` to avoid
clash with BaseOperator arguments
2020-09-13 17:05:57 +02:00
Kaxil Naik f77a11d5b1
Add Secrets backend for Microsoft Azure Key Vault (#10898) 2020-09-13 16:45:21 +02:00
Kaxil Naik 92eafc01ed
Parameterize tests in hashicorp/hooks/test_vault.py (#10903)
Some of the tests were parameterizable, so less line to maintain with the same level of testing
2020-09-12 22:01:47 +01:00
Kaxil Naik ee42aaeaa2
Fix typo in the word 'instance' (#10902)
`instnace` -> `instance`
2020-09-12 20:08:43 +01:00
Kaxil Naik f383bb3416
Fix separated strings in test_secrets_manager.py (#10900)
"airflow.providers.amazon.aws.secrets.secrets_manager." "SecretsManagerBackend.get_conn_uri"

to

"airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend.get_conn_uri"
2020-09-12 18:31:38 +02:00
Daniel Cohen 2e8b4ece36
Pass conf to subdags (#9956) 2020-09-12 11:58:17 +01:00
tszerszen 41a62735ed
Add on_kill method to BigQueryInsertJobOperator (#10866)
* Add on_kill method to BigQueryInsertJobOperator
* BigQueryInsertJobOperator pylint disable=too-many-arguments
2020-09-11 20:48:16 +02:00
Daniel Imberman 56bd9b7d6b
Modify helm chart to use pod_template_file (#10872)
* Modify helm chart to use pod_template_file

Since we are deprecating most k8sexecutor arguments
we should use the pod_template_file when launching airflow
using the KubernetesExecutor

* fix tests

* one more nit

* fix dag command

* fix pylint
2020-09-11 10:47:59 -07:00
Anmol Dhingra c58d60635d
Update qubole_hook to not remove pool as an arg for qubole_operator (#10820) 2020-09-11 12:30:02 +05:30
Miller Tracy b9dc3c51ba
Added Plexus as an Airflow provider (#10591) 2020-09-10 19:54:38 +02:00
tszerszen 68cc7273bf
Add on_kill method to DataprocSubmitJobOperator (#10847) 2020-09-10 19:07:08 +02:00
Ash Berlin-Taylor 1a95361122
Fix and unquarantine TestDagFileProcessorAgent.test_parse_once (#10862)
The SmartSensor PR introduces slightly different behaviour on
list_py_files happens when given a file path directly.

Prior to that PR, if given a file path it would not include examples.

After that PR was merged, it would return that path and the example dags
(assuming they were enabled.)
2020-09-10 17:04:14 +01:00
Ash Berlin-Taylor 63b6e53ffd
Detect orphaned task instances by SchedulerJob id and heartbeat (#10729)
Once HA mode for scheduler lands, we can no longer reset orphaned
task by looking at the tasks in (the memory of) the current executor.

This changes it to keep track of which (Scheduler)Job queued/scheduled a
TaskInstance (the new "queued_by_job_id" column stored against
TaskInstance table), and then we can use the existing heartbeat
mechanism for jobs to notice when a TI should be reset.

As part of this the existing implementation of
`reset_state_for_orphaned_tasks` has been moved out of BaseJob in to
BackfillJob -- as only this and SchedulerJob had these methods, and the
SchedulerJob version now operates differently
2020-09-10 17:01:41 +01:00
Jarek Potiuk ff72327614
Move parse_once to quarantine (#10857) 2020-09-10 13:20:23 +01:00
Kaxil Naik ce66bc944d
Add test for Health Endpoint when there is an exception (#10846) 2020-09-10 01:00:40 +01:00
Kaxil Naik ee8b02a14f
Add missing assert call in test_dbapi_hook.py (#10842)
`assert` call was missing so the statement didn't test or wouldn't fail if condition isn't true
2020-09-09 23:59:16 +01:00
Kaxil Naik 9549274d11
Upgrade black to 20.8b1 (#10818) 2020-09-09 09:06:24 +01:00
Daniel Imberman 20481c3caf
Add pod_override setting for KubernetesExecutor (#10756)
* Add podOverride setting for KubernetesExecutor

Users of the KubernetesExecutor will now have a "podOverride"
option in the executor_config. This option will allow users to
modify the pod launched by the KubernetesExecutor using a
`kubernetes.client.models.V1Pod` class. This is the first step
in deprecating the tradition executor_config.

* Fix k8s tests

* fix docs
2020-09-08 15:56:59 -07:00