Граф коммитов

3020 Коммитов

Автор SHA1 Сообщение Дата
Tomek Urbaszek daf8f31080
Add template fields renderers for better UI rendering (#11061)
This PR adds possibility to define template_fields_renderers for an
operator. In this way users will be able to provide information
what lexer should be used for rendering a particular field. This is
super useful for custom operator and gives more flexibility than
predefined keywords.

Co-authored-by: Kamil Olszewski <34898234+olchas@users.noreply.github.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-09-23 15:31:40 +02:00
yuqian90 423a382678
SkipMixin: Add missing session.commit() and test (#10421) 2020-09-22 21:08:12 +01:00
yuqian90 e59ad5b2c6
Make Skipmixin handle empty branch properly (#10751)
closes: #10725

Make sure SkipMixin.skip_all_except() handles empty branches like this properly. When "task1" is followed, "join" must not be skipped even though it is considered to be immediately downstream of "branch".
2020-09-22 20:48:26 +01:00
James Timmins fbd994a4cf
Add permissions for stable API (#10594)
Related Github Issue: https://github.com/apache/airflow/issues/8112
2020-09-22 17:23:59 +01:00
Jarek Potiuk 1ebd3a631c
Pandas behaviour for None changed in 1.1.2 (#11004)
In Pandas version 1.1.2 experimental NAN value started to be
returned instead of None in a number of places. That broke our tests.

Fixing the tests also requires the Pandas to be updated to be >=1.1.2
2020-09-22 14:23:49 +02:00
Kaxil Naik cb979f9f21
Get Airflow configs with sensitive data from CloudSecretManagerBackend (#11024) 2020-09-22 08:17:58 +01:00
Daniel Imberman f4513c0389
Revert "KubernetesJobWatcher no longer inherits from Process (#11017)" (#11065)
This reverts commit 1539bd051c.
2020-09-21 15:28:00 -07:00
Jarek Potiuk 3db4d3b04d
All versions in CI yamls are not hard-coded any more (#10959)
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.

This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh

This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh

Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.

MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
2020-09-21 20:02:04 +02:00
Kaxil Naik 2410f592a4
Get Airflow configs with sensitive data from AWS Systems Manager (#11023)
Adds support to AWS SSM for feature added in https://github.com/apache/airflow/pull/9645
2020-09-19 19:05:42 +01:00
Shekhar Singh 9edfcb7ac4
Support extra_args in S3Hook and GCSToS3Operator (#11001) 2020-09-19 02:03:21 +01:00
yuqian90 49c193fb87
[AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator (#10153)
This commit introduces TaskGroup, which is a simple UI task grouping concept.

- TaskGroups can be collapsed/expanded in Graph View when clicked
- TaskGroups can be nested
- TaskGroups can be put upstream/downstream of tasks or other TaskGroups with >> and << operators
- Search box, hovering, focusing in Graph View treats TaskGroup properly. E.g. searching for tasks also highlights TaskGroup that contains matching task_id. When TaskGroup is expanded/collapsed, the affected TaskGroup is put in focus and moved to the centre of the graph.


What this commit does not do:

- This commit does not change or remove SubDagOperator. Although TaskGroup is intended as an alternative for SubDagOperator, deprecating SubDagOperator will need to be discussed/implemented in the future.
- This PR only implemented TaskGroup handling in the Graph View. In places such as Tree View, it will look like as-if 
- TaskGroup does not exist and all tasks are in the same flat DAG.

GitHub Issue: #8078
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
2020-09-19 01:51:37 +01:00
Daniel Imberman 1539bd051c
KubernetesJobWatcher no longer inherits from Process (#11017)
multiprocessing.Process is set up in a very unfortunate manner
that pretty much makes it impossible to test a class that inherits from
Process or use any of its internal functions. For this reason we decided
to seperate the actual process based functionality into a class member
2020-09-18 11:33:22 -07:00
Shubham Joshi 966a06d96b
Fetching databricks host from connection if not supplied in extras. (#10762)
* Fetching databricks host from connection if not supplied in extras.

* Fixing formatting issue in databricks test

Co-authored-by: joshi95 <shubham@playsimple.in>
2020-09-18 13:15:11 +02:00
Daniel Imberman cba51d49ee
Simplify the K8sExecutor and K8sPodOperator (#10393)
* Simplify Airflow on Kubernetes Story

Removes thousands of lines of code that essentially ammount to us
re-creating the Kubernetes API. Will offer a faster, simpler
KubernetesExecutor for 2.0

* Fix podgen tests

* fix documentation

* simplify validate function

* @mik-laj comments

* spellcheck

* spellcheck

* Update airflow/executors/kubernetes_executor.py

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-17 08:40:20 -07:00
Jarek Potiuk 82a9477cd3
The test_find_not_should_ignore_path is now in heisentests (#10989)
It seems that the test_find_not_should_ignore_path test has some
dependency on side-effects from other tests.

See #10988 - we are moving this test to heisentests until we
solve the issue.
2020-09-17 14:46:36 +02:00
Kaxil Naik e066260ef8
Improve the Error message in Breeze for invalid params (#10980)
Changed `Is` to `Passed`

Before:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Is: 'dpostgres'.

Switch to supported value with --backend flag.
```

After:

```

ERROR:  Allowed backend: [ sqlite mysql postgres ]. Passed: 'dpostgres'.

Switch to supported value with --backend flag.
```
2020-09-17 03:21:47 +01:00
Ash Berlin-Taylor 59dad1a4ea
Allow CeleryExecutor to "adopt" an orphaned queued or running task (#10949)
This can happen when a task is enqueued by one executor, and then that
scheduler dies/exits.

The default fallback behaviour is unchanged -- that queued tasks are
cleared and then and then later rescheduled.

But for Celery, we can do better -- if we record the Celery-generated
task_id, we can then re-create the AsyncResult objects for orphaned
tasks at a later date.

However, since Celery just reports all AsyncResult as "PENDING", even if
they aren't tasks currently in the broker queue, we need to apply a
timeout to "unblock" these tasks in case they never actually made it to
the Celery broker.

This all means that we can adopt tasks that have been enqueued another
CeleryExecutor if it dies, without having to clear the task and slow
down. This is especially useful as the task may have already started
running, and while clearing it would stop it, it's better if we don't
have to reset it!

Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
2020-09-16 20:10:30 +01:00
Ephraim Anierobi 76545bb3d6
Add example dag and system test for S3ToGCSOperator (#10951) 2020-09-16 19:36:08 +02:00
Robert Grizzell 2aec99c228
Fix empty asctime field in JSON formatted logs (#10515) 2020-09-16 17:50:27 +01:00
Daniel Imberman 1294e15d44
KubernetesPodOperator template fix (#10963)
* Ensure that K8sPodOperator can pull namespace from pod_template_file

Fixes a bug where users who run K8sPodOperator could not run because
the operator was expecting a namespace parameter

* add test

* self.pod

* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

* don't create pod until run

* spellcheck

Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-16 07:58:32 -07:00
Kaxil Naik 905cdd502a
Add a default for DagModel.default_view (#10897)
fixes https://github.com/apache/airflow/issues/10283
2020-09-16 00:23:47 +01:00
Denis Evseev f7da7d94b4
Fix ExternalTaskMarker serialized fields (#10924)
Co-authored-by: Denis Evseev <xOnelinx@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-09-15 23:40:41 +01:00
John Bampton ce19657ec6
Fix case of GitHub. (#10955)
Changed `Github` to `GitHub`.
2020-09-15 14:49:27 -04:00
Kaxil Naik d43bb75367
Remove test dependency from TestApiKerberos (#10950)
TestApiKerberos::test_trigger_dag previously was dependent that the `example_bash_operator` exist in the Database.

If one of the other tests didn't write it to the DB or if one of the other tests cleared it from the DB, this test failed.
2020-09-15 14:19:29 +01:00
Ping Zhang 96165185f1
Add CeleryKubernetesExecutor (#10901)
it consists of CeleryExecutor and KubernetesExecutor, which allows users
to route their tasks to either Kubernetes or Celery based on the queue
defined on a task
2020-09-15 09:42:55 +02:00
Jed Cunningham b628067b42
Minor refactor of the login methods in tests.www.test_views (#10918)
- Instead of supporting only an Admin user in the base test class, you can also use a normal User or Viewer
- Only add users when they are being used so we can do a little less in the setup phase (minor speedup in TestDagACLView)
2020-09-14 23:54:23 +02:00
Tomek Urbaszek 5d6d5a2f7d
Allow to specify path to kubeconfig in KubernetesHook (#10453) 2020-09-14 18:16:53 +02:00
Dmytro Usenko 4e1f3a69db
[AIRFLOW-10645] Add AWS Secrets Manager Hook (#10655) 2020-09-14 08:54:48 -07:00
Tomek Urbaszek eaa49b2257
Fix chain methods for XComArg (#10827)
__lshift__ and __rshift__ methods should return other not self.
This PR fixes XComArg implementation  to support chain like this one:
BaseOprator >> XComArg >> BaseOperator

Related to: #10153
2020-09-14 13:13:04 +02:00
Ash Berlin-Taylor 9e42a97f3f
Mark task as failed when it fails sending in Celery (#10881)
If a task failed hard on celery, _before_ being able to execute the
airflow code the task would end up stuck in queued state. This change
makes it get retried.

This was discovered in load testing the HA work (but unrelated to HA
changes), where I swamped the kube-dns pod, meaning the worker was
sometimes unable to resolve the db name via DNS, so the state in the DB
was never updated
2020-09-14 10:40:14 +01:00
Jarek Potiuk b2dc346062
Make breeeze-complete Google Shell Guide compatible (#10708)
Also added unit tests for breeze-complete
Part of #10576
2020-09-14 10:21:09 +02:00
Jarek Potiuk 791f9044fe
Adds the maintain-heart-rate to quarantine. (#10922)
The test occasionally fails, moving it to quarantine for now.
2020-09-14 10:18:54 +02:00
tszerszen 12a652f534
Fix parameter name collision in AutoMLBatchPredictOperator #10723 (#10869)
Rename `params` to `prediction_params` to avoid
clash with BaseOperator arguments
2020-09-13 17:05:57 +02:00
Kaxil Naik f77a11d5b1
Add Secrets backend for Microsoft Azure Key Vault (#10898) 2020-09-13 16:45:21 +02:00
Kaxil Naik 92eafc01ed
Parameterize tests in hashicorp/hooks/test_vault.py (#10903)
Some of the tests were parameterizable, so less line to maintain with the same level of testing
2020-09-12 22:01:47 +01:00
Kaxil Naik ee42aaeaa2
Fix typo in the word 'instance' (#10902)
`instnace` -> `instance`
2020-09-12 20:08:43 +01:00
Kaxil Naik f383bb3416
Fix separated strings in test_secrets_manager.py (#10900)
"airflow.providers.amazon.aws.secrets.secrets_manager." "SecretsManagerBackend.get_conn_uri"

to

"airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend.get_conn_uri"
2020-09-12 18:31:38 +02:00
Daniel Cohen 2e8b4ece36
Pass conf to subdags (#9956) 2020-09-12 11:58:17 +01:00
tszerszen 41a62735ed
Add on_kill method to BigQueryInsertJobOperator (#10866)
* Add on_kill method to BigQueryInsertJobOperator
* BigQueryInsertJobOperator pylint disable=too-many-arguments
2020-09-11 20:48:16 +02:00
Daniel Imberman 56bd9b7d6b
Modify helm chart to use pod_template_file (#10872)
* Modify helm chart to use pod_template_file

Since we are deprecating most k8sexecutor arguments
we should use the pod_template_file when launching airflow
using the KubernetesExecutor

* fix tests

* one more nit

* fix dag command

* fix pylint
2020-09-11 10:47:59 -07:00
Anmol Dhingra c58d60635d
Update qubole_hook to not remove pool as an arg for qubole_operator (#10820) 2020-09-11 12:30:02 +05:30
Miller Tracy b9dc3c51ba
Added Plexus as an Airflow provider (#10591) 2020-09-10 19:54:38 +02:00
tszerszen 68cc7273bf
Add on_kill method to DataprocSubmitJobOperator (#10847) 2020-09-10 19:07:08 +02:00
Ash Berlin-Taylor 1a95361122
Fix and unquarantine TestDagFileProcessorAgent.test_parse_once (#10862)
The SmartSensor PR introduces slightly different behaviour on
list_py_files happens when given a file path directly.

Prior to that PR, if given a file path it would not include examples.

After that PR was merged, it would return that path and the example dags
(assuming they were enabled.)
2020-09-10 17:04:14 +01:00
Ash Berlin-Taylor 63b6e53ffd
Detect orphaned task instances by SchedulerJob id and heartbeat (#10729)
Once HA mode for scheduler lands, we can no longer reset orphaned
task by looking at the tasks in (the memory of) the current executor.

This changes it to keep track of which (Scheduler)Job queued/scheduled a
TaskInstance (the new "queued_by_job_id" column stored against
TaskInstance table), and then we can use the existing heartbeat
mechanism for jobs to notice when a TI should be reset.

As part of this the existing implementation of
`reset_state_for_orphaned_tasks` has been moved out of BaseJob in to
BackfillJob -- as only this and SchedulerJob had these methods, and the
SchedulerJob version now operates differently
2020-09-10 17:01:41 +01:00
Jarek Potiuk ff72327614
Move parse_once to quarantine (#10857) 2020-09-10 13:20:23 +01:00
Kaxil Naik ce66bc944d
Add test for Health Endpoint when there is an exception (#10846) 2020-09-10 01:00:40 +01:00
Kaxil Naik ee8b02a14f
Add missing assert call in test_dbapi_hook.py (#10842)
`assert` call was missing so the statement didn't test or wouldn't fail if condition isn't true
2020-09-09 23:59:16 +01:00
Kaxil Naik 9549274d11
Upgrade black to 20.8b1 (#10818) 2020-09-09 09:06:24 +01:00
Daniel Imberman 20481c3caf
Add pod_override setting for KubernetesExecutor (#10756)
* Add podOverride setting for KubernetesExecutor

Users of the KubernetesExecutor will now have a "podOverride"
option in the executor_config. This option will allow users to
modify the pod launched by the KubernetesExecutor using a
`kubernetes.client.models.V1Pod` class. This is the first step
in deprecating the tradition executor_config.

* Fix k8s tests

* fix docs
2020-09-08 15:56:59 -07:00
Yingbo Wang ac943c9e18
[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499)
Co-authored-by: Yingbo Wang <yingbo.wang@airbnb.com>
2020-09-08 22:47:59 +01:00
Kamil Breguła ff41361e0e
Add task logging handler to airflow info command (#10771) 2020-09-08 22:12:55 +02:00
Jarek Potiuk 2811851f80
Move Impersonation test back to quarantine (#10809)
Seems that TestImpersonation is not stable even in isolation
Moving it back to quarantine for now.
2020-09-08 21:33:44 +02:00
Kamil Breguła 961131d51c
All files in providers package heve unit tests (#10799) 2020-09-08 13:55:35 +02:00
Ephraim Anierobi 078bfaf60a
Extract missing gcs_to_local example DAG from gcs example (#10767)
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-09-08 13:08:06 +02:00
Ephraim Anierobi 3c3342f1fd
Add unit test for AzureCosmosDocumentSensor (#10765) 2020-09-08 12:21:22 +02:00
Jarek Potiuk 4de67a6731
Move dev docker images to airflow registry (#9652)
Part of #9401
2020-09-08 10:07:10 +02:00
Joshua Carp 2934220dc9
Always return a list from S3Hook list methods (#10774) 2020-09-08 09:49:34 +02:00
Dmitri Kuksik 10ce31127f
Deprecate using global as the default region in Google Dataproc operators and hooks (#10772)
The region parameter is required for some of Google Dataproc operators
and it should be provided by users to avoid creating data-intensive 
tasks in any default location.
2020-09-08 08:46:29 +02:00
Jarek Potiuk b746f33fc6
Removes stable tests from quarantine (#10768)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
2020-09-08 07:36:12 +02:00
Mateusz Kukieła f14f379716
[AIRFLOW-10672] Refactor BigQueryToGCSOperator to use new method (#10773)
Makes BigQueryToGCSOperator to use BigQueryHook.insert_job method

Committer: Mateusz Kukieła <mateuszkukiela@gmail.com>
2020-09-07 16:18:16 +02:00
Tomek Urbaszek c8ee455685
Refactor DataprocCreateCluster operator to use simpler interface (#10403)
DataprocCreateCluster requires now:
- cluster config
- cluster name
- project id

In this way users don't have to pass project_id two times 
(in cluster definition and as parameter). The cluster object 
is built in create_cluster hook method
2020-09-07 12:21:00 +02:00
Kamil Breguła ddee0aa4fb
Simplify load connection in LocalFilesystemBackend (#10638) 2020-09-06 20:56:03 +02:00
Jed Cunningham 59f9a4116a
Add permission "extra_links" for Viewer role and above (#10719)
This change adds 'can extra links on Airflow' to the Viewer role and above. Currently, only Admins can see extra links by default.
2020-09-06 18:26:08 +02:00
Varun Dhussa ece685b5b8
Asynchronous execution of Dataproc jobs with a Sensor (#10673) 2020-09-05 13:11:37 +01:00
Kaxil Naik 7f0271f820
Improve test coverage for ConfObject in dag_run_schema (#10738)
Adds test to verify that string can be passed to conf and ConfObject._deserialize works.
2020-09-05 08:55:12 +02:00
Kaxil Naik a1a312ee1b
Fix typo in test_dag_run_schema.py (#10739) 2020-09-05 08:54:17 +02:00
Kaxil Naik 5b683f09c0
Improve test coverage for test_common_schema.py (#10740)
Adds test that an error is raised with specific message when unkown object type is passed
2020-09-05 08:53:43 +02:00
Antonio Davide Calì 6e3d7b63d3
Add masterConfig parameter to MLEngineStartTrainingJobOperator (#10578)
Co-authored-by: antonio-davide-cali <antonio.davide.cali@ikea.com>
2020-09-04 23:58:24 +02:00
Jarek Potiuk e4de7288a3
Switches to better BATS asserts (#10718)
BATS has additional libraries of asserts that are much more
straightforward and nicer to write tests for bash scripts

There is no dockerfile from BATS that contains those, so we
had to build our own (but it follows the same structure
as #9652 - where we keep our dev docker image
sources inside our repository and the generated docker images
in "apache/airflow:<tool>-CALVER-TOOLVER format.

We have more BATS unit test to add - following #10576
and this change will be of great help.
2020-09-04 22:25:29 +02:00
Daniel Imberman 828f7303b7
Add generate_yaml command to easily test KubernetesExecutor before deploying pods (#10677)
* Add generate_template command for kubernetes_executor

* move import

* fix test failure

* Address @mik-laj comments

* Address @mik-laj comments

* Use current dir

* add docs

* fix test
2020-09-03 18:04:23 -07:00
Kamil Breguła ab5235ee12
Unify command names in CLI (#10720)
* Unify command names in CLI

* fixup! Unify command names in CLI
2020-09-04 01:25:39 +02:00
Ash Berlin-Taylor de0d7d52ac
Make test_trigger_rule_dep tests re-runnable (#10712)
If we run this test
(TestTriggerRuleDep::test_get_states_count_upstream_ti specifically)
more than once without clearing the DB in between it would fail due to a
unique constraint violation.
2020-09-03 17:19:30 +01:00
Ash Berlin-Taylor a01d986f6a
Don't commit when explicitly passed a session to TI.set_state (#10710)
The `@provide_session` wrapper will already commit the transaction when
returned, unless an explicit session is passed in -- removing this
parameter changes the behaviour to be:

- If session explicitly passed in: don't commit (caller's
  responsibility)
- If no session passed in, `@provide_session` will commit for us already.
2020-09-03 17:18:32 +01:00
Kaxil Naik 9ac882e6cc
[AIRFLOW-5948] Replace SimpleDag with SerializedDag (#7694) 2020-09-03 16:52:27 +01:00
Tomek Urbaszek 913397c1c6
Make Cloud Build system tests setup runnable (#10692)
This change fixes error: open(quickstart.sh): Permission denied
that was rised during git add.
2020-09-03 13:20:10 +02:00
Aaditya Sharma 36aa88ffc1
Add jupytercmd and fix task failure when notify set as true in qubole operator (#10599)
Add jupytercmd in Qubole Operator which fires a JupyterNotebookCommand to the jupyter notebooks running on user's QDS account. Along with this, we have fixed a minor bug that caused the tasks to fail with --notify is set in Qubole Operator.

Co-authored-by: Aaditya Sharma <asharma@qubole.com>
2020-09-03 15:00:19 +05:30
Jarek Potiuk 4e09cb53ea
Add packages to function names in bash (#10670) (#10696)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576

(cherry picked from commit cc551ba793)
(cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)
2020-09-02 21:58:37 +02:00
Jarek Potiuk 649ce4ba9d
Implement Google Shell Conventions for breeze script (#10695)
* Implement Google Shell Conventions for breeze script … (#10651)

Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.

(cherry picked from commit 46c8d6714c)
(cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246)

* fixup! Implement Google Shell Conventions for breeze script … (#10651)
2020-09-02 21:55:50 +02:00
Kaxil Naik 9a10f83ab0
Revert recent breeze changes (#10651 & #10670) (#10694)
* Revert "Add packages to function names in bash (#10670)"

This reverts commit cc551ba793.

* Revert "Implement Google Shell Conventions for breeze script … (#10651)"

This reverts commit 46c8d6714c.
2020-09-02 17:27:36 +01:00
Kamil Breguła 0d9e421f16
Unify command names in CLI (#10669)
* Unify command names in CLI
2020-09-02 08:43:41 -04:00
Jarek Potiuk cc551ba793
Add packages to function names in bash (#10670)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576
2020-09-01 13:40:06 +02:00
Michał Słowikowski 804548d58f
Add Dataprep operators (#10304)
Add DataprepGetJobGroupOperator and DataprepRunJobGroupOperator
for Dataprep service.

Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
2020-09-01 12:59:13 +02:00
Shoichi Kagawa f40ac9b151
Add placement_strategy option (#9444) 2020-09-01 01:50:08 +02:00
Ephraim Anierobi aa2db70494
Unify error messages and complete type field in response (#10333)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-08-31 15:36:52 +02:00
Marco Aguiar e6a0a5374d
Display conf as a JSON in the DagRun list view (#10644)
Co-authored-by: Marco Aguiar <marco@DESKTOP-8IVSCHM.localdomain>
2020-08-31 15:31:58 +02:00
Jarek Potiuk 46c8d6714c
Implement Google Shell Conventions for breeze script … (#10651)
Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.
2020-08-31 13:24:53 +02:00
Masato Ohba 11c00bc820
Fix typos: duplicated "the" (#10647) 2020-08-30 09:57:24 +02:00
Kamil Breguła 8e0d9f09d9
Add airflow cheat-sheet command (#10619) 2020-08-28 21:25:29 +02:00
Tomek Urbaszek 5ae82a56da
Fix Google DLP example and improve ops idempotency (#10608) 2020-08-28 16:35:47 +02:00
Kamil Breguła 3867f76625
Update Google Cloud branding (#10615) 2020-08-28 12:19:27 +02:00
Kaxil Naik 725bf330ef
Revert Clean up DAG serializations based on last_updated (#7424) (#10613)
This PR reverts the behavior of https://github.com/apache/airflow/pull/7424
2020-08-27 20:56:41 +01:00
Anton Bryzgalov 2e56ee7b22
DockerOperator extra_hosts argument support added (#10546) 2020-08-27 11:36:04 +01:00
Beni Ben zikry 1e5aa4465c
Spark-on-K8S sensor - add driver logs (#10023) 2020-08-26 18:14:20 +02:00
Ping Zhang db378c09b7
[k8s] Store the raw ti key info to pod annotations (#10568)
The value of annotations can store the raw dag_id, task_id and
execution_date so that k8s executor can easily map pod event back
to the task instance
2020-08-26 07:53:37 -07:00
Jarek Potiuk 8a7c37281c
Untangle cyclic deps configuration <> secrets (#10559) 2020-08-26 16:38:57 +02:00
Kaxil Naik fdd9b6f65b
Enable Black on Providers Packages (#10543) 2020-08-25 17:39:04 +01:00
Kaxil Naik 4c6b7595de
Fix failing Black test on connexion (#10547) 2020-08-25 12:57:07 +01:00
Kaxil Naik 7c0d6ab9f4
Enable Black on Connexion API folders (#10545) 2020-08-25 12:10:20 +01:00
Ephraim Anierobi d6ce8c8561
Add update mask to patch dag endpoint (#10535) 2020-08-25 12:56:19 +02:00
Kaxil Naik d760265452
PyDocStyle: No whitespaces allowed surrounding docstring text (#10533) 2020-08-25 09:50:21 +01:00
Steven Yu bfefcce0c9
Updated REST API call so GET requests pass payload in query string instead of request body (#10462)
* Updated REST API call so GET requests pass payload in query string instead of request body

* Updated comparisons to use in to follow better standards

* Added whitespace for pylint failure

* Update Databricks hooks tests to reflect new payload

* Fixed trailing whitespace in unit test

Co-authored-by: Steven Yu <steven@databricks.com>
2020-08-25 01:59:36 +02:00
Kaxil Naik 0e0aefb8f0
Fix TestAWSDataSyncOperatorUpdate.__init__ method (#10536)
`__init` -> `__init__`
2020-08-25 00:39:57 +01:00
Kaxil Naik 6bed074b2d
Remove unreachable code in test_user_command.py (#10526) 2020-08-25 00:31:06 +01:00
Jarek Potiuk 2f2d8dbfaf
Remove all "noinspection" comments native to IntelliJ (#10525)
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.
2020-08-25 00:01:37 +02:00
Kamil Olszewski fef73b91d8
Fix impersonation related bug in bigtable tests (#10521)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-24 20:13:09 +01:00
Kamil Olszewski 3734876d98
Implement impersonation in google operators (#10052)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-24 13:47:59 +02:00
Derrick Qin b0598b5351
Add support for creating multiple replicated clusters in Bigtable hook and operator (#10475)
* Add support for creating multiple Bigtable replicas

* Flake8 fix
2020-08-24 11:44:22 +02:00
Jarek Potiuk 82369fadde
Removed the prerequisite for perf-kit path augmentation (#10492) 2020-08-23 15:50:25 +02:00
Kaxil Naik ef8df17348
Fix typo in Facebook Ads Provider (#10484)
`missings_keys` -> `missing_keys`
2020-08-22 21:54:19 +02:00
Jarek Potiuk 7ee7d7cf3f
Move perf_kit to tests.utils (#10470)
Perf_kit was a separate folder and it was a problem when we tried to
build it from Docker-embedded sources, because there was a hidden,
implicit dependency between tests (conftest) and perf.

Perf_kit is now moved to tests to be avaiilable in the CI image
also when we run tests without the sources mounted.
This is changing back in #10441 and we need to move perf_kit
for it to work.
2020-08-22 21:53:07 +02:00
Gabriel Montañola c6358045f9
Fixes S3ToRedshift COPY query (#10436)
* fix: 🐛 Wrong S3 URI on COPY query

The S3 URI on COPY query was appending the target Redshift table to the
S3 object key.

* test: 💍 Fixed typo on test query

The COPY query that the operator used is the same query the test uses.
2020-08-22 21:19:37 +02:00
Kaxil Naik 44a36b9ab3
Use assertEqual instead of assertTrue in tests/utils/test_dates.py for proper diff (#10457)
assertEqual will show show the proper diff instead of just "False is not True" error
2020-08-22 10:43:26 +02:00
Kaxil Naik 904c1d825a
Test exact match of Executor name (#10465)
Use `self.assertEqual` instead of `self.assertIn` to do an exact match of string name instead of partial match
2020-08-22 10:35:01 +02:00
Kaxil Naik 4a77211ab8
Remove redudandant checks in test_views.py (#10464)
- `self.check_content_in_response` already checks that response code is 200
- `self.assertEqual(None, ...)` -> `self.assertIsNone(...)`
- Fix typo: "succcess" -> `success`
2020-08-22 10:32:02 +02:00
Tomek Urbaszek fdd68ec653
Make system test work with 1.10 (#10444) 2020-08-21 16:45:37 +02:00
Omair Khan 1e371864cc
Add update endpoint for DAG (#9101) (#9740) 2020-08-21 12:28:21 +02:00
Felix Uellendall 2f552233f5
Add AzureBaseHook (#9747)
- refactor/change azure_container_instance to use AzureBaseHook
- add info to operators-and-hooks-ref.rst
- add howto docs for connecting to azure
- add auth mechanism via json config
- add azure conn type
2020-08-21 11:45:23 +02:00
Ignacio Peluffo 27d08b76a2
Amazon SES Hook (#10391)
* Add Amazon SES hook

* Add SES Hook to operators-and-hooks documentation.

* Fix arguments for parent class constructor call (PR feedback)

* Fix indentation in operators-and-hooks documentation

* Fix mypy error for argument on call to parent class constructor

* Simplify logic on constructor (PR feedback)

* Add custom headers and other relevant options to hook

* Change pylint exception rule to apply it only to function instead of module (PR feedback)

* Fix spellcheck error

* Vendorize airflow.utils.emaail

* fixup! Vendorize airflow.utils.emaail

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-08-21 09:32:25 +02:00
Craig Chatfield 88c7d2e526
Dataflow operators don't not always create a virtualenv (#10373) 2020-08-21 02:28:37 +02:00
Daniel Imberman e195c6a3d2
Make KubernetesExecutor recognize kubernetes_labels (#10412)
KubernetesExecutor needs to inject `kubernetes_labels` configs
into the worker_config
2020-08-20 00:06:56 +01:00
Daniel Imberman f76938c171
Make Kubernetes tests pass locally (#10407)
* Make Kubernetes tests pass locally

Currently Kuberentes tests only all pass within breeze.

This PR makes them read the local path so they can pass in any
system.

* static tests
2020-08-19 15:49:12 -07:00
Jarek Potiuk db446f2677
Replaced aliases for common tools with functions. (#10402)
This allows for all the kinds of verbosity we want, including
writing outputs to output files, and it also works out-of-the-box
in git-commit non-interactive shell scripts. Also as a side effect
we have mocked tools in bats tests, which will allow us to write
more comprehensive unit tests for the bash scripts of ours
(this is a long overdue task).

Part of #10368
2020-08-19 15:23:57 +02:00
Kaxil Naik 3bc37013f6
Add back 'refresh_all' method in airflow/www/views.py (#10328)
closes https://github.com/apache/airflow/issues/9749
2020-08-19 10:59:36 +01:00
QP Hou 541c47c998
Add basic auth API auth backend (#10356) 2020-08-19 09:44:17 +01:00
Cooper Gillan d6f6d53bcd
Expand JenkinsJobTriggerOperator unit tests (#10353)
Using the parameterized library, add unit test coverage
for JenkinsJobTriggerOperator parameters, covering parameters
as strings or as a list of strings.
2020-08-18 23:53:27 +01:00
Kamil Breguła 083c3c129b
Simplified GCSTaskHandler configuration (#10365) 2020-08-18 16:24:26 +02:00
Ping Zhang 439f7dc1d1
Use check_output to capture in celery task (#10310)
See: https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError

The check_call does not set output to the subprocess.CalledProcessError so the log.error(e.output) code is always None.

By using check_ouput, when there is CalledProcessError, it will correctly log the error output
2020-08-18 12:55:46 +02:00
Jubeen Lee dea345b05c
Fix AwsGlueJobSensor to stop running after the Glue job finished (#9022)
* Extract get_job_state and fix poke of AwsGlueJobSensor

* Save hook and reuse in GlueJobSensor

* Add descriptions for some functions

* Fix tests according to changed function definition

* Fix too long line

* Add type hints and apply review

* Fix type error

Co-authored-by: JB Lee <jb.lee@sendbird.com>
2020-08-17 18:41:50 +02:00
Omair Khan 1ae5bdf23e
Add test for GCSTaskHandler (#9600) (#9861)
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-08-17 10:53:57 +02:00
Ryan Yuan 382c1011b6
Add Bigtable Update Instance Hook/Operator (#10340)
Add Bigtable Update Instance Hook/Operator
2020-08-16 05:52:14 +02:00
Tomek Urbaszek be46d20fb4
Improve idempotency of BigQueryInsertJobOperator (#9590)
Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-08-15 10:30:22 +02:00
Kaxil Naik 5c2bb7b0b0
Webserver: Sanitize values passed to origin param (#10334) 2020-08-15 04:26:48 +01:00
yuqian90 4454224b68
Fix clear future recursive when ExternalTaskMarker is used (#9515) 2020-08-14 23:39:57 +01:00
mhenc 47387a69e6
Catch Permission Denied exception when getting secret from GCP Secret Manager. (#10326) 2020-08-14 17:26:05 +02:00
Kaxil Naik 2d4e44c04e
Respect DAG Serialization setting when running sync_perm (#10321)
We run this on Webserver Startup and when DAG Serialization is enabled we expect that no files are required but because of this bug the files were still looked for.
2020-08-13 21:38:49 +02:00
Jens Larsson 2f0613b0c2
Implement Google BigQuery Table Partition Sensor (#10218) 2020-08-13 15:23:46 +02:00
Jacob Ferriero 7f76b8b942
Add ClusterPolicyViolation support to airflow local settings (#10282)
This change will allow users to throw other exceptions (namely `AirflowClusterPolicyViolation`) than `DagCycleException` as part of Cluster Policies.

This can be helpful for running checks on tasks / DAGs (e.g. asserting task has a non-airflow owner) and failing to run tasks aren't compliant with these checks.

This is meant as a tool for airflow admins to prevent user mistakes (especially in shared Airflow infrastructure with newbies) than as a strong technical control for security/compliance posture.
2020-08-12 23:06:29 +01:00
David Cavaletto f6734b3b85
Enable Sphinx spellcheck for doc generation (#10280) 2020-08-12 21:30:37 +01:00
Kaxil Naik adce6f0296
Use Hash of Serialized DAG to determine DAG is changed or not (#10227)
closes #10116
2020-08-11 22:31:55 +01:00
Ephraim Anierobi 0ee437547b
Add unittest for WasbTaskHandler (#10284) 2020-08-11 18:18:49 +02:00
Daniel Imberman 3c374a42c0
Add reconcile_metadata to reconcile_pods (#10266)
metadata objects require a more complex merge strategy
then a simple "merge pods" for merging labels and other
features
2020-08-11 07:49:44 -07:00
Kamil Breguła 422e3f1d5d
Add Authentication for Stable API (#10267) 2020-08-11 16:23:10 +02:00
Jarek Potiuk 19bc97d0ce
Revert "Add Amazon SES hook (#10004)" (#10276)
This reverts commit f06fe616e6.
2020-08-10 16:30:40 +02:00
Ignacio Peluffo f06fe616e6
Add Amazon SES hook (#10004)
- refactor airflow.utils.email and add typing
2020-08-10 11:58:55 +02:00
Michał Słowikowski ef088314f8
Added DataprepGetJobsForJobGroupOperator (#10246) 2020-08-09 22:45:40 +02:00
Kamil Breguła db8d06a696
Disable sentry integration by default (#10212)
* Disable sentry integration by default
2020-08-09 13:21:41 +02:00
Kamil Breguła e2ec5ef665
Update example on docs/howto/connection/index.rst (#10236)
* Upddate example on docs/howto/connection/index.rst

* fixup! Upddate example on docs/howto/connection/index.rst
2020-08-09 12:25:15 +02:00
Kamil Breguła 12eed9d960
Add system tests for CloudSecretManagerBackend (#10235)
* Add system tests for CloudSecretManagerBackend

* fixup! Add system tests for CloudSecretManagerBackend
2020-08-08 19:00:41 +02:00
Cooper Gillan c29533888f
Add labels param to Google MLEngine Operators (#10222) 2020-08-08 02:47:50 +02:00