Граф коммитов

3020 Коммитов

Автор SHA1 Сообщение Дата
Yingbo Wang ac943c9e18
[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499)
Co-authored-by: Yingbo Wang <yingbo.wang@airbnb.com>
2020-09-08 22:47:59 +01:00
Kamil Breguła ff41361e0e
Add task logging handler to airflow info command (#10771) 2020-09-08 22:12:55 +02:00
Jarek Potiuk 2811851f80
Move Impersonation test back to quarantine (#10809)
Seems that TestImpersonation is not stable even in isolation
Moving it back to quarantine for now.
2020-09-08 21:33:44 +02:00
Kamil Breguła 961131d51c
All files in providers package heve unit tests (#10799) 2020-09-08 13:55:35 +02:00
Ephraim Anierobi 078bfaf60a
Extract missing gcs_to_local example DAG from gcs example (#10767)
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-09-08 13:08:06 +02:00
Ephraim Anierobi 3c3342f1fd
Add unit test for AzureCosmosDocumentSensor (#10765) 2020-09-08 12:21:22 +02:00
Jarek Potiuk 4de67a6731
Move dev docker images to airflow registry (#9652)
Part of #9401
2020-09-08 10:07:10 +02:00
Joshua Carp 2934220dc9
Always return a list from S3Hook list methods (#10774) 2020-09-08 09:49:34 +02:00
Dmitri Kuksik 10ce31127f
Deprecate using global as the default region in Google Dataproc operators and hooks (#10772)
The region parameter is required for some of Google Dataproc operators
and it should be provided by users to avoid creating data-intensive 
tasks in any default location.
2020-09-08 08:46:29 +02:00
Jarek Potiuk b746f33fc6
Removes stable tests from quarantine (#10768)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
2020-09-08 07:36:12 +02:00
Mateusz Kukieła f14f379716
[AIRFLOW-10672] Refactor BigQueryToGCSOperator to use new method (#10773)
Makes BigQueryToGCSOperator to use BigQueryHook.insert_job method

Committer: Mateusz Kukieła <mateuszkukiela@gmail.com>
2020-09-07 16:18:16 +02:00
Tomek Urbaszek c8ee455685
Refactor DataprocCreateCluster operator to use simpler interface (#10403)
DataprocCreateCluster requires now:
- cluster config
- cluster name
- project id

In this way users don't have to pass project_id two times 
(in cluster definition and as parameter). The cluster object 
is built in create_cluster hook method
2020-09-07 12:21:00 +02:00
Kamil Breguła ddee0aa4fb
Simplify load connection in LocalFilesystemBackend (#10638) 2020-09-06 20:56:03 +02:00
Jed Cunningham 59f9a4116a
Add permission "extra_links" for Viewer role and above (#10719)
This change adds 'can extra links on Airflow' to the Viewer role and above. Currently, only Admins can see extra links by default.
2020-09-06 18:26:08 +02:00
Varun Dhussa ece685b5b8
Asynchronous execution of Dataproc jobs with a Sensor (#10673) 2020-09-05 13:11:37 +01:00
Kaxil Naik 7f0271f820
Improve test coverage for ConfObject in dag_run_schema (#10738)
Adds test to verify that string can be passed to conf and ConfObject._deserialize works.
2020-09-05 08:55:12 +02:00
Kaxil Naik a1a312ee1b
Fix typo in test_dag_run_schema.py (#10739) 2020-09-05 08:54:17 +02:00
Kaxil Naik 5b683f09c0
Improve test coverage for test_common_schema.py (#10740)
Adds test that an error is raised with specific message when unkown object type is passed
2020-09-05 08:53:43 +02:00
Antonio Davide Calì 6e3d7b63d3
Add masterConfig parameter to MLEngineStartTrainingJobOperator (#10578)
Co-authored-by: antonio-davide-cali <antonio.davide.cali@ikea.com>
2020-09-04 23:58:24 +02:00
Jarek Potiuk e4de7288a3
Switches to better BATS asserts (#10718)
BATS has additional libraries of asserts that are much more
straightforward and nicer to write tests for bash scripts

There is no dockerfile from BATS that contains those, so we
had to build our own (but it follows the same structure
as #9652 - where we keep our dev docker image
sources inside our repository and the generated docker images
in "apache/airflow:<tool>-CALVER-TOOLVER format.

We have more BATS unit test to add - following #10576
and this change will be of great help.
2020-09-04 22:25:29 +02:00
Daniel Imberman 828f7303b7
Add generate_yaml command to easily test KubernetesExecutor before deploying pods (#10677)
* Add generate_template command for kubernetes_executor

* move import

* fix test failure

* Address @mik-laj comments

* Address @mik-laj comments

* Use current dir

* add docs

* fix test
2020-09-03 18:04:23 -07:00
Kamil Breguła ab5235ee12
Unify command names in CLI (#10720)
* Unify command names in CLI

* fixup! Unify command names in CLI
2020-09-04 01:25:39 +02:00
Ash Berlin-Taylor de0d7d52ac
Make test_trigger_rule_dep tests re-runnable (#10712)
If we run this test
(TestTriggerRuleDep::test_get_states_count_upstream_ti specifically)
more than once without clearing the DB in between it would fail due to a
unique constraint violation.
2020-09-03 17:19:30 +01:00
Ash Berlin-Taylor a01d986f6a
Don't commit when explicitly passed a session to TI.set_state (#10710)
The `@provide_session` wrapper will already commit the transaction when
returned, unless an explicit session is passed in -- removing this
parameter changes the behaviour to be:

- If session explicitly passed in: don't commit (caller's
  responsibility)
- If no session passed in, `@provide_session` will commit for us already.
2020-09-03 17:18:32 +01:00
Kaxil Naik 9ac882e6cc
[AIRFLOW-5948] Replace SimpleDag with SerializedDag (#7694) 2020-09-03 16:52:27 +01:00
Tomek Urbaszek 913397c1c6
Make Cloud Build system tests setup runnable (#10692)
This change fixes error: open(quickstart.sh): Permission denied
that was rised during git add.
2020-09-03 13:20:10 +02:00
Aaditya Sharma 36aa88ffc1
Add jupytercmd and fix task failure when notify set as true in qubole operator (#10599)
Add jupytercmd in Qubole Operator which fires a JupyterNotebookCommand to the jupyter notebooks running on user's QDS account. Along with this, we have fixed a minor bug that caused the tasks to fail with --notify is set in Qubole Operator.

Co-authored-by: Aaditya Sharma <asharma@qubole.com>
2020-09-03 15:00:19 +05:30
Jarek Potiuk 4e09cb53ea
Add packages to function names in bash (#10670) (#10696)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576

(cherry picked from commit cc551ba793)
(cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)
2020-09-02 21:58:37 +02:00
Jarek Potiuk 649ce4ba9d
Implement Google Shell Conventions for breeze script (#10695)
* Implement Google Shell Conventions for breeze script … (#10651)

Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.

(cherry picked from commit 46c8d6714c)
(cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246)

* fixup! Implement Google Shell Conventions for breeze script … (#10651)
2020-09-02 21:55:50 +02:00
Kaxil Naik 9a10f83ab0
Revert recent breeze changes (#10651 & #10670) (#10694)
* Revert "Add packages to function names in bash (#10670)"

This reverts commit cc551ba793.

* Revert "Implement Google Shell Conventions for breeze script … (#10651)"

This reverts commit 46c8d6714c.
2020-09-02 17:27:36 +01:00
Kamil Breguła 0d9e421f16
Unify command names in CLI (#10669)
* Unify command names in CLI
2020-09-02 08:43:41 -04:00
Jarek Potiuk cc551ba793
Add packages to function names in bash (#10670)
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.

While we already had packages (in libraries folders)
it's been difficult to realise which function is where.

With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.

Way easier in fact.

Part of #10576
2020-09-01 13:40:06 +02:00
Michał Słowikowski 804548d58f
Add Dataprep operators (#10304)
Add DataprepGetJobGroupOperator and DataprepRunJobGroupOperator
for Dataprep service.

Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
2020-09-01 12:59:13 +02:00
Shoichi Kagawa f40ac9b151
Add placement_strategy option (#9444) 2020-09-01 01:50:08 +02:00
Ephraim Anierobi aa2db70494
Unify error messages and complete type field in response (#10333)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-08-31 15:36:52 +02:00
Marco Aguiar e6a0a5374d
Display conf as a JSON in the DagRun list view (#10644)
Co-authored-by: Marco Aguiar <marco@DESKTOP-8IVSCHM.localdomain>
2020-08-31 15:31:58 +02:00
Jarek Potiuk 46c8d6714c
Implement Google Shell Conventions for breeze script … (#10651)
Part of #10576

First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.

This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.

The rules implemented (from the conventions):

 * constants and exported variables are CAPITALIZED, where
   local/temporary variables are lowercase

 * following the shell guide, once all the variables are set to their
   final values (either from exported variables, calculation or --switches
   ) I have a single function that makes all the variables read-only. That
   helped to clean-up a lot of places where same functions was called
   several times, or where variables were defined in a few places. Now the
   behavior should be rather consistent and we should easily catch some
   duplications

 * function headers (following the guide) explaining arguments,
   variables expected, variables modified in the functions used.

 * setting the variables as read-only also helped to clean-up the "ifs"
   where we often had ":=}" in variables and != "" or == "". Those are
   replaced with `=}` and tests are replaced with `-n` and `-z` - also
   following the shell guide (readonly helped to detect and clean all
   such cases). This also should be much more robust in the future.

 * reorganized initialization of those constants and variables - simplified
   a few places where initialization was overlapping. It should be much more
   straightforward and clean now

 * a number of internal function breeze variables are "local" - this is
   helpful in accidental variables overwriting and keeping stuff localized

 * trap_add function is separated out to help in cases where we had
   several traps handling the same signals.
2020-08-31 13:24:53 +02:00
Masato Ohba 11c00bc820
Fix typos: duplicated "the" (#10647) 2020-08-30 09:57:24 +02:00
Kamil Breguła 8e0d9f09d9
Add airflow cheat-sheet command (#10619) 2020-08-28 21:25:29 +02:00
Tomek Urbaszek 5ae82a56da
Fix Google DLP example and improve ops idempotency (#10608) 2020-08-28 16:35:47 +02:00
Kamil Breguła 3867f76625
Update Google Cloud branding (#10615) 2020-08-28 12:19:27 +02:00
Kaxil Naik 725bf330ef
Revert Clean up DAG serializations based on last_updated (#7424) (#10613)
This PR reverts the behavior of https://github.com/apache/airflow/pull/7424
2020-08-27 20:56:41 +01:00
Anton Bryzgalov 2e56ee7b22
DockerOperator extra_hosts argument support added (#10546) 2020-08-27 11:36:04 +01:00
Beni Ben zikry 1e5aa4465c
Spark-on-K8S sensor - add driver logs (#10023) 2020-08-26 18:14:20 +02:00
Ping Zhang db378c09b7
[k8s] Store the raw ti key info to pod annotations (#10568)
The value of annotations can store the raw dag_id, task_id and
execution_date so that k8s executor can easily map pod event back
to the task instance
2020-08-26 07:53:37 -07:00
Jarek Potiuk 8a7c37281c
Untangle cyclic deps configuration <> secrets (#10559) 2020-08-26 16:38:57 +02:00
Kaxil Naik fdd9b6f65b
Enable Black on Providers Packages (#10543) 2020-08-25 17:39:04 +01:00
Kaxil Naik 4c6b7595de
Fix failing Black test on connexion (#10547) 2020-08-25 12:57:07 +01:00
Kaxil Naik 7c0d6ab9f4
Enable Black on Connexion API folders (#10545) 2020-08-25 12:10:20 +01:00
Ephraim Anierobi d6ce8c8561
Add update mask to patch dag endpoint (#10535) 2020-08-25 12:56:19 +02:00
Kaxil Naik d760265452
PyDocStyle: No whitespaces allowed surrounding docstring text (#10533) 2020-08-25 09:50:21 +01:00
Steven Yu bfefcce0c9
Updated REST API call so GET requests pass payload in query string instead of request body (#10462)
* Updated REST API call so GET requests pass payload in query string instead of request body

* Updated comparisons to use in to follow better standards

* Added whitespace for pylint failure

* Update Databricks hooks tests to reflect new payload

* Fixed trailing whitespace in unit test

Co-authored-by: Steven Yu <steven@databricks.com>
2020-08-25 01:59:36 +02:00
Kaxil Naik 0e0aefb8f0
Fix TestAWSDataSyncOperatorUpdate.__init__ method (#10536)
`__init` -> `__init__`
2020-08-25 00:39:57 +01:00
Kaxil Naik 6bed074b2d
Remove unreachable code in test_user_command.py (#10526) 2020-08-25 00:31:06 +01:00
Jarek Potiuk 2f2d8dbfaf
Remove all "noinspection" comments native to IntelliJ (#10525)
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.
2020-08-25 00:01:37 +02:00
Kamil Olszewski fef73b91d8
Fix impersonation related bug in bigtable tests (#10521)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-24 20:13:09 +01:00
Kamil Olszewski 3734876d98
Implement impersonation in google operators (#10052)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-24 13:47:59 +02:00
Derrick Qin b0598b5351
Add support for creating multiple replicated clusters in Bigtable hook and operator (#10475)
* Add support for creating multiple Bigtable replicas

* Flake8 fix
2020-08-24 11:44:22 +02:00
Jarek Potiuk 82369fadde
Removed the prerequisite for perf-kit path augmentation (#10492) 2020-08-23 15:50:25 +02:00
Kaxil Naik ef8df17348
Fix typo in Facebook Ads Provider (#10484)
`missings_keys` -> `missing_keys`
2020-08-22 21:54:19 +02:00
Jarek Potiuk 7ee7d7cf3f
Move perf_kit to tests.utils (#10470)
Perf_kit was a separate folder and it was a problem when we tried to
build it from Docker-embedded sources, because there was a hidden,
implicit dependency between tests (conftest) and perf.

Perf_kit is now moved to tests to be avaiilable in the CI image
also when we run tests without the sources mounted.
This is changing back in #10441 and we need to move perf_kit
for it to work.
2020-08-22 21:53:07 +02:00
Gabriel Montañola c6358045f9
Fixes S3ToRedshift COPY query (#10436)
* fix: 🐛 Wrong S3 URI on COPY query

The S3 URI on COPY query was appending the target Redshift table to the
S3 object key.

* test: 💍 Fixed typo on test query

The COPY query that the operator used is the same query the test uses.
2020-08-22 21:19:37 +02:00
Kaxil Naik 44a36b9ab3
Use assertEqual instead of assertTrue in tests/utils/test_dates.py for proper diff (#10457)
assertEqual will show show the proper diff instead of just "False is not True" error
2020-08-22 10:43:26 +02:00
Kaxil Naik 904c1d825a
Test exact match of Executor name (#10465)
Use `self.assertEqual` instead of `self.assertIn` to do an exact match of string name instead of partial match
2020-08-22 10:35:01 +02:00
Kaxil Naik 4a77211ab8
Remove redudandant checks in test_views.py (#10464)
- `self.check_content_in_response` already checks that response code is 200
- `self.assertEqual(None, ...)` -> `self.assertIsNone(...)`
- Fix typo: "succcess" -> `success`
2020-08-22 10:32:02 +02:00
Tomek Urbaszek fdd68ec653
Make system test work with 1.10 (#10444) 2020-08-21 16:45:37 +02:00
Omair Khan 1e371864cc
Add update endpoint for DAG (#9101) (#9740) 2020-08-21 12:28:21 +02:00
Felix Uellendall 2f552233f5
Add AzureBaseHook (#9747)
- refactor/change azure_container_instance to use AzureBaseHook
- add info to operators-and-hooks-ref.rst
- add howto docs for connecting to azure
- add auth mechanism via json config
- add azure conn type
2020-08-21 11:45:23 +02:00
Ignacio Peluffo 27d08b76a2
Amazon SES Hook (#10391)
* Add Amazon SES hook

* Add SES Hook to operators-and-hooks documentation.

* Fix arguments for parent class constructor call (PR feedback)

* Fix indentation in operators-and-hooks documentation

* Fix mypy error for argument on call to parent class constructor

* Simplify logic on constructor (PR feedback)

* Add custom headers and other relevant options to hook

* Change pylint exception rule to apply it only to function instead of module (PR feedback)

* Fix spellcheck error

* Vendorize airflow.utils.emaail

* fixup! Vendorize airflow.utils.emaail

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-08-21 09:32:25 +02:00
Craig Chatfield 88c7d2e526
Dataflow operators don't not always create a virtualenv (#10373) 2020-08-21 02:28:37 +02:00
Daniel Imberman e195c6a3d2
Make KubernetesExecutor recognize kubernetes_labels (#10412)
KubernetesExecutor needs to inject `kubernetes_labels` configs
into the worker_config
2020-08-20 00:06:56 +01:00
Daniel Imberman f76938c171
Make Kubernetes tests pass locally (#10407)
* Make Kubernetes tests pass locally

Currently Kuberentes tests only all pass within breeze.

This PR makes them read the local path so they can pass in any
system.

* static tests
2020-08-19 15:49:12 -07:00
Jarek Potiuk db446f2677
Replaced aliases for common tools with functions. (#10402)
This allows for all the kinds of verbosity we want, including
writing outputs to output files, and it also works out-of-the-box
in git-commit non-interactive shell scripts. Also as a side effect
we have mocked tools in bats tests, which will allow us to write
more comprehensive unit tests for the bash scripts of ours
(this is a long overdue task).

Part of #10368
2020-08-19 15:23:57 +02:00
Kaxil Naik 3bc37013f6
Add back 'refresh_all' method in airflow/www/views.py (#10328)
closes https://github.com/apache/airflow/issues/9749
2020-08-19 10:59:36 +01:00
QP Hou 541c47c998
Add basic auth API auth backend (#10356) 2020-08-19 09:44:17 +01:00
Cooper Gillan d6f6d53bcd
Expand JenkinsJobTriggerOperator unit tests (#10353)
Using the parameterized library, add unit test coverage
for JenkinsJobTriggerOperator parameters, covering parameters
as strings or as a list of strings.
2020-08-18 23:53:27 +01:00
Kamil Breguła 083c3c129b
Simplified GCSTaskHandler configuration (#10365) 2020-08-18 16:24:26 +02:00
Ping Zhang 439f7dc1d1
Use check_output to capture in celery task (#10310)
See: https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError

The check_call does not set output to the subprocess.CalledProcessError so the log.error(e.output) code is always None.

By using check_ouput, when there is CalledProcessError, it will correctly log the error output
2020-08-18 12:55:46 +02:00
Jubeen Lee dea345b05c
Fix AwsGlueJobSensor to stop running after the Glue job finished (#9022)
* Extract get_job_state and fix poke of AwsGlueJobSensor

* Save hook and reuse in GlueJobSensor

* Add descriptions for some functions

* Fix tests according to changed function definition

* Fix too long line

* Add type hints and apply review

* Fix type error

Co-authored-by: JB Lee <jb.lee@sendbird.com>
2020-08-17 18:41:50 +02:00
Omair Khan 1ae5bdf23e
Add test for GCSTaskHandler (#9600) (#9861)
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-08-17 10:53:57 +02:00
Ryan Yuan 382c1011b6
Add Bigtable Update Instance Hook/Operator (#10340)
Add Bigtable Update Instance Hook/Operator
2020-08-16 05:52:14 +02:00
Tomek Urbaszek be46d20fb4
Improve idempotency of BigQueryInsertJobOperator (#9590)
Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-08-15 10:30:22 +02:00
Kaxil Naik 5c2bb7b0b0
Webserver: Sanitize values passed to origin param (#10334) 2020-08-15 04:26:48 +01:00
yuqian90 4454224b68
Fix clear future recursive when ExternalTaskMarker is used (#9515) 2020-08-14 23:39:57 +01:00
mhenc 47387a69e6
Catch Permission Denied exception when getting secret from GCP Secret Manager. (#10326) 2020-08-14 17:26:05 +02:00
Kaxil Naik 2d4e44c04e
Respect DAG Serialization setting when running sync_perm (#10321)
We run this on Webserver Startup and when DAG Serialization is enabled we expect that no files are required but because of this bug the files were still looked for.
2020-08-13 21:38:49 +02:00
Jens Larsson 2f0613b0c2
Implement Google BigQuery Table Partition Sensor (#10218) 2020-08-13 15:23:46 +02:00
Jacob Ferriero 7f76b8b942
Add ClusterPolicyViolation support to airflow local settings (#10282)
This change will allow users to throw other exceptions (namely `AirflowClusterPolicyViolation`) than `DagCycleException` as part of Cluster Policies.

This can be helpful for running checks on tasks / DAGs (e.g. asserting task has a non-airflow owner) and failing to run tasks aren't compliant with these checks.

This is meant as a tool for airflow admins to prevent user mistakes (especially in shared Airflow infrastructure with newbies) than as a strong technical control for security/compliance posture.
2020-08-12 23:06:29 +01:00
David Cavaletto f6734b3b85
Enable Sphinx spellcheck for doc generation (#10280) 2020-08-12 21:30:37 +01:00
Kaxil Naik adce6f0296
Use Hash of Serialized DAG to determine DAG is changed or not (#10227)
closes #10116
2020-08-11 22:31:55 +01:00
Ephraim Anierobi 0ee437547b
Add unittest for WasbTaskHandler (#10284) 2020-08-11 18:18:49 +02:00
Daniel Imberman 3c374a42c0
Add reconcile_metadata to reconcile_pods (#10266)
metadata objects require a more complex merge strategy
then a simple "merge pods" for merging labels and other
features
2020-08-11 07:49:44 -07:00
Kamil Breguła 422e3f1d5d
Add Authentication for Stable API (#10267) 2020-08-11 16:23:10 +02:00
Jarek Potiuk 19bc97d0ce
Revert "Add Amazon SES hook (#10004)" (#10276)
This reverts commit f06fe616e6.
2020-08-10 16:30:40 +02:00
Ignacio Peluffo f06fe616e6
Add Amazon SES hook (#10004)
- refactor airflow.utils.email and add typing
2020-08-10 11:58:55 +02:00
Michał Słowikowski ef088314f8
Added DataprepGetJobsForJobGroupOperator (#10246) 2020-08-09 22:45:40 +02:00
Kamil Breguła db8d06a696
Disable sentry integration by default (#10212)
* Disable sentry integration by default
2020-08-09 13:21:41 +02:00
Kamil Breguła e2ec5ef665
Update example on docs/howto/connection/index.rst (#10236)
* Upddate example on docs/howto/connection/index.rst

* fixup! Upddate example on docs/howto/connection/index.rst
2020-08-09 12:25:15 +02:00
Kamil Breguła 12eed9d960
Add system tests for CloudSecretManagerBackend (#10235)
* Add system tests for CloudSecretManagerBackend

* fixup! Add system tests for CloudSecretManagerBackend
2020-08-08 19:00:41 +02:00
Cooper Gillan c29533888f
Add labels param to Google MLEngine Operators (#10222) 2020-08-08 02:47:50 +02:00
Kamil Breguła 8a655cfeba
Add airflow connections get command (#10214) 2020-08-08 02:43:05 +02:00
Sumit Maheshwari 2102122875
Handle IntegrityError while creating TIs (#10136)
While doing a trigger_dag from UI, DagRun gets created first and then WebServer starts creating TIs. Meanwhile, Scheduler also picks up the DagRun and starts creating the TIs, which results in IntegrityError as the Primary key constraint gets violated. This happens when a DAG has a good number of tasks.

Also, changing the TIs array with a set for faster lookups for Dags with too many tasks.
2020-08-07 18:25:10 +05:30
Shekhar Singh d2540e6592
Add airflow connections export command (#9856) (#10081) 2020-08-07 13:27:11 +02:00
Jarek Potiuk 9e3b7d9a1e
Pylint checks should be way faster now (#10207)
* Pylint checks should be way faster now

Instead of running separate pylint checks for tests and main source
we are running a single check now. This is possible thanks to a
nice hack - we have pylint plugin that injects the right
"# pylint: disable=" comment for all test files while reading
the file content by astroid (just before tokenization)

Thanks to that we can also separate out pylint checks
to a separate job in CI - this way all pylint checks will
be run in parallel to all other checks effectively halfing
the time needed to get the static check feedback and potentially
cancelling other jobs much faster.

* fixup! Pylint checks should be way faster now
2020-08-07 11:07:15 +02:00
Leon Yuan 24c8e4c2d6
Changes to all the constructors to remove the args argument (#10163) 2020-08-06 13:42:51 +01:00
j-y-matsubara 73ad5a4ba8
Fix BaseSensorOperator soft_fail mode to respect downstream tasks trigger_rule (#8867)
Fixes the BaseSensorOperator to make respect the trigger_rule in downstream tasks, when setting soft_fail="True".
2020-08-06 13:08:01 +02:00
Tomek Urbaszek 010322692e
Improve handling Dataproc cluster creation with ERROR state (#9593)
Handle cluster in DELETING state

Extend tests

fixup! Extend tests

fixup! fixup! Extend tests

fixup! fixup! fixup! Extend tests
2020-08-06 10:31:35 +02:00
QP Hou 1e36666695
prevent DAG callback exception from crashing scheduler (#10096) 2020-08-06 10:31:10 +02:00
Ephraim Anierobi 1437cb7495
Add correct signatures for operators in google provider package (#10144) 2020-08-04 19:08:34 +02:00
Cooper Gillan 4a0fdb6308
Use conn_name_attr for SqliteHook connection (#10156)
The DbApiHook allows for a conn_name_attr to be changed in subclasses,
however SqliteHook's `get_conn` method is always calling the main class
attribute. Find the correct attribute and use this to establish the
connection.

Allow attr setting outside init for test case

Closes #10147
2020-08-04 14:13:41 +02:00
Johan Eklund 000287753b
Improve Typing coverage of amazon/aws/athena (#10025)
Co-authored-by: Johan Eklund <jeklund@zynga.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-08-03 21:58:11 +01:00
Ephraim Anierobi 201823b91a
Add Legacy command displaying new CLI counterparts (#10115) 2020-08-03 19:10:30 +02:00
Anike Arni 53ada6e791
Add S3KeysUnchangedSensor (#9817)
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-08-03 18:49:03 +02:00
Jarek Potiuk 76c3e21a25
Moved webserver background to Quarantine (#10114) 2020-08-03 13:10:24 +02:00
Tomek Urbaszek 6efa1b9cb7
Add additional Cloud Datastore operators (#10032)
This PR adds more operators for Google Cloud Datastore
service. It also adds missing tests and how-to guides.
2020-08-03 12:39:05 +02:00
Kaxil Naik 4e3799fec4
[AIRFLOW-4541] Replace os.mkdirs usage with pathlib.Path(path).mkdir (#10117)
`makedirs` is used in `airlfow.utils.file.mkdirs`  - it is replaced with pathlib now with python3.5+
2020-08-02 11:46:03 +01:00
HasanJ 1d68cd2929
Make conn_id unique in Connections table (#9067) 2020-08-02 12:25:09 +02:00
Ryan Yuan 85c56b1737
Add missing params to GCP Pub/Sub creation_subscription (#10106)
Add missing params to GCP Pub/Sub creation_subscription hook/operator
2020-08-02 10:59:46 +02:00
Kamil Olszewski b79466c12f
Fix sensor not providing arguments for GCSHook (#10074)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-02 09:06:12 +02:00
Kamil Olszewski 4ee35d0279
Fix hook not passing gcp_conn_id to base class (#10075)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-08-02 09:05:06 +02:00
Shekhar Singh ca3fa76b17
Add unit tests for mlengine_prediction_summary (#10022) 2020-08-02 08:59:07 +02:00
Cooper Gillan 2b8dea64e9
Fix typo in Athena sensor retries (#10079)
Understanding that it is an attribute name, which could have downstream
consequences, correct the spelling of max_retries and reword some of the
docstring.
2020-08-01 07:22:04 +02:00
Tomek Urbaszek 4c84661adb
Split Display Video 360 example into smaler DAGs (#10077) 2020-07-31 16:00:46 +02:00
Kaxil Naik 03c4351744
Allow `image` in `KubernetesPodOperator` to be templated (#10068)
fixes https://github.com/apache/airflow/issues/10063
2020-07-31 14:25:08 +01:00
Felix Uellendall 3f2eee15f9
Fix PythonVirtualenvOperator not working with Airflow context (#9394)
- automatically add dill requirement if use_dill=True
- add howto docs
- refactor

Co-authored-by: Luis Magana <maganaluis@users.noreply.github.com>
2020-07-30 10:32:52 +02:00
chipmyersjr ba2d6408e6
Add typing for jira provider (#10005) 2020-07-29 17:26:08 +02:00
guptamyr 1508c43ec9
Adding new SageMaker operator for ProcessingJobs (#9594) 2020-07-29 12:24:31 +02:00
Kamil Breguła c70c38e9ef
Move e-mail operator to core (#10013) 2020-07-29 00:27:12 +02:00
Tomek Urbaszek c12e33efa9
Use consistent message in SchedulerJob._process_executor_events (#9929) 2020-07-27 13:50:50 +02:00
Kamil Breguła 1d9a634d1d
Add airflow config get-value command (#9932) 2020-07-27 11:11:32 +02:00
Shekhar Singh f149ca9ecf
Add unit tests for samba provider (#9959) 2020-07-27 11:09:19 +02:00
Shekhar Singh 81b87d48ed
Add unit tests for GcpBodyFieldSanitizer in Google providers (#9996) 2020-07-27 01:50:47 +02:00
Shekhar Singh 42fbf9df47
Add unit tests for MsSqlHook (#10006) 2020-07-26 20:55:38 +02:00
Shekhar Singh 0142abb198
Add unit tests for GcpBodyFieldValidator in google cloud providers (#10003) 2020-07-26 10:35:33 +02:00
Kaxil Naik 7d24b088cd
Stop using start_date in default_args in example_dags (2) (#9985) 2020-07-25 19:57:32 +01:00
Alok Shenoy 7cc1c8bc00
Updates the slack WebClient call to use the instance variable - token (#9995)
Co-authored-by: Alok Shenoy <ashenoy@coursera.org>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-07-25 20:54:20 +02:00
Kaxil Naik 8b10a4b35e
Stop using start_date in default_args in example_dags (#9982) 2020-07-25 00:16:25 +01:00
jayrumi 0bf330ba86
Add get_blobs_list method to WasbHook (#9950) 2020-07-24 14:39:23 +02:00
zikun 243b704f47
Add DateTimeSensor (#9697)
* Add DateTimeSensor
2020-07-23 18:53:10 +02:00
Kamil Breguła 33f0cd2657
apply_default keeps the function signature for mypy (#9784) 2020-07-22 22:36:27 +02:00
Kamil Breguła 39a0288a47
Add Google Authentication for experimental API (#9848) 2020-07-22 22:33:55 +02:00
Dani Hodovic ac93419d1d
Add response_filter parameter to SimpleHttpOperator (#9885) 2020-07-22 14:03:23 +02:00
retornam c2db0dfeb1
More strict rules in mypy (#9705) (#9906)
Signed-off-by: Raymond Etornam <retornam@users.noreply.github.com>
2020-07-22 13:56:02 +02:00
Stijn De Haes 1427e4acb4
Update Spark submit operator for Spark 3 support (#8730)
In spark 3 they log the exit code with a lowercase
e, in spark 2 they used an uppercase E.

Also made the exception a bit clearer when running
on kubernetes.
2020-07-22 10:12:43 +01:00
chipmyersjr f60940d3ec
Add unit test for test_sql_to_gcs (#9920) 2020-07-22 08:48:31 +02:00
Nathan Hadfield c4244e18bb
Fix calling `get_client` in BigQueryHook.table_exists (#9916)
Adding `project_id` argument to `get_client` method 
otherwise this call always falls back to the default connection id.
2020-07-22 08:10:56 +02:00
Kamil Olszewski 5eacc16420
Add support for impersonation in GCP hooks (#9915)
Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>
2020-07-22 01:02:32 +02:00
Tomek Urbaszek bff713750f
Add function to get current context (#9631)
Support for getting current context at any code location that runs
under the scope of BaseOperator.execute function. This functionality
is part of AIP-31.

Co-authored-by: Jonathan Shir <jonathan.shir@databand.ai>
2020-07-21 18:45:09 +02:00
Shekhar Singh eb1aedd2df
Add unit tests for CassandraTableSensor, CassandraRecordSensor and WebHdfsSensor (#9874) 2020-07-21 14:14:29 +02:00
Tomek Urbaszek 95632ce8ed
Fix dag.clear usages after change from #9824 (#9909)
#9824 introduced changes in the signature of dag.clear(...) 
but not all occurrences of invocation were adjusted.
2020-07-21 12:47:39 +02:00
Tomek Urbaszek 1cfdebf5f8
Fix insert_job method of BigQueryHook (#9899)
The method should submit the job and wait for the result.
Closes: #9897
2020-07-21 12:16:21 +02:00
zikun 9c518fe937
TimeSensor should respect DAG timezone (#9882) 2020-07-20 17:19:08 +01:00
Kaxil Naik 84b85d8acc
Update Serialized DAGs in Webserver when DAGs are Updated (#9851)
Before this change, if DAG Serialization was enabled the Webserver would not update the DAGs once they are fetched from DB. The default worker_refresh_interval was `30` so whenever the gunicorn workers were restarted, they used to pull the updated DAGs when needed.

This change will allow us to have a larged worker_refresh_interval (e.g 30 mins or even 1 day)
2020-07-20 12:45:18 +01:00
Kaxil Naik 1a32c45126
Don't Update Serialized DAGs in DB if DAG didn't change (#9850)
We should not update the "last_updated" column unnecessarily. This is first of  few optimizations to DAG Serialization that would also aid in DAG Versioning
2020-07-20 12:31:05 +01:00
Kamil Breguła 9126f7061f
Deprecate experimental API (#9888) 2020-07-20 12:03:46 +02:00
vanka56 5013fda8f0
Add drop_partition functionality for HiveMetastoreHook (#9472) 2020-07-20 09:37:48 +02:00
Johan Eklund 297e34afa0
Add log of affected sql rows in PostgresOperator (#9841)
Co-authored-by: Johan Eklund <jeklund@zynga.com>
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
2020-07-17 09:06:07 +02:00
Andy 9c68e7cc6f
Add Snowflake support to SQL operator and sensor (#9843)
* Add Snowflake support to SQL operator and sensor
* Add test for conn_type to valid hook mapping
* Improve code quality for conn type mapping test
2020-07-17 09:04:14 +02:00
Jarek Potiuk faec41ec9a
Group CI scripts in subdirectories (#9653)
Reviewed the scripts and removed some of the old unused ones.
2020-07-16 18:05:35 +02:00
Kamil Breguła f4067b65a5
Fix Experimental API Client (#9849) 2020-07-16 15:41:10 +02:00
Kaxil Naik 31cab8ffbb
Fix DagRun.conf when using trigger_dag API (#9853)
fixes https://github.com/apache/airflow/issues/9852
2020-07-16 12:04:05 +01:00
Mariusz Strzelecki 2577f9334a
Fix S3FileTransformOperator to support S3 Select transformation only (#8936)
Documentation for S3FileTransformOperator states that users
can skip transformation script if S3 Select experession is
specified, but in this case the created file is always
zero bytes long.

This fix changes the behaviour, so in case of no transformation
given, the source file (a result of S3Select) is uploaded.
2020-07-16 10:46:01 +02:00
Kaxil Naik d008ff669d
Rename DagBag.store_serialized_dags to Dagbag.read_dags_from_db (#9838) 2020-07-15 22:28:04 +01:00
Chao-Han Tsai b01d95ec22
Change DAG.clear to take dag_run_state (#9824)
* Change DAG.clear to take dag_run_state

* fix lint

* fix tests

* assign var

* extend original clause
2020-07-15 13:08:18 -07:00
Nathan Hadfield 770de53eb5
BigQueryTableExistenceSensor needs to specify keyword arguments (#9832) 2020-07-15 20:49:50 +02:00
Kaxil Naik 2d124417e6
Fix Writing Serialized Dags to DB (#9836) 2020-07-15 18:35:59 +01:00
Zachary Manesiotis 2d8dbacdf6
Add CloudVisionDeleteReferenceImageOperator (#9698) 2020-07-15 15:14:10 +02:00
Shoichi Kagawa 52b6efe1ec
Add option to delete by prefix to S3DeleteObjectsOperator (#9350)
Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>
2020-07-15 14:57:18 +02:00
Sam Wheating 9f017951b9
Add Google Deployment Manager Hook (#9159)
Co-authored-by: Ephraim Anierobi <4122866+ephraimbuddy@users.noreply.github.com>
2020-07-15 12:51:07 +02:00
yongheng.liu a2c5389a60
Add kylin operator (#9149)
Co-authored-by: yongheng.liu <yongheng.liu@kyligence.io>
2020-07-14 18:25:05 +02:00
royberkoweee ed5004cca7
Allow `replace` flag in gcs_to_gcs operator. (#9667)
* Allow `replace` flag in gcs_to_gcs operator.
If we are not replacing, list all files in the Destination GCS bucket and only keep those files which are present in Source GCS bucket and not in Destination GCS bucket
2020-07-14 18:21:31 +02:00
Kaxil Naik 0eb5020fda
Remove unnecessary comprehension (#9805) 2020-07-14 09:04:14 +01:00
Tim Healy 68925904e4
Add multiple file upload functionality to GCS hook (#8849)
Co-authored-by: Timothy Healy <healz@timothys-air.lan>
2020-07-13 22:33:38 +02:00
Tobiasz Kędzierski d31e8a3250
Add DAG Source endpoint (#9322) 2020-07-13 19:50:03 +02:00
Chao-Han Tsai 7f64f2d00b
Backfill reset_dagruns set DagRun to NONE state (#9756) 2020-07-13 10:33:15 -07:00
Kamil Breguła 2b12c304f6
Improve typing coverage in scheduler_job.py (#9783) 2020-07-13 11:11:33 +02:00
takunnithan 5ddbbf1f59
Add API Endpoint - DagRuns Batch (#9556)
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
2020-07-13 10:54:10 +02:00
Jarek Potiuk 43cb059e96
Fixes failing formatting of DAG file containing {} in docstring (#9779) 2020-07-12 18:42:36 +02:00
Kanthi 815a4697dc
Unit tests jenkins hook (#9767) 2020-07-12 18:41:19 +02:00
Mauricio De Diana 1de78e8f97
Add Google Stackdriver link (#9765) 2020-07-12 14:32:00 +02:00
Kanthi a6b04d7b9a
Add tests for yandex hook (#9665) 2020-07-11 17:47:07 +02:00
Kamil Breguła aee000c0eb
Check project structure in sensors/transfers directories (#9764) 2020-07-11 16:14:22 +02:00
Kamil Breguła 092d33f298
Fix StackdriverTaskHandler + add system tests (#9761)
Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
2020-07-11 16:11:15 +02:00
Jacek Kołodziej 0873070e08
Mask other forms of password arguments in SparkSubmitOperator (#9615)
This is a follow-up to #6917 before modifying the masking code.
Related: #9595.
2020-07-11 12:49:15 +02:00
chipmyersjr 3cc5756d04
Add unit tests for mlengine_operator_utils (#9702) 2020-07-11 00:26:00 +02:00
Tomek Urbaszek ecf2f8499b
Use namedtuple for TaskInstanceKeyType (#9712)
* Use namedtuple for TaskInstanceKeyType
2020-07-10 15:05:51 +02:00
Ash Berlin-Taylor dcdc7c1fa9
Pre-create Celery db result tables before running Celery worker (#9719)
Otherwise at large scale this can end up with some tasks failing as they
try to create the result table at the same time.

This was always possible before, just exceedingly rare, but in large
scale performance testing where I create a lot of tasks quickly
(especially in my HA testing) I hit this a few times.

This is also only a problem for fresh installs/clean DBs, as once these
tables exist the possible race goes away.

This is the same fix from #8909, just for runtime, not test time.
2020-07-09 19:40:17 +01:00
Kamil Breguła 8517af696f
Fix warning about incompatible plugins (#9704)
One condition was bad and warns when the plugin is for admin and FAB flask.
2020-07-09 17:57:08 +02:00
Aneesh Joseph 13a827d80f
Ensure Kerberos token is valid in SparkSubmitOperator before running `yarn kill` (#9044)
do a kinit before yarn kill if keytab and principal is provided
2020-07-09 10:39:16 +01:00
Kamil Breguła 8b94ace597
Add read-only endpoints for DAG Model (#9045)
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
2020-07-09 07:28:34 +02:00
Vinay G B dfe8337ca2
YAML file supports extra json parameters (#9549)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Co-authored-by: Vinay <vinay@synctactic.ai>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-07-08 17:29:49 +02:00
Kaxil Naik 2f31b3060e
Get Airflow configs with sensitive data from Secret Backends (#9645) 2020-07-08 13:29:54 +01:00
Omair Khan 7a4988a3c7
Add Dag Runs CRUD endpoints (#9473) 2020-07-08 13:12:26 +02:00
lindsable 07b81029eb
Allow AWSAthenaHook to get more than 1000/first page of results (#6075)
Co-authored-by: Dylan Joss <dylanjoss@gmail.com>
2020-07-08 11:41:52 +01:00
Ephraim Anierobi 23f80f34ad
Move gcs & wasb task handlers to their respective provider packages (#9714) 2020-07-08 11:30:16 +02:00
chamcca 564192c162
Add AWS StepFunctions integrations to the aws provider (#8749) 2020-07-08 11:25:16 +02:00
Omair Khan c713d92d88
Add health API endpoint (#8144) (#9277) 2020-07-08 09:36:50 +02:00
Tomek Urbaszek 4ad3bb53ff
Fix _process_executor_events method to use in-memory try_number (#9692) 2020-07-07 16:54:43 +02:00
Kaxil Naik 631ac484f1
Some Pylint fixes in airflow/models/taskinstance.py (#9674) 2020-07-06 20:32:02 +01:00
Ephraim Anierobi e764ea5811
Update FlaskAppBuilder to v3 (#9648) 2020-07-06 20:45:13 +02:00