incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yingbo Wang	ac943c9e18	[AIRFLOW-3964][AIP-17] Consolidate and de-dup sensor tasks using Smart Sensor (#5499 ) Co-authored-by: Yingbo Wang <yingbo.wang@airbnb.com>	2020-09-08 22:47:59 +01:00
Kamil Breguła	ff41361e0e	Add task logging handler to airflow info command (#10771 )	2020-09-08 22:12:55 +02:00
Jarek Potiuk	2811851f80	Move Impersonation test back to quarantine (#10809 ) Seems that TestImpersonation is not stable even in isolation Moving it back to quarantine for now.	2020-09-08 21:33:44 +02:00
Kamil Breguła	961131d51c	All files in providers package heve unit tests (#10799 )	2020-09-08 13:55:35 +02:00
Ephraim Anierobi	078bfaf60a	Extract missing gcs_to_local example DAG from gcs example (#10767 ) Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>	2020-09-08 13:08:06 +02:00
Ephraim Anierobi	3c3342f1fd	Add unit test for AzureCosmosDocumentSensor (#10765 )	2020-09-08 12:21:22 +02:00
Jarek Potiuk	4de67a6731	Move dev docker images to airflow registry (#9652 ) Part of #9401	2020-09-08 10:07:10 +02:00
Joshua Carp	2934220dc9	Always return a list from S3Hook list methods (#10774 )	2020-09-08 09:49:34 +02:00
Dmitri Kuksik	10ce31127f	Deprecate using global as the default region in Google Dataproc operators and hooks (#10772 ) The region parameter is required for some of Google Dataproc operators and it should be provided by users to avoid creating data-intensive tasks in any default location.	2020-09-08 08:46:29 +02:00
Jarek Potiuk	b746f33fc6	Removes stable tests from quarantine (#10768 ) We've observed the tests for last couple of weeks and it seems most of the tests marked with "quarantine" marker are succeeding in a stable way (https://github.com/apache/airflow/issues/10118) The removed tests have success ratio of > 95% (20 runs without problems) and this has been verified a week ago as well, so it seems they are rather stable. There are literally few that are either failing or causing the Quarantined builds to hang. I manually reviewed the master tests that failed for last few weeks and added the tests that are causing the build to hang. Seems that stability has improved - which might be casued by some temporary problems when we marked the quarantined builds or too "generous" way of marking test as quarantined, or maybe improvement comes from the #10368 as the docker engine and machines used to run the builds in GitHub experience far less load (image builds are executed in separate builds) so it might be that resource usage is decreased. Another reason might be Github Actions stability improvements. Or simply those tests are more stable when run isolation. We might still add failing tests back as soon we see them behave in a flaky way. The remaining quarantined tests that need to be fixed: * test_local_run (often hangs the build) * test_retry_handling_job * test_clear_multiple_external_task_marker * test_should_force_kill_process * test_change_state_for_tis_without_dagrun * test_cli_webserver_background We also move some of those tests to "heisentests" category Those testst run fine in isolation but fail the builds when run with all other tests: * TestImpersonation tests We might find that those heisentest can be fixed but for now we are going to run them in isolation. Also - since those quarantined tests are failing more often the "num runs" to track for those has been decreased to 10 to keep track of 10 last runs only.	2020-09-08 07:36:12 +02:00
Mateusz Kukieła	f14f379716	[AIRFLOW-10672] Refactor BigQueryToGCSOperator to use new method (#10773 ) Makes BigQueryToGCSOperator to use BigQueryHook.insert_job method Committer: Mateusz Kukieła <mateuszkukiela@gmail.com>	2020-09-07 16:18:16 +02:00
Tomek Urbaszek	c8ee455685	Refactor DataprocCreateCluster operator to use simpler interface (#10403 ) DataprocCreateCluster requires now: - cluster config - cluster name - project id In this way users don't have to pass project_id two times (in cluster definition and as parameter). The cluster object is built in create_cluster hook method	2020-09-07 12:21:00 +02:00
Kamil Breguła	ddee0aa4fb	Simplify load connection in LocalFilesystemBackend (#10638 )	2020-09-06 20:56:03 +02:00
Jed Cunningham	59f9a4116a	Add permission "extra_links" for Viewer role and above (#10719 ) This change adds 'can extra links on Airflow' to the Viewer role and above. Currently, only Admins can see extra links by default.	2020-09-06 18:26:08 +02:00
Varun Dhussa	ece685b5b8	Asynchronous execution of Dataproc jobs with a Sensor (#10673 )	2020-09-05 13:11:37 +01:00
Kaxil Naik	7f0271f820	Improve test coverage for ConfObject in dag_run_schema (#10738 ) Adds test to verify that string can be passed to conf and ConfObject._deserialize works.	2020-09-05 08:55:12 +02:00
Kaxil Naik	a1a312ee1b	Fix typo in test_dag_run_schema.py (#10739 )	2020-09-05 08:54:17 +02:00
Kaxil Naik	5b683f09c0	Improve test coverage for test_common_schema.py (#10740 ) Adds test that an error is raised with specific message when unkown object type is passed	2020-09-05 08:53:43 +02:00
Antonio Davide Calì	6e3d7b63d3	Add masterConfig parameter to MLEngineStartTrainingJobOperator (#10578 ) Co-authored-by: antonio-davide-cali <antonio.davide.cali@ikea.com>	2020-09-04 23:58:24 +02:00
Jarek Potiuk	e4de7288a3	Switches to better BATS asserts (#10718 ) BATS has additional libraries of asserts that are much more straightforward and nicer to write tests for bash scripts There is no dockerfile from BATS that contains those, so we had to build our own (but it follows the same structure as #9652 - where we keep our dev docker image sources inside our repository and the generated docker images in "apache/airflow:<tool>-CALVER-TOOLVER format. We have more BATS unit test to add - following #10576 and this change will be of great help.	2020-09-04 22:25:29 +02:00
Daniel Imberman	828f7303b7	Add generate_yaml command to easily test KubernetesExecutor before deploying pods (#10677 ) * Add generate_template command for kubernetes_executor * move import * fix test failure * Address @mik-laj comments * Address @mik-laj comments * Use current dir * add docs * fix test	2020-09-03 18:04:23 -07:00
Kamil Breguła	ab5235ee12	Unify command names in CLI (#10720 ) * Unify command names in CLI * fixup! Unify command names in CLI	2020-09-04 01:25:39 +02:00
Ash Berlin-Taylor	de0d7d52ac	Make test_trigger_rule_dep tests re-runnable (#10712 ) If we run this test (TestTriggerRuleDep::test_get_states_count_upstream_ti specifically) more than once without clearing the DB in between it would fail due to a unique constraint violation.	2020-09-03 17:19:30 +01:00
Ash Berlin-Taylor	a01d986f6a	Don't commit when explicitly passed a session to TI.set_state (#10710 ) The `@provide_session` wrapper will already commit the transaction when returned, unless an explicit session is passed in -- removing this parameter changes the behaviour to be: - If session explicitly passed in: don't commit (caller's responsibility) - If no session passed in, `@provide_session` will commit for us already.	2020-09-03 17:18:32 +01:00
Kaxil Naik	9ac882e6cc	[AIRFLOW-5948] Replace SimpleDag with SerializedDag (#7694 )	2020-09-03 16:52:27 +01:00
Tomek Urbaszek	913397c1c6	Make Cloud Build system tests setup runnable (#10692 ) This change fixes error: open(quickstart.sh): Permission denied that was rised during git add.	2020-09-03 13:20:10 +02:00
Aaditya Sharma	36aa88ffc1	Add jupytercmd and fix task failure when notify set as true in qubole operator (#10599 ) Add jupytercmd in Qubole Operator which fires a JupyterNotebookCommand to the jupyter notebooks running on user's QDS account. Along with this, we have fixed a minor bug that caused the tasks to fail with --notify is set in Qubole Operator. Co-authored-by: Aaditya Sharma <asharma@qubole.com>	2020-09-03 15:00:19 +05:30
Jarek Potiuk	4e09cb53ea	Add packages to function names in bash (#10670 ) (#10696 ) Inspired by the Google Shell Guide where they mentioned separating package names with :: I realized that this was one of the missing pieces in the bash scripts of ours. While we already had packages (in libraries folders) it's been difficult to realise which function is where. With introducing packages - equal to the library file name we are almost at a level of a structured language - and it's easier to find the functions if you are looking for them. Way easier in fact. Part of #10576 (cherry picked from commit `cc551ba793`) (cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)	2020-09-02 21:58:37 +02:00
Jarek Potiuk	649ce4ba9d	Implement Google Shell Conventions for breeze script (#10695 ) * Implement Google Shell Conventions for breeze script … (#10651) Part of #10576 First (and the biggest of the series of commits to introduce Google Shell Conventions in our bash scripts. This is about the biggest and the most complex breeze script so it is rather huge but it is difficult to split it into smaller pieces. The rules implemented (from the conventions): * constants and exported variables are CAPITALIZED, where local/temporary variables are lowercase * following the shell guide, once all the variables are set to their final values (either from exported variables, calculation or --switches ) I have a single function that makes all the variables read-only. That helped to clean-up a lot of places where same functions was called several times, or where variables were defined in a few places. Now the behavior should be rather consistent and we should easily catch some duplications * function headers (following the guide) explaining arguments, variables expected, variables modified in the functions used. * setting the variables as read-only also helped to clean-up the "ifs" where we often had ":=}" in variables and != "" or == "". Those are replaced with `=}` and tests are replaced with `-n` and `-z` - also following the shell guide (readonly helped to detect and clean all such cases). This also should be much more robust in the future. * reorganized initialization of those constants and variables - simplified a few places where initialization was overlapping. It should be much more straightforward and clean now * a number of internal function breeze variables are "local" - this is helpful in accidental variables overwriting and keeping stuff localized * trap_add function is separated out to help in cases where we had several traps handling the same signals. (cherry picked from commit `46c8d6714c`) (cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246) * fixup! Implement Google Shell Conventions for breeze script … (#10651)	2020-09-02 21:55:50 +02:00
Kaxil Naik	9a10f83ab0	Revert recent breeze changes (#10651 & #10670 ) (#10694 ) * Revert "Add packages to function names in bash (#10670)" This reverts commit `cc551ba793`. * Revert "Implement Google Shell Conventions for breeze script … (#10651)" This reverts commit `46c8d6714c`.	2020-09-02 17:27:36 +01:00
Kamil Breguła	0d9e421f16	Unify command names in CLI (#10669 ) * Unify command names in CLI	2020-09-02 08:43:41 -04:00
Jarek Potiuk	cc551ba793	Add packages to function names in bash (#10670 ) Inspired by the Google Shell Guide where they mentioned separating package names with :: I realized that this was one of the missing pieces in the bash scripts of ours. While we already had packages (in libraries folders) it's been difficult to realise which function is where. With introducing packages - equal to the library file name we are almost at a level of a structured language - and it's easier to find the functions if you are looking for them. Way easier in fact. Part of #10576	2020-09-01 13:40:06 +02:00
Michał Słowikowski	804548d58f	Add Dataprep operators (#10304 ) Add DataprepGetJobGroupOperator and DataprepRunJobGroupOperator for Dataprep service. Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>	2020-09-01 12:59:13 +02:00
Shoichi Kagawa	f40ac9b151	Add placement_strategy option (#9444 )	2020-09-01 01:50:08 +02:00
Ephraim Anierobi	aa2db70494	Unify error messages and complete type field in response (#10333 ) Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-08-31 15:36:52 +02:00
Marco Aguiar	e6a0a5374d	Display conf as a JSON in the DagRun list view (#10644 ) Co-authored-by: Marco Aguiar <marco@DESKTOP-8IVSCHM.localdomain>	2020-08-31 15:31:58 +02:00
Jarek Potiuk	46c8d6714c	Implement Google Shell Conventions for breeze script … (#10651 ) Part of #10576 First (and the biggest of the series of commits to introduce Google Shell Conventions in our bash scripts. This is about the biggest and the most complex breeze script so it is rather huge but it is difficult to split it into smaller pieces. The rules implemented (from the conventions): * constants and exported variables are CAPITALIZED, where local/temporary variables are lowercase * following the shell guide, once all the variables are set to their final values (either from exported variables, calculation or --switches ) I have a single function that makes all the variables read-only. That helped to clean-up a lot of places where same functions was called several times, or where variables were defined in a few places. Now the behavior should be rather consistent and we should easily catch some duplications * function headers (following the guide) explaining arguments, variables expected, variables modified in the functions used. * setting the variables as read-only also helped to clean-up the "ifs" where we often had ":=}" in variables and != "" or == "". Those are replaced with `=}` and tests are replaced with `-n` and `-z` - also following the shell guide (readonly helped to detect and clean all such cases). This also should be much more robust in the future. * reorganized initialization of those constants and variables - simplified a few places where initialization was overlapping. It should be much more straightforward and clean now * a number of internal function breeze variables are "local" - this is helpful in accidental variables overwriting and keeping stuff localized * trap_add function is separated out to help in cases where we had several traps handling the same signals.	2020-08-31 13:24:53 +02:00
Masato Ohba	11c00bc820	Fix typos: duplicated "the" (#10647 )	2020-08-30 09:57:24 +02:00
Kamil Breguła	8e0d9f09d9	Add airflow cheat-sheet command (#10619 )	2020-08-28 21:25:29 +02:00
Tomek Urbaszek	5ae82a56da	Fix Google DLP example and improve ops idempotency (#10608 )	2020-08-28 16:35:47 +02:00
Kamil Breguła	3867f76625	Update Google Cloud branding (#10615 )	2020-08-28 12:19:27 +02:00
Kaxil Naik	725bf330ef	Revert Clean up DAG serializations based on last_updated (#7424 ) (#10613 ) This PR reverts the behavior of https://github.com/apache/airflow/pull/7424	2020-08-27 20:56:41 +01:00
Anton Bryzgalov	2e56ee7b22	DockerOperator extra_hosts argument support added (#10546 )	2020-08-27 11:36:04 +01:00
Beni Ben zikry	1e5aa4465c	Spark-on-K8S sensor - add driver logs (#10023 )	2020-08-26 18:14:20 +02:00
Ping Zhang	db378c09b7	[k8s] Store the raw ti key info to pod annotations (#10568 ) The value of annotations can store the raw dag_id, task_id and execution_date so that k8s executor can easily map pod event back to the task instance	2020-08-26 07:53:37 -07:00
Jarek Potiuk	8a7c37281c	Untangle cyclic deps configuration <> secrets (#10559 )	2020-08-26 16:38:57 +02:00
Kaxil Naik	fdd9b6f65b	Enable Black on Providers Packages (#10543 )	2020-08-25 17:39:04 +01:00
Kaxil Naik	4c6b7595de	Fix failing Black test on connexion (#10547 )	2020-08-25 12:57:07 +01:00
Kaxil Naik	7c0d6ab9f4	Enable Black on Connexion API folders (#10545 )	2020-08-25 12:10:20 +01:00
Ephraim Anierobi	d6ce8c8561	Add update mask to patch dag endpoint (#10535 )	2020-08-25 12:56:19 +02:00
Kaxil Naik	d760265452	PyDocStyle: No whitespaces allowed surrounding docstring text (#10533 )	2020-08-25 09:50:21 +01:00
Steven Yu	bfefcce0c9	Updated REST API call so GET requests pass payload in query string instead of request body (#10462 ) * Updated REST API call so GET requests pass payload in query string instead of request body * Updated comparisons to use in to follow better standards * Added whitespace for pylint failure * Update Databricks hooks tests to reflect new payload * Fixed trailing whitespace in unit test Co-authored-by: Steven Yu <steven@databricks.com>	2020-08-25 01:59:36 +02:00
Kaxil Naik	0e0aefb8f0	Fix TestAWSDataSyncOperatorUpdate.__init__ method (#10536 ) `__init` -> `__init__`	2020-08-25 00:39:57 +01:00
Kaxil Naik	6bed074b2d	Remove unreachable code in test_user_command.py (#10526 )	2020-08-25 00:31:06 +01:00
Jarek Potiuk	2f2d8dbfaf	Remove all "noinspection" comments native to IntelliJ (#10525 ) We have already fixed a lot of problems that were marked with those, also IntelluiJ gotten a bit smarter on not detecting false positives as well as understand more pylint annotation. Wherever the problem remained we replaced it with # noqa comments - as it is also well understood by IntelliJ.	2020-08-25 00:01:37 +02:00
Kamil Olszewski	fef73b91d8	Fix impersonation related bug in bigtable tests (#10521 ) Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>	2020-08-24 20:13:09 +01:00
Kamil Olszewski	3734876d98	Implement impersonation in google operators (#10052 ) Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>	2020-08-24 13:47:59 +02:00
Derrick Qin	b0598b5351	Add support for creating multiple replicated clusters in Bigtable hook and operator (#10475 ) * Add support for creating multiple Bigtable replicas * Flake8 fix	2020-08-24 11:44:22 +02:00
Jarek Potiuk	82369fadde	Removed the prerequisite for perf-kit path augmentation (#10492 )	2020-08-23 15:50:25 +02:00
Kaxil Naik	ef8df17348	Fix typo in Facebook Ads Provider (#10484 ) `missings_keys` -> `missing_keys`	2020-08-22 21:54:19 +02:00
Jarek Potiuk	7ee7d7cf3f	Move perf_kit to tests.utils (#10470 ) Perf_kit was a separate folder and it was a problem when we tried to build it from Docker-embedded sources, because there was a hidden, implicit dependency between tests (conftest) and perf. Perf_kit is now moved to tests to be avaiilable in the CI image also when we run tests without the sources mounted. This is changing back in #10441 and we need to move perf_kit for it to work.	2020-08-22 21:53:07 +02:00
Gabriel Montañola	c6358045f9	Fixes S3ToRedshift COPY query (#10436 ) * fix: 🐛 Wrong S3 URI on COPY query The S3 URI on COPY query was appending the target Redshift table to the S3 object key. * test: 💍 Fixed typo on test query The COPY query that the operator used is the same query the test uses.	2020-08-22 21:19:37 +02:00
Kaxil Naik	44a36b9ab3	Use assertEqual instead of assertTrue in tests/utils/test_dates.py for proper diff (#10457 ) assertEqual will show show the proper diff instead of just "False is not True" error	2020-08-22 10:43:26 +02:00
Kaxil Naik	904c1d825a	Test exact match of Executor name (#10465 ) Use `self.assertEqual` instead of `self.assertIn` to do an exact match of string name instead of partial match	2020-08-22 10:35:01 +02:00
Kaxil Naik	4a77211ab8	Remove redudandant checks in test_views.py (#10464 ) - `self.check_content_in_response` already checks that response code is 200 - `self.assertEqual(None, ...)` -> `self.assertIsNone(...)` - Fix typo: "succcess" -> `success`	2020-08-22 10:32:02 +02:00
Tomek Urbaszek	fdd68ec653	Make system test work with 1.10 (#10444 )	2020-08-21 16:45:37 +02:00
Omair Khan	1e371864cc	Add update endpoint for DAG (#9101 ) (#9740 )	2020-08-21 12:28:21 +02:00
Felix Uellendall	2f552233f5	Add AzureBaseHook (#9747 ) - refactor/change azure_container_instance to use AzureBaseHook - add info to operators-and-hooks-ref.rst - add howto docs for connecting to azure - add auth mechanism via json config - add azure conn type	2020-08-21 11:45:23 +02:00
Ignacio Peluffo	27d08b76a2	Amazon SES Hook (#10391 ) * Add Amazon SES hook * Add SES Hook to operators-and-hooks documentation. * Fix arguments for parent class constructor call (PR feedback) * Fix indentation in operators-and-hooks documentation * Fix mypy error for argument on call to parent class constructor * Simplify logic on constructor (PR feedback) * Add custom headers and other relevant options to hook * Change pylint exception rule to apply it only to function instead of module (PR feedback) * Fix spellcheck error * Vendorize airflow.utils.emaail * fixup! Vendorize airflow.utils.emaail Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>	2020-08-21 09:32:25 +02:00
Craig Chatfield	88c7d2e526	Dataflow operators don't not always create a virtualenv (#10373 )	2020-08-21 02:28:37 +02:00
Daniel Imberman	e195c6a3d2	Make KubernetesExecutor recognize kubernetes_labels (#10412 ) KubernetesExecutor needs to inject `kubernetes_labels` configs into the worker_config	2020-08-20 00:06:56 +01:00
Daniel Imberman	f76938c171	Make Kubernetes tests pass locally (#10407 ) * Make Kubernetes tests pass locally Currently Kuberentes tests only all pass within breeze. This PR makes them read the local path so they can pass in any system. * static tests	2020-08-19 15:49:12 -07:00
Jarek Potiuk	db446f2677	Replaced aliases for common tools with functions. (#10402 ) This allows for all the kinds of verbosity we want, including writing outputs to output files, and it also works out-of-the-box in git-commit non-interactive shell scripts. Also as a side effect we have mocked tools in bats tests, which will allow us to write more comprehensive unit tests for the bash scripts of ours (this is a long overdue task). Part of #10368	2020-08-19 15:23:57 +02:00
Kaxil Naik	3bc37013f6	Add back 'refresh_all' method in airflow/www/views.py (#10328 ) closes https://github.com/apache/airflow/issues/9749	2020-08-19 10:59:36 +01:00
QP Hou	541c47c998	Add basic auth API auth backend (#10356 )	2020-08-19 09:44:17 +01:00
Cooper Gillan	d6f6d53bcd	Expand JenkinsJobTriggerOperator unit tests (#10353 ) Using the parameterized library, add unit test coverage for JenkinsJobTriggerOperator parameters, covering parameters as strings or as a list of strings.	2020-08-18 23:53:27 +01:00
Kamil Breguła	083c3c129b	Simplified GCSTaskHandler configuration (#10365 )	2020-08-18 16:24:26 +02:00
Ping Zhang	439f7dc1d1	Use check_output to capture in celery task (#10310 ) See: https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError The check_call does not set output to the subprocess.CalledProcessError so the log.error(e.output) code is always None. By using check_ouput, when there is CalledProcessError, it will correctly log the error output	2020-08-18 12:55:46 +02:00
Jubeen Lee	dea345b05c	Fix AwsGlueJobSensor to stop running after the Glue job finished (#9022 ) * Extract get_job_state and fix poke of AwsGlueJobSensor * Save hook and reuse in GlueJobSensor * Add descriptions for some functions * Fix tests according to changed function definition * Fix too long line * Add type hints and apply review * Fix type error Co-authored-by: JB Lee <jb.lee@sendbird.com>	2020-08-17 18:41:50 +02:00
Omair Khan	1ae5bdf23e	Add test for GCSTaskHandler (#9600 ) (#9861 ) Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>	2020-08-17 10:53:57 +02:00
Ryan Yuan	382c1011b6	Add Bigtable Update Instance Hook/Operator (#10340 ) Add Bigtable Update Instance Hook/Operator	2020-08-16 05:52:14 +02:00
Tomek Urbaszek	be46d20fb4	Improve idempotency of BigQueryInsertJobOperator (#9590 ) Co-authored-by: Jacob Ferriero <jferriero@google.com>	2020-08-15 10:30:22 +02:00
Kaxil Naik	5c2bb7b0b0	Webserver: Sanitize values passed to origin param (#10334 )	2020-08-15 04:26:48 +01:00
yuqian90	4454224b68	Fix clear future recursive when ExternalTaskMarker is used (#9515 )	2020-08-14 23:39:57 +01:00
mhenc	47387a69e6	Catch Permission Denied exception when getting secret from GCP Secret Manager. (#10326 )	2020-08-14 17:26:05 +02:00
Kaxil Naik	2d4e44c04e	Respect DAG Serialization setting when running sync_perm (#10321 ) We run this on Webserver Startup and when DAG Serialization is enabled we expect that no files are required but because of this bug the files were still looked for.	2020-08-13 21:38:49 +02:00
Jens Larsson	2f0613b0c2	Implement Google BigQuery Table Partition Sensor (#10218 )	2020-08-13 15:23:46 +02:00
Jacob Ferriero	7f76b8b942	Add ClusterPolicyViolation support to airflow local settings (#10282 ) This change will allow users to throw other exceptions (namely `AirflowClusterPolicyViolation`) than `DagCycleException` as part of Cluster Policies. This can be helpful for running checks on tasks / DAGs (e.g. asserting task has a non-airflow owner) and failing to run tasks aren't compliant with these checks. This is meant as a tool for airflow admins to prevent user mistakes (especially in shared Airflow infrastructure with newbies) than as a strong technical control for security/compliance posture.	2020-08-12 23:06:29 +01:00
David Cavaletto	f6734b3b85	Enable Sphinx spellcheck for doc generation (#10280 )	2020-08-12 21:30:37 +01:00
Kaxil Naik	adce6f0296	Use Hash of Serialized DAG to determine DAG is changed or not (#10227 ) closes #10116	2020-08-11 22:31:55 +01:00
Ephraim Anierobi	0ee437547b	Add unittest for WasbTaskHandler (#10284 )	2020-08-11 18:18:49 +02:00
Daniel Imberman	3c374a42c0	Add reconcile_metadata to reconcile_pods (#10266 ) metadata objects require a more complex merge strategy then a simple "merge pods" for merging labels and other features	2020-08-11 07:49:44 -07:00
Kamil Breguła	422e3f1d5d	Add Authentication for Stable API (#10267 )	2020-08-11 16:23:10 +02:00
Jarek Potiuk	19bc97d0ce	Revert "Add Amazon SES hook (#10004 )" (#10276 ) This reverts commit `f06fe616e6`.	2020-08-10 16:30:40 +02:00
Ignacio Peluffo	f06fe616e6	Add Amazon SES hook (#10004 ) - refactor airflow.utils.email and add typing	2020-08-10 11:58:55 +02:00
Michał Słowikowski	ef088314f8	Added DataprepGetJobsForJobGroupOperator (#10246 )	2020-08-09 22:45:40 +02:00
Kamil Breguła	db8d06a696	Disable sentry integration by default (#10212 ) * Disable sentry integration by default	2020-08-09 13:21:41 +02:00
Kamil Breguła	e2ec5ef665	Update example on docs/howto/connection/index.rst (#10236 ) * Upddate example on docs/howto/connection/index.rst * fixup! Upddate example on docs/howto/connection/index.rst	2020-08-09 12:25:15 +02:00
Kamil Breguła	12eed9d960	Add system tests for CloudSecretManagerBackend (#10235 ) * Add system tests for CloudSecretManagerBackend * fixup! Add system tests for CloudSecretManagerBackend	2020-08-08 19:00:41 +02:00
Cooper Gillan	c29533888f	Add labels param to Google MLEngine Operators (#10222 )	2020-08-08 02:47:50 +02:00
Kamil Breguła	8a655cfeba	Add airflow connections get command (#10214 )	2020-08-08 02:43:05 +02:00
Sumit Maheshwari	2102122875	Handle IntegrityError while creating TIs (#10136 ) While doing a trigger_dag from UI, DagRun gets created first and then WebServer starts creating TIs. Meanwhile, Scheduler also picks up the DagRun and starts creating the TIs, which results in IntegrityError as the Primary key constraint gets violated. This happens when a DAG has a good number of tasks. Also, changing the TIs array with a set for faster lookups for Dags with too many tasks.	2020-08-07 18:25:10 +05:30
Shekhar Singh	d2540e6592	Add airflow connections export command (#9856 ) (#10081 )	2020-08-07 13:27:11 +02:00
Jarek Potiuk	9e3b7d9a1e	Pylint checks should be way faster now (#10207 ) * Pylint checks should be way faster now Instead of running separate pylint checks for tests and main source we are running a single check now. This is possible thanks to a nice hack - we have pylint plugin that injects the right "# pylint: disable=" comment for all test files while reading the file content by astroid (just before tokenization) Thanks to that we can also separate out pylint checks to a separate job in CI - this way all pylint checks will be run in parallel to all other checks effectively halfing the time needed to get the static check feedback and potentially cancelling other jobs much faster. * fixup! Pylint checks should be way faster now	2020-08-07 11:07:15 +02:00
Leon Yuan	24c8e4c2d6	Changes to all the constructors to remove the args argument (#10163 )	2020-08-06 13:42:51 +01:00
j-y-matsubara	73ad5a4ba8	Fix BaseSensorOperator soft_fail mode to respect downstream tasks trigger_rule (#8867 ) Fixes the BaseSensorOperator to make respect the trigger_rule in downstream tasks, when setting soft_fail="True".	2020-08-06 13:08:01 +02:00
Tomek Urbaszek	010322692e	Improve handling Dataproc cluster creation with ERROR state (#9593 ) Handle cluster in DELETING state Extend tests fixup! Extend tests fixup! fixup! Extend tests fixup! fixup! fixup! Extend tests	2020-08-06 10:31:35 +02:00
QP Hou	1e36666695	prevent DAG callback exception from crashing scheduler (#10096 )	2020-08-06 10:31:10 +02:00
Ephraim Anierobi	1437cb7495	Add correct signatures for operators in google provider package (#10144 )	2020-08-04 19:08:34 +02:00
Cooper Gillan	4a0fdb6308	Use conn_name_attr for SqliteHook connection (#10156 ) The DbApiHook allows for a conn_name_attr to be changed in subclasses, however SqliteHook's `get_conn` method is always calling the main class attribute. Find the correct attribute and use this to establish the connection. Allow attr setting outside init for test case Closes #10147	2020-08-04 14:13:41 +02:00
Johan Eklund	000287753b	Improve Typing coverage of amazon/aws/athena (#10025 ) Co-authored-by: Johan Eklund <jeklund@zynga.com> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-08-03 21:58:11 +01:00
Ephraim Anierobi	201823b91a	Add Legacy command displaying new CLI counterparts (#10115 )	2020-08-03 19:10:30 +02:00
Anike Arni	53ada6e791	Add S3KeysUnchangedSensor (#9817 ) Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com> Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>	2020-08-03 18:49:03 +02:00
Jarek Potiuk	76c3e21a25	Moved webserver background to Quarantine (#10114 )	2020-08-03 13:10:24 +02:00
Tomek Urbaszek	6efa1b9cb7	Add additional Cloud Datastore operators (#10032 ) This PR adds more operators for Google Cloud Datastore service. It also adds missing tests and how-to guides.	2020-08-03 12:39:05 +02:00
Kaxil Naik	4e3799fec4	[AIRFLOW-4541] Replace os.mkdirs usage with pathlib.Path(path).mkdir (#10117 ) `makedirs` is used in `airlfow.utils.file.mkdirs` - it is replaced with pathlib now with python3.5+	2020-08-02 11:46:03 +01:00
HasanJ	1d68cd2929	Make conn_id unique in Connections table (#9067 )	2020-08-02 12:25:09 +02:00
Ryan Yuan	85c56b1737	Add missing params to GCP Pub/Sub creation_subscription (#10106 ) Add missing params to GCP Pub/Sub creation_subscription hook/operator	2020-08-02 10:59:46 +02:00
Kamil Olszewski	b79466c12f	Fix sensor not providing arguments for GCSHook (#10074 ) Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>	2020-08-02 09:06:12 +02:00
Kamil Olszewski	4ee35d0279	Fix hook not passing gcp_conn_id to base class (#10075 ) Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>	2020-08-02 09:05:06 +02:00
Shekhar Singh	ca3fa76b17	Add unit tests for mlengine_prediction_summary (#10022 )	2020-08-02 08:59:07 +02:00
Cooper Gillan	2b8dea64e9	Fix typo in Athena sensor retries (#10079 ) Understanding that it is an attribute name, which could have downstream consequences, correct the spelling of max_retries and reword some of the docstring.	2020-08-01 07:22:04 +02:00
Tomek Urbaszek	4c84661adb	Split Display Video 360 example into smaler DAGs (#10077 )	2020-07-31 16:00:46 +02:00
Kaxil Naik	03c4351744	Allow `image` in `KubernetesPodOperator` to be templated (#10068 ) fixes https://github.com/apache/airflow/issues/10063	2020-07-31 14:25:08 +01:00
Felix Uellendall	3f2eee15f9	Fix PythonVirtualenvOperator not working with Airflow context (#9394 ) - automatically add dill requirement if use_dill=True - add howto docs - refactor Co-authored-by: Luis Magana <maganaluis@users.noreply.github.com>	2020-07-30 10:32:52 +02:00
chipmyersjr	ba2d6408e6	Add typing for jira provider (#10005 )	2020-07-29 17:26:08 +02:00
guptamyr	1508c43ec9	Adding new SageMaker operator for ProcessingJobs (#9594 )	2020-07-29 12:24:31 +02:00
Kamil Breguła	c70c38e9ef	Move e-mail operator to core (#10013 )	2020-07-29 00:27:12 +02:00
Tomek Urbaszek	c12e33efa9	Use consistent message in SchedulerJob._process_executor_events (#9929 )	2020-07-27 13:50:50 +02:00
Kamil Breguła	1d9a634d1d	Add airflow config get-value command (#9932 )	2020-07-27 11:11:32 +02:00
Shekhar Singh	f149ca9ecf	Add unit tests for samba provider (#9959 )	2020-07-27 11:09:19 +02:00
Shekhar Singh	81b87d48ed	Add unit tests for GcpBodyFieldSanitizer in Google providers (#9996 )	2020-07-27 01:50:47 +02:00
Shekhar Singh	42fbf9df47	Add unit tests for MsSqlHook (#10006 )	2020-07-26 20:55:38 +02:00
Shekhar Singh	0142abb198	Add unit tests for GcpBodyFieldValidator in google cloud providers (#10003 )	2020-07-26 10:35:33 +02:00
Kaxil Naik	7d24b088cd	Stop using start_date in default_args in example_dags (2) (#9985 )	2020-07-25 19:57:32 +01:00
Alok Shenoy	7cc1c8bc00	Updates the slack WebClient call to use the instance variable - token (#9995 ) Co-authored-by: Alok Shenoy <ashenoy@coursera.org> Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2020-07-25 20:54:20 +02:00
Kaxil Naik	8b10a4b35e	Stop using start_date in default_args in example_dags (#9982 )	2020-07-25 00:16:25 +01:00
jayrumi	0bf330ba86	Add get_blobs_list method to WasbHook (#9950 )	2020-07-24 14:39:23 +02:00
zikun	243b704f47	Add DateTimeSensor (#9697 ) * Add DateTimeSensor	2020-07-23 18:53:10 +02:00
Kamil Breguła	33f0cd2657	apply_default keeps the function signature for mypy (#9784 )	2020-07-22 22:36:27 +02:00
Kamil Breguła	39a0288a47	Add Google Authentication for experimental API (#9848 )	2020-07-22 22:33:55 +02:00
Dani Hodovic	ac93419d1d	Add response_filter parameter to SimpleHttpOperator (#9885 )	2020-07-22 14:03:23 +02:00
retornam	c2db0dfeb1	More strict rules in mypy (#9705 ) (#9906 ) Signed-off-by: Raymond Etornam <retornam@users.noreply.github.com>	2020-07-22 13:56:02 +02:00
Stijn De Haes	1427e4acb4	Update Spark submit operator for Spark 3 support (#8730 ) In spark 3 they log the exit code with a lowercase e, in spark 2 they used an uppercase E. Also made the exception a bit clearer when running on kubernetes.	2020-07-22 10:12:43 +01:00
chipmyersjr	f60940d3ec	Add unit test for test_sql_to_gcs (#9920 )	2020-07-22 08:48:31 +02:00
Nathan Hadfield	c4244e18bb	Fix calling `get_client` in BigQueryHook.table_exists (#9916 ) Adding `project_id` argument to `get_client` method otherwise this call always falls back to the default connection id.	2020-07-22 08:10:56 +02:00
Kamil Olszewski	5eacc16420	Add support for impersonation in GCP hooks (#9915 ) Co-authored-by: Kamil Olszewski <kamil.olszewski@polidea.com>	2020-07-22 01:02:32 +02:00
Tomek Urbaszek	bff713750f	Add function to get current context (#9631 ) Support for getting current context at any code location that runs under the scope of BaseOperator.execute function. This functionality is part of AIP-31. Co-authored-by: Jonathan Shir <jonathan.shir@databand.ai>	2020-07-21 18:45:09 +02:00
Shekhar Singh	eb1aedd2df	Add unit tests for CassandraTableSensor, CassandraRecordSensor and WebHdfsSensor (#9874 )	2020-07-21 14:14:29 +02:00
Tomek Urbaszek	95632ce8ed	Fix dag.clear usages after change from #9824 (#9909 ) #9824 introduced changes in the signature of dag.clear(...) but not all occurrences of invocation were adjusted.	2020-07-21 12:47:39 +02:00
Tomek Urbaszek	1cfdebf5f8	Fix insert_job method of BigQueryHook (#9899 ) The method should submit the job and wait for the result. Closes: #9897	2020-07-21 12:16:21 +02:00
zikun	9c518fe937	TimeSensor should respect DAG timezone (#9882 )	2020-07-20 17:19:08 +01:00
Kaxil Naik	84b85d8acc	Update Serialized DAGs in Webserver when DAGs are Updated (#9851 ) Before this change, if DAG Serialization was enabled the Webserver would not update the DAGs once they are fetched from DB. The default worker_refresh_interval was `30` so whenever the gunicorn workers were restarted, they used to pull the updated DAGs when needed. This change will allow us to have a larged worker_refresh_interval (e.g 30 mins or even 1 day)	2020-07-20 12:45:18 +01:00
Kaxil Naik	1a32c45126	Don't Update Serialized DAGs in DB if DAG didn't change (#9850 ) We should not update the "last_updated" column unnecessarily. This is first of few optimizations to DAG Serialization that would also aid in DAG Versioning	2020-07-20 12:31:05 +01:00
Kamil Breguła	9126f7061f	Deprecate experimental API (#9888 )	2020-07-20 12:03:46 +02:00
vanka56	5013fda8f0	Add drop_partition functionality for HiveMetastoreHook (#9472 )	2020-07-20 09:37:48 +02:00
Johan Eklund	297e34afa0	Add log of affected sql rows in PostgresOperator (#9841 ) Co-authored-by: Johan Eklund <jeklund@zynga.com> Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>	2020-07-17 09:06:07 +02:00
Andy	9c68e7cc6f	Add Snowflake support to SQL operator and sensor (#9843 ) * Add Snowflake support to SQL operator and sensor * Add test for conn_type to valid hook mapping * Improve code quality for conn type mapping test	2020-07-17 09:04:14 +02:00
Jarek Potiuk	faec41ec9a	Group CI scripts in subdirectories (#9653 ) Reviewed the scripts and removed some of the old unused ones.	2020-07-16 18:05:35 +02:00
Kamil Breguła	f4067b65a5	Fix Experimental API Client (#9849 )	2020-07-16 15:41:10 +02:00
Kaxil Naik	31cab8ffbb	Fix DagRun.conf when using trigger_dag API (#9853 ) fixes https://github.com/apache/airflow/issues/9852	2020-07-16 12:04:05 +01:00
Mariusz Strzelecki	2577f9334a	Fix S3FileTransformOperator to support S3 Select transformation only (#8936 ) Documentation for S3FileTransformOperator states that users can skip transformation script if S3 Select experession is specified, but in this case the created file is always zero bytes long. This fix changes the behaviour, so in case of no transformation given, the source file (a result of S3Select) is uploaded.	2020-07-16 10:46:01 +02:00
Kaxil Naik	d008ff669d	Rename DagBag.store_serialized_dags to Dagbag.read_dags_from_db (#9838 )	2020-07-15 22:28:04 +01:00
Chao-Han Tsai	b01d95ec22	Change DAG.clear to take dag_run_state (#9824 ) * Change DAG.clear to take dag_run_state * fix lint * fix tests * assign var * extend original clause	2020-07-15 13:08:18 -07:00
Nathan Hadfield	770de53eb5	BigQueryTableExistenceSensor needs to specify keyword arguments (#9832 )	2020-07-15 20:49:50 +02:00
Kaxil Naik	2d124417e6	Fix Writing Serialized Dags to DB (#9836 )	2020-07-15 18:35:59 +01:00
Zachary Manesiotis	2d8dbacdf6	Add CloudVisionDeleteReferenceImageOperator (#9698 )	2020-07-15 15:14:10 +02:00
Shoichi Kagawa	52b6efe1ec	Add option to delete by prefix to S3DeleteObjectsOperator (#9350 ) Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>	2020-07-15 14:57:18 +02:00
Sam Wheating	9f017951b9	Add Google Deployment Manager Hook (#9159 ) Co-authored-by: Ephraim Anierobi <4122866+ephraimbuddy@users.noreply.github.com>	2020-07-15 12:51:07 +02:00
yongheng.liu	a2c5389a60	Add kylin operator (#9149 ) Co-authored-by: yongheng.liu <yongheng.liu@kyligence.io>	2020-07-14 18:25:05 +02:00
royberkoweee	ed5004cca7	Allow `replace` flag in gcs_to_gcs operator. (#9667 ) * Allow `replace` flag in gcs_to_gcs operator. If we are not replacing, list all files in the Destination GCS bucket and only keep those files which are present in Source GCS bucket and not in Destination GCS bucket	2020-07-14 18:21:31 +02:00
Kaxil Naik	0eb5020fda	Remove unnecessary comprehension (#9805 )	2020-07-14 09:04:14 +01:00
Tim Healy	68925904e4	Add multiple file upload functionality to GCS hook (#8849 ) Co-authored-by: Timothy Healy <healz@timothys-air.lan>	2020-07-13 22:33:38 +02:00
Tobiasz Kędzierski	d31e8a3250	Add DAG Source endpoint (#9322 )	2020-07-13 19:50:03 +02:00
Chao-Han Tsai	7f64f2d00b	Backfill reset_dagruns set DagRun to NONE state (#9756 )	2020-07-13 10:33:15 -07:00
Kamil Breguła	2b12c304f6	Improve typing coverage in scheduler_job.py (#9783 )	2020-07-13 11:11:33 +02:00
takunnithan	5ddbbf1f59	Add API Endpoint - DagRuns Batch (#9556 ) Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>	2020-07-13 10:54:10 +02:00
Jarek Potiuk	43cb059e96	Fixes failing formatting of DAG file containing {} in docstring (#9779 )	2020-07-12 18:42:36 +02:00
Kanthi	815a4697dc	Unit tests jenkins hook (#9767 )	2020-07-12 18:41:19 +02:00
Mauricio De Diana	1de78e8f97	Add Google Stackdriver link (#9765 )	2020-07-12 14:32:00 +02:00
Kanthi	a6b04d7b9a	Add tests for yandex hook (#9665 )	2020-07-11 17:47:07 +02:00
Kamil Breguła	aee000c0eb	Check project structure in sensors/transfers directories (#9764 )	2020-07-11 16:14:22 +02:00
Kamil Breguła	092d33f298	Fix StackdriverTaskHandler + add system tests (#9761 ) Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com> Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>	2020-07-11 16:11:15 +02:00
Jacek Kołodziej	0873070e08	Mask other forms of password arguments in SparkSubmitOperator (#9615 ) This is a follow-up to #6917 before modifying the masking code. Related: #9595.	2020-07-11 12:49:15 +02:00
chipmyersjr	3cc5756d04	Add unit tests for mlengine_operator_utils (#9702 )	2020-07-11 00:26:00 +02:00
Tomek Urbaszek	ecf2f8499b	Use namedtuple for TaskInstanceKeyType (#9712 ) * Use namedtuple for TaskInstanceKeyType	2020-07-10 15:05:51 +02:00
Ash Berlin-Taylor	dcdc7c1fa9	Pre-create Celery db result tables before running Celery worker (#9719 ) Otherwise at large scale this can end up with some tasks failing as they try to create the result table at the same time. This was always possible before, just exceedingly rare, but in large scale performance testing where I create a lot of tasks quickly (especially in my HA testing) I hit this a few times. This is also only a problem for fresh installs/clean DBs, as once these tables exist the possible race goes away. This is the same fix from #8909, just for runtime, not test time.	2020-07-09 19:40:17 +01:00
Kamil Breguła	8517af696f	Fix warning about incompatible plugins (#9704 ) One condition was bad and warns when the plugin is for admin and FAB flask.	2020-07-09 17:57:08 +02:00
Aneesh Joseph	13a827d80f	Ensure Kerberos token is valid in SparkSubmitOperator before running `yarn kill` (#9044 ) do a kinit before yarn kill if keytab and principal is provided	2020-07-09 10:39:16 +01:00
Kamil Breguła	8b94ace597	Add read-only endpoints for DAG Model (#9045 ) Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com> Co-authored-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>	2020-07-09 07:28:34 +02:00
Vinay G B	dfe8337ca2	YAML file supports extra json parameters (#9549 ) Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com> Co-authored-by: Vinay <vinay@synctactic.ai> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2020-07-08 17:29:49 +02:00
Kaxil Naik	2f31b3060e	Get Airflow configs with sensitive data from Secret Backends (#9645 )	2020-07-08 13:29:54 +01:00
Omair Khan	7a4988a3c7	Add Dag Runs CRUD endpoints (#9473 )	2020-07-08 13:12:26 +02:00
lindsable	07b81029eb	Allow AWSAthenaHook to get more than 1000/first page of results (#6075 ) Co-authored-by: Dylan Joss <dylanjoss@gmail.com>	2020-07-08 11:41:52 +01:00
Ephraim Anierobi	23f80f34ad	Move gcs & wasb task handlers to their respective provider packages (#9714 )	2020-07-08 11:30:16 +02:00
chamcca	564192c162	Add AWS StepFunctions integrations to the aws provider (#8749 )	2020-07-08 11:25:16 +02:00
Omair Khan	c713d92d88	Add health API endpoint (#8144 ) (#9277 )	2020-07-08 09:36:50 +02:00
Tomek Urbaszek	4ad3bb53ff	Fix _process_executor_events method to use in-memory try_number (#9692 )	2020-07-07 16:54:43 +02:00
Kaxil Naik	631ac484f1	Some Pylint fixes in airflow/models/taskinstance.py (#9674 )	2020-07-06 20:32:02 +01:00
Ephraim Anierobi	e764ea5811	Update FlaskAppBuilder to v3 (#9648 )	2020-07-06 20:45:13 +02:00

... 2 3 4 5 6 ...

3020 Коммитов