incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Ash Berlin-Taylor	6694eaa831	Show the location of the queries when the assert_queries_count fails. (#11186 ) Example output (I forced one of the existing tests to fail) ``` E AssertionError: The expected number of db queries is 3. The current number is 2. E E Recorded query locations: E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:94: 1 E scheduler_job.py:_run_scheduler_loop>scheduler_job.py:_emit_pool_metrics>pool.py:slots_stats:101: 1 ``` This makes it a bit easier to see what the queries are, without having to re-run with full query tracing and then analyze the logs.	2020-09-28 19:39:21 +01:00
Tomek Urbaszek	e2dc706b08	Make kill log in DagFileProcessorProcess more informative (#11124 )	2020-09-28 00:24:58 +02:00
Jarek Potiuk	4d2a787070	Enables Kerberos sidecar support (#11130 ) Some of the users of Airflow are using Kerberos to authenticate their worker workflows. Airflow has a basic support for Kerberos for some of the operators and it has support to refresh the temporary Kerberos tokens via `airflow kerberos` command. This change adds support for the Kerberos side-car that connects to the Kerberos Key Distribution Center and retrieves the token using Keytab that should be deployed as Kubernetes Secret. It uses shared volume to share the temporary token. The nice thing about setting it up as a sidecar is that the Keytab is never shared with the workers - the secret is only mounted by the sidecar and the workers have only access to the temporary token. Depends on #11129	2020-09-28 00:13:36 +02:00
Daniel Imberman	a888198c27	Allow overrides for pod_template_file (#11162 ) * Allow overrides for pod_template_file A pod_template_file should be treated as a template not a steadfast rule. This PR ensures that users can override individual values set by the pod_template_file s.t. the same file can be used for multiple tasks. * fix podtemplatetest * fix name	2020-09-27 23:39:35 +02:00
Jarek Potiuk	0ea3e611d3	Adds Kubernetes Service Account for the webserver (#11131 ) Webserver did not have a Kubernetes Service Account defined and while we do not strictly need to use the service account for anything now, having the Service Account defined allows to define various capabilities for the webserver. For example when you are in the GCP environment, you can map the Kubernetes service account into a GCP one, using Workload Identity without the need to define any secrets and performing additional authentication. Then you can have that GCP service account get the permissions to write logs to GCS bucket. Similar mechanisms exist in AWS and it also opens up on-premises configuration. See more at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity Co-authored-by: Jacob Ferriero <jferriero@google.com> Co-authored-by: Jacob Ferriero <jferriero@google.com>	2020-09-27 23:39:14 +02:00
Satyasheel	54353f8745	Increase type coverage for five different providers (#11170 ) * Increasing type coverage for five different providers * Added more strict type	2020-09-27 20:00:27 +02:00
Ephraim Anierobi	cb52fb0ae1	Add example DAG and system test for MySQLToGCSOperator (#10990 )	2020-09-27 19:05:04 +02:00
Jarek Potiuk	044b441257	Conditional MySQL Client installation (#11174 ) This is the second step of making the Production Docker Image more corporate-environment friendly, by making MySQL client installation optional. Instaling MySQL Client on Debian requires to reach out to oracle deb repositories which might not be approved by security teams when you build the images. Also not everyone needs MySQL client or might want to install their own MySQL client or MariaDB client - from their own repositories. This change makes the installation step separated out to script (with prod/dev installation option). The prod/dev separation is needed because MySQL needs to be installed with dev libraries in the "Build" segment of the image (requiring build essentials etc.) but in "Final" segment of the image only runtime libraries are needed. Part of #11171 Depends on #11173.	2020-09-27 18:56:58 +02:00
mucio	0db7a30782	New Breeze command start-airflow, it replaces the previous flag (#11157 )	2020-09-27 18:31:50 +02:00
Kamil Breguła	2d831fbbc5	Update UPDATING.md (#11172 )	2020-09-27 18:24:24 +02:00
Jarek Potiuk	f16354bc02	Optionally disables PIP cache from GitHub during the build (#11173 ) This is first step of implementing the corporate-environment friendly way of building images, where in the corporate environment, this might not be possible to install the packages using the GitHub cache initially. Part of #11171	2020-09-27 18:00:03 +02:00
Satyasheel	0161b5ea2b	Increasing type coverage for multiple provider (#11159 )	2020-09-26 15:40:28 +01:00
Satyasheel	08dfd8cd00	Increase Type coverage for IMAP provider (#11154 )	2020-09-25 21:08:26 +01:00
Ash Berlin-Taylor	ee90807ace	Massively speed up the query returned by TI.filter_for_tis (#11147 ) The previous query generated SQL like this: ``` WHERE (task_id = ? AND dag_id = ? AND execution_date = ?) OR (task_id = ? AND dag_id = ? AND execution_date = ?) ``` Which is fine for one or maybe even 100 TIs, but when testing DAGs at extreme size (over 21k tasks!) this query was taking for ever (162s on Postgres, 172s on MySQL 5.7) By changing this query to this ``` WHERE task_id IN (?,?) AND dag_id = ? AND execution_date = ? ``` the time is reduced to 1s! (1.03s on Postgres, 1.19s on MySQL) Even on 100 tis the reduction is large, but the overall time is not significant (0.01451s -> 0.00626s on Postgres). Times included SQLA query construction time (but not time for calling filter_for_tis. So a like-for-like comparison), not just DB query time: ```python ipdb> start_filter_20k = time.monotonic(); result_filter_20k = session.query(TI).filter(tis_filter).all(); end_filter_20k = time.monotonic() ipdb> end_filter_20k - start_filter_20k 172.30647455298458 ipdb> in_filter = TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, TI.task_id.in_([o.task_id for o in old_states.keys()]); ipdb> start_20k_custom = time.monotonic(); result_custom_20k = session.query(TI).filter(in_filter).all(); end_20k_custom = time.monotonic() ipdb> end_20k_custom - start_20k_custom 1.1882996069907676 ``` I have also removed the check that was ensuring everything was of the same type (all TaskInstance or all TaskInstanceKey) as it felt needless - both types have the three required fields, so the "duck-typing" approach at runtime (crash if doesn't have the required property)+mypy checks felt Good Enough.	2020-09-25 20:49:11 +01:00
Kamil Breguła	b92c60af8a	Add new member to Polidea (#11153 )	2020-09-25 20:31:03 +02:00
Jarek Potiuk	c65d46634c	Update to latest version of pbgouncer-exporter (#11150 ) There was a problem with Mac version of pgbouncer exporter created and released previously. This commit releases the latest version making sure that Linux Go is used to build the pgbouncer binary.	2020-09-25 18:55:26 +02:00
Ruben Laguna	33fe9a52cd	Make sure pgbouncer-exporter docker image is linux/amd64 (#11148 ) Closes #11145	2020-09-25 17:26:44 +02:00
Ryan Hamilton	edf803374d	Remove link to Dag Model view given the redundancy with DAG Details view (#11082 )	2020-09-25 13:57:06 +01:00
Kaxil Naik	99accec29d	Fix incorrect Usage of Optional[str] & Optional[int] (#11141 ) From https://docs.python.org/3/library/typing.html#typing.Optional ``` Optional[X] is equivalent to Union[X, None]. ``` >Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default does not require the Optional qualifier on its type annotation just because it is optional. There were incorrect usages where the default was already set to a string or int value but still Optional was used	2020-09-25 12:25:32 +01:00
Jarek Potiuk	ce6b257de7	Fix gitSync user in the helm Chart (#11127 ) There was a problem with user in Git Sync mode of the Helm Chart in connection with the git sync image and official Airflow image. Since we are using the official image, most of the containers are run with the "50000" user, but the git-sync image used by the git sync user is 65533 so we have to set it as default. We also exposed that value as parameter, so that another image could be used here as well.	2020-09-25 11:31:45 +01:00
Jarek Potiuk	b40df1bf12	Fixes celery deployments for Airflow 2.0 (#11129 ) The celery flower and worker commands have changed in Airflow 2.0. The Helm Chart supported only 1.10 version of those commands and this PR fixes it by adding both variants of them.	2020-09-25 11:31:28 +01:00
Ruben Laguna	1f0a7857f2	Fix user in helm chart pgbouncer deployment (#11143 )	2020-09-25 11:06:30 +01:00
Kaxil Naik	f4ec1f6b41	Move Backport Providers docs to our docsite (#11136 )	2020-09-25 09:28:47 +02:00
Logan Attwood	37798f0d2a	Do not silently allow the use of undefined variables in jinja2 templates (#11016 ) This can have extremely bad consequences. After this change, a jinja2 template like the one below will cause the task instance to fail, if the DAG being executed is not a sub-DAG. This may also display an error on the Rendered tab of the Task Instance page. task_instance.xcom_pull('z', key='return_value', dag_id=dag.parent_dag.dag_id) Prior to the change in this commit, the above template would pull the latest value for task_id 'z', for the given execution_date, from any DAG. If your task_ids between DAGs are all unique, or if DAGs using the same task_id always have different execution_date values, this will appear to act like dag_id=None. Our current theory is SQLAlchemy/Python doesn't behave as expected when comparing `jinja2.Undefined` to `None`.	2020-09-25 09:15:28 +02:00
Kaxil Naik	6970584e79	Upgrade to latest isort & pydocstyle (#11142 ) isort: from 5.4.2 to 5.5.3 pydocstyle: from 5.0.2 to 5.1.1	2020-09-25 09:13:59 +02:00
Kaxil Naik	7c0541bbf0	Fix error message when checking literalinclude in docs (#11140 ) Before: ``` literalinclude directive is is prohibited for example DAGs ``` After: ``` literalinclude directive is prohibited for example DAGs ```	2020-09-25 07:06:44 +02:00
Nadim Younes	68fa29bff0	Added support for encrypted private keys in SSHHook (#11097 ) * Added support for encrypted private keys in SSHHook * Fixed Styling issues and added unit testing * fixed last pylint styling issue by adding newline to the end of the file * re-fixed newline issue for pylint checks * fixed pep8 styling issues and black formatted files to pass static checks * added comma as per suggestion to fix static check Co-authored-by: Nadim Younes <nyounes@kobo.com>	2020-09-25 07:02:16 +02:00
Satyasheel	45669bea4f	Increasing type coverage for salesforce provide (#11135 )	2020-09-24 23:28:55 +01:00
Kaxil Naik	51052aa4e2	Fix FROM directive in docs/production-deployment.rst (#11139 ) `FROM:` -> `FROM`	2020-09-24 23:21:35 +01:00
Kaxil Naik	e3f96ce7a8	Fix incorrect Usage of Optional[bool] (#11138 ) Optional[bool] = Union[None, bool] There were incorrect usages where the default was already set to a boolean value but still Optional was used	2020-09-24 23:18:19 +01:00
Kaxil Naik	af3c67775b	README Doc: Link to Airflow directory in ASF Directory (#11137 ) `https://downloads.apache.org` -> `https://downloads.apache.org/airflow` (links to precise dir)	2020-09-24 22:28:00 +01:00
Jarek Potiuk	620b0989b8	Add Helm Chart linting (#11108 )	2020-09-24 13:02:11 +02:00
Jarek Potiuk	e252a6064f	Adds timeout in CI/PROD waiting jobs (#11117 ) In very rare cases, the waiting job might not be cancelled when the "Build Image" job fails or gets cancelled on its own. In the "Build Image" workflow we have this step: - name: "Canceling the CI Build source workflow in case of failure!" if: cancelled() \|\| failure() uses: potiuk/cancel-workflow-runs@v2 with: token: ${{ secrets.GITHUB_TOKEN }} cancelMode: self sourceRunId: ${{ github.event.workflow_run.id }} But when this step fails or gets cancelled on its own before cancel is triggered, the "wait for image" steps could run for up to 6 hours. This change sets 50 minutes timeout for those jobs. Fixes #11114	2020-09-24 10:46:43 +02:00
Satyasheel	bcdd3bb7bb	Increasing type coverage FTP (#11107 )	2020-09-24 00:13:58 +02:00
jmfreeman	b83507acb8	Update initialize-database.rst (#11109 ) * Update initialize-database.rst Remove ambiguity in the language as only MySQL, Postgres and SQLite are supported backends. * Update docs/howto/initialize-database.rst Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com> Co-authored-by: Jarek Potiuk <jarek@potiuk.com>	2020-09-23 23:15:36 +02:00
Kaxil Naik	7644c37082	Revert "Introducing flags to skip example dags and default connections (#11099 )" (#11110 ) This reverts commit `0edc3dd579`.	2020-09-23 19:47:43 +01:00
Joe Harris	04b8adf69d	Add Opensignal to INTHEWILD.md (#11105 )	2020-09-23 18:52:00 +02:00
Xiaodong DENG	8a34719346	Fix typo in README (#11106 )	2020-09-23 17:39:34 +01:00
Kaxil Naik	ccfbc319dd	Fix sort-in-the-wild pre-commit on Mac (#11103 )	2020-09-23 15:10:15 +01:00
Tomek Urbaszek	daf8f31080	Add template fields renderers for better UI rendering (#11061 ) This PR adds possibility to define template_fields_renderers for an operator. In this way users will be able to provide information what lexer should be used for rendering a particular field. This is super useful for custom operator and gives more flexibility than predefined keywords. Co-authored-by: Kamil Olszewski <34898234+olchas@users.noreply.github.com> Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>	2020-09-23 15:31:40 +02:00
mucio	0edc3dd579	Introducing flags to skip example dags and default connections (#11099 )	2020-09-23 14:56:29 +02:00
Kaxil Naik	51181e885e	Security upgrade lodash from 4.17.19 to 4.17.20 (#11095 ) Details: https://snyk.io/vuln/SNYK-JS-LODASH-590103	2020-09-23 10:06:28 +01:00
Xiaodong DENG	a0374a5f95	Fix for pydocstyle D202 (#11096 ) 'issues' introduced in https://github.com/apache/airflow/pull/10594	2020-09-22 23:48:53 +02:00
Xiaodong DENG	35c43987e5	Avoid redundant SET conversion (#11091 ) * Avoid redundant SET conversion get_accessible_dag_ids() returns a SET, so no need to apply set() again * Add type annotation for get_accessible_dag_ids()	2020-09-22 23:44:49 +02:00
Kaxil Naik	1a149827a2	Fix typo in STATIC_CODE_CHECKS.rst (#11094 ) `realtive` -> `relative`	2020-09-22 21:36:38 +01:00
yuqian90	423a382678	SkipMixin: Add missing session.commit() and test (#10421 )	2020-09-22 21:08:12 +01:00
yuqian90	e59ad5b2c6	Make Skipmixin handle empty branch properly (#10751 ) closes: #10725 Make sure SkipMixin.skip_all_except() handles empty branches like this properly. When "task1" is followed, "join" must not be skipped even though it is considered to be immediately downstream of "branch".	2020-09-22 20:48:26 +01:00
James Timmins	fbd994a4cf	Add permissions for stable API (#10594 ) Related Github Issue: https://github.com/apache/airflow/issues/8112	2020-09-22 17:23:59 +01:00
Patrick Cando	f3e87c5030	Add D202 pydocstyle check (#11032 )	2020-09-22 16:17:24 +01:00
Jarek Potiuk	52fdb62314	Requirements might get upgraded without setup.py change (#10784 ) I noticed that when there is no setup.py changes, the constraints are not upgraded automatically. This is because of the docker caching strategy used - it simply does not even know that the upgrade of pip should happen. I believe this is really good (from security and incremental updates POV to attempt to upgrade at every successfull merge (not that the upgrade will not be committed if any of the tests fail and this is only happening on every merge to master or scheduled run. This way we will have more often but smaller constraint changes. Depends on #10828	2020-09-22 16:22:53 +02:00

1 2 3 4 5 ...

10044 Коммитов Все ветки Поиск

10044 Коммитов

Все ветки