incubator-airflow

Граф коммитов

Автор	SHA1	Сообщение	Дата
Tobiasz Kędzierski	70bf307f38	Add How To Guide for Dataflow (#13461 )	2021-01-21 11:41:36 +01:00
Kaxil Naik	f7fe363255	Fix Deprecation for configuration.getsection (#13804 )	2021-01-21 06:26:34 +01:00
Ashmeet Lamba	3e25795099	BaseBranchOperator will push to xcom by default. (#13704 ) (#13763 ) This change will BaseBranchOperator to do xcom push of the branch it choose to follow. It will also add support to use the do_xcom_push parameter. The added change returns the result received by running choose_branch(). Closes: #13704	2021-01-21 01:16:32 +00:00
Griffin Cosgrove	3fd5ef3555	Add missing logos for integrations (#13717 ) Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>	2021-01-21 01:22:34 +01:00
Andrii Soldatenko	29730d7200	Add acl_policy to S3CopyObjectOperator (#13773 ) closes https://github.com/apache/airflow/issues/13774 Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2021-01-20 15:16:25 +00:00
Jennifer Melot	9923d606d2	Use DAG context manager in examples (#13297 )	2021-01-20 13:16:12 +01:00
Kaxil Naik	b4c8a0406e	Fix SQL syntax to check duplicate connections (#13783 ) closes https://github.com/apache/airflow/issues/13679	2021-01-20 08:23:28 +01:00
André Amaral	1602ec97c8	Add a new argument for HttpSensor to accept a list of http status code to Continue Poking (#13499 ) closes: #13451	2021-01-20 00:02:08 +00:00
drago-f5a	7a742cb033	Change log level from debug to info when spawning new gunicorn workers (#13780 )	2021-01-19 23:38:55 +00:00
Brent Bovenzi	d65cf77552	Add description to hint if conn_type is missing (#13778 ) - add plaintext description to add/edit conn_type to make sure people remember to install necessary provider packages	2021-01-19 23:38:29 +00:00
drago-f5a	8a4bd3c73e	Fix webserver exiting when gunicorn master crashes (#13518 ) * Correct the logic for webserver choosing number of workers to spawn (#13469) A key consequence of this fix is that webserver will properly exit when gunicorn master dies and stops responding to signals.	2021-01-19 22:23:40 +00:00
JavierLopezT	c065d32189	AllowDiskUse parameter and docs in MongotoS3Operator (#12033 ) Co-authored-by: RosterIn <48057736+RosterIn@users.noreply.github.com> Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>	2021-01-19 13:25:53 +01:00
QP Hou	f1d4f54b34	Fix race conditions in task callback invocations (#10917 ) This race condition resulted in task success and failure callbacks being called more than once. Here is the order of events that could lead to this issue: * task started running within process 2 * (process 1) local_task_job checked for task return code, returns None * (process 2) task exited with failure state, task state updated as failed in DB * (process 2) task failure callback invoked through taskinstance.handle_failure method * (process 1) local_task_job heartbeat noticed task state set to failure, mistoken it as state bing updated externally, also invoked task failure callback To avoid this race condition, we need to make sure task callbacks are only invoked within a single process.	2021-01-18 23:39:41 +00:00
Kaxil Naik	6410f07106	Add __repr__ for Executors (#13753 ) Before: ```python >>> from airflow.executors.local_executor import LocalExecutor >>> LocalExecutor() <airflow.executors.local_executor.LocalExecutor object at 0x7f49b47f8d68> ``` After: ```python >>> from airflow.executors.local_executor import LocalExecutor >>> LocalExecutor() LocalExecutor(parallelism=32) ```	2021-01-18 22:10:18 +00:00
Ash Berlin-Taylor	31d31adb58	Setting `max_tis_per_query` to 0 now correctly removes the limit (#13512 ) This config setting is documented as 0==unlimited, but in my HA scheduler work I rewrote the code that used this and mistakenly didn't keep this behaviour. This re-introduces the correct behaviour and also adds a test so that it is stays working in the future. Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>	2021-01-18 21:24:37 +00:00
Kengo Seki	85a3ce1a47	Fix S3ToSnowflakeOperator to support uploading all files in the specified stage (#12505 ) * Fix S3ToSnowflakeOperator to support uploading all files in the specified stage Currently, users have to specify each file to upload as the "s3_keys" parameter when using S3ToSnowflakeOperator. But the `COPY INTO` statement, which S3ToSnowflakeOperator leverages internally, allows omitting this parameter so that users can upload whole files in the specified stage. https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#syntax This PR makes S3ToSnowflakeOperator's s3_keys parameter optional so as to support this functionality.	2021-01-18 18:39:43 +01:00
Tomek Urbaszek	309788e5e2	Refactor DataprocOperators to support google-cloud-dataproc 2.0 (#13256 )	2021-01-18 17:49:19 +01:00
Jarek Potiuk	f74da5025d	Disables provider's manager warning for source-installed prod image. (#13729 ) When production image is built for development purpose, by default it installs all providers from sources, but not all dependencies are installed for all providers. Many providers require more dependencies and when you try to import those packages via provider's manager, they fail to import and print warnings. Those warnings are now turned into debug messages, in case AIRFLOW_INSTALLATION_METHOD=".", which is set when production image is built locally from sources. This is helpful especially when you use locally build production image to run K8S tests - otherwise the logs are flooded with warnings. This problem does not happe in CI, because there by default production image is built from locally prepared packages and it does not contain sources from providers that are not installed via packages.	2021-01-18 09:00:32 +01:00
QP Hou	1ec63123c4	Fix backfill crash on task retry or reschedule (#13712 ) When a retry happens, task key needs to be recorded with try number + 1 to avoid KeyError exception.	2021-01-17 19:18:36 -08:00
Sara Hamilton	7ec858c452	updated Google DV360 Hook to fix SDF issue (#13703 ) Co-authored-by: Sara Hamilton <sarahamilton@google.com>	2021-01-17 13:47:35 +01:00
phucbui95	ab5fe56ac4	Fix bug in GCSToS3Operator (#13718 )	2021-01-16 21:47:16 +00:00
JavierLopezT	dbf751112f	Add connection arguments in S3ToSnowflakeOperator (#12564 ) * Add connection arguments in S3ToSnowflakeOperator * delete database * add database * indent Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>	2021-01-16 10:42:41 -08:00
Kaxil Naik	1ab19b40fd	Add Missing Email configs in Configuration doc (#13709 ) closes https://github.com/apache/airflow/issues/13697	2021-01-16 01:11:35 +00:00
Kaxil Naik	32f59534cb	Stop creating duplicate Dag File Processors (#13662 ) When a dag file is executed via Dag File Processors and multiple callbacks are created either via zombies or executor events, the dag file is added to the _file_path_queue and the manager will launch a new process to process it, which it should not since the dag file is currently under processing. This will bypass the _parallelism eventually especially when it takes a long time to process some dag files and since self._processors is just a dict with file path as the key. So multiple processors with the same key count as one and hence parallelism is bypassed. This address the same issue as https://github.com/apache/airflow/pull/11875 but instead does not exclude file paths that are recently processed and that run at the limit (which is only used in tests) when Callbacks are sent by the Agent. This is by design as the execution of Callbacks is critical. This is done with a caveat to avoid duplicate processor -- i.e. if a processor exists, the file path is removed from the queue. This means that the processor with the file path to run callback will be still run when the file path is added again in the next loop Tests are added to check the same. closes https://github.com/apache/airflow/issues/13047 closes https://github.com/apache/airflow/pull/11875	2021-01-15 16:40:20 +00:00
Jun	614b70805a	Add verify_ssl config for kubernetes (#13516 )	2021-01-15 15:59:55 +00:00
Jyoti Dhiman	3558538883	Support tables in DAG docs (#13533 )	2021-01-15 12:13:22 +01:00
Kaxil Naik	dc80fa4cbc	Bugfix: Return XCom Value in the XCom Endpoint API (#13684 ) * Bugfix: Return XCom Value in the XCom Endpoint API closes https://github.com/apache/airflow/issues/13676	2021-01-15 10:18:44 +00:00
Brent Bovenzi	2fef2ab1bf	Add JSON linter to DAG Trigger UI (#13551 ) * Add JSON linter to Variable/DAG Trigger UIs Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG. variable_add whitespace Remove JSON linter for add/edit Variables Variable values can be either plain text or json which makes linting more complicated and not worth it for now. * Add JSON linter to DAG Trigger UI Adding codemirror and jshint to lint the text input for config when triggering a DAG. variable_add whitespace Add JSON linter to Variable/DAG Trigger UIs Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG. variable_add whitespace Remove JSON linter for add/edit Variables Variable values can be either plain text or json which makes linting more complicated and not worth it for now. update trigger dag conf test Fixed failing test by adding `id="json"` to the expected html in the `test_trigger_dag_params_conf` test	2021-01-14 15:26:40 -05:00
Ryan Hamilton	87645b331a	Configurable API response (CORS) headers (#13620 ) * Allow setting of API response (CORS) headers via config * Fix RST syntax * Register function to only API instead of all views in app * Add missing/required property * Update spelling dictionary	2021-01-14 15:17:43 -05:00
Kanthi	1d2977f6a4	Add Neo4j hook and operator (#13324 ) Close: #12873	2021-01-14 16:27:50 +00:00
Kaxil Naik	c128aa744e	BugFix: Dag-level Callback Requests were not run (#13651 ) In https://github.com/apache/airflow/pull/13163 - I attempted to only run Callback requests when they are defined on DAG. But I just found out that while we were storing the task-level callbacks as string in Serialized JSON, we were not storing DAG level callbacks and hence it default to None when the Serialized DAG was deserialized which meant that the DAG callbacks were not run. This PR fixes it, we don't need to store DAG level callbacks as string, as we don't display them in the Webserver and the actual contents are not used anywhere in the Scheduler itself. Scheduler just checks if the callbacks are defined and sends it to DagFileProcessorProcess to run with the actual DAG file. So instead of storing the actual callback as string which would have resulted in larger JSON blob, I have added properties to determine whether a callback is defined or not. (`dag.has_on_success_callback` and `dag.has_on_failure_callback`) Note: SLA callbacks don't have issue, as we currently check that SLAs are defined on any tasks are not, if yes, we send it to DagFileProcessorProcess which then executes the SLA callback defined on DAG.	2021-01-14 15:46:58 +00:00
Kaxil Naik	e4b8ee63b0	Increase the default ``min_file_process_interval`` to decrease CPU Usage (#13664 ) With the previous default of `0`, the CPU Usage mostly stays around 100. As in Airflow 2.0.0, the scheduling decisions have been moved out from DagFileProcessor to Scheduler, we can keep this number high. closes https://github.com/apache/airflow/issues/13637	2021-01-14 13:08:12 +00:00
Kamil Breguła	ef8617ec9d	Support google-cloud-tasks>=2.0.0 (#13347 )	2021-01-14 12:18:49 +01:00
Kaxil Naik	61b1ea368d	Update outdated docs in scheduler_job.py (#13663 ) As part of Airflow 2.0.0 and Scheduler HA, we updated the logic of what happens in DagFileProcessor and SchedulerJob. This PR updates the docstrings to match the code.	2021-01-14 10:48:48 +00:00
Kaxil Naik	aef89478e4	Add missing Dag Tag for Example DAGs (#13665 ) `example_dag_decorator` and `tutorial_taskflow_api_etl` were missing `example` dag tag. All the other example DAGs had it. This makes it consistent.	2021-01-14 10:48:14 +00:00
JavierLopezT	04d278f93f	Add S3ToFTPOperator (#11747 ) Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com> Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com> Co-authored-by: Tobiasz Kędzierski <tobiaszkedzierski@gmail.com>	2021-01-13 16:50:08 +01:00
Jun	475f1ab267	Fix invalid continue_token for cleanup list pods (#13563 )	2021-01-13 14:52:01 +00:00
Griffin Cosgrove	548d082008	Update external docs URL for Segment (#13645 )	2021-01-13 13:07:17 +01:00
Kamil Breguła	189af54043	Add system tests for Stackdriver operators (#13644 )	2021-01-13 12:45:22 +01:00
Jarek Potiuk	b007fc33d4	Fixes problems with extras for custom connection types (#13640 ) The custom providers with custom connections can define extra widgets and fields, however there were problems with those custom fields in Aiflow 2.0.0: * When connection type was a subset of another connection type (for example jdbc and jdbcx) widgets from the 'subset' connection type appeared also in the 'superset' one due to prefix matching in javascript. * Each connection when saved received 'full set' of extra fields from other connection types (with empty values). This problem is likely present in Airflow 1.10 but due to limited number of connections supported it had no real implications besides slightly bigger dictionary stored in 'extra' field. * The extra field values were not saved for custom connections. Only the predefined connection types could save extras in extras field. This PR fixes it by: * adding __ matching for javascript to match only full connection types not prefixes * saving only the fields matching extra__<conn_type> when the connection is saved * removing filtering on 'known' connection types (the above filtering on `extra__` results in empty extra for connections that do not have any extra field defined. Fixes #13597	2021-01-13 00:32:49 +01:00
Kaxil Naik	c4112e2e9d	Make the tooltip to Pause / Unpause a DAG clearer (#13642 ) closes https://github.com/apache/airflow/issues/13624	2021-01-12 19:14:31 +00:00
Kaxil Naik	8ecdef3e50	Audit Log records View should not contain link if dag_id is None (#13619 ) closes https://github.com/apache/airflow/issues/13602	2021-01-12 10:16:01 +00:00
Kaxil Naik	6c458f29c0	Change the default celery worker_concurrency to 16 (#13612 ) This change was unintentional -- https://github.com/apache/airflow/pull/7205 That PR just changed it to work with breeze. Since we had `16` as default in 1.10.x and to get better performance and keep in line with `dag_concurrency` and `max_active_runs_per_dag` -- I think `16` makes more sense.	2021-01-11 23:40:58 +00:00
Jarek Potiuk	ad2a030b9e	Introduces separate runtime provider schema (#13488 ) The provider.yaml contains more information that required at runtime (specifically about documentation building). Those fields are not needed at runtime and their presence is optional. Also the runtime check for provider information should be more relexed and allow for future compatibility (with additional properties set to false). This way we can add new, optional fields to provider.yaml without worrying about breaking future-compatibility of providers with future airflow versions. This changei restores 'additionalProperties': false in the main, development-focused provider.yaml schema and introduced new runtime schema that is used to verify the provider info when providers are discovered by airflow. This 'runtime' version should change very rarely as change to add a new required property in it breaks compatibility of providers with already released versions of Airflow. We also trim-down the provider.yaml file when preparing provider packages to only contain those fields that are required in the runtime schema.	2021-01-11 23:10:44 +01:00
baxievski	8d42d9ed69	add xcom push for ECSOperator (#12096 ) This pushes the last cloudwatch event to xcom when do_xcom_push is True Co-authored-by: Felix Uellendall <feluelle@users.noreply.github.com>	2021-01-11 10:07:10 +01:00
Kamil Breguła	a6f999b62e	Support google-cloud-automl >=2.1.0 (#13505 )	2021-01-11 09:39:44 +01:00
Kamil Breguła	947dbb73bb	Support google-cloud-datacatalog>=3.0.0 (#13534 )	2021-01-11 09:39:19 +01:00
Ryan Hamilton	87a7557f8b	Display message and docs link when no plugins are loaded (#13599 )	2021-01-10 12:55:07 -05:00
Kamil Breguła	5954ef5f41	Warn about precedence of env var when getting variables (#13501 )	2021-01-10 10:35:09 +01:00
Xiaodong DENG	4f740db57a	Minor grammar fix in OpenAPI YAML (#13586 )	2021-01-09 09:47:49 +00:00

1 2 3 4 5 ...

7071 Коммитов