This change will BaseBranchOperator to do xcom push of the branch it choose to follow.
It will also add support to use the do_xcom_push parameter.
The added change returns the result received by running choose_branch().
Closes: #13704
* Correct the logic for webserver choosing number of workers to spawn (#13469)
A key consequence of this fix is that webserver will properly
exit when gunicorn master dies and stops responding to signals.
This race condition resulted in task success and failure callbacks being
called more than once. Here is the order of events that could lead to
this issue:
* task started running within process 2
* (process 1) local_task_job checked for task return code, returns None
* (process 2) task exited with failure state, task state updated as failed in DB
* (process 2) task failure callback invoked through taskinstance.handle_failure method
* (process 1) local_task_job heartbeat noticed task state set to
failure, mistoken it as state bing updated externally, also invoked task
failure callback
To avoid this race condition, we need to make sure task callbacks are
only invoked within a single process.
This config setting is documented as 0==unlimited, but in my HA
scheduler work I rewrote the code that used this and mistakenly didn't
keep this behaviour.
This re-introduces the correct behaviour and also adds a test so that it
is stays working in the future.
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Fix S3ToSnowflakeOperator to support uploading all files in the specified stage
Currently, users have to specify each file to upload as
the "s3_keys" parameter when using S3ToSnowflakeOperator.
But the `COPY INTO` statement, which S3ToSnowflakeOperator
leverages internally, allows omitting this parameter
so that users can upload whole files in the specified stage.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#syntax
This PR makes S3ToSnowflakeOperator's s3_keys parameter optional
so as to support this functionality.
When production image is built for development purpose, by default
it installs all providers from sources, but not all dependencies
are installed for all providers. Many providers require more
dependencies and when you try to import those packages via
provider's manager, they fail to import and print warnings.
Those warnings are now turned into debug messages, in case
AIRFLOW_INSTALLATION_METHOD=".", which is set when
production image is built locally from sources. This is helpful
especially when you use locally build production image to
run K8S tests - otherwise the logs are flooded with
warnings.
This problem does not happe in CI, because there by default
production image is built from locally prepared packages
and it does not contain sources from providers that are not
installed via packages.
When a dag file is executed via Dag File Processors and multiple callbacks are
created either via zombies or executor events, the dag file is added to
the _file_path_queue and the manager will launch a new process to
process it, which it should not since the dag file is currently under
processing. This will bypass the _parallelism eventually especially when
it takes a long time to process some dag files and since self._processors
is just a dict with file path as the key. So multiple processors with the same key
count as one and hence parallelism is bypassed.
This address the same issue as https://github.com/apache/airflow/pull/11875
but instead does not exclude file paths that are recently processed and that
run at the limit (which is only used in tests) when Callbacks are sent by the
Agent. This is by design as the execution of Callbacks is critical. This is done
with a caveat to avoid duplicate processor -- i.e. if a processor exists,
the file path is removed from the queue. This means that the processor with
the file path to run callback will be still run when the file path is added again in the
next loop
Tests are added to check the same.
closes https://github.com/apache/airflow/issues/13047
closes https://github.com/apache/airflow/pull/11875
* Add JSON linter to Variable/DAG Trigger UIs
Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG.
variable_add whitespace
Remove JSON linter for add/edit Variables
Variable values can be either plain text or json which makes linting more complicated and not worth it for now.
* Add JSON linter to DAG Trigger UI
Adding codemirror and jshint to lint the text input for config when triggering a DAG.
variable_add whitespace
Add JSON linter to Variable/DAG Trigger UIs
Adding codemirror and jshint to lint the text input for add/edit a Variable and for config when triggering a DAG.
variable_add whitespace
Remove JSON linter for add/edit Variables
Variable values can be either plain text or json which makes linting more complicated and not worth it for now.
update trigger dag conf test
Fixed failing test by adding `id="json"` to the expected html in the `test_trigger_dag_params_conf` test
* Allow setting of API response (CORS) headers via config
* Fix RST syntax
* Register function to only API instead of all views in app
* Add missing/required property
* Update spelling dictionary
In https://github.com/apache/airflow/pull/13163 - I attempted to only run
Callback requests when they are defined on DAG. But I just found out
that while we were storing the task-level callbacks as string in Serialized
JSON, we were not storing DAG level callbacks and hence it default to None
when the Serialized DAG was deserialized which meant that the DAG callbacks
were not run.
This PR fixes it, we don't need to store DAG level callbacks as string, as
we don't display them in the Webserver and the actual contents are not used anywhere
in the Scheduler itself. Scheduler just checks if the callbacks are defined and sends
it to DagFileProcessorProcess to run with the actual DAG file. So instead of storing
the actual callback as string which would have resulted in larger JSON blob, I have
added properties to determine whether a callback is defined or not.
(`dag.has_on_success_callback` and `dag.has_on_failure_callback`)
Note: SLA callbacks don't have issue, as we currently check that SLAs are defined on
any tasks are not, if yes, we send it to DagFileProcessorProcess which then executes
the SLA callback defined on DAG.
With the previous default of `0`, the CPU Usage mostly stays around 100.
As in Airflow 2.0.0, the scheduling decisions have been moved out from
DagFileProcessor to Scheduler, we can keep this number high.
closes https://github.com/apache/airflow/issues/13637
As part of Airflow 2.0.0 and Scheduler HA, we updated the logic
of what happens in DagFileProcessor and SchedulerJob.
This PR updates the docstrings to match the code.
The custom providers with custom connections can define
extra widgets and fields, however there were problems with
those custom fields in Aiflow 2.0.0:
* When connection type was a subset of another connection
type (for example jdbc and jdbcx) widgets from the
'subset' connection type appeared also in the 'superset'
one due to prefix matching in javascript.
* Each connection when saved received 'full set' of extra
fields from other connection types (with empty values).
This problem is likely present in Airflow 1.10 but due
to limited number of connections supported it had no
real implications besides slightly bigger dictionary
stored in 'extra' field.
* The extra field values were not saved for custom connections.
Only the predefined connection types could save extras in
extras field.
This PR fixes it by:
* adding __ matching for javascript to match only full connection
types not prefixes
* saving only the fields matching extra__<conn_type> when the
connection is saved
* removing filtering on 'known' connection types (the above
filtering on `extra__` results in empty extra for
connections that do not have any extra field defined.
Fixes#13597
This change was unintentional -- https://github.com/apache/airflow/pull/7205
That PR just changed it to work with breeze. Since we had `16` as default in 1.10.x
and to get better performance and keep in line with `dag_concurrency` and
`max_active_runs_per_dag` -- I think `16` makes more sense.
The provider.yaml contains more information that required at
runtime (specifically about documentation building). Those
fields are not needed at runtime and their presence is optional.
Also the runtime check for provider information should be more
relexed and allow for future compatibility (with
additional properties set to false). This way we can add new,
optional fields to provider.yaml without worrying about breaking
future-compatibility of providers with future airflow versions.
This changei restores 'additionalProperties': false in the
main, development-focused provider.yaml schema and introduced
new runtime schema that is used to verify the provider info when
providers are discovered by airflow.
This 'runtime' version should change very rarely as change to
add a new required property in it breaks compatibility of
providers with already released versions of Airflow.
We also trim-down the provider.yaml file when preparing provider
packages to only contain those fields that are required in the
runtime schema.