`[webserver] secret_key` is also a secret like Fernet key. Allowing
it to be set via _CMD or _SECRET allows users to use the external secret store for it.
Custom operators inheriting from DummyOperator will now instead
of going to a scheduled state will go set straight to success
if they don't have callbacks set.
closes https://github.com/apache/airflow/issues/11393
This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.
Closes: #12699
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Move config item 'worker_precheck' from section [core] to [celery]
This configuration is ONLY applicable for Celery Worker.
So it should be in section [celery], rather than [core]
* Add to deprecation/migration automatic list
The chart has two jobs (migrate-database & create-user).
These jobs are run post-install and post-upgrade and only deleted on success.
So if for some reason (quick reinstall / upgrade), the job fails or is stuck then helm
will fail because the job already exists.
This commit sets the `helm.sh/hook-delete-policy` to `before-hook-creation,hook-succeeded`
so helm will always delete the jobs before creating them again.
Without this commit, the Webserver throws an error when
enabling xcom_pickling in the airflow_config by setting `enable_xcom_pickling = True`
(the default is `False`).
Example error:
```
> return pickle.loads(result.value)
E _pickle.UnpicklingError: invalid load key, '{'.
airflow/models/xcom.py:250: UnpicklingError
--------------------------------------------------
```
Previously the output of instaling remaining packges when testing
provider imports was only shown on error. However it is useful
to know what's going on even if it clutters the log.
Note that this installation is only needed until we include
apache-beam in the installed packages on CI.
Related to #12703
This PR shows the output always .
* Refine the DB query logics in www.views.task_stats()
- given filter_dag_ids is either allowed_dag_ids, or intersection of allowed_dag_ids and selected_dag_ids,
no matter if selected_dag_ids is None or not, filter_dag_ids should ALWAYS be considered into the SQL query.
Currently, if selected_dag_ids is None, the query is actually getting the full result (then 'filter' at the end).
This means more (unnecessary) data travel between Airflow and DB.
- When we join table A and B with A.id == B.id (default is INNER join), if we always confirm ALL A.id is in a specific list,
implicitly ALL ids in the result table are already guaranteed in this specific list as well.
This is why the two redundant .filter() chunks are removed.
Minor performance improvement should be expected.
Meanwhile, this change makes the code cleaner.
* Adds support for Hook discovery from providers
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.
Fixes#12456
* fixup! Adds support for Hook discovery from providers
* fixup! fixup! Adds support for Hook discovery from providers
`time.time() - start`, or `timezone.utcnow() - start_dttm` will work
fine in 99% of cases, but it has one fatal flaw:
They both operate on system time, and that can go backwards.
While this might be surprising, it can happen -- usually due to clocks
being adjusted.
And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.
Also the `utcnow()` style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, _especially_ when compared to
just subtracting two floats.
To make the "common" case easier of wanting to compute a duration for a
block, I have made `Stats.timer()` return an object that has a
`duration` field.
If a task is "manually" set to up_for_retry (via the UI for instance) it
might not have an end date, and much of the logic about computing
retries assumes that it does.
Without this, manually setting a running task to up_for_retry will make
the make it impossible to view the TaskInstance details page (as it
tries to print the is_premature property), and also the NotInRetryPeriod
TIDep fails - both with an exception:
> File "airflow/models/taskinstance.py", line 882, in next_retry_datetime
> return self.end_date + delay
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'
Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.
This sounds nice, but is a huge mis-feature.
So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.
This has two advantages:
1. providers and plugins load much more reliably.
2. it's faster too
Closes#12692
It was added to make snowflake happy, but it is not needed as
package requirement in fact and google provider complains when
the version of pyarrow is too low.
Also when PyArrow limitation is removed, we have to limit
the importlib_resources library back.
This change upgrades setup.py and setup.cfg to provide non-conflicting
`pip check` valid set of constraints for CI image.
Fixes#10854
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.
Add displaying provider information in CLI
Closes: #12470