As discussed in AIP-21
* Rename airflow.sensors.external_task_sensor to airflow.sensors.external_task
* Rename airflow.sensors.sql_sensor to airflow.sensors.sql
* Rename airflow.contrib.sensors.weekday_sensor to airflow.sensors.weekday
As discussed in AIP-21
* Rename airflow.hooks.base_hook to airflow.hooks.base
* Rename airflow.hooks.dbapi_hook to airflow.hooks.dbapi
* Rename airflow.sensors.base_sensor_operator to airflow.sensors.base
* Rename airflow.sensors.date_time_sensor to airflow.sensors.date_time
* Rename airflow.sensors.time_delta_sensor to airflow.sensors.time_delta
Co-authored-by: Kaxil Naik <kaxilnaik@apache.org>
Connection form behaviour depends on the connection type. Since we've
separated providers into separate packages, the connection form should
be extendable by each provider. This PR implements both:
* extra fields added by provider
* configurable behaviour per provider
This PR will be followed by separate documentation on how to write your
provider.
Also this change triggers (in tests only) the snowflake annoyance
described in #12881 so we had to xfail presto test where monkeypatching
of snowflake causes the test to fail.
Part of #11429
In order to allow a plugin-provided macro to be used at templating time,
it needs to be exposed through the airflow.macros module.
* Add cleanup logic to test_registering_plugin_macros
This test-case has side-effects in the sense that the symbol table of
the airflow.macros module is altered when integrate_macros_plugins() is
invoked. This commit adds a finalizer to the test case that ensures that
that module is being reloaded completely in order to prevent impact on
other tests.
* Integrate plugin-provided macros in subprocesses
When Airflow is available in a virtual environment, and when this
environment runs at least Python 3, then plugin-provided macros should
be made available to the Python callable that is being executed in this
environment.
* Document macros limitation
Plugin-provided macros can not be used on Python 2 when using
PythonVirtualenvOperator any longer.
`entry_point.module_name` -- Entrypoint does not have a `module_name`
attribute.
This commit also makes importlib_metadata conditional as it is not
needed for Py 3.9
The previous behaviour led to "bad" data being written in the DB -- for
example:
```json
"dag": {
"tasks": [
"serialization_failed"
],
```
(`tasks` should be a list of dictionaries. It clearly isn't.)
Instead of doing this we throw an error, that is captured and showing
using the existing import_error mechanism for DAGs. This almost
certainly happens because a user has done "something interesting".
So far, the production images of Airflow were using sources
when they were built on CI. This PR changes that, to build
airflow + providers packages first and install them
rather than use sources as installation mechanism.
Part of #12261
Without this change sensors in "reschedule" mode were being instantly
rescheduled because they didn't have the extra dep that
BaseSensorOperator added.
To fix that we need to include deps in the serialization format (but to
save space only when they are not the default list). As of this PR right
now, only built-in deps are allowed -- a custom dep will result in a DAG
parse error.
We can fix that for 2.0.x, as I think it is a _very_ uncommon thing to
do.
Fixes#12783
Dags with a schedule interval of None, or `@once` don't have a following
schedule, so we can't realistically calculate this metric.
Additionally, this changes the emitted metric from seconds to
milliseconds -- all timers to statsd should be in milliseconds -- this
is what Statsd and apps that consume data from there expect. See #10629
for more details.
This will be a "breaking" change from 1.10.14, where the metric was
back-ported to, but was (incorrectly) emitting seconds.
This can still show "None" if the dag is not yet in the metadata DB --
showing either True or False there would give a false impression
(especially False -- as if it doesn't exist in the DB it can't be
unpaused yet!)
If you try to run `airflow config list` with an old config you upgraded
from 1.8, it would fail for any sections that have been removed from the
default cofing -- `ldap` for instance.
This would also be a problem if the user makes a typo in a config
section, or is using the airflow config for storing their own
information.
While I was changing this code, I also removed the use of private
methods/variable access in favour of public API
* Dagrun object doesn't exist in the TriggerDagRunOperator
fixes https://github.com/apache/airflow/issues/12587
Fixes issue where dag_run object is not populated if the dag_run already
exists and is reset
* change to get_last_dag_run
* Update airflow/operators/dagrun_operator.py
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
This PR is a followup after #12375 and #12704 it improves handling
of some errors in cli commands to avoid show users to much traceback
and uses SystemExit consistently.
Custom operators inheriting from DummyOperator will now instead
of going to a scheduled state will go set straight to success
if they don't have callbacks set.
closes https://github.com/apache/airflow/issues/11393
This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.
Closes: #12699
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Move config item 'worker_precheck' from section [core] to [celery]
This configuration is ONLY applicable for Celery Worker.
So it should be in section [celery], rather than [core]
* Add to deprecation/migration automatic list
Without this commit, the Webserver throws an error when
enabling xcom_pickling in the airflow_config by setting `enable_xcom_pickling = True`
(the default is `False`).
Example error:
```
> return pickle.loads(result.value)
E _pickle.UnpicklingError: invalid load key, '{'.
airflow/models/xcom.py:250: UnpicklingError
--------------------------------------------------
```
* Adds support for Hook discovery from providers
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.
Fixes#12456
* fixup! Adds support for Hook discovery from providers
* fixup! fixup! Adds support for Hook discovery from providers
`time.time() - start`, or `timezone.utcnow() - start_dttm` will work
fine in 99% of cases, but it has one fatal flaw:
They both operate on system time, and that can go backwards.
While this might be surprising, it can happen -- usually due to clocks
being adjusted.
And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.
Also the `utcnow()` style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, _especially_ when compared to
just subtracting two floats.
To make the "common" case easier of wanting to compute a duration for a
block, I have made `Stats.timer()` return an object that has a
`duration` field.
If a task is "manually" set to up_for_retry (via the UI for instance) it
might not have an end date, and much of the logic about computing
retries assumes that it does.
Without this, manually setting a running task to up_for_retry will make
the make it impossible to view the TaskInstance details page (as it
tries to print the is_premature property), and also the NotInRetryPeriod
TIDep fails - both with an exception:
> File "airflow/models/taskinstance.py", line 882, in next_retry_datetime
> return self.end_date + delay
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'
Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.
This sounds nice, but is a huge mis-feature.
So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.
This has two advantages:
1. providers and plugins load much more reliably.
2. it's faster too
Closes#12692
This change upgrades setup.py and setup.cfg to provide non-conflicting
`pip check` valid set of constraints for CI image.
Fixes#10854
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.
Add displaying provider information in CLI
Closes: #12470