This race condition resulted in task success and failure callbacks being
called more than once. Here is the order of events that could lead to
this issue:
* task started running within process 2
* (process 1) local_task_job checked for task return code, returns None
* (process 2) task exited with failure state, task state updated as failed in DB
* (process 2) task failure callback invoked through taskinstance.handle_failure method
* (process 1) local_task_job heartbeat noticed task state set to
failure, mistoken it as state bing updated externally, also invoked task
failure callback
To avoid this race condition, we need to make sure task callbacks are
only invoked within a single process.
In https://github.com/apache/airflow/pull/13163 - I attempted to only run
Callback requests when they are defined on DAG. But I just found out
that while we were storing the task-level callbacks as string in Serialized
JSON, we were not storing DAG level callbacks and hence it default to None
when the Serialized DAG was deserialized which meant that the DAG callbacks
were not run.
This PR fixes it, we don't need to store DAG level callbacks as string, as
we don't display them in the Webserver and the actual contents are not used anywhere
in the Scheduler itself. Scheduler just checks if the callbacks are defined and sends
it to DagFileProcessorProcess to run with the actual DAG file. So instead of storing
the actual callback as string which would have resulted in larger JSON blob, I have
added properties to determine whether a callback is defined or not.
(`dag.has_on_success_callback` and `dag.has_on_failure_callback`)
Note: SLA callbacks don't have issue, as we currently check that SLAs are defined on
any tasks are not, if yes, we send it to DagFileProcessorProcess which then executes
the SLA callback defined on DAG.
- Use getattr() instead of __dict__ as __dict__ doesn't return
correct values for properties.
- Avoid unnecessary condition checks (the removed condition checks are covered by _comps)
If no on_*_callback are defined on DAG, Callbacks should not be registered
and sent to DAG Processor.
This will reduce the KeyError mentioned in https://github.com/apache/airflow/issues/13047
As discussed in AIP-21
* Rename airflow.sensors.external_task_sensor to airflow.sensors.external_task
* Rename airflow.sensors.sql_sensor to airflow.sensors.sql
* Rename airflow.contrib.sensors.weekday_sensor to airflow.sensors.weekday
Connection form behaviour depends on the connection type. Since we've
separated providers into separate packages, the connection form should
be extendable by each provider. This PR implements both:
* extra fields added by provider
* configurable behaviour per provider
This PR will be followed by separate documentation on how to write your
provider.
Also this change triggers (in tests only) the snowflake annoyance
described in #12881 so we had to xfail presto test where monkeypatching
of snowflake causes the test to fail.
Part of #11429
This change increases the maximum amount of code one can store
in dag_code in MySQL. The limit for TEXT is 64KB where for
MEDIUMTEXT is 16MB.
Fixes#12776
The previous behaviour led to "bad" data being written in the DB -- for
example:
```json
"dag": {
"tasks": [
"serialization_failed"
],
```
(`tasks` should be a list of dictionaries. It clearly isn't.)
Instead of doing this we throw an error, that is captured and showing
using the existing import_error mechanism for DAGs. This almost
certainly happens because a user has done "something interesting".
Without this change sensors in "reschedule" mode were being instantly
rescheduled because they didn't have the extra dep that
BaseSensorOperator added.
To fix that we need to include deps in the serialization format (but to
save space only when they are not the default list). As of this PR right
now, only built-in deps are allowed -- a custom dep will result in a DAG
parse error.
We can fix that for 2.0.x, as I think it is a _very_ uncommon thing to
do.
Fixes#12783
Dags with a schedule interval of None, or `@once` don't have a following
schedule, so we can't realistically calculate this metric.
Additionally, this changes the emitted metric from seconds to
milliseconds -- all timers to statsd should be in milliseconds -- this
is what Statsd and apps that consume data from there expect. See #10629
for more details.
This will be a "breaking" change from 1.10.14, where the metric was
back-ported to, but was (incorrectly) emitting seconds.
This can still show "None" if the dag is not yet in the metadata DB --
showing either True or False there would give a false impression
(especially False -- as if it doesn't exist in the DB it can't be
unpaused yet!)
* Cleanup & improvement around scheduling
- Remove unneeded code line
- Remove stale docstring
- Fix wrong docstring
- Fix stale doc image link in docstring
- avoid unnecessary loop in DagRun.schedule_tis()
- Minor improvement on DAG.deactivate_stale_dags()
which is invoked inside SchedulerJob
* Revert one change, because we plan to have a dedicated project-wise PR for this issue
* One more fix: dagbag.read_dags_from_db = True in DagFileProcess.process_file() is not needed anymore
Custom operators inheriting from DummyOperator will now instead
of going to a scheduled state will go set straight to success
if they don't have callbacks set.
closes https://github.com/apache/airflow/issues/11393
Without this commit, the Webserver throws an error when
enabling xcom_pickling in the airflow_config by setting `enable_xcom_pickling = True`
(the default is `False`).
Example error:
```
> return pickle.loads(result.value)
E _pickle.UnpicklingError: invalid load key, '{'.
airflow/models/xcom.py:250: UnpicklingError
--------------------------------------------------
```
* Adds support for Hook discovery from providers
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.
Fixes#12456
* fixup! Adds support for Hook discovery from providers
* fixup! fixup! Adds support for Hook discovery from providers
`time.time() - start`, or `timezone.utcnow() - start_dttm` will work
fine in 99% of cases, but it has one fatal flaw:
They both operate on system time, and that can go backwards.
While this might be surprising, it can happen -- usually due to clocks
being adjusted.
And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.
Also the `utcnow()` style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, _especially_ when compared to
just subtracting two floats.
To make the "common" case easier of wanting to compute a duration for a
block, I have made `Stats.timer()` return an object that has a
`duration` field.
If a task is "manually" set to up_for_retry (via the UI for instance) it
might not have an end date, and much of the logic about computing
retries assumes that it does.
Without this, manually setting a running task to up_for_retry will make
the make it impossible to view the TaskInstance details page (as it
tries to print the is_premature property), and also the NotInRetryPeriod
TIDep fails - both with an exception:
> File "airflow/models/taskinstance.py", line 882, in next_retry_datetime
> return self.end_date + delay
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'
Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.
This change adds migration and fixes the previous migration
to make it compatible.
Fixes a bug where Airflow will attempt to set child tasks to schedulable
for test tasks when users run `airflow task test.` This causes an error
as Airflow runs a DB seek for a task that has not been recorded.
This changes XComArg string representation from 'task_instance.pull(...)'
to '{{ task_instance.xcom_pull(...) }}' so users can use XComArgs with
f-string (and other) in simpler way. Instead of doing
f'echo {{{{ {op.output} }}}}' they can simply do f'echo {op.output}'.
The init_on_load method used deserialize_value method which
in case of custom XCom backends may perform requests to external
services (for example downloading file from buckets).
This is problematic because wherever we query XCom the resuest would be
send (for example when listing XCom in webui). This PR proposes implementing
orm_deserialize_value which allows overriding this behavior. By default
we use BaseXCom.deserialize_value.
closes: #12315
* K8s yaml templates not rendered by k8sexecutor
There is a bug in the yaml template rendering caused by the logic that
yaml templates are only generated when the current executor is the
k8sexecutor. This is a problem as the templates are generated by the
task pod, which is itself running a LocalExecutor. Also generates a
"base" template if this taskInstance has not run yet.
* fix tests
* fix taskinstance test
* fix taskinstance
* fix pod generator tests
* fix podgen
* Update tests/kubernetes/test_pod_generator.py
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
* @ashb comment
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
This commit adds new concept of dag_policy which is checked
once for every DAG when creating DagBag. It also improves
documentation around cluster policies.
closes: #12179
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
This function allows users of the k8s executor to get previews
of their tasks via the Airflow UI before they launch
Co-authored-by: Ryan Hamilton <ryan@ryanahamilton.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Since #7694 these haven't really be needed, but we hadn't removed them
yet.
No UPDATING.md note for this as I think it's extremely unlikely anyone
was using this directly -- it's very much an implementation detail
relating to DAG/SimpleDag.
Hooks do not need to live under "airflow.hooks" namespace for them to
work -- so remove the ability to create them under there in plugins.
Using them as normal python imports is good enough!
We still allow them to be "registered" to support dynamically populating
the connections list in the UI (which won't be done for 2.0)
Closes#9507