Without this change sensors in "reschedule" mode were being instantly
rescheduled because they didn't have the extra dep that
BaseSensorOperator added.
To fix that we need to include deps in the serialization format (but to
save space only when they are not the default list). As of this PR right
now, only built-in deps are allowed -- a custom dep will result in a DAG
parse error.
We can fix that for 2.0.x, as I think it is a _very_ uncommon thing to
do.
Fixes#12783
We have a number of custom forms that have required fields that weren't
explicitly marked as required.
This allowed you to submit the Connection form (for example) with
nothing as the Conn Id, leading to an empty string being used as the
connection id. This marks that and all the other required fields as
required.
We also replace DataRequired with InputRequired. The previous one is
tested the truthyness of the value, rather than just that a value was
submitted.
When airflow is not installed as package (for example for local
development from sources) there is no package metadata.
Many of our unit tests use the version field and they fail if they
are run within virtual environment where airflow is not installed
as package (for example in IntelliJ this is default setting.
This PR adds fall-back to read airflow version from setup in
case it cannot be read from package metadata.
Dags with a schedule interval of None, or `@once` don't have a following
schedule, so we can't realistically calculate this metric.
Additionally, this changes the emitted metric from seconds to
milliseconds -- all timers to statsd should be in milliseconds -- this
is what Statsd and apps that consume data from there expect. See #10629
for more details.
This will be a "breaking" change from 1.10.14, where the metric was
back-ported to, but was (incorrectly) emitting seconds.
Importing anything from airflow.models pulls in _a lot_ of airflow, so
delay imports until the functions are called, or make use of the
`TYPE_CHECKING` to not actually import at runtime.
**Before**: mean 2.58s (with a lot of variance)
```
airflow ❯ for i in 1 2 3; do time airflow --help >/dev/null; done
airflow --help > /dev/null 2.00s user 1.39s system 176% cpu 1.928 total
airflow --help > /dev/null 2.84s user 1.43s system 151% cpu 2.817 total
airflow --help > /dev/null 3.00s user 1.37s system 145% cpu 3.009 total
```
**After**: 0.526s
```
airflow --help > /dev/null 0.39s user 0.04s system 99% cpu 0.435 total
airflow --help > /dev/null 0.40s user 0.05s system 99% cpu 0.446 total
airflow --help > /dev/null 0.64s user 0.05s system 99% cpu 0.698 total
```
This also has an advantage in development where a syntax error doesn't
fail with a slightly confusing error message about "unable to configure
logger 'task'".
This can still show "None" if the dag is not yet in the metadata DB --
showing either True or False there would give a false impression
(especially False -- as if it doesn't exist in the DB it can't be
unpaused yet!)
If you try to run `airflow config list` with an old config you upgraded
from 1.8, it would fail for any sections that have been removed from the
default cofing -- `ldap` for instance.
This would also be a problem if the user makes a typo in a config
section, or is using the airflow config for storing their own
information.
While I was changing this code, I also removed the use of private
methods/variable access in favour of public API
* Cleanup & improvement around scheduling
- Remove unneeded code line
- Remove stale docstring
- Fix wrong docstring
- Fix stale doc image link in docstring
- avoid unnecessary loop in DagRun.schedule_tis()
- Minor improvement on DAG.deactivate_stale_dags()
which is invoked inside SchedulerJob
* Revert one change, because we plan to have a dedicated project-wise PR for this issue
* One more fix: dagbag.read_dags_from_db = True in DagFileProcess.process_file() is not needed anymore
* Dagrun object doesn't exist in the TriggerDagRunOperator
fixes https://github.com/apache/airflow/issues/12587
Fixes issue where dag_run object is not populated if the dag_run already
exists and is reset
* change to get_last_dag_run
* Update airflow/operators/dagrun_operator.py
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
This PR is a followup after #12375 and #12704 it improves handling
of some errors in cli commands to avoid show users to much traceback
and uses SystemExit consistently.
`[webserver] secret_key` is also a secret like Fernet key. Allowing
it to be set via _CMD or _SECRET allows users to use the external secret store for it.
Custom operators inheriting from DummyOperator will now instead
of going to a scheduled state will go set straight to success
if they don't have callbacks set.
closes https://github.com/apache/airflow/issues/11393
This commit unifies the mechanism of rendering output of tabular
data. This gives users a possibility to eiter display a tabular
representation of data or render it as valid json or yaml payload.
Closes: #12699
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Move config item 'worker_precheck' from section [core] to [celery]
This configuration is ONLY applicable for Celery Worker.
So it should be in section [celery], rather than [core]
* Add to deprecation/migration automatic list
Without this commit, the Webserver throws an error when
enabling xcom_pickling in the airflow_config by setting `enable_xcom_pickling = True`
(the default is `False`).
Example error:
```
> return pickle.loads(result.value)
E _pickle.UnpicklingError: invalid load key, '{'.
airflow/models/xcom.py:250: UnpicklingError
--------------------------------------------------
```
* Refine the DB query logics in www.views.task_stats()
- given filter_dag_ids is either allowed_dag_ids, or intersection of allowed_dag_ids and selected_dag_ids,
no matter if selected_dag_ids is None or not, filter_dag_ids should ALWAYS be considered into the SQL query.
Currently, if selected_dag_ids is None, the query is actually getting the full result (then 'filter' at the end).
This means more (unnecessary) data travel between Airflow and DB.
- When we join table A and B with A.id == B.id (default is INNER join), if we always confirm ALL A.id is in a specific list,
implicitly ALL ids in the result table are already guaranteed in this specific list as well.
This is why the two redundant .filter() chunks are removed.
Minor performance improvement should be expected.
Meanwhile, this change makes the code cleaner.
* Adds support for Hook discovery from providers
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.
Fixes#12456
* fixup! Adds support for Hook discovery from providers
* fixup! fixup! Adds support for Hook discovery from providers
`time.time() - start`, or `timezone.utcnow() - start_dttm` will work
fine in 99% of cases, but it has one fatal flaw:
They both operate on system time, and that can go backwards.
While this might be surprising, it can happen -- usually due to clocks
being adjusted.
And while it is might seem rare, for long running processes it is more
common than we might expect. Most of these durations are harmless to get
wrong (just being logs) it is better to be safe than sorry.
Also the `utcnow()` style I have replaced will be much lighter weight -
creating a date time object is a comparatively expensive operation, and
computing a diff between two even more so, _especially_ when compared to
just subtracting two floats.
To make the "common" case easier of wanting to compute a duration for a
block, I have made `Stats.timer()` return an object that has a
`duration` field.
If a task is "manually" set to up_for_retry (via the UI for instance) it
might not have an end date, and much of the logic about computing
retries assumes that it does.
Without this, manually setting a running task to up_for_retry will make
the make it impossible to view the TaskInstance details page (as it
tries to print the is_premature property), and also the NotInRetryPeriod
TIDep fails - both with an exception:
> File "airflow/models/taskinstance.py", line 882, in next_retry_datetime
> return self.end_date + delay
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'
Using `pkg_resources.iter_entry_points` validates the version
constraints, and if any fail it will throw an Exception for that
entrypoint.
This sounds nice, but is a huge mis-feature.
So instead of that, switch to using importlib.metadata (well, it's
backport importlib_metadata) that just gives us the entrypoints - no
other verification of requirements is performed.
This has two advantages:
1. providers and plugins load much more reliably.
2. it's faster too
Closes#12692
This change upgrades setup.py and setup.cfg to provide non-conflicting
`pip check` valid set of constraints for CI image.
Fixes#10854
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.
Add displaying provider information in CLI
Closes: #12470
- Remove the stale internal method `_get_security_context_val`
It was added in PR #5429, but to what I can see, it's not needed anymore.
- Avoid hard-coding when we can (we already have `core_section` specified, so can avoid using ``'core'`)
- Narrow down what we import from `airflow.settings`
By listening on the engine's `commit` we were picking up _all_ session
commit calls, even from sessions other than the one passed to
`prohibit_commit`, which was not intended.
This changes it to listen to before_commit, which is session specific,
rather than engine "global" and also adds tests which were lacking
before hand.
Due to not executing MySQL8 tests Fixed in #12591 added
description for connection table was not compatible with
MySQL8 with utf8mb4 character set.
This change adds migration and fixes the previous migration
to make it compatible.
From Airflow 2.0, `max_threads` config under `[scheduler]` section has been renamed to `parsing_processes`.
This is to align the name with the actual code where the Scheduler launches the number of processes defined by
`[scheduler] parsing_processes` to Parse DAG files, calculates next DagRun date for each DAG,
serialize them and store them in the DB.
Fixes a bug where Airflow will attempt to set child tasks to schedulable
for test tasks when users run `airflow task test.` This causes an error
as Airflow runs a DB seek for a task that has not been recorded.
https://github.com/apache/airflow/pull/9067 made conn_id unique,
and this is effective from 2.0.*.
Due to this change, BaseHook.get_connections() will return a List of length 1, or raise Exception.
In such a case, we should simply always get the only element
from the result from BaseHook.get_connections(),
and drop random.choice() in BaseHook.get_connection(), which was only applicable for the earlier
setting (multiple connections is allowed for single conn_id)
The dags view uses onclick events for dagrun and taskinstance links.
This breaks url previews, copying urls, opening links in a new tab, etc.
This patch uses svg anchors with href attributes instead of onclick
events so that these links behave like normal links.
Documentation fixes/improvements:
- For Variables set by Environment Variable,
it was highlighted that it may not appear in the web UI.
But this was not highlighted for Connections set by Environment Variable.
This PR adds this note (in docs/howto/connection/index.rst).
- Fix wrong docstring of airflow.secrets.base_secrets.BaseSecretsBackend.get_variable().
- The Secret Backends don't properly mentioning Variables in the docstrings
(all the focus was put on Connections only). This PR addresses this.
- Other a few minor changes.
## Housekeeping for www/security.py
- correct type hint for dag_id (str rather than int)
- Use DAG name without prefix "DAG:" in logging (line 644)
- avoid unnecessary duplicated operation (line 653)
## Clean-up the logic in update_admin_perm_view()
Because RESOURCE_DAG_PREFIX is "DAG:" and RESOURCE_DAG is "DAGs",
if we have view_menu_id.in_(pv_ids),
we can be sure that view_menu_id != all_dag_view.id.
By making this change, we have cleaner logic, and can avoid some talks to DB.
This is a follow up to #10023 - it seems that passing the defined namespace to the log call was missed in one of the rebases done during the PR.
Without this fix, logging will fail when the k8s connection uses a different namespace than the one SparkApplication(s) are actually submitted to.
The pool import command returns an exit code of zero in a few different
error cases. This causes problems for scripts that invoke the command,
since commands that actually failed will appear to have worked. This
patch returns a nonzero code if the pool file doesn't exist, if the file
isn't valid json, or if any of the pools in the file is invalid.
Fixes a bug when calling `/api/v1/dags/~/dagRuns/~/taskInstances/list` with dag_ids as parameter.
The schema had defined `start_date`, `end_date` and `state` as non-nullable, but they are optional.
This is the same fix as in #12461, but we didn't notice it as the tests
failed after 50 failures.
It also turns out that the k8s API doesn't take a V1NodeSelector and instead
just takes a dict.
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
* Improve www.security.get_accessible_dags() and webserver performance
- the performance of get_accessible_dags() is improved by returning as early as possible
- the changes made in www.views.py are based on the fact that the check
on permissions.RESOURCE_DAG is already done in get_accessible_dags(),
which is invoked by get_accessible_dag_ids() then.
* Fix-up. Incorporate the changes suggested by jhtimmins with minor change
Co-Authored-By: jhtimmins <jameshtimmins@gmail.com>
* Fix backwards compatibility further
This PR ensures that node_selector, affinity, and tolerations are all
converted into k8s API objects before they are sent to the
pod_mutation_hook. this fixes an inconsistency that would force airflow
engineers to consider both cases when writing their pod_mutation_hook
* nit
* Make the KubernetesPodOperator backwards compatible
This PR significantly reduces the pain of upgrading to Airflow 2.0
for users of the KubernetesPodOperator. Users will be allowed to
continue using the airflow.kubernetes custom classes
* spellcheck
* spelling
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
This changes XComArg string representation from 'task_instance.pull(...)'
to '{{ task_instance.xcom_pull(...) }}' so users can use XComArgs with
f-string (and other) in simpler way. Instead of doing
f'echo {{{{ {op.output} }}}}' they can simply do f'echo {op.output}'.
[ldap] section in airflow.cfg is not applicable anymore in 2.0 and master,
because the LDAP authentication (for webserver and API) is handled by FAB,
and the configuration for this is handled by webserver_config.py file.
The init_on_load method used deserialize_value method which
in case of custom XCom backends may perform requests to external
services (for example downloading file from buckets).
This is problematic because wherever we query XCom the resuest would be
send (for example when listing XCom in webui). This PR proposes implementing
orm_deserialize_value which allows overriding this behavior. By default
we use BaseXCom.deserialize_value.
closes: #12315
This commit adds provide_file_and_upload context manager
which works similar to provide_file. Users using it can
avoid boilerplate code of creating temporary file and then
uploading its content to GCS.
Loading plugins, particularly from setuptools entry points can be slow,
and since by default this happens per-task, it can slow down task
execution unexpectedly.
By having this log message users can know the source of the delay
(The change to test_standard_task_runner was to remove logging-config
side effects from that test)
Some CLI commands simply print messages when the requests fail.
The issue is the exit code for these commands are 0 while it should be non-zero.
Pursuing very detailed status code may not make sense here.
But we can at least ensure we give non-zero status by using raise SystemExit().
More proper exist status ensures people can better make use of the CLI.
(A few minor string expression issues are fixed here as well).
Crucial feature of functions decorated by @task is to be able
to invoke them multiple times in single DAG. To do this we are
generating custom task_id for each invocation. However, this didn't
work with TaskGroup as the task_id is already altered by adding group_id
prefix. This PR fixes it.
closes: #12309
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Fix full_pod_spec for k8spodoperator
Fixes a bug where the `full_pod_spec` argument is never factored
into the kubernetespodoperator. The new order of operations is as
follows:
1. Check to see if there is a pod_template_file and if so create the initial pod, else start with empty pod
2. if there is a full_pod_spec , reconcile the pod_template_file pod and the full_pod_spec pod
3. reconcile with any of the argument overrides
* add tests
Becase `2c6edca13270` (Resource based permissions) & `849da589634d` (Prefix DAG permissions)
were run before `92c57b58940d_add_fab_tables.py` and `03afc6b6f902_increase_length_of_fab_ab_view_menu_.py`,
the FAB tables were already created because those migrations imported `from airflow.www.app import create_app`
which calls the following lines that creates tables:
0e7f62418b/flask_appbuilder/security/sqla/manager.py (L86-L97)
Previously:
```
INFO [alembic.runtime.migration] Running upgrade bef4f3d11e8b -> 98271e7606e2, Add scheduling_decision to DagRun and DAG
INFO [alembic.runtime.migration] Running upgrade 98271e7606e2 -> 52d53670a240, fix_mssql_exec_date_rendered_task_instance_fields_for_MSSQL
INFO [alembic.runtime.migration] Running upgrade 52d53670a240 -> 849da589634d, Prefix DAG permissions.
[2020-11-14 02:35:43,055] {manager.py:727} WARNING - No user yet created, use flask fab command to do it.
[2020-11-14 02:35:46,790] {migration.py:515} INFO - Running upgrade 849da589634d -> 364159666cbd, Add creating_job_id to DagRun table
[2020-11-14 02:35:46,794] {migration.py:515} INFO - Running upgrade 364159666cbd -> 2c6edca13270, Resource based permissions.
[2020-11-14 02:35:46,795] {app.py:87} INFO - User session lifetime is set to 43200 minutes.
[2020-11-14 02:35:46,806] {manager.py:727} WARNING - No user yet created, use flask fab command to do it.
[2020-11-14 02:35:48,221] {migration.py:515} INFO - Running upgrade 2c6edca13270 -> 45ba3f1493b9, add-k8s-yaml-to-rendered-templates
[2020-11-14 02:35:48,226] {migration.py:515} INFO - Running upgrade 45ba3f1493b9 -> 92c57b58940d, Create FAB Tables
[2020-11-14 02:35:48,227] {migration.py:515} INFO - Running upgrade 92c57b58940d -> 03afc6b6f902, Increase length of FAB ab_view_menu.name column
```
Now:
```
INFO [alembic.runtime.migration] Running upgrade bef4f3d11e8b -> 98271e7606e2, Add scheduling_decision to DagRun and DAG
INFO [alembic.runtime.migration] Running upgrade 98271e7606e2 -> 52d53670a240, fix_mssql_exec_date_rendered_task_instance_fields_for_MSSQL
INFO [alembic.runtime.migration] Running upgrade 52d53670a240 -> 364159666cbd, Add creating_job_id to DagRun table
INFO [alembic.runtime.migration] Running upgrade 364159666cbd -> 45ba3f1493b9, add-k8s-yaml-to-rendered-templates
INFO [alembic.runtime.migration] Running upgrade 45ba3f1493b9 -> 92c57b58940d, Create FAB Tables
INFO [alembic.runtime.migration] Running upgrade 92c57b58940d -> 03afc6b6f902, Increase length of FAB ab_view_menu.name column
INFO [alembic.runtime.migration] Running upgrade 03afc6b6f902 -> 849da589634d, Prefix DAG permissions.
[2020-11-14 02:57:18,886] {manager.py:727} WARNING - No user yet created, use flask fab command to do it.
[2020-11-14 02:57:22,380] {migration.py:515} INFO - Running upgrade 849da589634d -> 2c6edca13270, Resource based permissions.
```
- Use a context manager to encapsulate task logging setup and teardown
- Create a copy, not a reference, of the handlers list
- Remove logging.shutdown(), it simply should not be called
Closes#12090
* K8s yaml templates not rendered by k8sexecutor
There is a bug in the yaml template rendering caused by the logic that
yaml templates are only generated when the current executor is the
k8sexecutor. This is a problem as the templates are generated by the
task pod, which is itself running a LocalExecutor. Also generates a
"base" template if this taskInstance has not run yet.
* fix tests
* fix taskinstance test
* fix taskinstance
* fix pod generator tests
* fix podgen
* Update tests/kubernetes/test_pod_generator.py
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
* @ashb comment
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Rather than import the backend Task model directly, use the class that the backend actually uses. This could have been customised, and there is no reason not to use this reference.
This commit adds new concept of dag_policy which is checked
once for every DAG when creating DagBag. It also improves
documentation around cluster policies.
closes: #12179
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
If you have configured S3 logs, but there is a problem then this is never
surfaced to the UI (nor the webserver logs) making this very hard to
debug.
This PR exposes some of these errors to the user.
Co-authored-by: Joao Ponte <jpe@plista.com>
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
- adding install Databricks API to databricks hook(api/2.0/libraries/install)
- adding uninstall Databricks API to databricks hook (2.0/libraries/uninstall)
* Fixes an issue that was causing an empty list being sent to the BigQuery client `list_rows` method resulting in no schema being returned.
* Added a test to check that providing an empty list for `selected_fields` results in `list_rows` being called wth `None`.
* Add wait_for_completion option to dag run operator.
* Add wait_for_completion option to dag run operator.
* Change code format to pass sanity check.
* Simplify the logic to check dag run state.
* Move sleep in the beginning of loop and update pydoc.
* Change elif to if on checking allowed_states
Co-authored-by: Kaz Ukigai <kukigai@apple.com>
If stderr is not a TTY, rich was hard-wrapping warning messages at 80
characters:
```
/home/ash/code/airflow/airflow/airflow/configuration.py:328 DeprecationWarning:
The remote_logging option in [core] has been moved to the remote_logging option
in [logging] - the old setting has been used, but please update your config.
```
After
```
/home/ash/code/airflow/airflow/airflow/configuration.py:328 DeprecationWarning: The remote_logging option in [core] has been moved to the remote_logging option in [logging] - the old setting has been used, but please update your config.
```
`rich.print()` doesn't take a `soft_wrap` option, so I had to create a
`rich.console.Console` object -- and it seems best to cache those.
Before this commit:
```
...airflow/configuration.py:328 DeprecationWarning: The remote_logging option in has been moved to the remote_logging option in - the old setting has been used, but please update your config.
```
After this commit:
```
...airflow/configuration.py:328 DeprecationWarning: The remote_logging option in [core] has been moved to the remote_logging option in [logging] - the old setting has been used, but please update your config.
```
As this file is _always_ imported by anything in airflow, but warnings are quite rare I have
also delayed the import.
* Unify user session lifetime configuration
* align with new linting rules
* exit app when removed args are provided in conf
* add one more test
* extract stopping gunicorn to method
* add docstring to stop_webserver method
* use lazy formatting
* exit webserver when removed options are provided
* align with markdown lint
* Move unify user session lifetime configuration section to master
* add new line
* remove quotes
Resolves#12254
A bug introduced in #11815. The function that updates the button URLs was failing when trying to update the "K8s Pod Spec" which is conditionally displayed (if k8s_or_k8scelery_executor). This fix adds a check to confirm the button exists before attempting.
Core example DAGs should not depend on any non-core dependency
like providers packages.
closes: #12247
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
The inbuilt functions all() and any() in python also support
short-circuiting (evaluation stops as soon as the overall return value
of the function is known), but this behavior is lost if you use
comprehension. This affects performance.
A bad rebase in #12082 deleted this file by mistake.
This missing file was also the cause of needing the documentation
to exclude these files
Fixes#12239
This PR proposes to use custom showwarning function that
provides users with better information about warnings using
rich library to highlight the warning.
This PR Standardises the callable signatures in PythonOperator, PythonSensor, ExternalTaskSensor, SimpleHttpOperator and HttpSensor.
The callable facilities in PythonOperator have been refactored into airflow.utils.helper.make_kwargs_callable. And it's used in those other places to make them work the same way.
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
This function allows users of the k8s executor to get previews
of their tasks via the Airflow UI before they launch
Co-authored-by: Ryan Hamilton <ryan@ryanahamilton.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Since #7694 these haven't really be needed, but we hadn't removed them
yet.
No UPDATING.md note for this as I think it's extremely unlikely anyone
was using this directly -- it's very much an implementation detail
relating to DAG/SimpleDag.
* Filter dags by owner
* Seperate links for multiple owners
* Minor style change
Co-authored-by: Ryan Hamilton <ryan@ryanahamilton.com>
Co-authored-by: Ryan Hamilton <ryan@ryanahamilton.com>
The change #10806 made airflow works with implicit packages
when "airflow" got imported. This is a good change, however
it has some unforeseen consequences. The 'provider_packages'
script copy all the providers code for backports in order
to refactor them to the empty "airflow" directory in
provider_packages folder. The #10806 change turned that
empty folder in 'airflow' package because it was in the
same directory as the provider_packages scripts.
Moving the scripts to dev solves this problem.
This change makes it so that certain operations in the scheduler are
called on a regular interval, instead of only once at start up, or every
time around the loop:
- adopt_or_reset_orphaned_tasks (detecting SchedulerJobs that died) was
previously only called on start up.
- _clean_tis_without_dagrun was previously called every time around the
scheduling loop, but this isn't so needed to be done every time as
this is a relatively rare cleanup operation
- _emit_pool_metrics doesn't need to be called _every_ time around the
loop, once every 5 seconds is enough.
This uses the built in ["sched" module][sched] to handle the "timers".
[sched]: https://docs.python.org/3/library/sched.html
* Move metrics configuration to new section
* fixup! Move metrics configuration to new section
* fixup! fixup! Move metrics configuration to new section
* Apply suggestions from code review
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
* fixup! Apply suggestions from code review
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
* Add authentication to AWS with Google credentials
* fixup! Add authentication to AWS with Google credentials
* fixup! fixup! Add authentication to AWS with Google credentials
* fixup! fixup! fixup! Add authentication to AWS with Google credentials
If a user has set `[webserver] update_fab_perms = False` and runs `airflow sync-perm` command to sync all permissions, they will receive the following error:
```
webserver_1 | [2020-11-07 15:13:07,431] {decorators.py:113} WARNING - Access is Denied for: can_index on: Airflow
```
and if the user was created before and some perms were sync'd a user won't be able to find Security Menu & Configurations View
If for some reason (network blip, redis is down) if AirflowTaskTimeout is raised (controlled by `[celery] operation_timeout`) when publishing Task to the broker, Airflow will be default atleast retry 3 times to publish the messages controlled by `[celery] task_publish_max_retries`.
Hooks do not need to live under "airflow.hooks" namespace for them to
work -- so remove the ability to create them under there in plugins.
Using them as normal python imports is good enough!
We still allow them to be "registered" to support dynamically populating
the connections list in the UI (which won't be done for 2.0)
Closes#9507
It is unnecessary to use an if statement to check the maximum of two values and then assign the value to a name. Just using the max built-in is straightforward and more readable.
* Add pod_template_override to executor_config
Users will be able to override the base pod_template_file on a per-task
basis.
* change docstring
* fix doc
* fix static checks
* add description
The original code is looping in a space which could be smaller, meanwhile IF checks is not necessary.
This change aims for:
- bring MINOR performance improvement
- cleaner code
In order to further reduce intra-dag task scheduling lag we add an
optimization: when a task has just finished executing (success or
failure) we can look at the downstream tasks of just that task, and then
make scheduling decisions for those tasks there -- we've already got the
dag loaded, and we know they are likely actionable as we just finished.
We should set tasks to scheduled if we can (but no further, i.e. not to
queued, as the scheduler has to make that decision with info about the
Pool usage etc.).
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
Previously we added Retry in DagFileProcessor.process_file to
retry dagbag.sync_to_db. However, this meant that if anyone calls
dagbag.sync_to_db separately then also need to manage retrying it
by themselves. This caused failures in CI for MySQL.
resolves https://github.com/apache/airflow/issues/11543