* Add podOverride setting for KubernetesExecutor
Users of the KubernetesExecutor will now have a "podOverride"
option in the executor_config. This option will allow users to
modify the pod launched by the KubernetesExecutor using a
`kubernetes.client.models.V1Pod` class. This is the first step
in deprecating the tradition executor_config.
* Fix k8s tests
* fix docs
The change from #10769 accidentally switched Integration tests
into far-longer run unit tests (we effectively run the tests
twice and did not run integration tests.
This fixes the problem by removing readonly status from
INTEGRATIONS and only setting it after the integrations are
set.
Until pre-commit implements export of all configured
checks, we need to maintain the list manually updated.
We check both - pre-commit list in breeze-complete and
descriptions in STATIC_CODE_CHECKS.rst
The region parameter is required for some of Google Dataproc operators
and it should be provided by users to avoid creating data-intensive
tasks in any default location.
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.
There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.
Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.
Or simply those tests are more stable when run isolation.
We might still add failing tests back as soon we see them behave
in a flaky way.
The remaining quarantined tests that need to be fixed:
* test_local_run (often hangs the build)
* test_retry_handling_job
* test_clear_multiple_external_task_marker
* test_should_force_kill_process
* test_change_state_for_tis_without_dagrun
* test_cli_webserver_background
We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
* TestImpersonation tests
We might find that those heisentest can be fixed but for
now we are going to run them in isolation.
Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
When rebuildig the image during commit, kill command failed to
find the spinner job to kill (this is just preventive measure)
and failed the rebuild step in pre-commit.
This is now fixed.
DataprocCreateCluster requires now:
- cluster config
- cluster name
- project id
In this way users don't have to pass project_id two times
(in cluster definition and as parameter). The cluster object
is built in create_cluster hook method
The constants were initialised after the readonly status was set
for the constants in the test script.
This was mainly about default values for those consttants (but this
has already been handled by the _script_init.sh but more importantly
the INTEGRATIONS were not properly initialized that cause skipping of
some integration tests.
The docker(), helm(), kubectl() functions replace the real tools
to get verbose behaviour (we can print the exact command being
executed for those. But when 'set +e' was set before the command
was called - indicating that error in those functions should be
ignored - this did not happen. The functions set 'set -e' just
before returning the non-zero value, effectively exiting the
script right after. This caused first time experience to be not
good.
The fix also fixes behaviour of stdout and stderr for those
functions - previously they were joined to be able to be
printed to OUTPUT_FILE but this lost the stderr/stdout
distinction. Now both stdout and stderr are printed to the
output file but they are also redirected to stdout/stderr
respectively, so that 2>/dev/null works as expected.
While fixing it, it turned out that one of the remove_images
methods was not used any more - merged it with the breeze one.
The hadolint check only checked the "main dir" Dockerfile
but we have more of them now. All of them are now checked.
The following problems are fixed:
* DL3000 Use absolute WORKDIR
* DL4000 MAINTAINER is deprecated
* DL4006 Set the SHELL option -o pipefail before RUN with a pipe in it.
* SC2046 Quote this to prevent word splitting.
The followiing problems are ignored:
* DL3018 Pin versions in apk add. Instead of `apk add <package>` use `apk add
<package>=<version>`
If the pod restarts before the sleep time is over, the trim command will not run. I think it's better if we reorder the commands to execute the delete and then go to sleep. At the moment sleep is every 15 mins but people will just increase the EVERY line if they want longer sleep time and can encounter this bug.