* Fix error with quick-failing tasks in KubernetesPodOperator
Addresses an issue with the KubernetesPodOperator where tasks that die
quickly are not patched with "already_checked" because they never make
it to the monitoring logic.
* static fix
This is the same fix as in #12461, but we didn't notice it as the tests
failed after 50 failures.
It also turns out that the k8s API doesn't take a V1NodeSelector and instead
just takes a dict.
Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
* Make the KubernetesPodOperator backwards compatible
This PR significantly reduces the pain of upgrading to Airflow 2.0
for users of the KubernetesPodOperator. Users will be allowed to
continue using the airflow.kubernetes custom classes
* spellcheck
* spelling
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
* clean up unecessary files in 1.10
* Fix full_pod_spec for k8spodoperator
Fixes a bug where the `full_pod_spec` argument is never factored
into the kubernetespodoperator. The new order of operations is as
follows:
1. Check to see if there is a pod_template_file and if so create the initial pod, else start with empty pod
2. if there is a full_pod_spec , reconcile the pod_template_file pod and the full_pod_spec pod
3. reconcile with any of the argument overrides
* add tests
The K9s is fantastic tool that helps to debug a running k8s
instance. It is terminal-based windowed CLI that makes you
several times more productive comparing to using kubectl
commands. We've integrated k9s (it is run as a docker container
and downloaded on demand). We've also separated out KUBECONFIG
of the integrated kind cluster so that it does not mess with
kubernetes configuration you might already have.
Also - together with that the "surrounding" of the kubernetes
tests were simplified and improved so that the k9s integration
can be utilized well. Instead of kubectl port forwarding (which
caused multitude of problems) we are now utilizing kind's
portMapping feature + custom NodePort resource that maps
port 8080 to 30007 NodePort which in turn maps it to 8080
port of the Webserver. This way we do not have to establish
an external kubectl port forward which is prone to error and
management - everything is brought up when Airflow gets
deployed to the Kind Cluster and shuts down when the Kind
cluster is stopped.
Yet another problem fixed was killing of postgres by one of the
kubernetes tests ('test_integration_run_dag_with_scheduler_failure').
Instead of just killing the scheduler it killed all pods - including
the Postgres one (it was named 'airflow-postgres.*'). That caused
various problems, as the database could be left in a strange state.
I changed the tests to do what it claimed was doing - so killing only the
scheduler during the test. This seemed to improve the stability
of tests immensely in my local setup.
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.
* KubernetesPodOperator can retry log tailing in case of interruption
* fix failing test
* change read_pod_logs method formatting
* KubernetesPodOperator retry log tailing based on last read log timestamp
* fix test_parse_log_line test formatting
* add docstring to parse_log_line method
* fix kubernetes integration test
* Allow overrides for pod_template_file
A pod_template_file should be treated as a *template* not a steadfast
rule.
This PR ensures that users can override individual values set by the
pod_template_file s.t. the same file can be used for multiple tasks.
* fix podtemplatetest
* fix name
* Simplify Airflow on Kubernetes Story
Removes thousands of lines of code that essentially ammount to us
re-creating the Kubernetes API. Will offer a faster, simpler
KubernetesExecutor for 2.0
* Fix podgen tests
* fix documentation
* simplify validate function
* @mik-laj comments
* spellcheck
* spellcheck
* Update airflow/executors/kubernetes_executor.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Ensure that K8sPodOperator can pull namespace from pod_template_file
Fixes a bug where users who run K8sPodOperator could not run because
the operator was expecting a namespace parameter
* add test
* self.pod
* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
* don't create pod until run
* spellcheck
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
* Make grace_period_seconds option on K8sPodOperator
This PR allows users to choose whether they want to gracefully kill
pods when they delete tasks in the UI or if they would like to
immediately kill them.
* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
This PR ensures that when a user kills a KubernetesPodOperator task
in the airflow UI, that the associated pod is also killed using the
on_kill method.
We have already fixed a lot of problems that were marked
with those, also IntelluiJ gotten a bit smarter on not
detecting false positives as well as understand more
pylint annotation. Wherever the problem remained
we replaced it with # noqa comments - as it is
also well understood by IntelliJ.
All Pod Operator tests were using the same task id which made it
flaky as sometimes failed tests were still being deleted while
subsequent success tests already started. That lead to random 404
(not found) errors when the failed test was picked up first.
The Kubernetes tests are now run using Helm chart
rather than the custom templates we used to have.
The Helm Chart uses locally build production image
so the tests are testing not only Airflow but also
Helm Chart and a Production image - all at the
same time. Later on we will add more tests
covering more functionalities of both Helm Chart
and Production Image. This is the first step to
get all of those bundle together and become
testable.
This change introduces also 'shell' sub-command
for Breeze's kind-cluster command and
EMBEDDED_DAGS build args for production image -
both of them useful to run the Kubernetes tests
more easily - without building two images
and with an easy-to-iterate-over-tests
shell command - which works without any
other development environment.
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Tests requiring Kubernetes Cluster are now moved out of
the regular CI tests and moved to "kubernetes_tests" folder
so that they can be run entirely on host without having
the CI image built at all. They use production image
to run the tests on KinD cluster and we add tooling
to start/stop/deploy the application to the KinD cluster
automatically - for both CI testing and local development.
This is a pre-requisite to convert the tests to convert the
tests to use the official Helm Chart and Docker images or
Apache Airflow.
It closes#8782