We already change Airflow DAG default owner to 'airflow'
in https://github.com/apache/airflow/pull/4151 but some
of our example DAGs and docs are still using
owner = 'Airflow', This patch to unify them
It is:
- quicker to install
- easier to get repeatable results
- Takes up less space (130MB/15k files vs 190MB/23k files)
- nicer to user (has better help)
With this change you should be able to simply run `pytest` to run all the tests in the main airflow directory.
This consist of two changes:
* moving pytest.ini to the main airflow directory
* skipping collecting kubernetes tests when ENV != kubernetes
There are cyclic imports detected seemingly randomly by pylint checks when some
of the PRs are run in CI
It was not deterministic because pylint usually uses as many processors as
many are available and it splits the list of .py files between the separate
pylint processors - depending on how the split is done, pylint check might
or might not detect it. The cycle is always detected when all files are used.
In order to make it more deterministic, all pylint and mypy errors were resolved
in all executors package and in dag_processor.
At the same time plugins_manager had also been moved out of the executors
and all of the operators/hooks/sensors/macros because it was also causing
cyclic dependencies and it's far easier to untangle those dependencies
in executor when we move the intialisation of all plugins to plugins_manager.
Additionally require_serial is set in pre-commit configuration to
make sure cycle detection is deterministic.
* [AIRFLOW-6142] Fix different local/Travis pylint results
Sometimes Pylint on Travis CI gives still different results than the one run
locally. This was happening because we were using the
AIRFLOW_MOUNT_SOURCE_DIR_FOR_STATIC_CHECKS="true" for static checks. This is
needed for checklicence check only - just to make sure that all source files
(including scripts etc.) are mounted to the container.
However this makes it slightly different when it comes to pylint checks. We
would like to have it exactly identical when run locally and in CI so in case
of static checks we should rather use
AIRFLOW_MOUNT_HOST_VOLUMES_FOR_STATIC_CHECKS="true" for all checks but the
Checklicence one - same as used locally.
This way running:
pre-commit run pylint --all-files
Should always give the same results locally and in Travis.
* Update scripts/ci/_utils.sh
Co-Authored-By: Felix Uellendall <feluelle@users.noreply.github.com>
- `security_context` was missing from docs of `KubernetesPodOperator`
- `KubernetesPodOperator` kwarg `in_cluster` erroneously defaults to
False in comparison to `default_args.py`, also default `do_xcom_push`
was overwritten to False in contradiction to `BaseOperator`
- `KubernetesPodOperator` kwarg `resources` is erroneously passed to
`base_operator`, instead should only go to `PodGenerator`. The two
have different syntax. (both on `master` and `v1-10-test` branches)
- `kubernetes/pod.py`: classes do not have `__slots__`
so they would accept arbitrary values in `setattr`
- Reduce amount of times the pod object is copied before execution
The list of tests for autocomplete is now generated automatically when you enter Breeze.
It will take some 40 seconds or so to generate the list and until it's done there are
no autocompletions but they appear right after the list is ready.
* [AIRFLOW-5704] Improve Kind Kubernetes scripts for local testing
* Fixed problem that Kubernetes tests were testing latest master
rather than what came from the local sources.
* Moved Kubernetes scripts to 'in_container' dir where they belong now
* Kubernetes tests are now better suited for running locally
* Kubernetes cluster is not deleted until environment is stopped
* Kubernetes image is built outside of the container and passed as .tar
* Kubectl version name is corrected in the Dockerfile
* Kubernetes Version can be used to select Kubernetes versio
* Running kubernetes scripts is now easy in Breeze
* Instructions on how to run Kubernetes tests are updated
* Better flags in Breeze are used to run Kubernetes environment/tests
* The old "bare" environment is replaced by --no-deps switch
This change is a further step of simplifying the set of scripts
used by CI. The separate checklicence image was implemented as an
optimisation of the licence check time. The image to download was
small and could be downloaded slightly faster in CI. However that
made all the management script more complex and lead to having
separate jobs for check licence and static checks. That lead to
actually longer time of Travis jobs - because new machine had to
be spun-off for checklicence check only.
With this change, the CI image is the only one left and it is slightly
bigger (with RAT tool added) but the same image is used for all the
tests - unit tests, static checks and checklicence checks.
This also makes it easier to manage the images and decreases update
overhead on the developers using Breeze.
The slim image gave only very small gain on executing the tests in CI. The
image was significantly smaller, but then for local development and testing
you needed both full CI and SLIM-CI image.
This made the scripts and docker image needlessly complex - especially
in the wake of coming Production image it turned to be premature
optimisation really. While it sped-up (slightly - by 10-20 seconds) some
static check jobs in Travis, it increased time needed by developers
to have a working environment and to keep it updated every time it was
needed (by minutes)
Also having two separately managed images made it rather complex to join
some of the Travis CI jobs (there is a follow-up change with getting rid
of Checklicence image).
With this change both static checks and tests are executed using single
image. That also opens doors for further simplification of the scripts
and easier implementation of production image.
This is needed to keep breeze --help in sync with the documentation.
It makes it easier for the follow-up changes needed for production
image to keep the docs in sync with the code.
All files are mounted in CI now and checked using the RAT tool.
As opposed to only the runtime-needed files. This is enabled for CI
build only as mounting all local files to Docker (especially on Mac)
has big performance penalty when running the checks (slow osxfs
volume and thousands of small node_modules files generated make the
check runs for a number of minutes). The RAT checks will by default
use the selective volumes but on CI they will mount the whole
source directory.
Also latest version of RAT tool is used now and the output - list
of checked files - is additionally printed as output of the RAT
check so that we are sure the files we expect to be there, are
actually verified.
This PR reimplements Kubernetes integration testing using kind,
a tool for running local Kubernetes clusters using Docker container
"nodes". The "nodes" are deployed to a separate docker daemon
(dind) started through docker-compose.
replace the default unload_option 'PARALLEL OFF' by 'HEADER' as Redshift handles
header with parallel mode now
Co-authored-by: Shreya Chakraborty <shrechak@users.noreply.github.com>
The `aws_default` by default specifies the `region_name` to be
`us-east-1` in its `extra` field. This causes trouble when the desired
AWS account uses a different region as this default value has priority
over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
gets passed directly to `botocore` and does not seem to be documented.
This commit removes the default region name from the `aws_default`'s
extra field. This means that it will have to be set manually, which
would follow the "explicit is better than implicit" philosophy.
This change optimises further image building and removes unnecessary
verbosity in building the images for CI builds.
After this change is merged, only the necessary images are built for
each type of check:
* Tests -> only CI
* Static checks (with/without pylint) -> Only CI_SLIM
* Docs -> only CI_SLIM
* Licence checks -> Only CHECKLICENCE
Previously the right images only were built in ci_before_install.sh
but then in case of static checks, the pre-commit build image step
also rebuilt CHECKLICENCE and CI images - which was not necessary
and very long in case of CRON job - this caused the CRON job to
fail at 10m of inactivity.
* [AIRFLOW-5147] extended character set for for k8s worker pods annotations
* updated UPDATING.md with new breaking changes
* excluded pylint too-many-statement check from constructor due to its nature
This commit adds full interactivity to pre-commits. Whenever you run pre-commit
and it detects that the image should be rebuild, an interactive question will
pop up instead of failing the build and asking to rebuild with REBUILD=yes
This is much nicer from the user perspective. You can choose whether to:
1) Rebuild the image (which will take some time)
2) Not rebuild the image (this will use the old image with hope it's OK)
3) Quit.
Answer to that question is carried across all images needed to rebuild.
There is the special "build" pre-commit hook that takes care about that.
Note that this interactive question cannot be asked if you run only
single pre-commit hook with Dockerfile because it can run multiple processes
and you can start building in parallel. This is not desired so instead we fail
such builds.
This is needed so that you can easily kill such checks with ^C
Not doing it might cause your docker containers run for a long time
and take precious resources.
We have fairly complex python version detection in our CI scripts.
They have to handle several cases:
1) Running builds on DockerHub (we cannot pass different environment
variables there, so we detect python version based on the image
name being build (airflow:master-python3.7 -> PYTHON_VERSION=3.7)
2) Running builds on Travis CI. We use python version determined
from default python3 version available on the path. This way we
do not have to specify PYTHON_VERSION separately in each job,
we just specify which host python version is used for that job.
This makes a nice UI experience where you see python version in
Travis UI.
3) Running builds locally via scripts where we can pass PYTHON_VERSION
as environment variable.
4) Running builds locally for the first time with Breeze. By default
we determine the version based on default python3 version we have
in the host system (3.5, 3.6 or 3.7) and we use this one.
5) Selecting python version with Breeze's --python switch. This will
override python version but it will also store the last used version
of python in .build directory so that it is automatically used next
time.
This change adds necessary explanations to the code that works for
all the cases and fixes some of the edge-cases we had. It also
extracts the code to common directory.
1. Issue old conf method deprecation warnings properly and remove current old conf method usages.
2. Unify the way to use conf as `from airflow.configuration import conf`
- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.
In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.
This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
TRAVIS_BRANCH is set to TAG when TAG build runs. We should alwayss
use branch and we already have our current branch in
hooks/_default_branch.sh and we can use it from there.
This seems to be the only way as TRAVIS does not pass the branch
in any variable - mainly because we do not know what branch we
are in when building a TAG build
The latest python will only be pulled by DockerHub when building
master/v1-10-test - which means that it will eventually catch
up with the latest python security releases but it will not
slow down the CI builds.
Most of the values I've removed here are the current defaults, so we
don't need to specify them again.
The reason I am removing them is that `email_backend` of
`airflow.utils.send_email_smtp` has been incorrect since 1.7.2(!) but
hasn't mattered until #5379 somehow triggered it. By removing the
default values it should make it easier to update in future.
Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
This change fixes autodoc generated documentation problems but also
leaves generated .rst files in _api folder so that it is easier to
debug and fix problems like that in the future.