The `aws_default` by default specifies the `region_name` to be
`us-east-1` in its `extra` field. This causes trouble when the desired
AWS account uses a different region as this default value has priority
over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
gets passed directly to `botocore` and does not seem to be documented.
This commit removes the default region name from the `aws_default`'s
extra field. This means that it will have to be set manually, which
would follow the "explicit is better than implicit" philosophy.
This change optimises further image building and removes unnecessary
verbosity in building the images for CI builds.
After this change is merged, only the necessary images are built for
each type of check:
* Tests -> only CI
* Static checks (with/without pylint) -> Only CI_SLIM
* Docs -> only CI_SLIM
* Licence checks -> Only CHECKLICENCE
Previously the right images only were built in ci_before_install.sh
but then in case of static checks, the pre-commit build image step
also rebuilt CHECKLICENCE and CI images - which was not necessary
and very long in case of CRON job - this caused the CRON job to
fail at 10m of inactivity.
* [AIRFLOW-5147] extended character set for for k8s worker pods annotations
* updated UPDATING.md with new breaking changes
* excluded pylint too-many-statement check from constructor due to its nature
This commit adds full interactivity to pre-commits. Whenever you run pre-commit
and it detects that the image should be rebuild, an interactive question will
pop up instead of failing the build and asking to rebuild with REBUILD=yes
This is much nicer from the user perspective. You can choose whether to:
1) Rebuild the image (which will take some time)
2) Not rebuild the image (this will use the old image with hope it's OK)
3) Quit.
Answer to that question is carried across all images needed to rebuild.
There is the special "build" pre-commit hook that takes care about that.
Note that this interactive question cannot be asked if you run only
single pre-commit hook with Dockerfile because it can run multiple processes
and you can start building in parallel. This is not desired so instead we fail
such builds.
This is needed so that you can easily kill such checks with ^C
Not doing it might cause your docker containers run for a long time
and take precious resources.
We have fairly complex python version detection in our CI scripts.
They have to handle several cases:
1) Running builds on DockerHub (we cannot pass different environment
variables there, so we detect python version based on the image
name being build (airflow:master-python3.7 -> PYTHON_VERSION=3.7)
2) Running builds on Travis CI. We use python version determined
from default python3 version available on the path. This way we
do not have to specify PYTHON_VERSION separately in each job,
we just specify which host python version is used for that job.
This makes a nice UI experience where you see python version in
Travis UI.
3) Running builds locally via scripts where we can pass PYTHON_VERSION
as environment variable.
4) Running builds locally for the first time with Breeze. By default
we determine the version based on default python3 version we have
in the host system (3.5, 3.6 or 3.7) and we use this one.
5) Selecting python version with Breeze's --python switch. This will
override python version but it will also store the last used version
of python in .build directory so that it is automatically used next
time.
This change adds necessary explanations to the code that works for
all the cases and fixes some of the edge-cases we had. It also
extracts the code to common directory.
1. Issue old conf method deprecation warnings properly and remove current old conf method usages.
2. Unify the way to use conf as `from airflow.configuration import conf`
- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.
In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.
This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
TRAVIS_BRANCH is set to TAG when TAG build runs. We should alwayss
use branch and we already have our current branch in
hooks/_default_branch.sh and we can use it from there.
This seems to be the only way as TRAVIS does not pass the branch
in any variable - mainly because we do not know what branch we
are in when building a TAG build
The latest python will only be pulled by DockerHub when building
master/v1-10-test - which means that it will eventually catch
up with the latest python security releases but it will not
slow down the CI builds.
Most of the values I've removed here are the current defaults, so we
don't need to specify them again.
The reason I am removing them is that `email_backend` of
`airflow.utils.send_email_smtp` has been incorrect since 1.7.2(!) but
hasn't mattered until #5379 somehow triggered it. By removing the
default values it should make it easier to update in future.
Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
This change fixes autodoc generated documentation problems but also
leaves generated .rst files in _api folder so that it is easier to
debug and fix problems like that in the future.
When using potentially larger offets than javascript can handle, they can get parsed incorrectly on the client, resulting in the offset query getting stuck on a certain number. This patch ensures that we return a string to the client to avoid being parsed. When we run the query, we ensure the offset is set as an integer.
Add unnecesary prefix_ in config for elastic search section
* Move k8s executor from contrib folder
Considering that the k8s executor is now fully supported by core
committers, we should move it from contrib to the primary executor
directory.
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:
- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME
Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.
This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.
There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module
(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)
Something about the tests or how we run them changed and we ended up
with a lot more lines appearing in the output, taking us over Travis' "will
display in the UI" limit, making it harder to debug failures. This isn't a long
term fix, but improves things while we fix the tests for the better.
Newer versions of Kube return "failed" events for the side car container
when the ^C causes the python process to exit with 1
Kube 1.13 runs a different number of kube-dns pods (2 by default, 1.9
and 1.10 ran only 1) so the setup scripts needed changing a little bit.
To get a Kube 1.13 cluster I had to upgrade minikube, and it no longer
works on a dist without systemd installed (#systemdsucks) so I had to
update the travis dist to xenial which is no bad thing!
This version of minikube doesn't need the localkube bootstrapper set
anymore, it handles driver=none much more gracefully, and some of the
permissions set up for context files/keys needed to be updated.
* Support setting global k8s affinity and toleration configuration in the airflow config file.
* Copy annotations as dict, not list
* Update airflow/contrib/kubernetes/pod.py
Co-Authored-By: kppullin <kevin.pullin@gmail.com>
To help move away from Minikube, we need to remove the dependency on
a local docker registry and move towards a solution that can be used
in any kubernetes cluster. Custom image names allow users to use
systems like docker, artifactory and gcr
When running integration tests on a k8s cluster vs. Minikube
I discovered that we were actually using an invalid permission
structure for our persistent volume. This commit fixes that.
* Refactor Kubernetes operator with git-sync
Currently the implementation of git-sync is broken because:
- git-sync clones the repository in /tmp and not in airflow-dags volume
- git-sync add a link to point to the revision required but it is not
taken into account in AIRFLOW__CORE__DAGS_FOLDER
Dags/logs hostPath volume has been added (needed if airflow run in
kubernetes in local environment)
To avoid false positive in CI `load_examples` is set to `False`
otherwise DAGs from `airflow/example_dags` are always loaded. In this
way is possible to test `import` in DAGs
Remove `worker_dags_folder` config:
`worker_dags_folder` is redundant and can lead to confusion.
In WorkerConfiguration `self.kube_config.dags_folder` defines the path of
the dags and can be set in the worker using airflow_configmap
Refactor worker_configuration.py
Use a docker container to run setup.py
Compile web assets
Fix codecov application path
* Fix kube_config.dags_in_image
* Read `dags_in_image` config value as a boolean
This PR is a minor fix for #3683
The dags_in_image config value is read as a string. However, the existing code expects this to be a boolean.
For example, in worker_configuration.py there is the statement: if not self.kube_config.dags_in_image:
Since the value is a non-empty string ('False') and not a boolean, this evaluates to true (since non-empty strings are truthy)
and skips the logic to add the dags_volume_claim volume mount.
This results in the CI tests failing because the dag volume is missing in the k8s pod definition.
This PR reads the dags_in_image using the conf.getboolean to fix this error.
Rebased on 457ad83e4e, before the previous
dags_in_image commit was reverted.
* Revert "Revert [AIRFLOW-2770] [AIRFLOW-3505] (#4318)"
This reverts commit 77c368fd22.
* Revert "[AIRFLOW-3505] replace 'dags_in_docker' with 'dags_in_image' (#4311)"
This reverts commit 457ad83e4e.
* Revert "[AIRFLOW-2770] kubernetes: add support for dag folder in the docker image (#3683)"
This reverts commit e9a09d408e.
Password stay None value and not None (str) in case there is no password set through webadmin interfaces.
This is fix for connections for Redis that not expect autorisation from clients.
The current `airflow flower` doesn't come with any authentication.
This may make essential information exposed in an untrusted environment.
This commit add support to HTTP basic authentication for Airflow Flower
Ref:
https://flower.readthedocs.io/en/latest/auth.html
This adds ASF license headers to all the .rst and .md files with the
exception of the Pull Request template (as that is included verbatim
when opening a Pull Request on Github which would be messy)
* [AIRFLOW-3178] Don't mask defaults() function from ConfigParser
ConfigParser (the base class for AirflowConfigParser) expects defaults()
to be a function - so when we re-assign it to be a property some of the
methods from ConfigParser no longer work.
* [AIRFLOW-3178] Correctly escape percent signs when creating temp config
Otherwise we have a problem when we come to use those values.
* [AIRFLOW-3178] Use os.chmod instead of shelling out
There's no need to run another process for a built in Python function.
This also removes a possible race condition that would make temporary
config file be readable by more than the airflow or run-as user
The exact behaviour would depend on the umask we run under, and the
primary group of our user, likely this would mean the file was readably
by members of the airflow group (which in most cases would be just the
airflow user). To remove any such possibility we chmod the file
before we write to it
- Update outdated cli command to create user
- Remove `airflow/example_dags_kubernetes` as the dag already exists in `contrib/example_dags/`
- Update the path to copy K8s dags
The recent update to the CI image changed the default
python from python2 to python3. The PythonVirtualenvOperator
tests expected python2 as default and fail due to
serialisation errors.
One of the things for tests is being self contained. This means that
it should not depend on anything external, such as loading data.
This PR will use the setUp and tearDown to load the data into MySQL
and remove it afterwards. This removes the actual bash mysql commands
and will make it easier to dockerize the whole testsuite in the future
The current dockerised CI pipeline doesn't run minikube and the
Kubernetes integration tests. This starts a Kubernetes cluster
using minikube and runs k8s integration tests using docker-compose.
- Add missing variables and use codecov instead of coveralls.
The issue why it wasn't working was because missing environment variables.
The codecov library heavily depends on the environment variables in
the CI to determine how to push the reports to codecov.
- Remove the explicit passing of the variables in the `tox.ini`
since it is already done in the `docker-compose.yml`,
having to maintain this at two places makes it brittle.
- Removed the empty Codecov yml since codecov was complaining that
it was unable to parse it
Airflow tests depend on many external services and other custom setup,
which makes it hard for contributors to work on this codebase. CI
builds have also been unreliable, and it is hard to reproduce the
causes. Having contributors trying to emulate the build environment
every time makes it easier to get to an "it works on my machine" sort
of situation.
This implements a dockerised version of the current build pipeline.
This setup has a few advantages:
* TravisCI tests are reproducible locally
* The same build setup can be used to create a local development environment
- Dictionary creation should be written by dictionary literal
- Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.
- Functions calling sets which can be replaced by set literal are now replaced by set literal
- Replace list literals
- Some of the static methods haven't been set static
- Remove redundant parentheses
Fix scripts/ci/kubernetes/minikube/start_minikube.sh
as follows:
- Make minikube version configurable via
environment variable
- Remove unused variables for readability
- Reorder some lines to remove warnings
- Replace ineffective `return` with `exit`
- Add -E to `sudo minikube` so that non-root
users can use this script locally
By default one of Apache Airflow's dependencies pulls in a GPL
library. Airflow should not install (and upgrade) without an explicit choice.
This is part of the Apache requirements as we cannot depend on Category X
software.
* Updates the GCP hooks to use the google-auth
library and removes
dependencies on the deprecated oauth2client
package.
* Removes inconsistent handling of the scope
parameter for different
auth methods.
Note: using google-auth for credentials requires a
newer version of the
google-api-python-client package, so this commit
also updates the
minimum version for that.
To avoid some annoying warnings about the
discovery cache not being
supported, so disable the discovery cache
explicitly as recommend here:
https://stackoverflow.com/a/44518587/101923
Tested by running:
nosetests
tests/contrib/operators/test_dataflow_operator.py
\
tests/contrib/operators/test_gcs*.py \
tests/contrib/operators/test_mlengine_*.py \
tests/contrib/operators/test_pubsub_operator.py \
tests/contrib/hooks/test_gcp*.py \
tests/contrib/hooks/test_gcs_hook.py \
tests/contrib/hooks/test_bigquery_hook.py
and also tested by running some GCP-related DAGs
locally, such as the
Dataproc DAG example at
https://cloud.google.com/composer/docs/quickstartCloses#3488 from tswast/google-auth
Add lineage support by having inlets and oulets
that
are made available to dependent upstream or
downstream
tasks.
If configured to do so can send lineage data to a
backend. Apache Atlas is supported out of the box.
Closes#3321 from bolkedebruin/lineage_exp
[AIRFLOW-2424] Add dagrun status endpoint and
increase k8s test coverage
[AIRFLOW-2424] Added minikube fixes by @kimoonkim
[AIRFLOW-2424] modify endpoint to remove 'status'
Closes#3320 from dimberman/add-kubernetes-test
[AIRFLOW-1899] Add full deployment
- Made home directory configurable
- Documentation fix
- Add licenses
[AIRFLOW-1899] Tests for the Kubernetes Executor
Add an integration test for the Kubernetes
executor. Done by
spinning up different versions of kubernetes and
run a DAG
by invoking the REST API
Closes#3301 from Fokko/fix-kubernetes-executor
The logs are kept inside of the worker pod. By
attaching a persistent
disk we keep the logs and make them available for
the webserver.
- Remove the requirements.txt since we dont want
to maintain another
dependency file
- Fix some small casing stuff
- Removed some unused code
- Add missing shebang lines
- Started on some docs
- Fixed the logging
Closes#3252 from Fokko/airflow-2357-pd-for-logs
Handle too old resource versions and throw exceptions on errors
- K8s API errors will now throw Airflow exceptions
- Add scheduler uuid to worker pod labels to match the two
* Added in executor_config to the task_instance table and the base_operator table
* Fix test; bump up number of examples
* Fix up comments from PR
* Exclude the kubernetes example dag from a test
* Fix dict -> KubernetesExecutorConfig
* fixed up executor_config comment and type hint
Add kubernetes config section in airflow.cfg and Inject GCP secrets upon executor start. (#17)
Update Airflow to Pass configuration to k8s containers, add some Py3 … (#9)
* Update Airflow to Pass configuration to k8s containers, add some Py3 compat., create git-sync pod
* Undo changes to display-source config setter for to_dict
* WIP Secrets and Configmaps
* Improve secrets support for multiple secrets. Add support for registry secrets. Add support for RBAC service accounts.
* Swap order of variables, overlooked very basic issue
* Secret env var names must be upper
* Update logging
* Revert spothero test code in setup.py
* WIP Fix tests
* Worker should be using local executor
* Consolidate worker setup and address code review comments
* reconfigure airflow script to use new secrets method
The python-cloudant release 2.8 is broken and
causes our CI to fail.
In the setup.py we install cloudant version <2.0
and in our CI pipeline
we install the latest version.
Closes#3051 from Fokko/fd-fix-cloudant
sla_miss and task_instances cannot have NULL
execution_dates. The timezone
migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.
Closes#2969, #2857Closes#2979 from bolkedebruin/AIRFLOW-1895