This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.
Previously `pip download` step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:
1) PIP download was executed outside of Dockerfile in CI scripts
which means that any change to requirements there could not
be executed in 'workflow_run' event - because main branch version
of CI scripts is used there. We want to add extra requirements
when installing airflow so in order to be able to change
it, those requirements should be added in Dockerfile.
This will be done in the follow-up #13409 PR.
2) Packages downloaded with PIP download have a "file" version
rather than regular == version when you run pip freeze/check.
This looks weird and while you can figure out the version
from file name, when you `pip install` them, they look
much more normal. The airflow package and provider package
will still get the "file" form but this is ok because we are
building those packages from sources and they are not yet
available in PyPI.
Example:
adal==1.2.5
aiohttp==3.7.3
alembic==1.4.3
amqp==2.6.1
apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
...
With this PR, we do not `pip download` all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)
Seems that we are hitting more often one of the most favourite
bugs by Ash: DNS. Quote: "It's always DNS".
It looks like there is a race condition with docker compose
that causes services that started fast enough (before DNS)
to get a different reverse-DNS IP lookup (usually it is
just `<SERVICE>` but sometimes it is
`<DOCKER_COMPOSE_APP>_<SERVICE>_1_<NETWORK>`).
This produces misleading messages in log that might
make analysis of such problems difficult, that's why
we chose to get rid of the reverse lookup and give
bigger time for each service to check if it is ready.
Netcat, unfortunately performs both forward and reverse
lookup when given a name - forward lookup to find the
IP address and reverse lookup to write information to the
log about the host it connected to - and if it sees
that the original and reverse-looked-up names do not match
even if it manages to connect, it retunrs an error:
`DNS fwd/rev mismatch` - which is very misleading.
This change performs the following:
1) We lookup the host name in python via gethostbyname
2) We set -n in netcat to disable ANY DNS use
3) We feed netcat with the IP address
4) We've standardized all waiting times to be up to 50 seconds
This way we should get rid of the DNS fwd/rev mismatch once
and for all.
* Set minimum SQLite version supported.
Some users reported that some older versions of SQLite do not
work with Airflow 2.0. This happens for example with latest
sqlite available by default on RHEL7 (sqlite version available
in fully updated system there is 7 (!) years old)
Example of such issue: #13397.
Not sure which 'minimum' version is supported but
in the Breeze environment based on debian buster we have
3.27.2 version in fully updated system. This shoudl be our
baseline.
* Update README.md
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
* links were old / dead
* run_duration was removed from scheduler
* clarify related notes on backward compat in helm chart values
Co-authored-by: Daniel Standish <dstandish@techstyle.com>
Previous change to add persist-credentials #13389 wrongly added
persists-credentials to python-setup rather than checkout
action. Also one of the checkout actions used master rather than
v2 tag.
This PR disables persisting credentials in Github Actions checkout.
This is a result of discussion in builds@apache.orghttps://lists.apache.org/thread.html/r435c45dfc28ec74e28314aa9db8a216a2b45ff7f27b15932035d3f65%40%3Cbuilds.apache.org%3E
It turns out that contrary to the documentation actios (specifically
checkout action) can use GITHUB_TOKEN without specifying it as
input in the yaml file and the GitHub checkout action
leaves the repository with credentials stored locally that
enable pushing to Github Repository by any step in the same
job. This was thought to be forbidden initially (and the
documentation clearly says that the action must have the
GITHUB_TOKEN passed to it in .yaml workflow in order to
use it). But apparently it behaves differently.
This leaves open an attack vector where for example
any PIP package installed in the following steps could push
any changes to GitHub Repository of Apache Airflow.
Security incidents have been reported to both GitHub and
Apache Security team, but in the meantime we add configuration
to remove credentials after checkout step.
https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#using-the-github_token-in-a-workflow
> Using the GITHUB_TOKEN in a workflow
> To use the GITHUB_TOKEN secret, you *must* reference it in your workflow
file. Using a token might include passing the token as an input to an
action that requires it, or making authenticated GitHub API calls.
Previously UPGRADE_TO_LATEST_CONSTRAINTS variable controlled
whether the CI image uses latest dependencies rather than
fixed constraints. This PR brings it also to PROD image.
The name of the ARG is changed to UPGRADE_TO_NEWER_DEPENDENCIES
as this corresponds better with the intention.
This is a complete refactor of the setup.py providers/dependencies.
It much better reflects the current setup where we have most of
the extras 1-1 reflecting providers but also some extras that do
not have their own providers.
The pre-commits that were verifying setup versus documentation
can now be vastly simplified (no more need to parse the
comments so we can import setup.py variables directly rather
than parse it via regexps. Also we can better categorize the
extras - separate out (and verify) whether we correctly
described deprecated extras and to mark extras that install
additional providers as such.
Fixes: #13309
It seems that for quite some time (1.10.4) the "ldap" extra
missed python-ldap dependency.
https://issues.apache.org/jira/browse/AIRFLOW-5261
Also LDAP seems to be popular enough to be added as default
extra in the production image.
Fixes#13306
The recently added log groupping hides error messages in case
there is an error in tests. You need to manually unfold last test
step which is somewhate hidden - it is followed by several
'dump-container' logs.
This change adds clear error message showing the exact log
group that you need to unfold in case you want to look for
a problem.
The PROD image is now verified by several checks:
* whether all expected providers are installed
* whether pip-check shows no conflicts
* whether imports are working for expected features
Part of #13315