Airflow and provider packages need to be installed together to
make sure that constrainst are taken into account and that airflow
does not get reinstalled from PyPI when eager upgrade runs.
(cherry picked from commit bc6f5ea088)
In the latest change #13422 change in the way product images are
prepared removed extras from installed airflow - thus caused
failing production image verification check.
This change restores extras when airflow is installed from packages
(cherry picked from commit 3a731108f5)
Curl has a sophisticated back-off mechanism when trying to connect
and it causes sometimes that it hangs for a very long time
when first few attempts to connect failed with a 'soft' error.
Similarly, when curl starts transfer after connecting but the
other party hanged, the client curl call might hang as well.
This causes various problems for example sometimes waitig for
images in the ci build gets cancelled because curl command
to check for image fails - example:
https://github.com/apache/airflow/pull/13413/checks?check_run_id=1635401914
This change adds appropriate timeouts to all curl commands we
use in CI/manual operations. In many cases we implemented
retry so the effect will be that those cases will stop happening
but even in no-retry case, failing curl is better than hangs.
(cherry picked from commit 0909ddfd24)
When no airflow files change, selective tests only run basic
tests, but this is wrong, because many of .py files are
outside of the airflow folder.
In this case we should enable image building because only then
full set of static checks is executed.
This bug caused for example #13403 to succeed even if it failed
static checks after merge.
(cherry picked from commit 1fe83a435d)
This fixes a failing pylint error introduced in #13403. This error
also trigger another pylint problem involved with c-extension
(cherry picked from commit 6e1a6ff3c8)
This PR improves building production image from local packages,
in preparation for moving provider requirements out of setup.cfg.
Previously `pip download` step was executed in the CI scripts
in order to download all the packages that were needed. However
this had two problems:
1) PIP download was executed outside of Dockerfile in CI scripts
which means that any change to requirements there could not
be executed in 'workflow_run' event - because main branch version
of CI scripts is used there. We want to add extra requirements
when installing airflow so in order to be able to change
it, those requirements should be added in Dockerfile.
This will be done in the follow-up #13409 PR.
2) Packages downloaded with PIP download have a "file" version
rather than regular == version when you run pip freeze/check.
This looks weird and while you can figure out the version
from file name, when you `pip install` them, they look
much more normal. The airflow package and provider package
will still get the "file" form but this is ok because we are
building those packages from sources and they are not yet
available in PyPI.
Example:
adal==1.2.5
aiohttp==3.7.3
alembic==1.4.3
amqp==2.6.1
apache-airflow @ file:///docker-context-files/apache_airflow-2.1.0.dev0-py3-none-any.whl
apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-1.0.0-py3-none-any.whl
apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-1.0.0-py3-none-any.whl
...
With this PR, we do not `pip download` all packages, but instead
we prepare airflow + providers packages as .whl files and
install them from there (all the dependencies are installed
from PyPI)
(cherry picked from commit e436883583)
Seems that we are hitting more often one of the most favourite
bugs by Ash: DNS. Quote: "It's always DNS".
It looks like there is a race condition with docker compose
that causes services that started fast enough (before DNS)
to get a different reverse-DNS IP lookup (usually it is
just `<SERVICE>` but sometimes it is
`<DOCKER_COMPOSE_APP>_<SERVICE>_1_<NETWORK>`).
This produces misleading messages in log that might
make analysis of such problems difficult, that's why
we chose to get rid of the reverse lookup and give
bigger time for each service to check if it is ready.
Netcat, unfortunately performs both forward and reverse
lookup when given a name - forward lookup to find the
IP address and reverse lookup to write information to the
log about the host it connected to - and if it sees
that the original and reverse-looked-up names do not match
even if it manages to connect, it retunrs an error:
`DNS fwd/rev mismatch` - which is very misleading.
This change performs the following:
1) We lookup the host name in python via gethostbyname
2) We set -n in netcat to disable ANY DNS use
3) We feed netcat with the IP address
4) We've standardized all waiting times to be up to 50 seconds
This way we should get rid of the DNS fwd/rev mismatch once
and for all.
(cherry picked from commit ae625b4483)
* Set minimum SQLite version supported.
Some users reported that some older versions of SQLite do not
work with Airflow 2.0. This happens for example with latest
sqlite available by default on RHEL7 (sqlite version available
in fully updated system there is 7 (!) years old)
Example of such issue: #13397.
Not sure which 'minimum' version is supported but
in the Breeze environment based on debian buster we have
3.27.2 version in fully updated system. This shoudl be our
baseline.
* Update README.md
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
Co-authored-by: Xiaodong DENG <xd.deng.r@gmail.com>
(cherry picked from commit 670056311a)
* links were old / dead
* run_duration was removed from scheduler
* clarify related notes on backward compat in helm chart values
Co-authored-by: Daniel Standish <dstandish@techstyle.com>
(cherry picked from commit 028d8e8efb)
Previous change to add persist-credentials #13389 wrongly added
persists-credentials to python-setup rather than checkout
action. Also one of the checkout actions used master rather than
v2 tag.
(cherry picked from commit 85ac03f58c)
This PR disables persisting credentials in Github Actions checkout.
This is a result of discussion in builds@apache.orghttps://lists.apache.org/thread.html/r435c45dfc28ec74e28314aa9db8a216a2b45ff7f27b15932035d3f65%40%3Cbuilds.apache.org%3E
It turns out that contrary to the documentation actios (specifically
checkout action) can use GITHUB_TOKEN without specifying it as
input in the yaml file and the GitHub checkout action
leaves the repository with credentials stored locally that
enable pushing to Github Repository by any step in the same
job. This was thought to be forbidden initially (and the
documentation clearly says that the action must have the
GITHUB_TOKEN passed to it in .yaml workflow in order to
use it). But apparently it behaves differently.
This leaves open an attack vector where for example
any PIP package installed in the following steps could push
any changes to GitHub Repository of Apache Airflow.
Security incidents have been reported to both GitHub and
Apache Security team, but in the meantime we add configuration
to remove credentials after checkout step.
https://docs.github.com/en/free-pro-team@latest/actions/reference/authentication-in-a-workflow#using-the-github_token-in-a-workflow
> Using the GITHUB_TOKEN in a workflow
> To use the GITHUB_TOKEN secret, you *must* reference it in your workflow
file. Using a token might include passing the token as an input to an
action that requires it, or making authenticated GitHub API calls.
(cherry picked from commit d079b913d2)
- Removed redundant comma
- Used list-table so that modifications are easy
- Added syntax highlighting for config code-block
(cherry picked from commit a4a3d3f262)
Previously UPGRADE_TO_LATEST_CONSTRAINTS variable controlled
whether the CI image uses latest dependencies rather than
fixed constraints. This PR brings it also to PROD image.
The name of the ARG is changed to UPGRADE_TO_NEWER_DEPENDENCIES
as this corresponds better with the intention.
(cherry picked from commit 82fa048c12)
This is a complete refactor of the setup.py providers/dependencies.
It much better reflects the current setup where we have most of
the extras 1-1 reflecting providers but also some extras that do
not have their own providers.
The pre-commits that were verifying setup versus documentation
can now be vastly simplified (no more need to parse the
comments so we can import setup.py variables directly rather
than parse it via regexps. Also we can better categorize the
extras - separate out (and verify) whether we correctly
described deprecated extras and to mark extras that install
additional providers as such.
Fixes: #13309
(cherry picked from commit 0d214575a1)
It seems that for quite some time (1.10.4) the "ldap" extra
missed python-ldap dependency.
https://issues.apache.org/jira/browse/AIRFLOW-5261
Also LDAP seems to be popular enough to be added as default
extra in the production image.
Fixes#13306
(cherry picked from commit d23ac9b235)
The recently added log groupping hides error messages in case
there is an error in tests. You need to manually unfold last test
step which is somewhate hidden - it is followed by several
'dump-container' logs.
This change adds clear error message showing the exact log
group that you need to unfold in case you want to look for
a problem.
(cherry picked from commit f7d354df1c)
The PROD image is now verified by several checks:
* whether all expected providers are installed
* whether pip-check shows no conflicts
* whether imports are working for expected features
Part of #13315
(cherry picked from commit 3b4290d055)
* saves approx 1 second and an error message when using >= 1.10.14
Co-authored-by: Daniel Standish <dstandish@techstyle.com>
(cherry picked from commit 641f63c2c4)
This change introduces improvements in the way logs are displayed
in CI jobs and in amount of logs produced in general for CI jobs
due to much smarter cache usage.
Logs in all CI jobs are now grouped in groups which are folded
by default when there is no error generated in such group. Similar
solution has been already used in docs job and it improved
both readability and speed of loading of the logs in CI after
recent improvements in Github UI (previously the speed of loading
the logs was not improved by groups).
Also cache usage has been reviewed and fixed in a number of places
which will result in much shorted setup times for static checks
and kubernetes virtualenv but also far shorter logs generated by
cache setup (we are using restore-keys feature that implements
incremental approach for cache building even if cache keys in
GitHub Actions are immutable.
(cherry picked from commit d41c6a46b1)
Some older versions of PIP (including the one in dockerhub!) treat
all env variables starting with PIP_ as a way to pass
options. Setting PIP_VERSION to 20.2.4 and exporting it causes
error "ValueError: invalid truth value '20.2.4'" because it
does not have --version option and it treats it as --verbose
¯\_(ツ)_/¯
You can read more about it here:
https://github.com/pypa/pip/issues/4528
This PR renames the variable to avoid this side effect.
(cherry picked from commit 8fed541192)
* When hook names are too long, pre-commit dispay becomes very ugly with many blank lines
Co-authored-by: Daniel Standish <dstandish@techstyle.com>
(cherry picked from commit 91acdbea05)