Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.
While we already had packages (in libraries folders)
it's been difficult to realise which function is where.
With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.
Way easier in fact.
Part of #10576
(cherry picked from commit cc551ba793)
(cherry picked from commit 2bba276f0f06a5981bdd7e4f0e7e5ca2fe84f063)
* Implement Google Shell Conventions for breeze script … (#10651)
Part of #10576
First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.
This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.
The rules implemented (from the conventions):
* constants and exported variables are CAPITALIZED, where
local/temporary variables are lowercase
* following the shell guide, once all the variables are set to their
final values (either from exported variables, calculation or --switches
) I have a single function that makes all the variables read-only. That
helped to clean-up a lot of places where same functions was called
several times, or where variables were defined in a few places. Now the
behavior should be rather consistent and we should easily catch some
duplications
* function headers (following the guide) explaining arguments,
variables expected, variables modified in the functions used.
* setting the variables as read-only also helped to clean-up the "ifs"
where we often had ":=}" in variables and != "" or == "". Those are
replaced with `=}` and tests are replaced with `-n` and `-z` - also
following the shell guide (readonly helped to detect and clean all
such cases). This also should be much more robust in the future.
* reorganized initialization of those constants and variables - simplified
a few places where initialization was overlapping. It should be much more
straightforward and clean now
* a number of internal function breeze variables are "local" - this is
helpful in accidental variables overwriting and keeping stuff localized
* trap_add function is separated out to help in cases where we had
several traps handling the same signals.
(cherry picked from commit 46c8d6714c)
(cherry picked from commit c822fd7b4bf2a9c5a9bb3c6e783cbea9dac37246)
* fixup! Implement Google Shell Conventions for breeze script … (#10651)
* Revert "Add packages to function names in bash (#10670)"
This reverts commit cc551ba793.
* Revert "Implement Google Shell Conventions for breeze script … (#10651)"
This reverts commit 46c8d6714c.
Inspired by the Google Shell Guide where they mentioned
separating package names with :: I realized that this was
one of the missing pieces in the bash scripts of ours.
While we already had packages (in libraries folders)
it's been difficult to realise which function is where.
With introducing packages - equal to the library file name
we are *almost* at a level of a structured language - and
it's easier to find the functions if you are looking for them.
Way easier in fact.
Part of #10576
Part of #10576
First (and the biggest of the series of commits to introduce
Google Shell Conventions in our bash scripts.
This is about the biggest and the most complex breeze script
so it is rather huge but it is difficult to split it into
smaller pieces.
The rules implemented (from the conventions):
* constants and exported variables are CAPITALIZED, where
local/temporary variables are lowercase
* following the shell guide, once all the variables are set to their
final values (either from exported variables, calculation or --switches
) I have a single function that makes all the variables read-only. That
helped to clean-up a lot of places where same functions was called
several times, or where variables were defined in a few places. Now the
behavior should be rather consistent and we should easily catch some
duplications
* function headers (following the guide) explaining arguments,
variables expected, variables modified in the functions used.
* setting the variables as read-only also helped to clean-up the "ifs"
where we often had ":=}" in variables and != "" or == "". Those are
replaced with `=}` and tests are replaced with `-n` and `-z` - also
following the shell guide (readonly helped to detect and clean all
such cases). This also should be much more robust in the future.
* reorganized initialization of those constants and variables - simplified
a few places where initialization was overlapping. It should be much more
straightforward and clean now
* a number of internal function breeze variables are "local" - this is
helpful in accidental variables overwriting and keeping stuff localized
* trap_add function is separated out to help in cases where we had
several traps handling the same signals.
* CI Images are now pre-build and stored in registry
With this change we utilise the latest pull_request_target
event type from Github Actions and we are building the
CI image only once (per version) for the entire run.
This safes from 2 to 10 minutes per job (!) depending on
how much of the Docker image needs to be rebuilt.
It works in the way that the image is built only in the
build-or-wait step. In case of direct push run or
scheduled runs, the build-or-wait step builds and pushes
to the GitHub registry the CI image. In case of the
pull_request runs, the build-and-wait step waits until
separate build-ci-image.yml workflow builds and pushes
the image and it will only move forward once the image
is ready.
This has numerous advantages:
1) Each job that requires CI image is much faster because
instead of pulling + rebuilding the image it only pulls
the image that was build once. This saves around 2 minutes
per job in regular builds but in case of python patch level
updates, or adding new requirements it can save up to 10
minutes per job (!)
2) While the images are buing rebuilt we only block one job waiting
for all the images. The tests will start running in parallell
only when all images are ready, so we are not blocking
other runs from running.
3) Whole run uses THE SAME image. Previously we could have some
variations because the images were built at different times
and potentially releases of dependencies in-between several
jobs could make different jobs in the same run use slightly
different image. This is not happening any more.
4) Also when we push image to github or dockerhub we push the
very same image that was built and tested. Previously it could
happen that the image pushed was slightly different than the
one that was used for testing (for the same reason)
5) Similar case is with the production images. We are now building
and pushing consistently the same images accross the board.
6) Documentation building is split into two parallel jobs docs
building and spell checking - decreases elapsed time for
the docs build.
7) Last but not least - we keep the history of al the images
- those images contain SHA of the commit. This means
that we can simply download and run the image locally to reproduce
any problem that anyone had in their PR (!). This is super useful
to be able to help others to test their problems.
* fixup! CI Images are now pre-build and stored in registry
* fixup! fixup! CI Images are now pre-build and stored in registry
* fixup! fixup! fixup! CI Images are now pre-build and stored in registry
* fixup! fixup! fixup! CI Images are now pre-build and stored in registry
This cleans up the document building process and replaces it
with breeze-only. The original instructions with
`pip install -e .[doc]` stopped working so there is no
point keeping them.
Extracted from #10368
* Constraint files are now maintained automatically
* No need to generate requirements when setup.py changes
* requirements are kept in separate orphan branches not in main repo
* merges to master verify if latest requirements are working and
push tested requirements to orphaned branches
* we keep history of requirement changes and can label them
individually for each version (by constraint-1.10.n tag name)
* consistently changed all references to be 'constraints' not
'requirements'
The Kubernetes tests are now run using Helm chart
rather than the custom templates we used to have.
The Helm Chart uses locally build production image
so the tests are testing not only Airflow but also
Helm Chart and a Production image - all at the
same time. Later on we will add more tests
covering more functionalities of both Helm Chart
and Production Image. This is the first step to
get all of those bundle together and become
testable.
This change introduces also 'shell' sub-command
for Breeze's kind-cluster command and
EMBEDDED_DAGS build args for production image -
both of them useful to run the Kubernetes tests
more easily - without building two images
and with an easy-to-iterate-over-tests
shell command - which works without any
other development environment.
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Local caching is now default strategy when building
the Production image.
You can still change it to pulled - similar to CI builds
by providing the right build flag and this is what
is used in CI by default. The flags in Breeze are now updated
to be more eplanatory and friendly (build-cache-*) and a flag
for "disabled" cache option is added as well.
Also the Dockerfile and Dockerfile.ci files are not needed
any more in the docker context. They used to be needed when
we built the Kubernetes image in the container, but since
we are now using production image directly - we do not need
them any nmore.
Combining setting the default strategy to local and removing
the Dockerfile from the context has the nice effect that you
can iterate much faster on the Production image without
triggering rebuilds of half of the docker image
as soon as the Dockerfile changes.
For a long time the way how entrypoint worked in ci scripts
was wrong. The way it worked was convoluted and short of black
magic. This did not allow to pass multiple test targets and
required separate execute command scripts in Breeze.
This is all now straightened out and both production and
CI image are always using the right entrypoint by default
and we can simply pass parameters to the image as usual without
escaping strings.
This also allowed to remove some breeze commands and
change names of several flags in Breeze to make them more
meaningful.
Both CI and PROD image have now embedded scripts for log
cleaning.
History of image releases is added for 1.10.10-*
alpha quality images.
* Improved cloud tool available in the trimmed down CI container
The tools now have shebangs which make them available for
python tools. Also /opt/airflow is now mounted from the
host Airflow sources which makes it possible for the tools to
copy files directly to/from the sources of Airflow.
It also contains one small change for Linux users - the files
created by docker gcloud are created with root user so in order to fix
that the directories mounted from the host are fixed when you exit
the tool - their ownership is changed to be owned by the host user
It's fairly common to say whitelisting and blacklisting to describe
desirable and undesirable things in cyber security. However just because
it is common doesn't mean it's right.
However, there's an issue with the terminology. It only makes sense if
you equate white with 'good, permitted, safe' and black with 'bad,
dangerous, forbidden'. There are some obvious problems with this.
You may not see why this matters. If you're not adversely affected by
racial stereotyping yourself, then please count yourself lucky. For some
of your friends and colleagues (and potential future colleagues), this
really is a change worth making.
From now on, we will use 'allow list' and 'deny list' in place of
'whitelist' and 'blacklist' wherever possible. Which, in fact, is
clearer and less ambiguous. So as well as being more inclusive of all,
this is a net benefit to our understandability.
(Words mostly borrowed from
<https://www.ncsc.gov.uk/blog-post/terminology-its-not-black-and-white>)
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Tests requiring Kubernetes Cluster are now moved out of
the regular CI tests and moved to "kubernetes_tests" folder
so that they can be run entirely on host without having
the CI image built at all. They use production image
to run the tests on KinD cluster and we add tooling
to start/stop/deploy the application to the KinD cluster
automatically - for both CI testing and local development.
This is a pre-requisite to convert the tests to convert the
tests to use the official Helm Chart and Docker images or
Apache Airflow.
It closes#8782
Breeze had --push-images switch to also push images to repo
but it was often needed to build and push images separately.
We have now a possibility to push an already built image with
separate push-image command instead and also you can choose
to push to cache registry in GitHub rather than to DockerHub
with --registry-cache switch.
We have now mechanism to keep release notes updated for the
backport operators in an automated way.
It really nicely generates all the necessary information:
* summary of requirements for each backport package
* list of dependencies (including extras to install them) when package
depends on other providers packages
* table of new hooks/operators/sensors/protocols/secrets
* table of moved hooks/operators/sensors/protocols/secrets with
information where they were moved from
* changelog of all the changes to the provider package (this will be
automatically updated with incremental changelog whenever we decide to
release separate packages.
The system is fully automated - we will be able to produce release notes
automatically (per-package) whenever we decide to release new version of
the package in the future.
* Help for breeze commands contain relevant flags.
* fixup! Help for breeze commands contain relevant flags.
* fixup! fixup! Help for breeze commands contain relevant flags.