Граф коммитов

204 Коммитов

Автор SHA1 Сообщение Дата
Jarek Potiuk 3447b55ba5
More stable kubernetes port forwarding (#11538)
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.
2020-10-15 11:05:58 +02:00
Jarek Potiuk bcf0557827
Fixes remaining test-type strategy problems (#11522)
The test-type strategy matrix were not deleted entirely when combined
back tests in #11504
2020-10-14 11:36:45 +02:00
Jarek Potiuk 4297abab26
Combine back multiple test types into single jobs (#11504)
Seems that by splitting the tests into many small jobs has a bad
effect - since we only have queue size = 180 for the whole Apache
organisation, we are competing with other projects for the jobs
and with the jobs being so short we got starved much more than if
we had long jobs. Therefore we are re-combining the test types into
single jobs per Python version/Database version and run all the
tests sequentially on those machines.
2020-10-13 20:51:08 +02:00
Jarek Potiuk 16e7129719
Added support for provider packages for Airflow 2.0 (#11487)
* Separate changes/readmes for backport and regular providers

We have now separate release notes for backport provider
packages and regular provider packages.

They have different versioning - backport provider
packages with CALVER, regular provider packages with
semver.

* Added support for provider packages for Airflow 2.0

This change consists of the following changes:

* adds provider package support for 2.0
* adds generation of package readme and change notes
* versions are for now hard-coded to 0.0.1 for first release
* adds automated tests for installation of the packages
* rename backport package readmes/changes to BACKPORT_*
* adds regulaar packge readmes/changes
* updates documentation on generating the provider packaes
* adds CI tests for the packages
* maintains backport packages generation with --backports flag

Fixes #11421
Fixes #11424
2020-10-13 16:33:00 +01:00
Jarek Potiuk f124d3f4eb
Enables back duplicate cancelling on push/schedule (#11471)
We disabled duplicate cancelling on push/schedule in #11397
but then it causes a lot of extra strain in case several commits
are merged in quick succession. The master merges are always
full builds and take a lot of time, but if we merge PRs
quickly, the subsequent merge cancels the previous ones.

This has the negative consequence that we might not know who
broke the master build, but this happens rarely enough to suffer
the pain at expense of much less strained queue in GitHub Actions.
2020-10-12 17:39:47 +02:00
Jarek Potiuk 32f2a45819
Rename backport packages to provider packages (#11459)
In preparation for adding provider packages to 2.0 line we
are renaming backport packages to provider packages.

We want to implement this in stages - first to rename the
packages, then split-out backport/2.0 providers as part of
the #11421 issue.
2020-10-12 16:29:48 +02:00
Jarek Potiuk 697465df8d
Increase timeout for waiting for images (#11460)
Now, when we have many more jobs to run, it might happen that
when a lot of PRs are submitted one-after-the-other there might
be longer waiting time for building the image.

There is only one waiting job per image type, so it does not
cost a lot to wait a bit longer, in order to avoid cancellation
after 50 minutes of waiting.
2020-10-12 12:24:53 +02:00
Jarek Potiuk 369bbf0427
Selective tests - depends on files changed in the commit. (#11417)
This is final step of implementing #10507 - selective tests.
Depending on files changed by the incoming commit, only subset of
the tests are exucted. The conditions below are evaluated in the
sequence specified below as well:

* In case of "push" and "schedule" type of events, all tests
  are executed.

* If no important files and folders changed - no tests are executed.
  This is a typical case for doc-only changes.

* If any of the environment files (Dockerfile/setup.py etc.) all
  tests are executed.

* If no "core/other" files are changed, only the relevant types
  of tests are executed:

  * API - if any of the API files/tests changed
  * CLI - if any of the CLI files/tests changed
  * WWW - if any of the WWW files/tests changed
  * Providers - if any of the Providers files/tests changed

* Integration Heisentests, Quarantined, Postgres and MySQL
  runs are always run unless all tests are skipped like in
  case of doc-only changes.

* If "Kubernetes" related files/tests are changed, the
  "Kubernetes" tests with Kind are run. Note that those tests
  are run separately using Host environment and those tests
  are stored in "kubernetes_tests" folder.

* If some of the core/other files change, all tests are run. This
  is calculated by substracting all the files count calculated
  above from the total count of important files.

Fixes: #10507
2020-10-12 00:28:11 +02:00
Jarek Potiuk 5bc5994c2c
Split tests to more sub-types (#11402)
We seem to have a problem with running all tests at once - most
likely due to some resource problems in our CI, therefore it makes
sense to split the tests into more batches. This is not yet full
implementation of selective tests but it is going in this direction
by splitting to Core/Providers/API/CLI tests. The full selective
tests approach will be implemented as part of #10507 issue.

This split is possible thanks to #10422 which moved building image
to a separate workflow - this way each image is only built once
and it is uploaded to a shared registry, where it is quickly
downloaded from rather than built by all the jobs separately - this
way we can have many more jobs as there is very little per-job
overhead before the tests start runnning.
2020-10-11 07:40:31 -07:00
Jarek Potiuk 4de8f85eec
Fixes SHA used for cancel-workflow-action (#11400)
The SHA of cancel-workflow-action in #11397 was pointing to previous
(3.1) version of the action. This PR fixes it to point to the
right (3.2) version.
2020-10-11 13:54:00 +02:00
Jarek Potiuk 076fe88a1d
Fixes cancelling of too many workflows. (#11403)
A problem was introduced in #11397 where a bit too many "Build Image"
jobs is being cancelled by subsequent Build Image run. For now it
cancels all the Build Image jobs that are running :(.
2020-10-10 18:33:06 +02:00
Jarek Potiuk a34f5ee76d
Fixes automated upgrade to latest constraints. (#11399)
Wrong if query in the GitHub action caused upgrade to latest
constraints did not work for a while.
2020-10-10 15:09:10 +02:00
Jarek Potiuk 401a579dd1
Push and schedule duplicates are not cancelled. (#11397)
The push and schedule builds should not be cancelled even if
they are duplicates. By seing which of the master merges
failed, we have better visibility on which merge caused
a problem and we can trace it's origin faster even if the builds
will take longer overall.

Scheduled builds also serve it's purpose and they should
be always run to completion.
2020-10-10 13:51:58 +02:00
Jarek Potiuk 666e81ab4a
Bump cache version for kubernetes tests (#11355)
Seems that the k8s cache for virtualenv got broken during the
recent problems. This commits bumps the cache version to make
it afresh
2020-10-08 19:10:46 +02:00
Jarek Potiuk 9dc32a3d8a
Better message when Building Image fails or gets cancelled. (#11333) 2020-10-08 13:09:34 +02:00
Fai b4baa2b04b
Add environment variables documentation to cli-ref.rst. (#10970)
Co-authored-by: Fai Hegberg <faihegberg@Fais-MacBook-Pro.local>
2020-10-07 21:43:48 +01:00
Jarek Potiuk fe59f26223
Pin versions of "untrusted" 3rd-party GitHub Actions (#11319)
According to https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actionsa
it's best practice not to use tags in case of untrusted
3rd-party actions in order to avoid potential attacks.
2020-10-07 13:23:41 +02:00
Jarek Potiuk 975558be11
Replaces depreated set-env with env file (#11292)
Github Actions deprecated the set-env action due to moderate security
vulnerability they found.

https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

This commit replaces set-env with env file as explained in

https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#environment-files
2020-10-06 09:50:59 +02:00
Jarek Potiuk a33a91951b
Switched to Run Checks for Building Images. (#11276)
Replaces the annoying comments with "workflow_run" links
with Run Checks. Now we will be able to see the "Build Image"
checks in the "Checks" section including their status and direct
link to the steps running the image builds as "Details" link.

Unfortunately Github Actions do not handle well the links to
details - even if you provide details_url to link to the other
run, the "Build Image" checks appear in the original workflow,
that's why we had to introduce another link in the summary of
the Build Image check that links to the actual workflow.
2020-10-05 18:35:24 +02:00
Jarek Potiuk a4478f5665
Improve running and canceliling of the PR-triggered builds. (#11268)
The PR builds are now better handled with regards to both
running (using merge-request) and canceling (with cancel notifications).

First of all we are using merged commit from the PR, not the original commit
from the PR.

Secondly - the workflow run notifies the original PR with comment
stating that the image is being built in a separate workflow -
including the link to that workflow.

Thirdly - when canceling duplicate PRs or PRs with failed
jobs, the workflow will add a comment to the PR stating the
reason why the PR is being cancelled.

Last but not least, we also add cancel job for the CodeQL duplicate
messages. They run for ~ 12 miinutes so it makes perfect sense to
also cancel those CodeQL jobs for which someone pushed fixups in a
quick succession.

Fixes: #10471
2020-10-04 22:53:18 +02:00
Jarek Potiuk 1b9e59c31a
Limits CodeQL workflow to run only in the Apache Airflow repo (#11264)
It has been raised quite a few times that workflow added in forked
repositories might be pretty invasive for the forks - especially
when it comes to scheduled workflows as they might eat quota
or at least jobs for those organisations/people who fork
repositories.

This is not strictly necessary because Recently GitHub recognized this as being
a problem and introduced new rules for scheduled workflows. But for people who
are already forked, it would be nice to not run those actions. It is enough
that the CodeQL check is done when PR is opened to the "apache/airflow"
repository.

Quote from the emails received by Github (no public URL explaining it yet):

> Scheduled workflows will be disabled by default in forks of public repos and in
public repos with no activity for 60 consecutive days.  We’re making two
changes to the usage policy for GitHub Actions. These changes will enable
GitHub Actions to scale with the incredible adoption we’ve seen from the GitHub
community. Here’s a quick overview:

> * Starting today, scheduled workflows will be disabled by default in new forks of
public repositories.
> * Scheduled workflows will be disabled in public repos with
no activity for 60 consecutive days.
2020-10-04 13:44:17 +02:00
Tomek Urbaszek 7c66936985
Add Github Code Scanning (#11211)
Github just released Github Code Scanning:
https://github.blog/2020-09-30-code-scanning-is-now-available/
2020-10-03 15:11:11 +02:00
Jarek Potiuk e252a6064f
Adds timeout in CI/PROD waiting jobs (#11117)
In very rare cases, the waiting job might not be cancelled when
the "Build Image" job fails or gets cancelled on its own.

In the "Build Image" workflow we have this step:

- name: "Canceling the CI Build source workflow in case of failure!"
  if: cancelled() || failure()
  uses: potiuk/cancel-workflow-runs@v2
  with:
    token: ${{ secrets.GITHUB_TOKEN }}
    cancelMode: self
    sourceRunId: ${{ github.event.workflow_run.id }}

But when this step fails or gets cancelled on its own before
cancel is triggered, the "wait for image" steps could
run for up to 6 hours.

This change sets 50 minutes timeout for those jobs.

Fixes #11114
2020-09-24 10:46:43 +02:00
Jarek Potiuk 52fdb62314
Requirements might get upgraded without setup.py change (#10784)
I noticed that when there is no setup.py changes, the constraints
are not upgraded automatically. This is because of the docker
caching strategy used - it simply does not even know that the
upgrade of pip should happen.

I believe this is really good (from security and incremental updates
POV to attempt to upgrade at every successfull merge (not that
the upgrade will not be committed if any of the tests fail and this
is only happening on every merge to master or scheduled run.

This way we will have more often but smaller constraint changes.

Depends on #10828
2020-09-22 16:22:53 +02:00
Jarek Potiuk cea9e829b3
Improves deletion of old artifacts. (#11079)
We introduced deletion of the old artifacts as this was
the suspected culprit of Kubernetes Job failures. It turned out
eventually that those Kubernetes Job failures were caused by
the #11017 change, but it's good to do housekeeping of the
artifacts anyway.

The delete workflow action introduced in a hurry had two problems:

* it runs for every fork if they sync master. This is a bit
  too invasive

* it fails continuously after 10 - 30 minutes every time
  as we have too many old artifacts to delete (GitHub has
  90 days retention policy so we have likely tens of
  thousands of artifacts to delete)

* it runs every hour and it causes occasional API rate limit
  exhaustion (because we have too many artifacts to loop trough)

This PR introduces filtering with the repo, changes the frequency
of deletion to be 4 times a day. Back of the envelope calculation
tops 4/day at 2500 artifacts to delete at every run so we have low risk 
of reaching 5000 API calls/hr rate limit. and adds script that we are
running manually to delete those excessive artifacts now. Eventually
when the number of artifacts goes down the regular job should delete
maybe a few hundreds of artifacts appearing within the 6 hours window
in normal circumstances and it should stop failing then.
2020-09-22 14:31:14 +02:00
Tomek Urbaszek 29d62977d3
Fix s.apache.org Slack link (#11078)
Remove ending / from s.apache.org Slack link
2020-09-22 11:33:49 +02:00
Kaxil Naik e3a590075e
Replace Airflow Slack Invite old link to short link (#11071)
Follow up to https://github.com/apache/airflow/pull/10034

https://apache-airflow-slack.herokuapp.com/ to https://s.apache.org/airflow-slack/
2020-09-22 10:46:44 +02:00
Jarek Potiuk c362d691fc
Add Workflow to delete old artifacts (#11064) 2020-09-21 20:59:18 +02:00
Jarek Potiuk 3db4d3b04d
All versions in CI yamls are not hard-coded any more (#10959)
GitHub Actions allow to use `fromJson` method to read arrays
or even more complex json objects into the CI workflow yaml files.

This, connected with set::output commands, allows to read the
list of allowed versions as well as default ones from the
environment variables configured in
./scripts/ci/libraries/initialization.sh

This means that we can have one plece in which versions are
configured. We also need to do it in "breeze-complete" as this is
a standalone script that should not source anything we added
BATS tests to verify if the versions in breeze-complete
correspond with those defined in the initialization.sh

Also we do not limit tests any more in regular PRs now - we run
all combinations of available versions. Our tests run quite a
bit faster now so we should be able to run more complete
matrixes. We can still exclude individual values of the matrixes
if this is too much.

MySQL 8 is disabled from breeze for now. I plan a separate follow
up PR where we will run MySQL 8 tests (they were not run so far)
2020-09-21 20:02:04 +02:00
Daniel Imberman 7e112b18e0
Only gather KinD logs if tests fail (#11058) 2020-09-21 18:56:17 +02:00
John Bampton ce19657ec6
Fix case of GitHub. (#10955)
Changed `Github` to `GitHub`.
2020-09-15 14:49:27 -04:00
Jarek Potiuk 14f27635f6
Fixes retrieval of correct branch in non-master related builds (#10912)
When we ported the new CI mechanism to v1-10-test it turned out
that we have to correct the retrieval of DEFAULT BRANCH
and DEFAULT_CONSTRAINTS_BRANCH.

Since we are building the images using the "master" scripts, we need to
make sure the branches are retrieved from _initialization.sh of the
incoming PR, not from the one in the master branch.

Additionally versions 2.7 and 3.5 builds have to be merged to
master and excluded when the build is run targeting master branch.
2020-09-15 15:24:33 +02:00
Jarek Potiuk 83ed6bdb3f
Cache for kubernetes tests is updateable (#10945)
The cache in Github Actions is immutable - once you create it
it cannot be modified. That's why cache keys should contain
hash of all files that are used to create the cache.

Kubernetes cache key did not contain it, and as a side effect
the cache from master kubernetes setup.py was used in the v1-10-test
after the breeze changes were cherry-picked.
2020-09-15 02:24:15 +02:00
Kaxil Naik 67402b72a9
Fix grammar in Bug Report Template (#10936)
`This questions` -> `These questions`
2020-09-14 16:38:42 +01:00
Jarek Potiuk b746f33fc6
Removes stable tests from quarantine (#10768)
We've observed the tests for last couple of weeks and it seems
most of the tests marked with "quarantine" marker are succeeding
in a stable way (https://github.com/apache/airflow/issues/10118)
The removed tests have success ratio of > 95% (20 runs without
problems) and this has been verified a week ago as well,
so it seems they are rather stable.

There are literally few that are either failing or causing
the Quarantined builds to hang. I manually reviewed the
master tests that failed for last few weeks and added the
tests that are causing the build to hang.

Seems that stability has improved - which might be casued
by some temporary problems when we marked the quarantined builds
or too "generous" way of marking test as quarantined, or
maybe improvement comes from the #10368 as the docker engine
and machines used to run the builds in GitHub experience far
less load (image builds are executed in separate builds) so
it might be that resource usage is decreased. Another reason
might be Github Actions stability improvements.

Or simply those tests are more stable when run isolation.

We might still add failing tests back as soon we see them behave
in a flaky way.

The remaining quarantined tests that need to be fixed:
 * test_local_run (often hangs the build)
 * test_retry_handling_job
 * test_clear_multiple_external_task_marker
 * test_should_force_kill_process
 * test_change_state_for_tis_without_dagrun
 * test_cli_webserver_background

We also move some of those tests to "heisentests" category
Those testst run fine in isolation but fail
the builds when run with all other tests:
 * TestImpersonation tests

We might find that those heisentest can be fixed but for
now we are going to run them in isolation.

Also - since those quarantined tests are failing more often
the "num runs" to track for those has been decreased to 10
to keep track of 10 last runs only.
2020-09-08 07:36:12 +02:00
Kaxil Naik a1032805bc
Change the name of Static Check without pylint (#10690) 2020-09-03 10:13:35 +01:00
Jarek Potiuk 596bc13379
Adds 'cncf.kubernetes' package back to backport provider packages. (#10659) 2020-08-31 14:45:58 +02:00
Jarek Potiuk e565368f2e
Nightly tag push is not skipped in scheduled builds (#10597)
With recent refactors, nightly tag was not pushed on
scheduled event because it was depending on pushing images
to github registry. Pushing images to github registry is
skipped on scheduled builds, so pushing tag was also skipped.
2020-08-27 15:40:07 +02:00
Jarek Potiuk be77f8e448
Add a possibility to switch back to building images by secret (#10509)
You can now define secret in your own fork:

AIRFLOW_GITHUB_REGISTRY_WAIT_FOR_IMAGE

If you set it to "false", it skips building images in separate
workflow_run - images will be built in the jobs run in the
CI Build run and they won't be pushed to the registry.

Note - you can't have secrets starting with GITHUB_, that's why
the AIRFLOW_* prefix
2020-08-25 01:38:28 +02:00
Jarek Potiuk 570f75149f
Sets default timeout for the job waiting for images (#10517)
In normal circumstances those jobs will wait for a short time
(4-15 minutes depenfding on the state of the base image).
However there might be some cases when there are a lot of jobs
or when there is some queueing problems in GitHub that
the "Build Images" job will be queued and not start quickly.

This happened on 24th of August 2020 for example when several
jobs failed because the "Build Image" was queued and only
run after the "CI Build" job timed out.

Usually those situations tends to be resolved by GitHub support
or they resolve themselves as the jobs will be finishing and
freing the queue. However in those cases we should give the
waiting job as much time as GitHub Action allows by default
for the job to run (360 minutes). This is no harm - we can
alwayc cancel those jobs manually and they are just two
jobs running so it should not cause any problem.

Note that if someone would see that the job is running for
a long time - the contributor will likely push amended
commit and it will also cancel such waiting job, so
this is even less likely to have long runnning waiting jobs.
2020-08-24 22:03:35 +02:00
Jarek Potiuk 4fa7df5de9
Mounting from sources is disabled for tests (#10472)
We had to enable mounting from sources for a short while
because we had to find a way to add new scripts to the
"workflow_run" workflow we have. This also requires
the #10470 to be merged - perf_kit to be moved to tests.utils because
it was in a separate directory and image without mounting sources
could not run the tests.

It also partially addresses the #10445 problem where
there was difference between sources in the image and coming
from the master. This comes from GitHub running merge on
non-conflicting changes in the PR and something that will
be addressed shortly.

The issue #10471 discusses this in detail.
2020-08-24 14:24:13 +02:00
Tomek Urbaszek c8c3f8b8b4
Remove old configuration from BoringCyborg (#10490)
Signed-off-by: Tomek Urbaszek <tomasz.urbaszek@polidea.com>
2020-08-23 12:16:56 +02:00
Jarek Potiuk 93ba98ce92
Optimise production image building during k8s tests on CI (#10476)
We do not have to rebuild PROD images now because we changed
the strategy of preparing the image for k8s tests instead of
embedding dags during build with EMBEDDED_DAGS build arg we
are now extending the image with FROM: clause and add dags
on top of the PROD base image. We've been rebuilding the
image twice during each k8s run - once in Prepare PROD image
and once in "Deploy airflow to cluster" both are not needed
and both lasted ~ 2m 30s, so we should save around 5m for every
K8S jobi (~30% as the while K8S test job is around 15m).
2020-08-22 21:35:55 +02:00
Jarek Potiuk 0c4f7cd0d0
Change Support Request template to a link to Slack (#10480)
* Change Support Request template to a link to Slack

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-08-22 20:01:42 +02:00
Jarek Potiuk 9774903088
Fixes quoting bug introduced in #10473 (#10477)
Copy pasted curly quotes from a web page <facepalm>
2020-08-22 17:31:48 +02:00
Jarek Potiuk ce9cc1b089
Stops running workflow_run for scheduled runs in forks (#10473)
There is still one build running for forks regularly, even though
we disabled all scheduled runs in #10448, there is still one
case with nightly build that we should disable.
The run is the "workflow_run"
executed for the nightly scheduled "CI Build" run that still gets
triggered.

This change skips those run in forjs in case the "source event"
is "schedule"
2020-08-22 16:12:43 +02:00
Jarek Potiuk 1cf1af664f
Do not override in_container scripts when building the image (#10442)
After #10368, we've changed the way we build the images
on CI. We are overriding the ci scripts that we use
to build the image with the scripts taken from master
to not give roque PR authors the possibiility to run
something with the write credentials.

We should not override the in_container scripts, however
because they become part of the image, so we should use
those that came with the PR. That's why we have to move
the "in_container" scripts out of the "ci" folder and
only override the "ci" folder with the one from
master. We've made sure that those scripts in ci
are self-contained and they do not need reach outside of
that folder.

Also the static checks are done with local files mounted
on CI because we want to check all the files - not only
those that are embedded in the container.
2020-08-21 17:21:57 +02:00
Jarek Potiuk 5bf47e3554
Be nice to fork repositories when it comes to scheduled events (#10448)
Only runs scheduled CI runs in the 'apache/airflow' forks
2020-08-21 16:20:30 +02:00
Felix Uellendall 2f552233f5
Add AzureBaseHook (#9747)
- refactor/change azure_container_instance to use AzureBaseHook
- add info to operators-and-hooks-ref.rst
- add howto docs for connecting to azure
- add auth mechanism via json config
- add azure conn type
2020-08-21 11:45:23 +02:00
Jarek Potiuk c35a01037a
Switch to released cancel-workflow-runs action (#10423)
Follow up after #10368
2020-08-20 12:15:42 +02:00