Граф коммитов

11488 Коммитов

Автор SHA1 Сообщение Дата
Vivek Bhojawala adf7755eaa
Add extra field to get_connnection REST endpoint (#13885) 2021-01-25 10:52:16 +01:00
Jarek Potiuk 31b956c6c2
Removes files from docker-context-files if not used (#13830)
In case docker-context files are not used during build, they
shoudl be cleaned just before the build to make sure that
docker context does not contain extra files here. Otherwise
files left from previous runs might be in the context and cause
cache invalidation if you are building the images locally.
2021-01-25 10:08:11 +01:00
Marshall Mamiya 6d55f329f9
AWS Glue Crawler Integration (#13072)
This change integrates an AWS glue crawler operator, hook and sensor that can be used to trigger glue crawlers from Airflow.

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2021-01-25 08:24:07 +01:00
Mahesh Panati c4b723f324
Added Aviva Plc to INTHEWILD.md (#13875) 2021-01-24 22:20:23 +01:00
Aakcht 326c74e608
Fix Kerberos envs for workers in Helm Chart (#13828) 2021-01-24 18:09:40 +01:00
Vladimir Mikhaylov dfbccd3b1f
Fix TaskNotFound in log endpoint (#13872) 2021-01-24 14:49:27 +01:00
Loïc Messal f473ca7130
Replace `google_cloud_storage_conn_id` by `gcp_conn_id` when using `GCSHook` (#13851)
google_cloud_storage_conn_id parameter has been deprecated by GCSHook, 
and should be replaced by gcp_conn_id parameter. google_cloud_storage_conn_id 
was still in use in many Operators.

GCSHook renders a DeprecationWarning message everytime one of those operators uses
google_cloud_storage_conn_id. This PR avoid triggering DeprecationWarning when using 
GCSHook in the codebase.

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2021-01-24 13:59:36 +01:00
Kaxil Naik 910ba25d6f
Fix spellings (#13867) 2021-01-24 07:40:50 +01:00
Kaxil Naik 702e3eee21
Fix link and section in CONTRIBUTORS_QUICK_START.rst (#13868) 2021-01-24 05:36:59 +01:00
Tomek Urbaszek c90485c5aa
Fix link to Apache Airflow docs in webserver (#13250) 2021-01-24 05:31:11 +01:00
Kaxil Naik 008b36592e
Improve BREEZE.rst docs (#13869) 2021-01-24 05:28:08 +01:00
Tobiasz Kędzierski 50548e154d
Improve environment variables in GCP Secret Manager test (#13844) 2021-01-24 02:26:55 +01:00
Jarek Potiuk 8ac6deaa39
Fix PyPI spelling (#13864) 2021-01-23 22:07:22 +01:00
Jarek Potiuk 4403b6d7b2
Quarantine test that often fails (same as alredy quarantined no pid) (#13845) 2021-01-23 21:29:21 +01:00
Ephraim Anierobi 94b1531230
Upgrade azure blob to v12 (#12188) 2021-01-23 13:52:13 +01:00
Joshua Carp a9ac2b040b
Switch to f-strings using flynt. (#13732) 2021-01-23 06:19:38 +01:00
Tobiasz Kędzierski 9592be88e5
Fix Google Spanner example dag (#13842)
* Fix Google Spanner example dag

Some tasks requires upstream

Without upstream they want perform operations on resources which does not exist yet

* fixup! Fix Google Spanner example dag
2021-01-22 19:13:50 +01:00
Tobiasz Kędzierski 3fe80741df
Fix GCP Secret Manager system test (#13848) 2021-01-22 19:13:07 +01:00
Tobiasz Kędzierski af52fdb511
Improve environment variables in GCP Dataflow system test (#13841)
It will help to parametrize system tests
2021-01-22 19:12:27 +01:00
Jarek Potiuk 0e540ab28d
Update information about branching strategy vs. production images (#13813)
Some users were not aware that we are not relasing images from
`stable` branch. This change clarifies branching strategy used
and what they can expect from the reference image published in
DockerHub.
2021-01-22 19:05:46 +01:00
Jarek Potiuk b1548a2a23
Allows for more than one warning in deprecation message (#13836)
Sometimes in our tests we get more than one deprecation
warnings. It is likely caused by transitive warnings
from importing other external libraries.

In order to get rid of those side effects, we are now
accepting more than one warning and we expect that at least
one of the warnings will come from the file being tested
2021-01-22 13:05:45 +01:00
Tobiasz Kędzierski e7946f1cb7
Improve environment variables in GCP Datafusion system test (#13837)
It will help to parametrize system tests
2021-01-22 13:05:17 +01:00
Tobiasz Kędzierski 61c1d6ec6c
Improve environment variables in GCP Memorystore system test (#13833) 2021-01-22 12:54:00 +01:00
Tobiasz Kędzierski 202f66093a
Improve environment variables in GCP Lifeciences system test (#13834) 2021-01-22 12:51:18 +01:00
Jarek Potiuk 1b9e3d1c28
Revert "Fix error with quick-failing tasks in KubernetesPodOperator (#13621)" (#13835)
This reverts commit 94d3ed61d6.

Co-authored-by: Daniel Imberman <daniel.imberman@gmail.com>
2021-01-22 12:24:20 +01:00
Jarek Potiuk df1503ea0a
Fixed image separator for Github Package registry image (#13825)
The #13726 introduced possibility of using Github Container
Registry, but for bulding from Package Registry there was a
mistake - only visible after merging - that introduced
failed master (naming of build image contained / rather than -)

This PR fixes it.
2021-01-22 09:28:31 +01:00
Daniel Imberman 94d3ed61d6
Fix error with quick-failing tasks in KubernetesPodOperator (#13621)
* Fix error with quick-failing tasks in KubernetesPodOperator

Addresses an issue with the KubernetesPodOperator where tasks that die
quickly are not patched with "already_checked" because they never make
it to the monitoring logic.

* static fix
2021-01-21 12:57:35 -08:00
Vladimir Mikhaylov 10b8ecc86f
Add params to the DAG details endpoint (#13790) 2021-01-21 16:42:19 +01:00
Jarek Potiuk 2c6c7fdb23
Adds capability of switching to Github Container Registry (#13726)
* Adds capability of switching to Github Container Registry

Currently we are using GitHub Packages to cache images for the
build. GitHub Packages are "legacy" storage of binary artifacts
for GitHub and as of September 2020 they introduced Github
Container Registry as more stable, easier to manage replacement
for container storage. It includes complete self-management
of the images including permission management, public access,
retention management and many more.

More about it here:

https://github.blog/2020-09-01-introducing-github-container-registry/

Recently we started to experience unstable behaviour of the
Github Packages ('unknown blob' and manifest v1 vs. v2 when
pushing images to it. So together with ASF we proposed to
enable Github Container Registry and it happened as of
January 2020.

More about it in https://issues.apache.org/jira/browse/INFRA-20959

We are currently in the testing phase, especially when it
comes to management of permissions - the model of permission
mangement is not the same for Container Registry as it was
for GitHub Packages (it was per-repository in GitHub Packages,
but it is organization-wide in the Container Registry.

This PR introduces an option to use GitHub Container Registry
rather than GitHub Packages. It is implemented in both - CI
level and Breeze level allowing to seamlessly switch between
those two solutions:

In Breeze (which we use to test pushing/pulling the images)
--github-registry option was added with `ghcr.io` (Github Container
Registry) or `docker.pkg.github.com` (GitHub Packages).

In CI the same can be achieved by setting GITHUB_REGISTRY value
(same values possible as for --github-registry Breeze parameter)

* fixup! Adds capability of switching to Github Container Registry
2021-01-21 16:16:09 +01:00
Tobiasz Kędzierski 9d9ef1addc
Improve environment variables in GCS system test (#13792) 2021-01-21 12:15:08 +01:00
Tobiasz Kędzierski 70bf307f38
Add How To Guide for Dataflow (#13461) 2021-01-21 11:41:36 +01:00
Kaxil Naik f7fe363255
Fix Deprecation for configuration.getsection (#13804) 2021-01-21 06:26:34 +01:00
Ashmeet Lamba 3e25795099
BaseBranchOperator will push to xcom by default. (#13704) (#13763)
This change will BaseBranchOperator to do xcom push of the branch it choose to follow.
It will also add support to use the do_xcom_push parameter.

The added change returns the result received by running choose_branch().

Closes: #13704
2021-01-21 01:16:32 +00:00
Griffin Cosgrove 3fd5ef3555
Add missing logos for integrations (#13717)
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2021-01-21 01:22:34 +01:00
Andrii Soldatenko 29730d7200
Add acl_policy to S3CopyObjectOperator (#13773)
closes https://github.com/apache/airflow/issues/13774

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2021-01-20 15:16:25 +00:00
Jennifer Melot 9923d606d2
Use DAG context manager in examples (#13297) 2021-01-20 13:16:12 +01:00
Kaxil Naik b4c8a0406e
Fix SQL syntax to check duplicate connections (#13783)
closes https://github.com/apache/airflow/issues/13679
2021-01-20 08:23:28 +01:00
André Amaral 1602ec97c8
Add a new argument for HttpSensor to accept a list of http status code to Continue Poking (#13499)
closes: #13451
2021-01-20 00:02:08 +00:00
drago-f5a 7a742cb033
Change log level from debug to info when spawning new gunicorn workers (#13780) 2021-01-19 23:38:55 +00:00
Brent Bovenzi d65cf77552
Add description to hint if conn_type is missing (#13778)
- add plaintext description to add/edit conn_type to make sure people remember to install necessary provider packages
2021-01-19 23:38:29 +00:00
drago-f5a 8a4bd3c73e
Fix webserver exiting when gunicorn master crashes (#13518)
* Correct the logic for webserver choosing number of workers to spawn (#13469)

A key consequence of this fix is that webserver will properly
exit when gunicorn master dies and stops responding to signals.
2021-01-19 22:23:40 +00:00
Jarek Potiuk 18d9320c26
Remove chmod +x for installation script for docker build. (#13772)
We've introduced chmod a+x for installation scripts in Dockerfiles.
but this turned out to be a bad idea. This was to accomodate
building on Azure Deveops which has filesystem that does not
keep executable bit. But the side-effect of it that the
layer of the script is invalidated when the permission is changed
to +x on linux. The problem is that the script has locally (on
checkout) different permissions depending on umask setting.

Therefore changing permissions for the image to +a is not best.

Instead we are running the scripts with bash directly, which does
not require changing of executable bit.
2021-01-19 21:49:50 +01:00
Jarek Potiuk bc026cf696
Adds automated user creation in production image (#13728)
* Adds automated user creation in the production image

This PR implements automated user creation for the production image
controlled by environment variables.

This is a solution for anyone who would like to make a quick test
of the production image and would like to:

* init/upgrade the DB automatically
* create a user

This is particularly useful for internal SQLite db initialization
but can also be used to initialize the user in docker-compose
or similar cases where there is no equivalent of init containers
that are usually used to perform the initialization.

Closes #860
2021-01-19 15:45:29 +01:00
JavierLopezT c065d32189
AllowDiskUse parameter and docs in MongotoS3Operator (#12033)
Co-authored-by: RosterIn <48057736+RosterIn@users.noreply.github.com>
Co-authored-by: javier.lopez <javier.lopez@promocionesfarma.com>
2021-01-19 13:25:53 +01:00
Kamil Breguła 66a16e5cb3
Dividing contributors guide into expert and beginner parts (#13742) 2021-01-19 11:52:27 +01:00
Jarek Potiuk a03b54545d
Increase timeouts for tests (#13756)
We are getting close to the previous timeouts for tests and some
tests are crossing the 80 minutes.

While we should speed it up in general, for now increasing
timeouts should do the job.
2021-01-19 11:48:52 +01:00
Jarek Potiuk c82f89f52c
Update installation notes to warn against common problems. (#13727)
We have recently seen a number of issues created by users who
tried to install airflow with poetry or pip-tools or who had
successes with using the latest pip 20.3.3. This change aims
to update the 'note' content and make sure installation
instructions are consistent everywhere, so that new users
are warned against using anything else than PIP and that they
are aware about potential problems with 'pip 20.3' and ways
to mitigate the problems.

This responds to the needs of confused users such as
one in https://github.com/apache/airflow/issues/13711#issuecomment-761694781
2021-01-19 11:46:06 +01:00
QP Hou f1d4f54b34
Fix race conditions in task callback invocations (#10917)
This race condition resulted in task success and failure callbacks being
called more than once. Here is the order of events that could lead to
this issue:

* task started running within process 2
* (process 1) local_task_job checked for task return code, returns None
* (process 2) task exited with failure state, task state updated as failed in DB
* (process 2) task failure callback invoked through taskinstance.handle_failure method
* (process 1) local_task_job heartbeat noticed task state set to
  failure, mistoken it as state bing updated externally, also invoked task
  failure callback

To avoid this race condition, we need to make sure task callbacks are
only invoked within a single process.
2021-01-18 23:39:41 +00:00
Kaxil Naik 2fefad4788
Use the correct link for Apache Airflow Dockerhub repo (#13752)
https://hub.docker.com/repository/docker/apache/airflow requires auth while  https://hub.docker.com/r/apache/airflow does not
2021-01-18 22:15:13 +00:00
Kaxil Naik 6410f07106
Add __repr__ for Executors (#13753)
Before:

```python
>>> from airflow.executors.local_executor import LocalExecutor
>>> LocalExecutor()
<airflow.executors.local_executor.LocalExecutor object at 0x7f49b47f8d68>
```

After:

```python
>>> from airflow.executors.local_executor import LocalExecutor
>>> LocalExecutor()
LocalExecutor(parallelism=32)
```
2021-01-18 22:10:18 +00:00