Граф коммитов

10400 Коммитов

Автор SHA1 Сообщение Дата
John Bampton 75071831ba
Remove redundant parentheses from Python files (#10967) 2020-10-10 15:08:38 +02:00
Jarek Potiuk 401a579dd1
Push and schedule duplicates are not cancelled. (#11397)
The push and schedule builds should not be cancelled even if
they are duplicates. By seing which of the master merges
failed, we have better visibility on which merge caused
a problem and we can trace it's origin faster even if the builds
will take longer overall.

Scheduled builds also serve it's purpose and they should
be always run to completion.
2020-10-10 13:51:58 +02:00
Jarek Potiuk 04973904c3
Constraints and PIP packages can be installed from local sources (#11382)
* Constraints and PIP packages can be installed from local sources

This is the final part of implementing #11171 based on feedback
from enterprise customers we worked with. They want to have
a capability of building the image using binary wheel packages
that are locally available and the official Dockerfile. This means
that besides the official APT sources the Dockerfile build should
not needd GitHub, nor any other external files pulled from outside
including PIP repository.

This change also includes documentation on how to prepare set of
such binaries ready for inspection and review by security teams
in Enterprise environment. Such sets of "known-working-binary-whl"
files can then be separately committed, tracked and scrutinized
in an artifact repository of such an Enterprise.

Fixes: #11171

* Update docs/production-deployment.rst
2020-10-10 12:58:09 +02:00
Daniel Imberman 8640fb6c10
fix tests (#11368) 2020-10-09 16:56:56 -07:00
Michał Misiewicz b7404b079a
KubernetesPodOperator should retry log tailing in case of interruption (#11325)
* KubernetesPodOperator can retry log tailing in case of interruption

* fix failing test

* change read_pod_logs method formatting

* KubernetesPodOperator retry log tailing based on last read log timestamp

* fix test_parse_log_line test  formatting

* add docstring to parse_log_line method

* fix kubernetes integration test
2020-10-09 15:59:47 -07:00
Jarek Potiuk 6fe020e105
Add tests for Custom cluster policy (#11381)
The custom ClusterPolicyViolation has been added in #10282
This one adds more comprehensive test to it.

Co-authored-by: Jacob Ferriero <jferriero@google.com>
2020-10-10 00:57:10 +02:00
John Bampton 39fc961eec
Fix case of JavaScript. (#10957) 2020-10-10 00:50:31 +02:00
Daniel Imberman 3164025a7a
Fix airflow_local_settings.py showing up as directory (#10999)
Fixes a bug where the airflow_local_settings.py mounts as a volume
if there is no value (this causes k8sExecutor pods to fail)
2020-10-10 00:49:45 +02:00
Đặng Minh Dũng 298052fcee
[airflow/providers/cncf/kubernetes] correct hook methods name (#11008) 2020-10-10 00:48:47 +02:00
mucio 7b0a2f5d8e
Replaced basestring with str in the Exasol hook (#11360) 2020-10-10 00:44:59 +02:00
Jarek Potiuk d752575e78
Revert "Revert "Adds --install-wheels flag to breeze command line (#11317)" (#11348)" (#11356)
This reverts commit f67e6cb805.
2020-10-10 00:41:11 +02:00
Ash Berlin-Taylor 73b9163a8f
Fully support running more than one scheduler concurrently (#10956)
* Fully support running more than one scheduler concurrently.

This PR implements scheduler HA as proposed in AIP-15. The high level
design is as follows:

- Move all scheduling decisions into SchedulerJob (requiring DAG
  serialization in the scheduler)
- Use row-level locks to ensure schedulers don't stomp on each other
  (`SELECT ... FOR UPDATE`)
- Use `SKIP LOCKED` for better performance when multiple schedulers are
  running. (Mysql < 8 and MariaDB don't support this)
- Scheduling decisions are not tied to the parsing speed, but can
  operate just on the database

*DagFileProcessorProcess*:

Previously this component was responsible for more than just parsing the
DAG files as it's name might imply. It also was responsible for creating
DagRuns, and also making scheduling decisions of TIs, sending them from
"None" to "scheduled" state.

This commit changes it so that the DagFileProcessorProcess now will
update the SerializedDAG row for this DAG, and make no scheduling
decisions itself.

To make the scheduler's job easier (so that it can make as many
decisions as possible without having to load the possibly-large
SerializedDAG row) we store/update some columns on the DagModel table:

- `next_dagrun`: The execution_date of the next dag run that should be created (or
  None)
- `next_dagrun_create_after`: The earliest point at which the next dag
  run can be created

Pre-computing these values (and updating them every time the DAG is
parsed) reduce the overall load on the DB as many decisions can be taken
by selecting just these two columns/the small DagModel row.

In case of max_active_runs, or `@once` these columns will be set to
null, meaning "don't create any dag runs"

*SchedulerJob*

The SchedulerJob used to only queue/send tasks to the executor after
they were parsed, and returned from the DagFileProcessorProcess.

This PR breaks the link between parsing and enqueuing of tasks, instead
of looking at DAGs as they are parsed, we now:

-  store a new datetime column, `last_scheduling_decision` on DagRun
  table, signifying when a scheduler last examined a DagRun
- Each time around the loop the scheduler will get (and lock) the next
  _n_ DagRuns via `DagRun.next_dagruns_to_examine`, prioritising DagRuns
  which haven't been touched by a scheduler in the longest period
- SimpleTaskInstance etc have been almost entirely removed now, as we
  use the serialized versions

* Move callbacks execution from Scheduler loop to DagProcessorProcess

* Don’t run verify_integrity if the Serialized DAG hasn’t changed

dag_run.verify_integrity is slow, and we don't want to call it every time, just when the dag structure changes (which we can know now thanks to DAG Serialization)

* Add escape hatch to disable newly added "SELECT ... FOR UPDATE" queries

We are worried that these extra uses of row-level locking will cause
problems on MySQL 5.x (most likely deadlocks) so we are providing users
an "escape hatch" to be able to make these queries non-locking -- this
means that only a singe scheduler should be run, but being able to run
one is better than having the scheduler crash.

Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
2020-10-09 22:44:27 +01:00
Jarek Potiuk e198077f3e
Add pypirc initialization (#11386)
This PR needs to be merged first in order to handle the #11385
which requires .pypirc to be created before dockerfile gets build.

This means that the script change needs to be merged to master
first in this PR.
2020-10-09 22:55:03 +02:00
Jarek Potiuk 29a145cd69
Add capability of adding service account annotations to Helm Chart (#11387)
We can now add annotations to the service accounts in a generic
way. This allows for example to add Workflow Identitty in GKE
environment but it is not limited to it.

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>

Co-authored-by: Jacob Ferriero <jferriero@google.com>
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
2020-10-09 22:54:21 +02:00
Daniel Imberman 49aad025b5
Users can specify sub-secrets and paths k8spodop (#11369)
Allows users to specify items for specific key path projections
when using the airflow.kubernetes.secret.Secret class
2020-10-09 09:00:09 -07:00
Tomek Urbaszek eb5fea7b64
Replace nuke with useful information on error page (#11346)
This PR replaces nuke asciiart with text about reporting a bug.
As we are no longer using asciiarts this PR removes it.
2020-10-09 16:27:39 +02:00
Kaxil Naik ff1a2aaff8
Set start_date, end_date & duration for tasks failing without DagRun (#11358) 2020-10-09 15:21:39 +01:00
Ash Berlin-Taylor fe0bf6e1f0
Reduce "start-up" time for tasks in CeleryExecutor (#11372)
This is similar to #11327, but for Celery this time.

The impact is not quite as pronounced here (for simple dags at least)
but takes the average queued to start delay from 1.5s to 0.4s
2020-10-09 13:18:32 +01:00
Satyasheel d2754ef769
Strict type check for Microsoft (#11359) 2020-10-09 10:31:53 +01:00
Tobiasz Kędzierski 8baf657fc2
Fix regression in DataflowTemplatedJobStartOperator (#11167) 2020-10-09 10:21:16 +02:00
Vijayant 422b61a9dd
Adding ElastiCache Hook for creating, describing and deleting replication groups (#8701) 2020-10-09 09:19:26 +01:00
Sumit Maheshwari 5605d1063b
Fix DagBag bug when a dag has invalid schedule_interval (#11344) 2020-10-09 13:29:41 +05:30
Kaxil Naik 7f674c685d
Use only-if-needed upgrade strategy for PRs (#11363)
Currently, upgrading dependencies in setup.py still runs with previous versions of the package for the PR which fails.

This will change to upgrade only the package that is required for the PRs
2020-10-09 09:57:51 +02:00
Kamil Breguła 7541c88eaf
Allways use Airlfow db in FAB (#11364) 2020-10-09 09:55:31 +02:00
Kaxil Naik 27e637fbe3
Bugfix: Error in SSHOperator when command is None (#11361)
closes https://github.com/apache/airflow/issues/10656
2020-10-09 08:35:39 +01:00
venkatesh selvaraj 11eb649d4a
Fix to make y-axis of Tries chart visible (#10071)
Co-authored-by: Venkatesh Selvaraj <venkateshselvaraj@pinterest.com>
2020-10-08 20:17:50 +01:00
Jarek Potiuk f5b7bbcb92
Better diagnostics when there are problems with Kerberos (#11353) 2020-10-08 21:08:11 +02:00
Jarek Potiuk 666e81ab4a
Bump cache version for kubernetes tests (#11355)
Seems that the k8s cache for virtualenv got broken during the
recent problems. This commits bumps the cache version to make
it afresh
2020-10-08 19:10:46 +02:00
Ash Berlin-Taylor 4839a5bc6e
Reduce "start-up" time for tasks in LocalExecutor (#11327)
Spawning a whole new python process and then re-loading all of Airflow
is expensive. All though this time fades to insignificance for long
running tasks, this delay gives a "bad" experience for new users when
they are just trying out Airflow for the first time.

For the LocalExecutor this cuts the "queued time" down from 1.5s to 0.1s
on average.
2020-10-08 17:37:51 +01:00
Kaxil Naik a1f888507f
Improve instructions to install Airflow Version (#11339)
The instructions can be replaced by `./breeze start-airflow` command
2020-10-08 17:19:31 +01:00
Kaxil Naik ba60836456
Fix command to run tmux with breeze in BREEZE.rst (#11340)
`breeze --start-airflow` -> `breeze start-airflow`
2020-10-08 08:47:56 -07:00
Ash Berlin-Taylor f67e6cb805
Revert "Adds --install-wheels flag to breeze command line (#11317)" (#11348)
This reverts commit de07d135ae.
2020-10-08 14:35:04 +01:00
Jarek Potiuk 9dc32a3d8a
Better message when Building Image fails or gets cancelled. (#11333) 2020-10-08 13:09:34 +02:00
Michał Słowikowski 832a7850f1
Add Azure Blob Storage to GCS transfer operator (#11321) 2020-10-08 12:16:50 +02:00
Kaxil Naik 625afa2af2
Improve Committer's guide docs (#11338) 2020-10-08 10:24:07 +01:00
Tomek Urbaszek 4d95d9c71b
Improve code quality of SLA mechanism in SchedulerJob (#11257) 2020-10-08 10:44:46 +02:00
Jarek Potiuk de07d135ae
Adds --install-wheels flag to breeze command line (#11317)
If this flag is specified it will look for wheel packages placed in dist
folder and it will install the wheels from there after installing
Airflow. This is useful for testing backport packages as well as in the
future for testing provider packages for 2.0.
2020-10-08 10:06:53 +02:00
Satyasheel 5d007fd2ff
Strict type check for azure hooks (#11342) 2020-10-08 09:36:35 +02:00
Kaxil Naik 2bac4810a4
Update link for Announcement Page (#11337) 2020-10-07 22:40:20 +01:00
Fai b4baa2b04b
Add environment variables documentation to cli-ref.rst. (#10970)
Co-authored-by: Fai Hegberg <faihegberg@Fais-MacBook-Pro.local>
2020-10-07 21:43:48 +01:00
Jarek Potiuk d404cb06dd
Moves Commiter's guide to CONTRIBUTING.rst (#11314)
I decided to move it to CONTRIBUTING.rst as is it is an important
documentation on what policies we have agreed to as community and
also it is a great resource for the contributor to learn what are
the committer's responsibilities.

Fixes: #10179
2020-10-07 21:14:55 +02:00
Jarek Potiuk fe59f26223
Pin versions of "untrusted" 3rd-party GitHub Actions (#11319)
According to https://docs.github.com/en/free-pro-team@latest/actions/learn-github-actions/security-hardening-for-github-actions#using-third-party-actionsa
it's best practice not to use tags in case of untrusted
3rd-party actions in order to avoid potential attacks.
2020-10-07 13:23:41 +02:00
Ash Berlin-Taylor d86cf37a35
Automatically upgrade old default navbar color (#11322)
As part of #11195 we re-styled the UI, changing a lot of the default
colours to make them look more modern. However for anyone upgrading and
keeping their airflow.cfg from 1.10 to 2.0 they would end up with things
looking a bit ugly, as the old navbar color would be kept.

This uses the existing config value upgrade feature to automatically
change the old default colour in to the new default colour.
2020-10-07 11:21:14 +01:00
FHoffmannCode b0fcf67559
Add AzureFileShareToGCSOperator (#10991) 2020-10-07 11:08:58 +02:00
Jarek Potiuk e2655f60b3
Prints nicer message in case of git push errors (#11320)
We started to get more often "unknown blob" kind of errors when
pushing the images to GitHub Registry. While this is clearly a
GitHub issue, it's frequency of occurence and unclear message
make it a good candidate to write additional message with
instructions to the users, especially that now they have
an easy way to get to that information via status checks and
links leading to the log file, when this problem happens during
image building process.

This way users will know that they should simply rebase or
amend/force-push their change to fix it.
2020-10-07 10:30:16 +02:00
Tomek Urbaszek 47b05a87f0
Improve handling of job_id in BigQuery operators (#11287)
Make autogenerated job_id more unique by using microseconds and hash of configuration. Replace dots in job_id.
Closes: #11280
2020-10-07 10:08:08 +02:00
Jarek Potiuk 18dcac8a01
Add remaining community guidelines to CONTRIBUTING.rst (#11312)
We are cleaning up the docs from CWiki and this is what's left of
community guidelines that were maintained there.

Fixes #10181
2020-10-07 05:33:47 +02:00
Jarek Potiuk 22c6a843d7
Adds --no-rbac-ui flag for Breeze airflow 1.10 installation (#11315)
When installing airflow 1.10 via breeze we now enable rbac
by default, but we can disable it with --no-rbac-ui flag.

This is useful to test different variants of 1.10 when testing
release candidataes in connection with the 'start-airflow'
command.
2020-10-07 01:00:00 +01:00
Kishore Vancheeshwaran bbc3cea057
Move latest_only_operator.py to latest_only.py (#11178) (#11304) 2020-10-07 00:15:28 +01:00
Kaxil Naik 4af7804549
Bump tenacity to 6.2 (#11313) 2020-10-06 21:52:35 +01:00