d.dag_id is not a valid attribute. in order to use dag_id variable
in a closure callback, it needs to be passed in as a fuction so the
right value can be captured for each for loop.
When you build from the scratch and some transient requirements
fail, the initial step of installation might fail.
We are now using latest valid constraints from the DEFAULT_BRANCH
branch to avoid it.
After preparing the 2020.5.19 release candidate and
reviewing the packages, some changes turned out to be necessary.
Therefore the date was changed to 2020.5.20 with the folowing
fixes:
* cncf.kubernetes.example_dags were hard-coded and added for all
packagesa and they were removed
* Version suffix is only used to rename the binary packages not for
the version itself
* Release process description is updated with the release process
* Package version is consistent - leading 0s are skipped in month
and day
* add feature for skipping writing to file
* add SalesforceHook missing method to return dataframe only
function write_object_to_file is divided to object_to_df which returns df and then the write_object_to_file can uses object_to_df as the first step before exporting to file
* fixed exception message
* fix review comments - removed filename check for None
* Monitor k8sPodOperator pods by labels
To prevent situations where the scheduler starts a
second k8sPodOperator pod after a restart, we now check
for existing pods using kubernetes labels
* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* Update airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* add docs
* Update airflow/kubernetes/pod_launcher.py
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Daniel Imberman <daniel@astronomer.io>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
* use preferred boolean check idiom
Co-Authored-By: Jarek Potiuk <jarek@potiuk.com>
* add test coverage for AirflowFailException
* add docs for some exception usage patterns
* autoformatting
* remove extraneous newline, poke travis build
* clean up TaskInstance.handle_failure
Try to reduce nesting and repetition of logic for different conditions.
Also try to tighten up the scope of the exception handling ... it looks
like the large block that catches an Exception and logs it as a failure
to send an email may have been swallowing some TypeErrors coming out
of trying to compose a log info message and calling strftime on
start_date and end_date when they're set to None; this is why I've added
lines in the test to set those values on the TaskInstance objects.
* let sphinx generate docs for exceptions module
* keep session kwarg last in handle_failure
* explain allowed_top_level
* add black-box tests for retry/fail immediately cases
* don't lose safety measures in logging date attrs
* fix flake8 too few blank lines
* grammar nitpick
* add import to AirflowFailException example
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
By default github actions checks out only latest commit but in order to
see if there are any changes since the last readme generated
we need to see the whole history so we need to fetch it all.
We also skip generating the new README in case there is only one
commit in the history since the last release. The nature of readme
generation is that the commit with the README itself will never
be in the list of commits for the previous release so there is
always at least one commit more than the one listed in the readme.
Moto's mock_lambda _actually runs the code_ in a docker container. This
is useful if you are testing a Lambda function but is massively overkill
for testing that we make a request to a function -- Airflow doesn't care
what the function does.
This is our slowest individual test in CI right now, taking 20s on
Github Actions.
In debugging another test I noticed that the scheduler was spending a
long time waiting for a "simple" dag to be parsed. But upon closer
inspection the parsing process itself was done in a few milliseconds,
but we just weren't harvesting the results in a timely fashion.
This change uses the `sentinel` attribute of multiprocessing.Connection
(added in Python 3.3) to be able to wait for all the processes, so that
as soon as one has finished we get woken up and can immediately harvest
and pass on the parsed dags.
This makes test_scheduler_job.py about twice as quick, and also reduces
the time the scheduler spends between tasks .
In real work loads, or where there are lots of dags this likely won't
equate to much such a huge speed up, but for our (synthetic) elastic
performance test dag.
These were the timings for the dag to run all the tasks in a single dag
run to completion., with PERF_SCHEDULE_INTERVAL='1d' PERF_DAGS_COUNT=1
I also have
PERF_SHAPE=linear PERF_TASKS_COUNT=12:
**Before**: 45.4166s
**After**: 16.9499s
PERF_SHAPE=linear PERF_TASKS_COUNT=24:
**Before**: 82.6426s
**After**: 34.0672s
PERF_SHAPE=binary_tree PERF_TASKS_COUNT=24:
**Before**: 20.3802s
**After**: 9.1400s
PERF_SHAPE=grid PERF_TASKS_COUNT=24:
**Before**: 27.4735s
**After**: 11.5607s
If you have many more dag **files**, this likely won't be your bottleneck.
We have now mechanism to keep release notes updated for the
backport operators in an automated way.
It really nicely generates all the necessary information:
* summary of requirements for each backport package
* list of dependencies (including extras to install them) when package
depends on other providers packages
* table of new hooks/operators/sensors/protocols/secrets
* table of moved hooks/operators/sensors/protocols/secrets with
information where they were moved from
* changelog of all the changes to the provider package (this will be
automatically updated with incremental changelog whenever we decide to
release separate packages.
The system is fully automated - we will be able to produce release notes
automatically (per-package) whenever we decide to release new version of
the package in the future.
This was causing it to be picked up as a `<dl>/<dd>` containing a list,
instead of a paragraph and a list.
```
<dl class="simple">
<dt>This will create a hook, and an operator accessible at:</dt>
<dd>
<ul class="simple">
<li><p><code><span class="pre">airflow.hooks.my_namespace.MyHook</span></code></p></li>
<li><p><code><span class="pre">airflow.operators.my_namespace.MyOperator</span></code></p></li>
</ul>
</dd>
</dl>
```
* Set conf vals as env vars so spawned process can access values.
* Create custom env_vars context manager to control simple environment variables.
* Use env_vars instead of conf_vars when using .
* When creating temporary environment variables, remove them if they didn't exist.
Some of our tests (when I was looking at another change) were using the
ProcessorAgent to run and test the behaviour of our ProcessorManager in
certain cases. Having that extra process in the middle is not critical
for the tests, and makes it harder to debug the problem when if
something breaks.
To make this possible I have made a small refactor to the loop of
DagFileProcessorManager (to give us a method we can call in tests that
doesn't do `os.setsid`).
I would like to (create) and use a pytest fixture as a parameter, but
they cannot be used on unittest.TestCase functions:
> unittest.TestCase methods cannot directly receive fixture arguments as
> implementing that is likely to inflict on the ability to run general
> unittest.TestCase test suites.
When preparing backport relases I found that rabbitmq was not
included in the "devel_ci" extras. It turned out that librabbitmq was
not installing in python3.7 and the reason it turned out to be
that librabbitmq is not maintained for 2 years already and it
has been replaced by py-amqp library. The pythhon py-amqp
library has been improved using cython compilation, so it
became production ready and librabbitmq has been abandoned.
We are switching to the py-amqp library here and adding
rabbitmq back to "devel_ci" dependencies.
Details in: https://github.com/celery/librabbitmq/issues/153
* Refactor BigQuery check operators
This commit applies some code formatting to existing BigQuery
check operators. It also adds location parameter to
BigQueryIntervalCheckOperator and BigQueryValueCheckOperator.
* fixup! Refactor BigQuery check operators
* Access function to be pickled as attribute, not method, to avoid error.
* Access type attribute to allow pickling.
* Use getattr instead of type(self) to fix linting error.
Previously, tasks that were in SUCCESS or SKIPPED state satisfy the
depends_on_past check, but only tasks that were in the SUCCESS state
satisfy the wait_for_downstream check. The inconsistency in behavior
made the API less intuitive to users.
I wanted an option to run a specific number of dag runs to completion,
so this feature lets me control the end_date of the dag without having
to know exactly what value it would have.
The "is string" check is more simplistic than it needs to be, but it's
Good Enough for now for an optional feature.