Make sure you have checked _all_ steps below.
### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-2538
- In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.
### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
Update the faq doc on how to reduce airflow
scheduler latency. This comes from our internal
production setting which also aligns with Maxime's
email(https://lists.apache.org/thread.html/%3CCAHE
Ep7WFAivyMJZ0N+0Zd1T3nvfyCJRudL3XSRLM4utSigR3dQmai
l.gmail.com%3E).
### Tests
- [ ] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
### Commits
- [ ] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
### Documentation
- [ ] In case of new functionality, my PR adds
documentation that describes how to use it.
- When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.
### Code Quality
- [ ] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`
Closes#3434 from feng-tao/update_faq
Limit number of dag runs shown in drop down. Add base date
and number of runs widgets known from other views which
allows kind of paging through all dag runs.
### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-2517
- In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.
### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
In backfill, we can provide key-value pairs
through CLI and those pairs can be accessed
through macros. This is just like the way
`trigger_dag -c` works [1].
Let's walk through an example.
In the airflow CLI we specify a key-value pair.
```
airflow backfill hello_world -s 2018-02-01 -e
2018-02-08 -c '{"text": "some text"}'
```
In the DAG file, I have a `BashOperator` that
contains a template command and I want
{{ dag_run.conf.text }} resolves to the text I
passed in CLI.
```python
templated_command = """
echo "ds = {{ ds }}"
echo "prev_ds = {{
macros.datetime.strftime(prev_execution_date,
"%Y-%m-%d") }}"
echo "next_ds = {{
macros.datetime.strftime(next_execution_date,
"%Y-%m-%d") }}"
echo "text_through_conf = {{ dag_run.conf.text }}"
"""
bash_operator = BashOperator(
task_id='bash_task',
bash_command=templated_command,
dag=dag
)
```
Rendered Bash command in Airflow UI.
<img width="1246" alt="screen shot 2018-05-22 at 4
33 59 pm" src="https://user-images.githubuserconte
nt.com/6065051/40395666-04c41574-5dde-11e8-9ec2-c0
312b7203e6.png">
[1]
https://airflow.apache.org/cli.html#trigger_dag
### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
### Documentation
- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
- When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.
### Code Quality
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`
Closes#3406 from milton0825/backfill-support-conf
When using a CeleryExecutor with SQLAlchemy
specified in broker_url, such as:
broker_url = sqla+mysql://airflow:airflow@localhos
t:3306/airflow
do not pass invalid options to the sqlalchemy
backend.
- In default_airflow.cfg, comment out
visibility_timeout from
[celery_broker_transport_options]. The user can
specify the
correct values in this section for the celery
broker transport
that they choose. visibility_timeout is only
valid
for Redis and SQS celery brokers.
- Move ssl options from
[celery_broker_transport_options] where
they were wrongly placed, into the [celery]
section where they
belong.
Closes#3417 from rodrigc/AIRFLOW-2519
Add docs to faq.rst to talk about how to deal with
Exception: Global variable
explicit_defaults_for_timestamp needs to be on (1)
for mysql
Closes#3429 from milton0825/fix-docs
[AIRFLOW-2521] backfill - make variable name and
logging messages more accurate
The term kicked_off in logging and the variable
started are used to
refer to `running` task instances. Let's clarify
the variable names and
messages here.
Fixing unit tests
Closes#3416 from mistercrunch/kicked_off_running
For now PostgresHook.copy_expert supports
"COPY TO" but not "COPY FROM", because it
opens a file with write mode and doesn't
commit operations. This PR fixes it by
opening a file with read and write mode
and committing operations at last.
I'd like to have how-to guides for all connection
types, or at least the
different categories of connection types. I found
it difficult to figure
out how to manage a GCP connection, this commit
add a how-to guide for
this.
Also, since creating and editing connections
really aren't all that
different, the PR renames the "creating
connections" how-to to "managing
connections".
Closes#3419 from tswast/howto
Used backfill recently and it would log a shit ton
of logging messages
telling me all the tasks that were not ready to
run at every tick.
These messages are not useful and should be muted
by default.
I understand that this may be helpful in the
context of `airflow run`
in the context where dependencies aren't met, so
decided to manage
a flag instead of simply going `logging.debug` on
it.
Closes#3414 from
mistercrunch/backfill_less_verbose
update import to docker's new API version >=2.0.0
changed dependency for docker package; now docker
rather than docker-py
updated test cases to align to new docker class
Closes#3407 from Noremac201/fixer
Currently, if you have an operator with a template
fields argument, that is a dictionary, e.g.:
template_fields = ([dict_args])
And you populate that dictionary with a field that
an integer in a DAG, e.g.:
...
dict_args = {'ds': '{{ ds }}', num_times: 5}
...
Then ariflow will give you the following error:
{base_task_runner.py:95} INFO - Subtask:
airflow.exceptions.AirflowException: Type '<type
'int'>' used for parameter 'dict_args[num_times]'
is not supported for templating
This fix aims to resolves that issue by
immediately resolving numbers without attempting
to template them
Closes#3410 from
ArgentFalcon/support_numeric_template_fields
Implement MySqlHook.bulk_dump since the opposite
operation bulk_load is already implemented.
This PR also addresses some flake8 warnings.
Closes#3385 from sekikn/AIRFLOW-2472
- The SFTP sensor is using SFTP hook and passing
`sftp_conn_id` to `sftp_conn_id` parameter which
doesn't exist. The solution would be to remove the
parameter name, hence defaulting to first
parameter which in this case would be
`ftp_conn_id`
Closes#3392 from kaxil/AIRFLOW-2498
- Changed single triple quotes to double quote
characters to be consistent with the docstring
convention in PEP 257
Closes#3396 from kaxil/AIRFLOW-2502
Without the devel extra, the docs do not build.
The build fails due to
missing the mock package.
Closes#3395 from tswast/airflow-2501-docs-
contributing
This PR fixes HiveCliHook.load_df to pass
load_file the parameter called create and
recreate, which are currently ignored, as
part of kwargs.
Closes#3390 from sekikn/AIRFLOW-2471
HiveCliHook.load_df can not handle DataFrame
which contains datetime for now.
This PR enhances it to work with datetime,
fixes some bug introduced by AIRFLOW-2441,
and addresses some flake8 issues.
Closes#3364 from sekikn/AIRFLOW-2448
KubernetesPodOperator now accept a dict type
parameter called "affinity", which represents a
group of affinity scheduling rules (nodeAffinity,
podAffinity, podAntiAffinity).
API reference: https://kubernetes.io/docs/referenc
e/generated/kubernetes-api/v1.10/#affinity-v1-core
Closes#3369 from imroc/AIRFLOW-2397