Add debug logging around number of queued files to
process in the
scheduler. This makes it easy to see when there
are bottlenecks due to parallelism and how long it
takes for all files to be processed.
Closes#2968 from aoen/ddavydov--
add_more_scheduler_metrics
We capture the standard output and error streams
so that they're handled
by the configured logger. However, sometimes, when
developing dags or
Airflow code itself, it is useful to put pdb
breakpoints in code
triggered using an `airflow run`. Such a flow
would of course require
not redirecting the output and error streams to
the logger.
This patch enables that by adding a flag to the
`airflow run`
subcommand. Note that this does not require
`--local`.
Closes#2957 from yati-sagade/ysagade/airflow-2015
sla_miss and task_instances cannot have NULL
execution_dates. The timezone
migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.
Closes#2969, #2857Closes#2979 from bolkedebruin/AIRFLOW-1895
Add ability to create a BigQuery External Table.
- Add new method create_external_table() in
BigQueryHook()
- Add parameters to existing
GoogleCloudStorageToBigQueryOperator()
Closes#2948 from kaxil/external_table
Allows a default BigQuery dialect to be specified
at the hook level, which is threaded through to
the
underlying cursors.
This allows standard SQL dialect to be used,
while maintaining compatibility with the
`DbApiHook` interface.
Addresses AIRFLOW-1267 and AIRFLOW-1874
Closes#2964 from ji-han/master
Make sure you have checked _all_ steps below.
### JIRA
- [x] My PR addresses the following [Airflow 2017]
(https://issues.apache.org/jira/browse/AIRFLOW-201
7/) issues and references them in the PR title.
For example, "[AIRFLOW-2017] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-2017
### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
Currently we're not getting the output logs of the
postgres operator that you would get otherwise if
you ran a psql command. It's because the postgres
conn has an attribute called [notices](http://init
d.org/psycopg/docs/connection.html#connection.noti
ces) which contains this information.
We need to just print the results of this to get
that output in the airflow logs, which makes it
easy to debug amongst other things.
I've included some images for before and after
pictures.
**BEFORE**
<img width="1146" alt="screen shot 2018-01-19 at 4
46 59 pm" src="https://user-images.githubuserconte
nt.com/10408007/35178405-6f6a1da8-fd3d-11e7-8f50-0
dbd567d8ab4.png">
**AFTER**
<img width="1147" alt="screen shot 2018-01-19 at 4
46 25 pm" src="https://user-images.githubuserconte
nt.com/10408007/35178406-74ea4ae6-fd3d-11e7-9551-6
31eac6bfe7b.png">
### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
There isn't anything to test, there is nothing
changing to the current implementation besides an
addition of logging.
### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`
Closes#2959 from Acehaidrey/AIRFLOW-2017
Moving the sensors to seperate files increases
readability of the
code. Also this reduces the code in the big
core.py file.
Closes#2875 from Fokko/AIRFLOW-1889-move-sensors-
to-separate-package
Changes the `task_ids` parameter of xcom_pull from
required to optional.
This parameter has always allowed None to be
passed, but since it's a
required parameter, it must be specified as such.
With this change, we're no longer forced to pass
it.
Closes#2902 from bcb/make-xcom-pull-task-ids-
optional
This enables Airflow and Celery Flower to live
below root. Draws on the work of Geatan Semet
(@Stibbons).
This closes#2723 and closes#2818Closes#2952 from bolkedebruin/AIRFLOW-1755
Improved task generation performance significantly
by using sets of
task_ids and dag_ids instead of lists when
calculating total priority
weight.
Closes#2941 from wongwill86/performance-latest
Clarify and upgrade HiveOperator. Include
description of hql parameter being able to
take in a relative path from the dag file
of a hive script, templated or not. Add
ability to template hiveconf variables. Add
default value to the map reduce job name as
well as add updated hiveconf var for queue.
Closes#2752 from wolfier/AIRFLOW-1770
On Task Instances page 'Scheduled' state TIs are
not-visible due to white
background. Changing background color to tan for
better UX.
Closes#2933 from wolfier/AIRFLOW-1994
Collaboration authors got destroyed when splitting up a PR, this commit adds back in the code which was be removed in the previous commit to restore authorship