In PR2532 (AIRFLOW-1520), the AWS credential code
was refactored into a general
AWS hook. When that change was made, the existing
assume role code was
removed, leaving only ID/Secret credentials as an
option. Our dags rely on
role assumption to access external S3 buckets, so
this code re-adds role
assumption via STS.
Additionally, in order to make this a bit easier,
I changed _get_credentials to
return a functioning boto3 session which is used
by the public methods to
initialize clients/resources/whatever. This
seemed a better route than
adding another returnval in an already long list.
Closes#2918 from CannibalVox/aws_hook_support_sts
Because Popen returns bytes and not str in python3
we need to join it
using bytes. This simple fix ensures that we join
with a byte. Python
2.7 is unaffected by this.
Closes#2988 from tobes/kerberos-python3
- Add missing operator in `code.rst` and
`integration.rst`
- Fix documentation in DataProc operator
- Minor doc fix in GCS operators
- Fixed codeblocks & links in docstrings for
BigQuery, DataProc, DataFlow, MLEngine, GCS hooks
& operators
Closes#3003 from kaxil/doc_update
Modified the condition to check if the
quote_character is set. This will allow to set
`quote_character` as empty string when the data
doesn't contain quoted sections.
Closes#2996 from kaxil/bq_hook_quote_fix
Add debug logging around number of queued files to
process in the
scheduler. This makes it easy to see when there
are bottlenecks due to parallelism and how long it
takes for all files to be processed.
Closes#2968 from aoen/ddavydov--
add_more_scheduler_metrics
We capture the standard output and error streams
so that they're handled
by the configured logger. However, sometimes, when
developing dags or
Airflow code itself, it is useful to put pdb
breakpoints in code
triggered using an `airflow run`. Such a flow
would of course require
not redirecting the output and error streams to
the logger.
This patch enables that by adding a flag to the
`airflow run`
subcommand. Note that this does not require
`--local`.
Closes#2957 from yati-sagade/ysagade/airflow-2015
sla_miss and task_instances cannot have NULL
execution_dates. The timezone
migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.
Closes#2969, #2857Closes#2979 from bolkedebruin/AIRFLOW-1895
Add ability to create a BigQuery External Table.
- Add new method create_external_table() in
BigQueryHook()
- Add parameters to existing
GoogleCloudStorageToBigQueryOperator()
Closes#2948 from kaxil/external_table
Allows a default BigQuery dialect to be specified
at the hook level, which is threaded through to
the
underlying cursors.
This allows standard SQL dialect to be used,
while maintaining compatibility with the
`DbApiHook` interface.
Addresses AIRFLOW-1267 and AIRFLOW-1874
Closes#2964 from ji-han/master
Make sure you have checked _all_ steps below.
### JIRA
- [x] My PR addresses the following [Airflow 2017]
(https://issues.apache.org/jira/browse/AIRFLOW-201
7/) issues and references them in the PR title.
For example, "[AIRFLOW-2017] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-2017
### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
Currently we're not getting the output logs of the
postgres operator that you would get otherwise if
you ran a psql command. It's because the postgres
conn has an attribute called [notices](http://init
d.org/psycopg/docs/connection.html#connection.noti
ces) which contains this information.
We need to just print the results of this to get
that output in the airflow logs, which makes it
easy to debug amongst other things.
I've included some images for before and after
pictures.
**BEFORE**
<img width="1146" alt="screen shot 2018-01-19 at 4
46 59 pm" src="https://user-images.githubuserconte
nt.com/10408007/35178405-6f6a1da8-fd3d-11e7-8f50-0
dbd567d8ab4.png">
**AFTER**
<img width="1147" alt="screen shot 2018-01-19 at 4
46 25 pm" src="https://user-images.githubuserconte
nt.com/10408007/35178406-74ea4ae6-fd3d-11e7-9551-6
31eac6bfe7b.png">
### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
There isn't anything to test, there is nothing
changing to the current implementation besides an
addition of logging.
### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`
Closes#2959 from Acehaidrey/AIRFLOW-2017
Moving the sensors to seperate files increases
readability of the
code. Also this reduces the code in the big
core.py file.
Closes#2875 from Fokko/AIRFLOW-1889-move-sensors-
to-separate-package
Changes the `task_ids` parameter of xcom_pull from
required to optional.
This parameter has always allowed None to be
passed, but since it's a
required parameter, it must be specified as such.
With this change, we're no longer forced to pass
it.
Closes#2902 from bcb/make-xcom-pull-task-ids-
optional
This enables Airflow and Celery Flower to live
below root. Draws on the work of Geatan Semet
(@Stibbons).
This closes#2723 and closes#2818Closes#2952 from bolkedebruin/AIRFLOW-1755
Improved task generation performance significantly
by using sets of
task_ids and dag_ids instead of lists when
calculating total priority
weight.
Closes#2941 from wongwill86/performance-latest