* [AIRFLOW-5740] Fix Transient failure in Slack test
The transient failure is caused by Dict Ordering
* We were comparing string
We were comparing strings
* discussion on original PR suggested removing private_key option as init param
* with this PR, can still provide through extras, but not as init param
* also add support for private_key in tunnel -- missing in original PR for this issue
* remove test related to private_key init param
* use context manager to auto-close socket listener so tests can be re-run
* [AIRFLOW-5451] SparkSubmitHook don't set default namespace
We only want to set the namespace if it isn't default.
https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration
The default is already set by Spark, therefore we don't want to pass
it if is default. This also allows us to pass the namespace over the
conf dict. Otherwise the namespace would be set twice.
* Fix tests as well
- add documentation to integration.rst
The hook provides:
- a get_conn function to authenticate to the Google API via an airflow connection
- a query function to dynamically query all data available for a specific endpoint and given parameters. (You are able to either retrieve one page of data or all data)
The transfer operator provides:
- basic transfer between google api and s3
- passing an xcom variable to dynamically set the endpoint params for a request
- exposing the response data to xcom, but raises exception when it exceeds MAX_XCOM_SIZE
Co-authored-by: louisguitton <louisguitton@users.noreply.github.com>
We have fairly complex python version detection in our CI scripts.
They have to handle several cases:
1) Running builds on DockerHub (we cannot pass different environment
variables there, so we detect python version based on the image
name being build (airflow:master-python3.7 -> PYTHON_VERSION=3.7)
2) Running builds on Travis CI. We use python version determined
from default python3 version available on the path. This way we
do not have to specify PYTHON_VERSION separately in each job,
we just specify which host python version is used for that job.
This makes a nice UI experience where you see python version in
Travis UI.
3) Running builds locally via scripts where we can pass PYTHON_VERSION
as environment variable.
4) Running builds locally for the first time with Breeze. By default
we determine the version based on default python3 version we have
in the host system (3.5, 3.6 or 3.7) and we use this one.
5) Selecting python version with Breeze's --python switch. This will
override python version but it will also store the last used version
of python in .build directory so that it is automatically used next
time.
This change adds necessary explanations to the code that works for
all the cases and fixes some of the edge-cases we had. It also
extracts the code to common directory.
* AIRFLOW-5049 Add validation for src_fmt_configs in bigquery hook
Adds validation for the src_fmt_configs arguments in the bigquery hook. Otherwise wrong src_fmt_configs would be silently ignored which is non-desireable.
* [AIRFLOW-5049] Update - Add validation for src_fmt_configs in bigquery hook
Adds a common method for validating the src_ftm_configs
`spark2-submit` supports `--proxy-user` parameter which should be handled by SparkSubmitOperator.
```
$ spark2-submit --help 2>&1 | grep proxy
--proxy-user NAME User to impersonate when submitting the application.
```
1. Issue old conf method deprecation warnings properly and remove current old conf method usages.
2. Unify the way to use conf as `from airflow.configuration import conf`
To tag and track GCP resources spawned from Airflow, we have
been adding airflow specific label(s) to GCP API service calls
whenever possible and applicable.