* [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook
* fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook
* fixup! fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook
* fixup! fixup! fixup! [AIRFLOW-5617] Add fallback for connection's project id in MLEngine hook
The `aws_default` by default specifies the `region_name` to be
`us-east-1` in its `extra` field. This causes trouble when the desired
AWS account uses a different region as this default value has priority
over the $AWS_REGION and $AWS_DEFAULT_REGION environment variables,
gets passed directly to `botocore` and does not seem to be documented.
This commit removes the default region name from the `aws_default`'s
extra field. This means that it will have to be set manually, which
would follow the "explicit is better than implicit" philosophy.
Since we switched to using sub-processes to parse the DAG files sometime
back in 2016(!) the metrics we have been emitting about dag bag size and
parsing have been incorrect.
We have also been emitting metrics from the webserver which is going to
be become wrong as we move towards a stateless webserver.
To fix both of these issues I have stopped emitting the metrics from
models.DagBag and only emit them from inside the
DagFileProcessorManager.
(There was also a bug in the `dag.loading-duration.*` we were emitting
from the DagBag code where the "dag_file" part of that metric was empty.
I have fixed that even though I have now deprecated that metric. The
webserver was emitting the right metric though so many people wouldn't
notice)
* [AIRFLOW-5147] extended character set for for k8s worker pods annotations
* updated UPDATING.md with new breaking changes
* excluded pylint too-many-statement check from constructor due to its nature
- changes the order of arguments for `has_mail_attachment`, `retrieve_mail_attachments` and `download_mail_attachments`
- add `get_conn` function
- refactor code
- fix pylint issues
- add imap_mail_filter arg to ImapAttachmentToS3Operator
- add mail_filter arg to ImapAttachmentSensor
- remove superfluous tests
- changes the order of arguments in the sensors + operators __init__
Change SubDagOperator to use Airflow scheduler to schedule
tasks in subdags instead of backfill.
In the past, SubDagOperator relies on backfill scheduler
to schedule tasks in the subdags. Tasks in parent DAG
are scheduled via Airflow scheduler while tasks in
a subdag are scheduled via backfill, which complicates
the scheduling logic and adds difficulties to maintain
the two scheduling code path.
This PR simplifies how tasks in subdags are scheduled.
SubDagOperator is reponsible for creating a DagRun for subdag
and wait until all the tasks in the subdag finish. Airflow
scheduler picks up the DagRun created by SubDagOperator,
create andschedule the tasks accordingly.
Note: The order of arguments has changed for `check_for_prefix`.
The `bucket_name` is now optional. It falls back to the `connection schema` attribute.
- refactor code
- complete docs
The support to different Celery pool implementation has been added
in https://github.com/apache/airflow/pull/4308.
But it's not reflected in the default_airflow.cfg yet, while it's
the main portal of config options to most users.
`non_pooled_task_slot_count` and `non_pooled_backfill_task_slot_count`
are removed in favor of a real pool, e.g. `default_pool`.
By default tasks are running in `default_pool`.
`default_pool` is initialized with 128 slots and user can change the
number of slots through UI/CLI. `default_pool` cannot be removed.
google-storage-client 1.16 introduced a breaking change where the
signature of client.get_bucket changed from (bucket_name) to
(bucket_or_name). Calls with named arguments to this method now fail.
This commit makes all calls positional to work around this.
When using potentially larger offets than javascript can handle, they can get parsed incorrectly on the client, resulting in the offset query getting stuck on a certain number. This patch ensures that we return a string to the client to avoid being parsed. When we run the query, we ensure the offset is set as an integer.
Add unnecesary prefix_ in config for elastic search section
- upgrade cloudant version from `>=0.5.9,<2.0` to `>=2.0`
- remove the use of the `schema` attribute in the connection
- remove `db` function since the database object can also be retrieved by calling `cloudant_session['database_name']`
- update docs
- refactor code
- update default used version for connecting to the Admin API from v1beta1 to v1
- move the establishment of the connection to the function calls instead of the hook init
- change get_conn signature to be able to pass an is_admin arg to set an admin connection
- rename GoogleCloudBaseHook._authorize function to GoogleCloudBaseHook.authorize
- rename the `partialKeys` argument of function `allocate_ids` to `partial_keys`.
- add tests
- update docs
- refactor code
Move version attribute from get_conn to __init__
- revert renaming of authorize function
- improve docs
- refactor code
* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit Operator
* [AIRFLOW-4172] Fix changes for driver class path option in Spark Submit
Some command for installing extra packages like
`pip install apache-airflow[devel]` cause error
in special situation/shell, We should clear them
by add quotation like
`pip install 'apache-airflow[devel]'`
There were a few ways of getting the AIRFLOW_HOME directory used
throughout the code base, giving possibly conflicting answer if they
weren't kept in sync:
- the AIRFLOW_HOME environment variable
- core/airflow_home from the config
- settings.AIRFLOW_HOME
- configuration.AIRFLOW_HOME
Since the home directory is used to compute the default path of the
config file to load, specifying the home directory Again in the config
file didn't make any sense to me, and I have deprecated that.
This commit makes everything in the code base use
`settings.AIRFLOW_HOME` as the source of truth, and deprecates the
core/airflow_home config option.
There was an import cycle form settings -> logging_config ->
module_loading -> settings that needed to be broken on Python 2 - so I
have moved all adjusting of sys.path in to the settings module
(This issue caused me a problem where the RBAC UI wouldn't work as it
didn't find the right webserver_config.py)