* [AIRFLOW-3752] Add/remove user from role via the CLI
Update the `users` subcommand to enable 2 new actions:
- `--add-role`: Make the user a member of the given role
- `--remove-role`: Remove the user's membership in the given role
For installations that use an external identity provider (e.g., Google
OAuth) the username is typically a long ID string. For the sake of
convenience, we allow the CLI operator to reference the target user
via either their `username` or their `email` (but not both).
* Update argparse spec
Accidentally left off this update to the argparse spec in the last
commit.
* Add unit tests
* Fix lint failures
Local users were always a superuser, this adds a column to the DB (and defaults to false,
which is going to cause a bit of an upgrade pain for people, but defaulting to not being an
admin is the only secure default.)
Started with "habe", "serever" and "certificiate" needing to be:
"have", "server", and "certificate".
Ran a check, ignoring British and US accepted spellings.
Kept jargon. EG admin, aync, auth, backend, config, dag, s3, utils, etc.
Took exception to: "num of dag run" meaning "number of dag runs",
"upness" is normally for quarks,
"url" being lower-case, and
sftp example having an excess file ending.
Python documentation writes "builtin" hyphenated, cases "PYTHONPATH".
Gave up on mixed use of "dag" and "DAG" as well as long line lengths.
This adds ASF license headers to all the .rst and .md files with the
exception of the Pull Request template (as that is included verbatim
when opening a Pull Request on Github which would be messy)
This updates the scheduler_heartbeat metric from a gauge to a counter to
better support the statsd_exporter for usage with Prometheus. A counter
allows users to track the rate of the heartbeat, and integrates with the
exporter better. A crashing or down scheduler will no longer emit the
metric, but the statsd_exporter will continue to show a 1 for the metric
value. This fixes that issue because a counter will continually change,
and the lack of change indicates an issue with the scheduler.
Add statsd change notice in UPDATING.md
artifacts in default_airflow.cfg
- fixed incorrect instructions in UPDATING.md regarding core.log_filename_template and elasticsearch.elasticsearch_log_id_template
- removed comments referencing "additional curly braces" from
default_airflow.cfg since they're irrelevant to the rendered airflow.cfg
By default one of Apache Airflow's dependencies pulls in a GPL
library. Airflow should not install (and upgrade) without an explicit choice.
This is part of the Apache requirements as we cannot depend on Category X
software.
Make sure you have checked _all_ steps below.
### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "\[AIRFLOW-XXX\] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-2267
- In case you are fixing a typo in the
documentation you can prepend your commit with
\[AIRFLOW-XXX\], code changes always need a JIRA
issue.
### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
Provide DAG level access for airflow. The detail
design could be found at https://docs.google.com/d
ocument/d/1qs26lE9kAuCY0Qa0ga-80EQ7d7m4s-590lhjtMB
jmxw/edit#
### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
Unit tests are added.
### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`
Closes#3197 from feng-tao/airflow-2267
The new names are in-line with Celery 4, but if
anyone upgrades Airflow
without following the UPDATING.md instructions
(which we probably assume
most people won't, not until something stops
working) their workers
would suddenly just start failing. That's bad.
This will issue a warning but carry on working as
expected. We can
remove the deprecation settings (but leave the
code in config) after
this release has been made.
Closes#3549 from ashb/AIRFLOW-1840-back-compat
- Improved the retries times to jobs below 60s
- Renamed property queue to job_queue to prevent
AWS Batch and CeleryExecutor queue conflict
- Added Breaking Chain note for the UPDATING.md
master
- Fixed operator infinit loop
- Added documentation warning about the Breaking
chain
- Fixed the commit parameter to keep it on Airflow
guidelines
- Fixed logging typo
- rebased with master
Changes to be committed:
modified: ../../../UPDATING.md
modified: awsbatch_operator.py
modified: ../../../tests/contrib/operators/test_
awsbatch_operator.py
Closes#3436 from hprudent/master
Closes#3279 from feng-tao/reduce-tree-view
This introduces a new configuration variable to set the default
number of dag runs displayed in the tree view. For large DAGs, this
could cause timeouts in the webserver.
Currently, data is ordered by first column in
descending order
Header row comes as first only if the first column
is integer
This fix puts header as first row regardless of
first column data type
Closes#3180 from sathyaprakashg/AIRFLOW-2254
The Google cloud operators uses both
google_cloud_storage_default and
google_cloud_default as a default conn_id. This is
confusing and the
google_cloud_storage_default conn_id isnt
initialized by default in db.py
Therefore we rename the
google_cloud_storage_default to
google_cloud_default for simplicity and
convenience
Closes#3141 from Fokko/airflow-2226
sla_miss and task_instances cannot have NULL
execution_dates. The timezone
migration scripts forgot to set this properly. In
addition to make sure
MySQL does not set "ON UPDATE CURRENT_TIMESTAMP"
or MariaDB "DEFAULT
0000-00-00 00:00:00" we now check if
explicit_defaults_for_timestamp is turned
on and otherwise fail an database upgrade.
Closes#2969, #2857Closes#2979 from bolkedebruin/AIRFLOW-1895
Explicitly set the celery backend from the config
and align the config
with the celery config as this might be confusing.
Closes#2806 from Fokko/AIRFLOW-1840-Fix-celery-
config
In the migration of S3Hook to boto3 the connection
ID parameter changed
to `aws_conn_id`. This fixes the uses of
`s3_conn_id` in the code base
and adds a note to UPDATING.md about the change.
In correcting the tests for S3ToHiveTransfer I
noticed that
S3Hook.get_key was returning a dictionary, rather
then the S3.Object as
mentioned in it's doc string. The important thing
that was missing was
ability to get the key name from the return a call
to get_wildcard_key.
Closes#2795 from
ashb/AIRFLOW-1795-s3hook_boto3_fixes
Before initializing the logging framework, we want
to set the python
path so the logging config can be found.
Closes#2721 from Fokko/AIRFLOW-1731-import-
pythonpath
Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.
Closes#2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow
Clean the way of logging within Airflow. Remove
the old logging.py and
move to the airflow.utils.log.* interface. Remove
setting the logging
outside of the settings/configuration code. Move
away from the string
format to logging_function(msg, *args).
Closes#2592 from Fokko/AIRFLOW-1582-Improve-
logging-structure
PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
should also be added to core section of
airflow.cfg
Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.
Closes#2518 from aoen/disable-pickle-type
This PR updates Airflow configuration
documentations to include a recent change to split
task logs by try number #2383.
Closes#2467 from AllisonWang/allison--update-doc
subprocess.Popen forks before doing execv. This makes it difficult
for some manager daemons (like supervisord) to send kill signals.
This patch uses os.execve directly. os.execve takes over the current
process and thus responds correctly to signals
* Resolves residue in ISSUE-852
Airflow spawns childs in the form of a webserver, scheduler, and executors.
If the parent gets terminated (SIGTERM) it needs to properly propagate the
signals to the childs otherwise these will get orphaned and end up as
zombie processes. This patch resolves that issue.
In addition Airflow does not store the PID of its services so they can be
managed by traditional unix systems services like rc.d / upstart / systemd
and the likes. This patch adds the "--pid" flag. By default it stores the
PID in ~/airflow/airflow-<service>.pid
Lastly, the patch adds support for different log file locations: log,
stdout, and stderr (respectively: --log-file, --stdout, --stderr). By
default these are stored in ~/airflow/airflow-<service>.log/out/err.
* Resolves ISSUE-852
BaseOperator silently accepts any arguments. This deprecates the
behavior with a warning that says it will be forbidden in Airflow 2.0.
This PR also turns on DeprecationWarnings by default, which in turn
revealed that inspect.getargspec is deprecated. Here it is replaced by
`inspect.signature` (Python 3) or `funcsigs.signature` (Python 2).
Lastly, this brought to attention that example_http_operator was
passing an illegal argument.