By default `find_packages()` will find _any_ valid
python package,
including things under tests. We don't want to
install the tests
packages into the python path, so exclude those.
Closes#2597 from ashb/AIRFLOW-1594-dont-install-
tests
Clean the way of logging within Airflow. Remove
the old logging.py and
move to the airflow.utils.log.* interface. Remove
setting the logging
outside of the settings/configuration code. Move
away from the string
format to logging_function(msg, *args).
Closes#2592 from Fokko/AIRFLOW-1582-Improve-
logging-structure
The string formatting should be done on the
string, and not on the
exception that is being raised.
Closes#2583 from Fokko/AIRFLOW-1580-error-in-
checkout-operator
The to field may sometimes want to be to be
template-able when you have a DAG that is using
XCOM to find the user to send the information to
(i.e. we have a form that a user submits and based
on the ldap user we send this specific user the
information). It's a rather easy fix to add the
'to' user to the template-able options.
Closes#2577 from Acehaidrey/AIRFLOW-1574
Logging in SparkSqlOperator does not work as
intended. Spark-sql
internally redirects all logs to stdout (including
stderr),
which causes the current two iterator logging to
get stuck with
the stderr pipe. This situation can lead to a
deadlock
because the std-err can grow too big and it will
start to block
until it will be consumed, which will only happen
when the process
ends, so the process stalls.
Closes#2563 from Fokko/AIRFLOW-1562-Spark-sql-
loggin-contains-deadlock
Dear Airflow maintainers,
Please accept this PR. I understand that it will
not be reviewed until I have checked off all the
steps below!
### JIRA
- [/] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW/)
issues and references them in the PR title. For
example, "[AIRFLOW-XXX] My Airflow PR"
-
https://issues.apache.org/jira/browse/AIRFLOW-108
### Description
- [/] Here are some details about my PR, including
screenshots of any UI changes:
Adding an entry to the companies list in README.md
file.
### Tests
- [/] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason: Documentation change only.
### Commits
- [/] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
Closes#2554 from r39132/master
added to currently **officially** using Airflow:
[California Data Collaborative](http://californiad
atacollaborative.org) powered by [ARGO
Labs](http://www.argolabs.org)
Dear Airflow maintainers,
Please accept this PR. I understand that it will
not be reviewed until I have checked off all the
steps below!
- [x] My PR addresses the following [Airflow JIRA]
**https://issues.apache.org/jira/browse/AIRFLOW-13
84**
- The California Data Collaborative is a unique
coalition of forward thinking municipal water
managers in California who along with ARGO, a
startup non-profit that builds, operates, and
maintains data infrastructures, are pioneering new
standards of collaborating around and
administering water data for millions
Californians.
ARGO has deployed a hosted version of Airflow on
AWS and it is used to orchestrate data pipelines
to parse water use data from participating
utilities to power analytics. Furthermore, ARGO
also uses Airflow to power a data infrastructure
for citywide street maintenance via
https://github.com/ARGO-SQUID
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
Change to README.md does not require unit testing.
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not
"adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"
Update README.md
added to currently **officially** using Airflow
section of README.md
[California Data Collaborative](https://github.com
/California-Data-Collaborative) powered by [ARGO
Labs](http://www.argolabs.org)
Added CaDC/ARGO Labs to README.md
Please consider adding [Argo
Labs](www.argolabs.org) to the Airflow users
section.
**Context**
- The California Data Collaborative is a unique
coalition of forward thinking municipal water
managers in California who along with ARGO, a
startup non-profit that builds, operates, and
maintains data infrastructures, are pioneering new
standards of collaborating around and
administering water data for millions
Californians.
- ARGO has deployed a hosted version of Airflow on
AWS and it is used to orchestrate data pipelines
to parse water use data from participating
utilities to power analytics. Furthermore, ARGO
also uses Airflow to power a data infrastructure
for citywide street maintenance via
https://github.com/ARGO-SQUIDCloses#2421 from vr00n/patch-3
The list of template_fields contains only 1 entry and was
interpreted by python as a list of character. That was
breaking the render_template function (see AIRFLOW-1521
ticket)
Closes#2534 from moe-nadal-ck/AIRFLOW-1521/fix_table_delete_operator_template_fields_list
Make the druid operator and hook more specific.
This allows us to
have a more flexible configuration, for example
ingest parquet.
Also get rid of the PyDruid extension since it is
more focussed on
querying druid, rather than ingesting data. Just
requests is
sufficient to submit an indexing job. Add a test
to the hive_to_druid
operator to make sure it behaves as we expect.
Furthermore cleaned
up the docstring a bit
Closes#2378 from Fokko/AIRFLOW-1324-make-more-
general-druid-hook-and-operator
There were unhandled cases for exceptions when
importing fernet in
models.py. This seems to be a remanent of a
previous refactor,
replacing logic that would depend on the
definition of a global variable
for Fernet if it was imported correctly.
Generally catching all exceptions from get_fernet
function, given that
other functions are already handling it that way
and the only
error handling case here is to not use encryption.
Closes#2527 from edgarRd/erod-fernet-error-
handling
There was a merge conflict on the migration hash
for down revision
at the time that two commits including migrations
were merged.
This commit restores the chain of revisions for
the migrations,
pointing to the last one. The job_id index
migration was regenerated
from the top migration.
Closes#2524 from edgarRd/erod-ti-jobid-index-fix
Views showing model listings had large page sizes
which made page
loading really slow client-side, mostly due to DOM
processing and
JS plugin rendering.
Also, the page size was inconsistent across some
listings.
This commit introduces a configurable page size,
and by default
it'll use a page_size = 100. Also, the same page
size is applied to
all the model views controlled by flask_admin to
be consistent.
Closes#2497 from edgarRd/erod-ui-page-size-conf
Column job_id is unindexed in TaskInstance, it was
used as
default sort column in TaskInstanceView.
This commit adds the required migration to add the
index on
task_instance.job_id on future db upgrades.
Closes#2520 from edgarRd/erod-ti-jobid-index
PickleType in Xcom allows remote code execution.
In order to deprecate
it without changing mysql table schema, change
PickleType to LargeBinary
because they both maps to blob type in mysql. Add
"enable_pickling" to
function signature to control using ether pickle
type or JSON. "enable_pickling"
should also be added to core section of
airflow.cfg
Picked up where https://github.com/apache
/incubator-airflow/pull/2132 left off. Took this
PR, fixed merge conflicts, added
documentation/tests, fixed broken tests/operators,
and fixed the python3 issues.
Closes#2518 from aoen/disable-pickle-type
The details here are that there exists a PR for
this JIRA already (https://github.com/apache/incubator-
airflow/pull/2318). The issue is that in python 2.7 not
all literals are automatically unicode like they
are in python 3. That's what's the root cause, and
that can simply be fixed by just explicitly
stating all literals should be treated as unicode,
which is an import from the `__future__` module.
https://stackoverflow.com/questions/3235386/python-using-format-on-
a-unicode-escaped-string also explains this
same solution, which I found helpful.
Closes#2496 from Acehaidrey/master