Adding a doc entry for Connections
This commit is contained in:
Родитель
a6ed0fa017
Коммит
3b75027df1
3
TODO.md
3
TODO.md
|
@ -1,7 +1,7 @@
|
|||
TODO
|
||||
-----
|
||||
#### UI
|
||||
* Run button / backfill wizard
|
||||
* Backfill form
|
||||
* Add templating to adhoc queries
|
||||
* Charts: better error handling
|
||||
|
||||
|
@ -16,7 +16,6 @@ TODO
|
|||
#### Backend
|
||||
* Add a run_only_latest flag to BaseOperator, runs only most recent task instance where deps are met
|
||||
* Pickle all the THINGS!
|
||||
* Add priority_weight(Int) to BaseOperator, +@property subtree_priority
|
||||
* Distributed scheduler
|
||||
* Add decorator to timeout imports on master process [lib](https://github.com/pnpnpn/timeout-decorator)
|
||||
* Raise errors when setting dependencies on task in foreign DAGs
|
||||
|
|
|
@ -4,7 +4,7 @@ Concepts
|
|||
Operators
|
||||
'''''''''
|
||||
|
||||
Operators allows to generate a certain type of task on the graph. There
|
||||
Operators allow for generating a certain type of task on the graph. There
|
||||
are 3 main type of operators:
|
||||
|
||||
- **Sensor:** Waits for events to happen, it could be a file appearing
|
||||
|
@ -58,11 +58,38 @@ arbitrary sets of tasks. The list of pools is managed in the UI
|
|||
(``Menu -> Admin -> Pools``) by giving the pools a name and assigning
|
||||
it a number of worker slots. Tasks can then be associated with
|
||||
one of the existing pools by using the ``pool`` parameter when
|
||||
creating tasks (aka instantiating operators).
|
||||
creating tasks (instantiating operators).
|
||||
|
||||
The ``pool`` parameter can
|
||||
be used in conjunction with ``priority_weight`` to define priorities
|
||||
in the queue, and which tasks get executed first as slots open up in the
|
||||
pool. The default ``priority_weight`` is of ``1``, and can be bumped to any
|
||||
number. When sorting the queue to evaluate which task should be executed
|
||||
next, we use the ``priority_weight``, summed up with of all
|
||||
the tasks ``priority_weight`` downstream from this task. This way you can
|
||||
bumped a specific important task and the whole path to that task gets
|
||||
prioritized accordingly.
|
||||
|
||||
Tasks will be scheduled as usual while the slots fill up. Once capacity is
|
||||
reached, runnable tasks get queued and there state will show as such in the
|
||||
UI. As slots free up, queued up tasks start running.
|
||||
UI. As slots free up, queued up tasks start running based on the
|
||||
``priority_weight`` (of the task and its descendants).
|
||||
|
||||
Note that by default tasks aren't assigned to any pool and their
|
||||
execution parallelism is only limited to the executor's setting.
|
||||
|
||||
Connections
|
||||
'''''''''''
|
||||
|
||||
The connection information to external systems is stored in the Airflow
|
||||
metadata database and managed in the UI (``Menu -> Admin -> Connections``).
|
||||
A ``conn_id`` is defined there and hostname / login / password / schema
|
||||
information attached to it. Then Airflow pipelines can simply refer
|
||||
to the centrally managed ``conn_id`` without having to hard code any
|
||||
of this information anywhere.
|
||||
|
||||
Many connections with the same ``conn_id`` can be defined and when that
|
||||
is the case, and when the **hooks** uses the ``get_connection`` method
|
||||
from ``BaseHook``, Airflow will choose one connection randomly, allowing
|
||||
for some basic load balancing and some fault tolerance when used in
|
||||
conjunction with retries.
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
Data Profiling
|
||||
==============
|
||||
|
||||
Part of being a productive data ninja is about having the right weapons to
|
||||
Part of being a productive with data is about having the right weapons to
|
||||
profile the data you are working with. Airflow provides a simple query
|
||||
interface to write sql and get results quickly, and a charting application
|
||||
letting you visualize data.
|
||||
|
@ -24,7 +24,7 @@ You can even use the same templating and macros availlable when writting
|
|||
airflow pipelines, parameterizing your queries and modifying parameters
|
||||
direclty in the URL.
|
||||
|
||||
These charts ain't Tableau, but they're easy to create, modify and share.
|
||||
These charts are basic, but they're easy to create, modify and share.
|
||||
|
||||
Chart Screenshot
|
||||
................
|
||||
|
|
Загрузка…
Ссылка в новой задаче