Update DAG Serialization docs (#10711)
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
This commit is contained in:
Родитель
9ac882e6cc
Коммит
f7fad67df3
|
@ -33,7 +33,8 @@ With **DAG Serialization** we aim to decouple the webserver from DAG parsing
|
||||||
which would make the Webserver very light-weight.
|
which would make the Webserver very light-weight.
|
||||||
|
|
||||||
As shown in the image above, when using the this feature,
|
As shown in the image above, when using the this feature,
|
||||||
the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB.
|
the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB
|
||||||
|
as :class:`airflow.models.serialized_dag.SerializedDagModel` model.
|
||||||
|
|
||||||
The Webserver now instead of having to parse the DAG file again, reads the
|
The Webserver now instead of having to parse the DAG file again, reads the
|
||||||
serialized DAGs in JSON, de-serializes them and create the DagBag and uses it
|
serialized DAGs in JSON, de-serializes them and create the DagBag and uses it
|
||||||
|
@ -44,9 +45,15 @@ instead of loading an entire DagBag when the WebServer starts we only load each
|
||||||
Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable
|
Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable
|
||||||
when you have large number of DAGs.
|
when you have large number of DAGs.
|
||||||
|
|
||||||
Below is the screenshot of the ``serialized_dag`` table in Metadata DB:
|
You can enable the source code to be stored in the database to make it completely independent from DAG files.
|
||||||
|
This is not necessary if your files are embedded in an Docker image or you can otherwise provide
|
||||||
|
them to the webserver. The data is stored in the :class:`airflow.models.dagcode.DagCode` model.
|
||||||
|
|
||||||
.. image:: img/serialized_dag_table.png
|
The last element is rendering template fields. When serialization is enabled, templates are not rendered
|
||||||
|
to requests, but a copy of the field contents is saved before the task is executed on worker.
|
||||||
|
The data is stored in the :class:`airflow.models.renderedtifields.RenderedTaskInstanceFields` model.
|
||||||
|
To limit the excessive growth of the database, only the most recent entries are kept and older entries
|
||||||
|
are purged.
|
||||||
|
|
||||||
Enable Dag Serialization
|
Enable Dag Serialization
|
||||||
------------------------
|
------------------------
|
||||||
|
@ -62,16 +69,19 @@ Add the following settings in ``airflow.cfg``:
|
||||||
# You can also update the following default configurations based on your needs
|
# You can also update the following default configurations based on your needs
|
||||||
min_serialized_dag_update_interval = 30
|
min_serialized_dag_update_interval = 30
|
||||||
min_serialized_dag_fetch_interval = 10
|
min_serialized_dag_fetch_interval = 10
|
||||||
|
max_num_rendered_ti_fields_per_task = 30
|
||||||
|
|
||||||
* ``store_serialized_dags``: This flag decides whether to serialise DAGs and persist them in DB.
|
* ``store_serialized_dags``: This option decides whether to serialise DAGs and persist them in DB.
|
||||||
If set to True, Webserver reads from DB instead of parsing DAG files
|
If set to True, Webserver reads from DB instead of parsing DAG files
|
||||||
* ``store_dag_code``: This flag decides whether to persist DAG files code in DB.
|
* ``store_dag_code``: This option decides whether to persist DAG files code in DB.
|
||||||
If set to True, Webserver reads file contents from DB instead of trying to access files in a DAG folder.
|
If set to True, Webserver reads file contents from DB instead of trying to access files in a DAG folder.
|
||||||
* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which
|
* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which
|
||||||
the serialized DAG in DB should be updated. This helps in reducing database write rate.
|
the serialized DAG in DB should be updated. This helps in reducing database write rate.
|
||||||
* ``min_serialized_dag_fetch_interval``: This flag controls how often a SerializedDAG will be re-fetched
|
* ``min_serialized_dag_fetch_interval``: This option controls how often a SerializedDAG will be re-fetched
|
||||||
from the DB when it's already loaded in the DagBag in the Webserver. Setting this higher will reduce
|
from the DB when it's already loaded in the DagBag in the Webserver. Setting this higher will reduce
|
||||||
load on the DB, but at the expense of displaying a possibly stale cached version of the DAG.
|
load on the DB, but at the expense of displaying a possibly stale cached version of the DAG.
|
||||||
|
* ``max_num_rendered_ti_fields_per_task``: This option controls maximum number of Rendered Task Instance
|
||||||
|
Fields (Template Fields) per task to store in the Database.
|
||||||
|
|
||||||
If you are updating Airflow from <1.10.7, please do not forget to run ``airflow db upgrade``.
|
If you are updating Airflow from <1.10.7, please do not forget to run ``airflow db upgrade``.
|
||||||
|
|
||||||
|
|
Двоичные данные
docs/img/serialized_dag_table.png
Двоичные данные
docs/img/serialized_dag_table.png
Двоичный файл не отображается.
До Ширина: | Высота: | Размер: 406 KiB |
Загрузка…
Ссылка в новой задаче