90 строки
3.5 KiB
ReStructuredText
90 строки
3.5 KiB
ReStructuredText
.. Licensed to the Apache Software Foundation (ASF) under one
|
|
or more contributor license agreements. See the NOTICE file
|
|
distributed with this work for additional information
|
|
regarding copyright ownership. The ASF licenses this file
|
|
to you under the Apache License, Version 2.0 (the
|
|
"License"); you may not use this file except in compliance
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
.. http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
.. Unless required by applicable law or agreed to in writing,
|
|
software distributed under the License is distributed on an
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations
|
|
under the License.
|
|
|
|
|
|
|
|
|
|
DAG Serialization
|
|
=================
|
|
|
|
In order to make Airflow Webserver stateless (almost!), Airflow >=1.10.7 supports
|
|
DAG Serialization and DB Persistence.
|
|
|
|
.. image:: img/dag_serialization.png
|
|
|
|
Without DAG Serialization & persistence in DB, the Webserver and the Scheduler both
|
|
needs access to the DAG files. Both the scheduler and webserver parses the DAG files.
|
|
|
|
With **DAG Serialization** we aim to decouple the webserver from DAG parsing
|
|
which would make the Webserver very light-weight.
|
|
|
|
As shown in the image above, when using the this feature,
|
|
the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB.
|
|
|
|
The Webserver now instead of having to parse the DAG file again, reads the
|
|
serialized DAGs in JSON, de-serializes them and create the DagBag and uses it
|
|
to show in the UI.
|
|
|
|
One of the key features that is implemented as the part of DAG Serialization is that
|
|
instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the
|
|
Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable
|
|
when you have large number of DAGs.
|
|
|
|
Below is the screenshot of the ``serialized_dag`` table in Metadata DB:
|
|
|
|
.. image:: img/serialized_dag_table.png
|
|
|
|
Enable Dag Serialization
|
|
------------------------
|
|
|
|
Add the following settings in ``airflow.cfg``:
|
|
|
|
.. code-block:: ini
|
|
|
|
[core]
|
|
store_serialized_dags = True
|
|
min_serialized_dag_update_interval = 30
|
|
|
|
* ``store_serialized_dags``: This flag decides whether to serialises DAGs and persist them in DB.
|
|
If set to True, Webserver reads from DB instead of parsing DAG files
|
|
* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which
|
|
the serialized DAG in DB should be updated. This helps in reducing database write rate.
|
|
|
|
If you are updating Airflow from <1.10.7, please do not forget to run ``airflow db upgrade``.
|
|
|
|
|
|
Limitations
|
|
-----------
|
|
The Webserver will still need access to DAG files in the following cases,
|
|
which is why we said "almost" stateless.
|
|
|
|
* **Rendered Template** tab will still have to parse Python file as it needs all the details like
|
|
the execution date and even the data passed by the upstream task using Xcom.
|
|
* **Code View** will read the DAG File & show it using Pygments.
|
|
However, it does not need to Parse the Python file so it is still a small operation.
|
|
|
|
Using a different JSON Library
|
|
------------------------------
|
|
|
|
To use a different JSON library instead of the standard ``json`` library like ``ujson``, you need to
|
|
define a ``json`` variable in local Airflow settings (``airflow_local_settings.py``) file as follows:
|
|
|
|
.. code:: python
|
|
|
|
import ujson
|
|
json = ujson
|