4.7 KiB

Исходник Постоянная ссылка Ответственный История

Define a custom plugin

Full example


from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole

cluster_config = ClusterConfiguration(
  ...# Other config,
  plugins=[
    PluginConfiguration(
        name="my-custom-plugin",
        files=[
            PluginFile("file.sh", "/my/local/path/to/file.sh"),
            PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
            PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
        ],
        execute="file.sh", # This must be one of the files defined in the file list and match the target path,
        env=dict(
            SOME_ENV_VAR="foo"
        ),
        args=["arg1"], # Those arguments are passed to your execute script
        ports=[
            PluginPort(internal="1234"),                # Internal only(For node communication for example)
            PluginPort(internal="2345", public=True),   # Open the port to the public(When ssh into). Used for UI for example
        ],

        # Pick where you want the plugin to run
        target=PluginTarget.Host,                       # The script will be run on the host. Default value is to run in the spark container
        target_role=PluginTargetRole.All,               # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
    )
  ]
)

Parameters

`PluginConfiguration`

name `required` | `string`

Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name)

files `required` | `List[PluginFile|PluginTextFile]`

List of files to upload

execute `required` | `str`

Script to execute. This script must be defined in the files above and must match its remote path

args `optional` | List[str]

List of arguments to be passed to your execute scripts

env `optional` | dict

List of environment variables to access in the script(This can be used to pass arguments to your script instead of args)

ports `optional` | `List[PluginPort]`

List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node.

target | `optional` | `PluginTarget`

Define where the execute script should be running. Potential values are PluginTarget.SparkContainer(Default) and PluginTarget.Host

`taget_role` | `optional` | `PluginTargetRole`

If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)

`PluginFile`

`target` `required` | `str`

Where the file should be dropped relative to the plugin working directory

`local_path` | `required` | `str`

Path to the local file you want to upload(Could form the plugins parameters)

`TextPluginFile`

target | `required` | `str`

Where the file should be dropped relative to the plugin working directory

content | `required` | `str` | `io.StringIO`

Path to the local file you want to upload(Could form the plugins parameters)

`PluginPort`

internal | `required` | `int`

Internal port to open on the docker container

public | `optional` | `bool`

If the port should be open publicly(Default: False)

Environment variables available in the plugin

AZTK provide a few environment variables that can be used in your plugin script

AZTK_IS_MASTER: Is the plugin running on the master node. Can be either true or false
AZTK_IS_WORKER: Is a worker setup on the current node(This might also be a master if you have worker_on_master set to true) Can be either true or false
AZTK_MASTER_IP: Internal ip of the master

Debug your plugin

When your plugin is not working as expected there is a few things you do to investigate issues

Check the logs, you can either use the debug tool or BatchLabs Navigate to startup/wd/logs/plugins

Now if you see a file named <your-plugin-name>.txt under that folder it means that your plugin started correctly and you can check this file to see what you execute script logged.
IF this file doesn't exists this means the script was not run on this node. There could be multiple reasons for this:
- If you want your plugin to run on the spark container check the startup/wd/logs/docker.log file for information about this
- If you want your plugin to run on the host check the startup/stdout.txt and startup/stderr.txt
The log could mention you picked the wrong target or target role for that plugin which is why this plugin is not running on this node.

4.7 KiB Исходник Постоянная ссылка Ответственный История

Define a custom plugin

Full example

Parameters

PluginConfiguration

name required | string

files required | List[PluginFile|PluginTextFile]

execute required | str

args optional | List[str]

env optional | dict

ports optional | List[PluginPort]

target | optional | PluginTarget

taget_role | optional | PluginTargetRole

PluginFile

target required | str

local_path | required | str

TextPluginFile

target | required | str

content | required | str | io.StringIO

PluginPort

internal | required | int

public | optional | bool