aztk/docs/51-define-plugin.md

5.4 KiB

Define a custom plugin

Full example


from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole

cluster_config = ClusterConfiguration(
  ...# Other config,
  plugins=[
    PluginConfiguration(
        name="my-custom-plugin",
        files=[
            PluginFile("file.sh", "/my/local/path/to/file.sh"),
            PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
            PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
        ],
        execute="file.sh", # This must be one of the files defined in the file list and match the target path,
        env=dict(
            SOME_ENV_VAR="foo"
        ),
        args=["arg1"], # Those arguments are passed to your execute script
        ports=[
            PluginPort(internal="1234"),                # Internal only(For node communication for example)
            PluginPort(internal="2345", public=True),   # Open the port to the public(When ssh into). Used for UI for example
        ],

        # Pick where you want the plugin to run
        target=PluginTarget.Host,                       # The script will be run on the host. Default value is to run in the spark container
        target_role=PluginTargetRole.All,               # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
    )
  ]
)

Parameters

PluginConfiguration

Name Required? Type Description
name required string Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name)
files required List[PluginFile PluginTextFile]
execute required str Script to execute. This script must be defined in the files above and must match its remote path
args optional List[str] List of arguments to be passed to your execute scripts
env optional dict List of environment variables to access in the script(This can be used to pass arguments to your script instead of args)
ports optional List[PluginPort] List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node.
target optional PluginTarget Define where the execute script should be running. Potential values are PluginTarget.SparkContainer(Default) and PluginTarget.Host
taget_role optional PluginTargetRole If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)

PluginFile

Name Required? Type Description
target required str Where the file should be dropped relative to the plugin working directory
local_path required str Path to the local file you want to upload(Could form the plugins parameters)

TextPluginFile

Name Required? Type Description
target required str Where the file should be dropped relative to the plugin working directory
content required str io.StringIO

PluginPort

Name Required? Type Description
internal required int Internal port to open on the docker container
public optional bool If the port should be open publicly(Default: False)

Environment variables availables in the plugin

AZTK provide a few environment variables that can be used in your plugin script

  • AZTK_IS_MASTER: Is the plugin running on the master node
  • AZTK_IS_WORKER: Is a worker setup on the current node(This might also be a master if you have worker_on_master set to true)
  • AZTK_MASTER_IP: Internal ip of the master