4.7 KiB
Define a custom plugin
Full example
from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole
cluster_config = ClusterConfiguration(
...# Other config,
plugins=[
PluginConfiguration(
name="my-custom-plugin",
files=[
PluginFile("file.sh", "/my/local/path/to/file.sh"),
PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
],
execute="file.sh", # This must be one of the files defined in the file list and match the target path,
env=dict(
SOME_ENV_VAR="foo"
),
args=["arg1"], # Those arguments are passed to your execute script
ports=[
PluginPort(internal="1234"), # Internal only(For node communication for example)
PluginPort(internal="2345", public=True), # Open the port to the public(When ssh into). Used for UI for example
],
# Pick where you want the plugin to run
target=PluginTarget.Host, # The script will be run on the host. Default value is to run in the spark container
target_role=PluginTargetRole.All, # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
)
]
)
Parameters
PluginConfiguration
name required
| string
Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name)
files required
| List[PluginFile|PluginTextFile]
List of files to upload
execute required
| str
Script to execute. This script must be defined in the files above and must match its remote path
args optional
| List[str]
List of arguments to be passed to your execute scripts
env optional
| dict
List of environment variables to access in the script(This can be used to pass arguments to your script instead of args)
ports optional
| List[PluginPort]
List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node.
target | optional
| PluginTarget
Define where the execute script should be running. Potential values are PluginTarget.SparkContainer(Default)
and PluginTarget.Host
taget_role
| optional
| PluginTargetRole
If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
PluginFile
target
required
| str
Where the file should be dropped relative to the plugin working directory
local_path
| required
| str
Path to the local file you want to upload(Could form the plugins parameters)
TextPluginFile
target | required
| str
Where the file should be dropped relative to the plugin working directory
content | required
| str
| io.StringIO
Path to the local file you want to upload(Could form the plugins parameters)
PluginPort
internal | required
| int
Internal port to open on the docker container
public | optional
| bool
If the port should be open publicly(Default: False
)
Environment variables available in the plugin
AZTK provide a few environment variables that can be used in your plugin script
AZTK_IS_MASTER
: Is the plugin running on the master node. Can be eithertrue
orfalse
AZTK_IS_WORKER
: Is a worker setup on the current node(This might also be a master if you haveworker_on_master
set to true) Can be eithertrue
orfalse
AZTK_MASTER_IP
: Internal ip of the master
Debug your plugin
When your plugin is not working as expected there is a few things you do to investigate issues
Check the logs, you can either use the debug tool or BatchLabs
Navigate to startup/wd/logs/plugins
-
Now if you see a file named
<your-plugin-name>.txt
under that folder it means that your plugin started correctly and you can check this file to see what you execute script logged. -
IF this file doesn't exists this means the script was not run on this node. There could be multiple reasons for this:
- If you want your plugin to run on the spark container check the
startup/wd/logs/docker.log
file for information about this - If you want your plugin to run on the host check the
startup/stdout.txt
andstartup/stderr.txt
The log could mention you picked the wrong target or target role for that plugin which is why this plugin is not running on this node.
- If you want your plugin to run on the spark container check the