Define a custom plugin
Full example
from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole
cluster_config = ClusterConfiguration(
...# Other config,
plugins=[
PluginConfiguration(
name="my-custom-plugin",
files=[
PluginFile("file.sh", "/my/local/path/to/file.sh"),
PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
],
execute="file.sh", # This must be one of the files defined in the file list and match the target path,
env=dict(
SOME_ENV_VAR="foo"
),
args=["arg1"], # Those arguments are passed to your execute script
ports=[
PluginPort(internal="1234"), # Internal only(For node communication for example)
PluginPort(internal="2345", public=True), # Open the port to the public(When ssh into). Used for UI for example
],
# Pick where you want the plugin to run
target=PluginTarget.Host, # The script will be run on the host. Default value is to run in the spark container
target_role=PluginTargetRole.All, # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
)
]
)
Parameters
PluginConfiguration
Name |
Required? |
Type |
Description |
name |
required |
string |
Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name) |
files |
required |
List[PluginFile |
PluginTextFile] |
execute |
required |
str |
Script to execute. This script must be defined in the files above and must match its remote path |
args |
optional |
List[str] |
List of arguments to be passed to your execute scripts |
env |
optional |
dict |
List of environment variables to access in the script(This can be used to pass arguments to your script instead of args) |
ports |
optional |
List[PluginPort] |
List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node. |
target |
optional |
PluginTarget |
Define where the execute script should be running. Potential values are PluginTarget.SparkContainer(Default) and PluginTarget.Host |
taget_role |
optional |
PluginTargetRole |
If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config) |
PluginFile
Name |
Required? |
Type |
Description |
target |
required |
str |
Where the file should be dropped relative to the plugin working directory |
local_path |
required |
str |
Path to the local file you want to upload(Could form the plugins parameters) |
TextPluginFile
Name |
Required? |
Type |
Description |
target |
required |
str |
Where the file should be dropped relative to the plugin working directory |
content |
required |
str |
io.StringIO |
PluginPort
Name |
Required? |
Type |
Description |
internal |
required |
int |
Internal port to open on the docker container |
public |
optional |
bool |
If the port should be open publicly(Default: False ) |
Environment variables availables in the plugin
AZTK provide a few environment variables that can be used in your plugin script
AZTK_IS_MASTER
: Is the plugin running on the master node
AZTK_IS_WORKER
: Is a worker setup on the current node(This might also be a master if you have worker_on_master
set to true)
AZTK_MASTER_IP
: Internal ip of the master