aztk/docs/51-define-plugin.md

108 строки
4.7 KiB
Markdown
Исходник Обычный вид История

# Define a custom plugin
## Full example
```py
from aztk.spark.models.plugins import PluginConfiguration, PluginFile,PluginPort, PluginTarget, PluginTargetRole
cluster_config = ClusterConfiguration(
...# Other config,
plugins=[
PluginConfiguration(
name="my-custom-plugin",
files=[
PluginFile("file.sh", "/my/local/path/to/file.sh"),
PluginFile("data/one.json", "/my/local/path/to/data/one.json"),
PluginFile("data/two.json", "/my/local/path/to/data/two.json"),
],
execute="file.sh", # This must be one of the files defined in the file list and match the target path,
env=dict(
SOME_ENV_VAR="foo"
),
args=["arg1"], # Those arguments are passed to your execute script
ports=[
PluginPort(internal="1234"), # Internal only(For node communication for example)
PluginPort(internal="2345", public=True), # Open the port to the public(When ssh into). Used for UI for example
],
# Pick where you want the plugin to run
target=PluginTarget.Host, # The script will be run on the host. Default value is to run in the spark container
target_role=PluginTargetRole.All, # If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
)
]
)
```
## Parameters
### `PluginConfiguration`
#### name `required` | `string`
Name of your plugin(This will be used for creating folder, it is recommended to have a simple letter, dash, underscore only name)
#### files `required` | `List[PluginFile|PluginTextFile]`
List of files to upload
#### execute `required` | `str`
Script to execute. This script must be defined in the files above and must match its remote path
#### args `optional` | List[str]
List of arguments to be passed to your execute scripts
#### env `optional` | dict
List of environment variables to access in the script(This can be used to pass arguments to your script instead of args)
#### ports `optional` | `List[PluginPort]`
List of ports to open if the script is running in a container. A port can also be specific public and it will then be accessible when ssh into the master node.
#### target | `optional` | `PluginTarget`
Define where the execute script should be running. Potential values are `PluginTarget.SparkContainer(Default)` and `PluginTarget.Host`
#### `taget_role` | `optional` | `PluginTargetRole`
If the plugin should be run only on the master worker or all. You can use environment variables(See below to have different master/worker config)
### `PluginFile`
#### `target` `required` | `str`
Where the file should be dropped relative to the plugin working directory
#### `local_path` | `required` | `str`
Path to the local file you want to upload(Could form the plugins parameters)
### `TextPluginFile`
#### target | `required` | `str`
Where the file should be dropped relative to the plugin working directory
#### content | `required` | `str` | `io.StringIO`
Path to the local file you want to upload(Could form the plugins parameters)
### `PluginPort`
#### internal | `required` | `int`
Internal port to open on the docker container
#### public | `optional` | `bool`
If the port should be open publicly(Default: `False`)
2018-06-07 19:57:43 +03:00
## Environment variables available in the plugin
AZTK provide a few environment variables that can be used in your plugin script
* `AZTK_IS_MASTER`: Is the plugin running on the master node. Can be either `true` or `false`
* `AZTK_IS_WORKER`: Is a worker setup on the current node(This might also be a master if you have `worker_on_master` set to true) Can be either `true` or `false`
* `AZTK_MASTER_IP`: Internal ip of the master
## Debug your plugin
2018-06-07 19:57:43 +03:00
When your plugin is not working as expected there is a few things you do to investigate issues
Check the logs, you can either use the debug tool or [BatchLabs](https://github.com/Azure/BatchLabs)
Navigate to `startup/wd/logs/plugins`
![](misc/plugin-logs.png)
* Now if you see a file named `<your-plugin-name>.txt` under that folder it means that your plugin started correctly and you can check this file to see what you execute script logged.
* IF this file doesn't exists this means the script was not run on this node. There could be multiple reasons for this:
- If you want your plugin to run on the spark container check the `startup/wd/logs/docker.log` file for information about this
- If you want your plugin to run on the host check the `startup/stdout.txt` and `startup/stderr.txt`
The log could mention you picked the wrong target or target role for that plugin which is why this plugin is not running on this node.