In the Azure Distributed Data Engineering Toolkit,a Job is an entity that runs against an automatically provisioned and managed cluster. Jobs run a collection of Spark applications and and persist the outputs.
Creating a Job starts with defining the necessary properties in your `.aztk/job.yaml` file. Jobs have one or more applications to run as well as values that define the Cluster the applications will run on.
### Job.yaml
Each Job has one or more applications given as a List in Job.yaml. Applications are defined using the following properties:
_Please note: the only required fields are name and application. All other fields may be removed or left blank._
NOTE: The Applcaition name can only contain alphanumeric characters including hyphens and underscores, and cannot contain more than 64 characters. Each application **must** have a unique name.
Jobs also require a definition of the cluster on which the Applications will run. The following properties define a cluster:
_Please Note: For more information about Azure VM sizes, see [Azure Batch Pricing](https://azure.microsoft.com/en-us/pricing/details/batch/). And for more information about Docker repositories see [Docker](./12-docker-iamge.html)._
_Please note: including a Spark Configuration is optional. Spark Configuration values defined as part of an application will take precedence over the values specified in these files._
Below we will define a simple, functioning job definition.
Once submitted, this Job will run two applications, pipy100 and pipy200, on an automatically provisioned Cluster with 3 dedicated Standard_f2 size Azure VMs. Immediately after both pipy100 and pipy200 have completed the Cluster will be destroyed. Application logs will be persisted and available.
NOTE: The Job id (`--id`) can only contain alphanumeric characters including hyphens and underscores, and cannot contain more than 64 characters. Each Job **must** have a unique id.
#### Low priority nodes
You can create your Job with [low-priority](https://docs.microsoft.com/en-us/azure/batch/batch-low-pri-vms) VMs at an 80% discount by using `--size-low-pri` instead of `--size`. Note that these are great for experimental use, but can be taken away at any time. We recommend against this option when doing long running jobs or for critical workloads.
### Listing Jobs
You can list all Jobs currently running in your account by running
```sh
aztk spark job list
```
### Viewing a Job
To view details about a particular Job, run:
```sh
aztk spark job get --id <your_job_id>
```
For example here Job 'pipy' has 2 applications which have already completed.
Deleting a Job also permanently deletes any data or logs associated with that cluster. If you wish to persist this data, use the `--keep-logs` flag.
__You are only charged for the job while it is active, Jobs handle provisioning and destorying infrastructure, so you are only charged for the time that your applications are running.__
### Stopping a Job
To stop a Job run:
```sh
aztk spark job stop --id <your_job_id>
```
Stopping a Job will end any currently running Applications and will prevent any new Applications from running.