Refactor documentation structure (#64)

* Refactor docs

Add more screenshots
Add links at top of README and FAQs for easy access
Consolidate and categorize advanced configurations
Add known issues and mitigation steps
Add a doc with instructions to convert from existing
Delete quickstart and germline pipeline instructions
Add links to CoA common workflows
GCP WDL to CoA WDL
Restructure README & example workflow

* Incorporate feedback

Categorize README links at the top
Fix "preemptible" attribute use guidance; add other unsupported attribute to list
Update screenshots
Grammar and spelling changes
This commit is contained in:
Jass Bagga 2020-04-14 10:04:45 -07:00 коммит произвёл GitHub
Родитель 5f2a468afc
Коммит 5a0e3b09c8
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
17 изменённых файлов: 610 добавлений и 421 удалений

172
README.md
Просмотреть файл

@ -1,14 +1,168 @@
# Welcome to Cromwell on Azure
Get started in a few quick steps! See our [Quickstart](docs/quickstart-cromwell-on-azure.md) guide<br/>
## What is Cromwell on Azure?
[Cromwell](https://cromwell.readthedocs.io/en/stable/) is a workflow management system for scientific workflows, orchestrating the computing tasks needed for genomics analysis. Originally developed by the [Broad Institute](https://github.com/broadinstitute/cromwell), Cromwell is used in the GATK Best Practices genome analysis pipeline. Cromwell supports running scripts at various scales, including your local machine, a local computing cluster, and on the cloud. <br/>
Cromwell on Azure configures all Azure resources needed to run workflows through Cromwell on the Azure cloud, and uses the [GA4GH TES](https://cromwell.readthedocs.io/en/develop/backends/TES/) backend for orchestrating the tasks that create a workflow. The installation sets up a VM host to run the Cromwell server and uses Azure Batch to spin up virtual machines that run each task in a workflow. Cromwell on Azure supports workflows written in the Workflow Description Language or WDL [format](https://cromwell.readthedocs.io/en/stable/LanguageSupport/).<br/>
#### Getting started
* What is [Cromwell on Azure?](#Cromwell-on-Azure) <br/>
* Deploy Cromwell on Azure now using this [guide](#Deploy-your-instance-of-Cromwell-on-Azure)<br/>
* A brief [demo video](https://youtu.be/QlRQ63n_mKw) on how to run workflows using Cromwell on Azure<br/>
#### Running workflows
* Prepare, start or abort your workflow [using this guide](docs/managing-your-workflow.md/#Managing-your-workflow)<br/>
* Here is an example workflow to [convert FASTQ files to uBAM files](docs/example-fastq-to-ubam.md/#Example-workflow-to-convert-FASTQ-files-to-uBAM-files)<br/>
* Have an existing WDL file that you want to run on Azure? [Modify your existing WDL with these adaptations for Azure](docs/change-existing-WDL-for-Azure.md/#How-to-modify-an-existing-WDL-file-to-run-on-Cromwell-on-Azure)<br/>
* Want to run commonly used workflows? [Find links to ready-to-use workflows here](#Run-Common-Workflows)<br/>
#### Questions?
* See our [Troubleshooting Guide](docs/troubleshooting-guide.md/#FAQs,-advanced-troubleshooting-and-known-issues-for-Cromwell-on-Azure) for more information.<br/>
* Known issues and work-arounds are [documented here](docs/troubleshooting-guide.md/#Known-Issues-And-Mitigation)<br/>
If you are running into an issue and cannot find any information in the troubleshooting guide, please open a GitHub issue!<br/>
![Logo](/docs/screenshots/logo.png)
## Cromwell on Azure
[Cromwell](https://cromwell.readthedocs.io/en/stable/) is a workflow management system for scientific workflows, orchestrating the computing tasks needed for genomics analysis. Originally developed by the [Broad Institute](https://github.com/broadinstitute/cromwell), Cromwell is also used in the GATK Best Practices genome analysis pipeline. Cromwell supports running scripts at various scales, including your local machine, a local computing cluster, and on the cloud. <br/>
Cromwell on Azure configures all Azure resources needed to run workflows through Cromwell on the Azure cloud, and uses the [GA4GH TES](https://cromwell.readthedocs.io/en/develop/backends/TES/) backend for orchestrating the tasks that create a workflow. The installation sets up a VM host to run the Cromwell server and uses Azure Batch to spin up virtual machines that run each task in a workflow. Cromwell on Azure supports workflows written in the [Workflow Description Language or WDL](https://openwdl.org/) format.<br/>
## Deploy your instance of Cromwell on Azure
### Prerequisites
1. You will need an [Azure Subscription](https://portal.azure.com/) to deploy Cromwell on Azure.
2. You must have the proper [Azure role assignments](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to deploy Cromwell on Azure. To check your current role assignments, please follow [these instructions](https://docs.microsoft.com/en-us/azure/role-based-access-control/check-access). You must have one of the following combinations of [role assignments](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles):
1. `Owner` of the subscription<br/>
2. `Contributor` and `User Access Administrator` of the subscription
3. `Owner` of the resource group.
. *Note: this level of access will result in a warning during deployment, and will not use the latest VM pricing data.</i> [Learn more](/docs/troubleshooting-guide.md/#How-are-Batch-VMs-selected-to-run-tasks-in-a-workflow?). Also, you must specify the resource group name during deployment with this level of access (see below).*
4. Note: if you only have `Service Administrator` as a role assignment, please assign yourself as `Owner` of the subscription.
3. Install the [Azure Command Line Interface (az cli)](https://docs.microsoft.com/en-us/cli/azure/?view=azure-cli-latest), a command line experience for managing Azure resources.
4. Run `az login` to authenticate with Azure.
### Download the deployment executable
Download the required executable from [Releases](https://github.com/microsoft/CromwellOnAzure/releases). Choose the runtime of your choice from `win-x64`, `linux-x64`, `osx-x64` <br/>
*Optional: build the executable yourself. Clone the [Cromwell on Azure repository](https://github.com/microsoft/CromwellOnAzure) and build the solution in Visual Studio 2019. Note that [VS 2019](https://visualstudio.microsoft.com/vs/) and [.NET Core SDK 3.0 and 2.2.x](https://dotnet.microsoft.com/download/dotnet-core) are required prerequisites. Build and [publish](https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-publish?tabs=netcore21#synopsis) the `deploy-cromwell-on-azure` project [as a self-contained deployment with your target RID](https://docs.microsoft.com/en-us/dotnet/core/deploying/#self-contained-deployments-scd) to produce the executable*
### Run the deployment executable
1. **Linux and OS X only**: assign execute permissions to the file by running the following command on the terminal:<br/>
`chmod +x <fileName>`. Replace `<fileName>` with the correct name: `deploy-cromwell-on-azure-linux` or `deploy-cromwell-on-azure-osx.app`
1. You must specify the following parameters:
1. `SubscriptionId` (**required**)
1. This can be obtained by navigating to the [subscriptions blade in the Azure portal](https://portal.azure.com/#blade/Microsoft_Azure_Billing/SubscriptionsBlade)
1. `RegionName` (**required**)
1. Specifies the region you would like to use for your Cromwell on Azure instance. To find a list of all available regions, run `az account list-locations` on the command line or in PowerShell and use the desired region's "name" property for `RegionName`.
1. `MainIdentifierPrefix` (*optional*)
1. This string will be used to prefix the name of your Cromwell on Azure resource group and associated resources. If not specified, the default value of "coa" followed by random characters is used as a prefix for the resource group and all Azure resources created for your Cromwell on Azure instance. After installation, you can search for your resources using the `MainIdentifierPrefix` value.<br/>
1. `ResourceGroupName` (*optional*, **required** when you only have owner-level access of the *resource group*)
1. Specifies the name of a pre-existing resource group that you wish to deploy into.
Run the following at the command line or terminal after navigating to where your executable is saved:
```
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string>
```
**Example:**
```
.\deploy-cromwell-on-azure.exe --SubscriptionId 00000000-0000-0000-0000-000000000000 --RegionName westus2 --MainIdentifierPrefix coa
```
Deployment can take up to 25 minutes to complete. **At installation, a user is created to allow managing the host VM with username "vmadmin". The password is randomly generated and shown during installation. You may want to save the username, password and resource group name to allow for advanced debugging later.**
Prepare, start or abort a workflow using instructions [here](docs/managing-your-workflow.md).
### Cromwell on Azure deployed resources
Once deployed, Cromwell on Azure configures the following Azure resources:
* [Host VM](https://azure.microsoft.com/en-us/services/virtual-machines/) - runs [Ubuntu 16.04 LTS](https://github.com/microsoft/CromwellOnAzure/blob/421ccd163bfd53807413ed696c0dab31fb2478aa/src/deploy-cromwell-on-azure/Configuration.cs#L16) and [Docker Compose with four containers](https://github.com/microsoft/CromwellOnAzure/blob/master/src/deploy-cromwell-on-azure/scripts/docker-compose.yml) (Cromwell, MySQL, TES, TriggerService). [Blobfuse](https://github.com/Azure/azure-storage-fuse) is used to mount the default storage account as a local file system available to the four containers. Also created are an OS and data disk, network interface, public IP address, virtual network, and network security group. [Learn more](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/)
* [Batch account](https://docs.microsoft.com/en-us/azure/batch/) - The Azure Batch account is used by TES to spin up the virtual machines that run each task in a workflow. After deployment, create an Azure support request to increase your core quotas if you plan on running large workflows. [Learn more](https://docs.microsoft.com/en-us/azure/batch/batch-quota-limit#resource-quotas)
* [Storage account](https://docs.microsoft.com/en-us/azure/storage/) - The Azure Storage account is mounted to the host VM using [blobfuse](https://github.com/Azure/azure-storage-fuse), which enables [Azure Block Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs) to be mounted as a local file system available to the four containers running in Docker. By default, it includes the following Blob containers - `cromwell-executions`, `cromwell-workflow-logs`, `inputs`, `outputs`, and `workflows`.
* [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview) - This contains logs from TES and the Trigger Service to enable debugging.
* [Cosmos DB](https://docs.microsoft.com/en-us/azure/cosmos-db/introduction) - This database is used by TES, and includes information and metadata about each TES task that is run as part of a workflow.
![Cromwell-On-Azure](/docs/screenshots/cromwellonazure.png)
A brief [Demo Video](https://youtu.be/QlRQ63n_mKw) on how to run workflows using Cromwell on Azure<br/>
Get started in a few quick steps! See our [Quickstart](docs/quickstart-cromwell-on-azure.md) guide<br/>
Run Broad Insitute of MIT and Harvard's Best Practices [Genome Analysis Pipeline on Cromwell on Azure](docs/germline-alignment-variantcalling-azure.md)<br/>
Need to debug or configure your Cromwell on Azure runs? See [Advanced configuration](docs/advanced-configuration.md) instructions<br/>
Questions? Check our [FAQs](docs/troubleshooting-guide.md) or open a GitHub issue<br/>
All of these resources will be grouped under a single resource group in your account, which you can view on the [Azure Portal](https://portal.azure.com). **Note that your specific resource group name, host VM name and host VM password for username "vmadmin" are printed to the screen during deployment. You can store these for your future use, or you can reset the VM's password at a later date via the Azure Portal.**<br/>
You can [follow these steps](/docs/troubleshooting-guide.md/#Use-input-data-files-from-an-existing-Storage-account-that-my-lab-or-team-is-currently-using) if you wish to mount a different Azure Storage account that you manage or own, to your Cromwell on Azure instance.
### Connect to existing Azure resources I own that are not part of the Cromwell on Azure instance by default
Cromwell on Azure uses [managed identities](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview) to allow the host VM to connect to Azure resources in a simple and secure manner. At the time of installation, a managed identity is created and associated with the host VM. You can find the identity via the Azure Portal by searching for the VM name in Azure Active Directory, under "All Applications". Or you may use Azure CLI `show` command as described [here](https://docs.microsoft.com/en-us/cli/azure/vm/identity?view=azure-cli-latest#az-vm-identity-show).
To allow the host VM to connect to **custom** Azure resources like Storage Account, Batch Account etc. you can use the [Azure Portal](https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) or [Azure CLI](https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-cli) to find the managed identity of the host VM and add it as a Contributor to the required Azure resource.<br/>
![Add Role](/docs/screenshots/add-role.png)
For convenience, some configuration files are hosted on your Cromwell on Azure Storage account, in the "configuration" container - `containers-to-mount`, and `cromwell-application.conf`. You can modify and save these file using Azure Portal UI "Edit Blob" option or simply upload a new file to replace the existing one.
![Edit Configuration](/docs/screenshots/edit-config.png)
See [this section in the advanced configuration on details of how to connect a different storage account, batch account, or a private Azure Container Registry](/docs/troubleshooting-guide.md/#Customizing-your-Cromwell-on-Azure-instance).<br/>
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`.
![Restart VM](/docs/screenshots/restartVM.png)
### Hello World WDL test
As part of the Cromwell on Azure deployment, a "Hello World" workflow is automatically run as a check. The input files for this workflow are found in the `inputs` container, and the output files can be found in the `cromwell-executions` container of your default storage account.
Once it runs to completion you can find the trigger JSON file that started the workflow in the `workflows` container in the `succeeded` directory, if it ran successfully.<br/>
Hello World WDL file:
```
task hello {
String name
command {
echo 'Hello ${name}!'
}
output {
File response = stdout()
}
runtime {
docker: 'ubuntu:16.04'
}
}
workflow test {
call hello
}
```
Hello World inputs.json file:
```
{
"test.hello.name": "World"
}
```
Hello World trigger JSON file as seen in your storage account's `workflows` container in the `succeeded` directory:
```
{
"WorkflowUrl": "/<storageaccountname>/inputs/test/test.wdl",
"WorkflowInputsUrl": "/<storageaccountname>/inputs/test/test.json",
"WorkflowOptionsUrl": null,
"WorkflowDependenciesUrl": null
}
```
## Run Common Workflows
Run Broad Institute of MIT and Harvard's Best Practices Pipelines on Cromwell on Azure:
[Data pre-processing for variant discovery](https://github.com/microsoft/gatk4-data-processing-azure)<br/>
[Germline short variant discovery (SNPs + Indels)](https://github.com/microsoft/five-dollar-genome-analysis-pipeline-azure)<br/>
[Somatic short variant discovery (SNVs + Indels)](https://github.com/microsoft/gatk4-somatic-snvs-indels-azure)<br/>

Просмотреть файл

@ -1,147 +0,0 @@
# Advanced configuration and debugging for Cromwell on Azure
This article describes advanced features that allow customization and debugging of Cromwell on Azure.
## Expand data disk for MySQL database storage for Cromwell
To ensure that no data is corrupted for MySQL backed storage for Cromwell, Cromwell on Azure mounts MySQL files on to an Azure Managed Data Disk of size 32G. In case there is a need to increase the size of this data disk, follow instructions [here](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/expand-disks#expand-an-azure-managed-disk).
## Connect to the host VM
To get logs from all the docker containers or to use the Cromwell REST API endpoints, you may want to connect to the Linux host VM. At installation, a user is created to allow managing the host VM with username "vmadmin". The password is randomly generated and shown during installation. If you need to reset your VM password, you can do this using the Azure Portal or by following these [instructions](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/reset-password).
![Reset password](/docs/screenshots/resetpassword.PNG)
To connect to your host VM, you can either
1. Construct your ssh connection string if you have the VM name `ssh vmadmin@<hostname>` OR
2. Navigate to the Connect button on the Overview blade of your Azure VM instance, then copy the ssh connection string.
Paste the ssh connectiong string in a command line, PowerShell or terminal application to log in.
![Connect with SSH](/docs/screenshots/connectssh.PNG)
### How to get container logs to debug issues
The host VM is running multiple docker containers that enable Cromwell on Azure - mysql, broadinstitute/cromwell, cromwellonazure/tes, cromwellonazure/triggerservice. On rare occasions, you may want to debug and diagnose issues with the docker containers. After logging in to the VM, run:
```
sudo docker ps
```
This command will list the names of all the docker containers currently running. To get logs for a particular container, run:
```
sudo docker logs 'containerName'
```
### Access the Cromwell REST API directly from Linux host VM
Cromwell is run in server mode on the Linux host VM and can be accessed via curl as described below:
***Get all workflows***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/query" -H "accept: application/json"`<br/>
***Get specific workflow's status by id***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/{id}/status" -H "accept: application/json"`<br/>
***Get call-caching difference between two workflow calls***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/callcaching/diff?workflowA={workflowId1}&callA={workflowName.callName1}&workflowB={workflowId2}&callB={workflowName.callName2}" -H "accept: application/json"`<br/>
You can perform other Cromwell API calls following a similar pattern. To see all available API endpoints, see Cromwell's REST API [here](https://cromwell.readthedocs.io/en/stable/api/RESTAPI/)
## Connect to custom Azure resources
Cromwell on Azure uses [managed identities](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview) to allow the host VM to connect to Azure resources in a simple and secure manner. At the time of installation, a managed identity is created and associated with the host VM. You can find the identity via the Azure Portal by searching for the VM name in Azure Active Directory, under "All Applications". Or you may use Azure CLI `show` command as described [here](https://docs.microsoft.com/en-us/cli/azure/vm/identity?view=azure-cli-latest#az-vm-identity-show).
To allow the host VM to connect to **custom** Azure resources like Storage Account, Batch Account etc. you can use the [Azure Portal](https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) or [Azure CLI](https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-cli) to find the managed identity of the host VM and add it as a Contributor to the required Azure resource.<br/>
For convenience, some configuration files are hosted on your Cromwell on Azure Storage account, in the "configuration" container - `containers-to-mount`, and `cromwell-application.conf`. You can modify and save these file using Azure Portal UI "Edit Blob" option or simply upload a new file to replace the existing one.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`.
![Restart VM](/docs/screenshots/restartVM.png)
### Use private docker containers
Cromwell on Azure supports private docker images for your WDL tasks hosted on [Azure Container Registry or ACR](https://docs.microsoft.com/en-us/azure/container-registry/).
To allow the host VM to use an ACR, add the VM identity as a Contributor to the Container Registry via Azure Portal or Azure CLI.<br/>
### Mount another storage account
Navigate to the "configuration" container in the Cromwell on Azure Storage account. Replace YOURSTORAGEACCOUNTNAME with your storage account name and YOURCONTAINERNAME with your container name in the `containers-to-mount` file below:
```
/YOURSTORAGEACCOUNTNAME/YOURCONTAINERNAME/
```
Add this to the end of file and save your changes.<br/>
To allow the host VM to write to a Storage account, add the VM Identity as a Contributor to the Storage Account via Azure Portal or Azure CLI.<br/>
Alternatively, you can choose to add a [SAS url for your desired container](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) to the end of the `containers-to-mount` file. This is also applicable if your VM cannot be granted Contributor access to the Storage account because the two resources are in different Azure tenants
```
https://<yourstorageaccountname>.blob.core.windows.net:443/<yourcontainername>?<sastoken>
```
When using the newly mounted storage account in your inputs JSON file, use the path `"/container-mountpath/blobName"`, where `container-mountpath` is `/YOURSTORAGEACCOUNTNAME/YOURCONTAINERNAME/`.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI.
### Change batch account
Log on to the host VM using the ssh connection string as described above. Replace `BatchAccountName` environment variable for the "tes" service in the `docker-compose.yml` file with the name of the desired Batch account and save your changes.<br/>
```
cd /cromwellazure/
sudo nano docker-compose.yml
# Modify the BatchAccountName and save the file
```
To allow the host VM to use a batch account, add the VM identity as a Contributor to the Azure Batch account via Azure Portal or Azure CLI.<br/>
To allow the host VM to read prices and information about types of machines available for the batch account, add the VM identity as a Billing Reader to the subscription with the configured Batch Account.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`. or run `sudo reboot`.
### Use dedicated VMs for all your tasks
By default, we are using an environment variable `UsePreemptibleVmsOnly` set to true, to always use low priority Azure Batch nodes.<br/>
If you prefer to use dedicated Azure Batch nodes, log on to the host VM using the ssh connection string as described above. Replace `UsePreemptibleVmsOnly` environment variable for the "tes" service to "false" in the `docker-compose.yml` file and save your changes.<br/>
```
cd /cromwellazure/
sudo nano docker-compose.yml
# Modify UsePreemptibleVmsOnly to false and save the file
```
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`.
### Connect Cromwell on Azure to a managed instance of Azure MySQL
**Create Azure MySQL server**<br/>
Create a managed instance for Azure MySQL following instructions [here](https://docs.microsoft.com/en-us/azure/mysql/quickstart-create-mysql-server-database-using-azure-portal). In the Connection security settings, add a rule to allow the host VM public IP address access to the MySQL server.<br/>
**Create a user for Cromwell**<br/>
Connect with admin username/password to the MySQL database. Replace "yourMySQLServerName" with your MySQL server name and run the following:
```
CREATE USER 'cromwell'@'localhost' IDENTIFIED BY 'cromwell';
GRANT ALL PRIVILEGES ON cromwell_db.* TO 'cromwell'@'localhost' WITH GRANT OPTION;
CREATE USER 'cromwell'@'%' IDENTIFIED BY 'cromwell';
GRANT ALL PRIVILEGES ON cromwell_db.* TO 'cromwell'@'%' WITH GRANT OPTION;
CREATE USER 'cromwell'@'yourMySQLServerName.mysql.database.azure.com' IDENTIFIED BY 'cromwell';
GRANT ALL PRIVILEGES ON cromwell_db.* TO 'cromwell'@'yourMySQLServerName.mysql.database.azure.com' WITH GRANT OPTION;
CREATE USER 'cromwell'@'yourMySQLServerName.mysql.database.azure.com' IDENTIFIED BY 'cromwell';
GRANT ALL PRIVILEGES ON cromwell_db.* TO 'cromwell'@'yourMySQLServerName.mysql.database.azure.com' WITH GRANT OPTION;
create database cromwell_db
flush privileges
```
**Connect Cromwell to the database by modifying the Cromwell configuration file**<br/>
Navigate to the "configuration" container in the Cromwell on Azure Storage account. Replace "yourMySQLServerName" with your MySQL server name in the `cromwell-application.conf` file under the database connection settings. <br/>
```
cd /cromwell-app-config
sudo nano cromwell-application.conf
```
Find the database section and make the changes:
```
database {
db.url = "jdbc:mysql://<yourMySQLServerName>.mysql.database.azure.com:3306/cromwell_db?useSSL=false&rewriteBatchedStatements=true&allowPublicKeyRetrieval=true&serverTimezone=UTC"
db.user = "cromwell@yourMySQLServerName"
db.password = "cromwell"
db.driver = "com.mysql.cj.jdbc.Driver"
profile = "slick.jdbc.MySQLProfile$"
db.connectionTimeout = 15000
}
```
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`.
Learn more about Cromwell configuration options [here](https://cromwell.readthedocs.io/en/stable/Configuring/)<br/>

Просмотреть файл

@ -0,0 +1,62 @@
# How to modify an existing WDL file to run on Cromwell on Azure
For any pipeline, you can create a [WDL](https://software.broadinstitute.org/wdl/) file that calls your tools in Docker containers. Please note that Cromwell on Azure only supports tasks with Docker containers defined for security reasons.<br/>
In order to run a WDL file, you must modify/create a workflow with the following runtime attributes for the tasks that are compliant with the [TES or Task Execution Schemas](https://cromwell.readthedocs.io/en/develop/backends/TES/):
```
runtime {
cpu: 1
memory: 2 GB
disk: 10 GB
docker:
maxRetries: 0
}
```
Ensure that the attributes `memory` and `disk` (note: use the singular form for `disk` NOT `disks`) have units. Supported units from Cromwell:
> KB - "KB", "K", "KiB", "Ki"<br/>
> MB - "MB", "M", "MiB", "Mi"<br/>
> GB - "GB", "G", "GiB", "Gi"<br/>
> TB - "TB", "T", "TiB", "Ti"<br/>
The `preemptible` attribute is a boolean (not an integer). You can specify `preemptible` as `true` or `false` for each task. When set to `true` Cromwell on Azure will use a [low-priority batch VM](https://docs.microsoft.com/en-us/azure/batch/batch-low-pri-vms#use-cases-for-low-priority-vms) to run the task.<br/>
`bootDiskSizeGb` and `zones` attributes are not supported by the TES backend.<br/>
Each of these runtime attributes are specific to your workflow and tasks within those workflows. The default values for resource requirements are as set above.<br/>
Learn more about Cromwell's runtime attributes [here](https://cromwell.readthedocs.io/en/develop/RuntimeAttributes).
## Runtime attributes comparison with a GCP WDL file
Left panel shows a WDL file created for GCP whereas the right panel is the modified WDL that runs on Azure.
![Runtime Attributes](/docs/screenshots/runtime.PNG)
## Using maxRetries to replace the preemptible attribute
For a GCP WDL, `preemptible` is an integer - specifying the number of retries when using the flag. For Cromwell on Azure, if you want to use the `preemptible` attribute but dont use `maxRetries` for a task, consider also adding `maxRetries` to keep the retry functionality. Remember that for each task in a workflow, you can either use a low-priority VM in batch (default configuration) or use a dedicated VM by setting `preemptible` to either `true` or `false` respectively.
![Preemptible Attribute](/docs/screenshots/preemptible.PNG)
You can choose to ALWAYS run dedicated VMs for every task, by modifying the `docker-compose.yml` setting `UsePreemptibleVmsOnly` as described in [this section](/docs/troubleshooting-guide.md/#How-can-I-configure-my-Cromwell-on-Azure-instance-to-use-dedicated-Batch-VMs-to-avoid-getting-preempted?). The `preemptible` runtime attribute will overwrite the environment variable setting.
## Accompanying index files for BAM or VCF files
If a tool you are using within a task assumes that an index file for your data (BAM or VCF file) is located in the same folder, add an index file to the list of parameters when defining and calling the task to ensure the accompanying index file is copied to the correct location for access:
![Index file parameter](/docs/screenshots/index_1.PNG)
![Index file called in task](/docs/screenshots/index_2.PNG)
## Calculating disk_size when scattering
In the current implementation, the entire input file is passed to each task created by the WDL `scatter` [operation](https://support.terra.bio/hc/en-us/articles/360037128572?id=6716). If you calculate `disk_size` runtime attribute dynamically within the task, use the full size of input file instead of dividing by the number of shards to allow for enough disk space to perform the task. Do not forget to add size of index files if you added them as parameters:
![Disk size scatter](/docs/screenshots/disk_size_scatter.PNG)

Просмотреть файл

@ -0,0 +1,60 @@
# Example workflow to convert FASTQ files to uBAM files
This document describes how to run an example workflow that converts input FASTQ files to uBAM files. This may be combined to other WDL (Workflow Description Language) files and used as part of a larger workflow.
## Run a sample workflow
In this example, we will run a workflow written in WDL that converts FASTQ files to uBAM for chromosome 21.
## Access input data
You can find publicly available paired end reads for chromosome 21 hosted here:
[https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read1.fq.gz](https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read1.fq.gz)
[https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read2.fq.gz](https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read2.fq.gz)
You can use these input file URLs directly as they are publicly available.<br/>
Alternatively, you can choose to upload the data into the "inputs" container in your Cromwell on Azure storage account associated with your host VM.
You can do this directly from the Azure Portal, or use various tools including [Microsoft Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/), [blobporter](https://github.com/Azure/blobporter), or [AzCopy](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy?toc=%2fazure%2fstorage%2fblobs%2ftoc.json). <br/>
## Configure your Cromwell on Azure trigger JSON, inputs JSON and WDL files
You can find an inputs JSON file and a sample WDL for converting FASTQ to uBAM format in this [GitHub repo](https://github.com/microsoft/CromwellOnAzure/blob/master/samples/quickstart). The chr21 FASTQ files are hosted on the public Azure Storage account container.<br/>
You can use the "msgenpublicdata" storage account directly as a relative path, like the below example.<br/>
The inputs JSON file should contain the following:
```
{
"FastqToUbamSingleSample.sample_name": "chr21",
"FastqToUbamSingleSample.library_name": "Pond001",
"FastqToUbamSingleSample.group_name": "GrA",
"FastqToUbamSingleSample.platform": "illumina",
"FastqToUbamSingleSample.platform_unit": "GrA.chr21",
"FastqToUbamSingleSample.fastq_pair": [
"/msgenpublicdata/inputs/chr21/chr21.read1.fq.gz",
"/msgenpublicdata/inputs/chr21/chr21.read2.fq.gz"
]
}
```
The input path consists of 3 parts - the storage account name, the blob container name, file path with extension. Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
`"/msgenpublicdata/inputs/chr21/chr21.read1.fq.gz"`
If you chose to host these files on your own storage account, replace the name "msgenpublicdata/inputs" to your `<storageaccountname>/<containername>`. <br/>
Alternatively, you can use http or https paths for your input files [using shared access signatures (SAS)](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for files in a private Azure Storage account container or refer to any public file location.
*Please note, [Cromwell engine currently does not support http(s) paths](https://github.com/broadinstitute/cromwell/issues/4184#issuecomment-425981166) in the JSON inputs file that accompany a WDL. **Ensure that your workflow WDL does not perform any WDL operations/input expressions that require Cromwell to download the http(s) inputs on the host machine.***
[A sample trigger JSON file can be downloaded from this GitHub repo](https://github.com/microsoft/CromwellOnAzure/blob/master/samples/quickstart/FastqToUbamSingleSample.chr21.json) and includes
```
{
"WorkflowUrl": "https://raw.githubusercontent.com/microsoft/CromwellOnAzure/master/samples/quickstart/FastqToUbamSingleSample.wdl",
"WorkflowInputsUrl": "https://raw.githubusercontent.com/microsoft/CromwellOnAzure/master/samples/quickstart/FastqToUbamSingleSample.chr21.inputs.json",
"WorkflowOptionsUrl": null,
"WorkflowDependenciesUrl": null
}
```

Просмотреть файл

@ -1,35 +0,0 @@
# Germline alignment and variant calling pipeline
This tutorial walks through how to run the germline alignment and variant calling pipeline, based on [Best Practices Genome Analysis Pipeline by Broad Institute of MIT and Harvard](https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165), on Cromwell on Azure.
> This WDL pipeline implements data pre-processing and initial variant calling (GVCF generation) according to the GATK Best Practices (June 2016) for germline SNP and Indel discovery in human whole-genome sequencing data.
Learn more [here](https://github.com/microsoft/five-dollar-genome-analysis-pipeline-azure)
### Input data
All the required input files for the tutorial are on a publicly hosted Azure Storage account.
You can find the sample WDL, JSON inputs and the `WholeGenomeGermlineSingleSample.hg38.json` trigger file for the pipeline on the [GitHub repo](https://github.com/microsoft/five-dollar-genome-analysis-pipeline-azure).
You can use the "msgenpublicdata" storage account directly as a relative path, like in the [JSON inputs file](https://raw.githubusercontent.com/microsoft/five-dollar-genome-analysis-pipeline-azure/master-azure/WholeGenomeGermlineSingleSample.hg38.inputs.json).
This is an example trigger JSON file:
```
{
"WorkflowUrl": "https://raw.githubusercontent.com/microsoft/five-dollar-genome-analysis-pipeline-azure/az1.1.0/WholeGenomeGermlineSingleSample.wdl",
"WorkflowInputsUrl": "https://raw.githubusercontent.com/microsoft/five-dollar-genome-analysis-pipeline-azure/az1.1.0/WholeGenomeGermlineSingleSample.hg38.inputs.json",
"WorkflowOptionsUrl": null,
"WorkflowDependenciesUrl": null
}
```
See more instructions on how to run a workflow on Cromwell on Azure [here](quickstart-cromwell-on-azure.md).
## Start a WDL workflow
To start a WDL workflow, go to your Cromwell on Azure Storage account associated with your host VM. In the "workflows" container, create the directory "new" and place the trigger file in that folder. This initiates a Cromwell workflow, and returns a workflow id that is appended to the trigger JSON file name and transferred over to the "inprogress" directory in the Workflows container.<br/>
![directory](/docs/screenshots/newportal.PNG)
![directory2](/docs/screenshots/newexplorer.PNG)
For example, a trigger JSON file with name `task1.json` in the "new" directory, will be move to "inprogress" directory with a modified name `task1.guid.json`. This guid is a workflow id assigned by Cromwell.<br/>
Once your workflow completes, you can view the output files of your workflow in the "cromwell-executions" container within your Azure Storage Account. Additional output files from the cromwell endpoint, including metadata and the timing file, are found in the "outputs" container. To learn more about Cromwell's metadata and timing information, visit the [Cromwell documentation](https://cromwell.readthedocs.io/en/stable/).<br/>

Просмотреть файл

@ -0,0 +1,103 @@
# Managing your workflow
[Prepare](#Prepare-your-workflow) your workflow <br/>
[Start](#Start-your-workflow) your workflow <br/>
[Get your workflow Id](#Get-the-Cromwell-workflow-ID) for your workflow <br/>
[Abort](#Abort-your-workflow) an in-progress workflow <br/>
## Prepare your workflow
### How to prepare a Workflow Description Language (WDL) file that runs a workflow on Cromwell on Azure
For any pipeline, you can create a [WDL](https://software.broadinstitute.org/wdl/) file that calls your tools in Docker containers. Please note that Cromwell on Azure only supports tasks with Docker containers defined for security reasons.<br/>
In order to run a WDL file, you must modify/create a workflow with the following runtime attributes for the tasks that are compliant with the [TES or Task Execution Schemas](https://cromwell.readthedocs.io/en/develop/backends/TES/):
```
runtime {
cpu: 1
memory: 2 GB
disk: 10 GB
docker:
maxRetries: 0
}
```
Ensure that the attributes `memory` and `disk` (note: use the singular form for `disk` NOT `disks`) have units. Supported units from Cromwell:
> KB - "KB", "K", "KiB", "Ki"<br/>
> MB - "MB", "M", "MiB", "Mi"<br/>
> GB - "GB", "G", "GiB", "Gi"<br/>
> TB - "TB", "T", "TiB", "Ti"<br/>
The `preemptible` attribute is a boolean (not an integer). You can specify `preemptible` as `true` or `false` for each task. When set to `true` Cromwell on Azure will use a [low-priority batch VM](https://docs.microsoft.com/en-us/azure/batch/batch-low-pri-vms#use-cases-for-low-priority-vms) to run the task.<br/>
`bootDiskSizeGb` and `zones` attributes are not supported by the TES backend.<br/>
Each of these runtime attributes are specific to your workflow and tasks within those workflows. The default values for resource requirements are as set above.<br/>
Learn more about Cromwell's runtime attributes [here](https://cromwell.readthedocs.io/en/develop/RuntimeAttributes).
### How to prepare an inputs JSON file to use in your workflow
For specifying inputs to any workflow, you may want to use a JSON file that allows you to customize inputs to any workflow WDL file.<br/>
For files hosted on an Azure Storage account that is connected to your Cromwell on Azure instance, the input path consists of 3 parts - the storage account name, the blob container name, file path with extension, following this format:
```
/<storageaccountname>/<containername>/<blobName>
```
Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
`"/msgenpublicdata/inputs/chr21.read1.fq.gz"`
### Configure your Cromwell on Azure trigger JSON file
To run a workflow using Cromwell on Azure, you will need to specify the location of your WDL file and inputs JSON file in an Cromwell on Azure-specific trigger JSON file which also includes any workflow options and dependencies. Submitting this trigger file initiates the Cromwell workflow.
All trigger JSON files include the following information:
- The "WorkflowUrl" is the url for your WDL file.
- The "WorkflowInputsUrl" is the url for your input JSON file.
- The "WorkflowOptionsUrl" is only used with some WDL files. If you are not using it set this to `null`.
- The "WorkflowDependenciesUrl" is only used with some WDL files. If you are not using it set this to `null`.
Your trigger file should be configured as follows:
```
{
"WorkflowUrl": <URL path to your WDL file in quotes>,
"WorkflowInputsUrl": <URL path to your input json file in quotes>,
"WorkflowOptionsUrl": <URL path to your workflow options json in quotes>,
"WorkflowDependenciesUrl": <URL path to your workflow dependencies file in quotes>
}
```
When using WDL and inputs JSON file hosted on your private Azure Storage account's blob containers, the specific URL can be found by clicking on the file to view the blob's properties from the Azure portal. The URL path to "WorkflowUrl" for a test WDL file will look like:
```
https://<storageaccountname>.blob.core.windows.net/inputs/test/test.wdl
```
You can also use the `/<storageaccountname>/<containername>/<blobName>` format for any storage account that is mounted to your Cromwell on Azure instance. By default, Cromwell on Azure mounts a storage account to your instance, which is found in your resource group after a successful deployment. You can [follow these steps](/docs/troubleshooting-guide.md/#Use-input-data-files-from-an-existing-Storage-account-that-my-lab-or-team-is-currently-using) to mount a different storage account that you manage or own, to your Cromwell on Azure instance.
Alternatively, you can use any http or https path to a TES compliant WDL and inputs.json [using shared access signatures (SAS)](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for files in a private Azure Storage account container or refer to any public file location like raw GitHub URLs.
## Start your workflow
To start a WDL workflow, go to your Cromwell on Azure Storage account associated with your host VM. In the `workflows` container, place the trigger JSON file in the "new" virtual directory (note: virtual directories do not exist on their own, they are just part of a blob's name). This initiates a Cromwell workflow, and returns a workflow ID that is appended to the trigger JSON file name and transferred to the "inprogress" directory in the `workflows` container.<br/>
This can be done programmatically using the [Azure Storage SDKs](https://azure.microsoft.com/en-us/downloads/), or manually via the [Azure Portal](https://portal.azure.com) or [Azure Storage Explorer](https://azure.microsoft.com/en-us/features/storage-explorer/).
### Via the Azure Portal
![Select a blob to upload from the portal](screenshots/newportal.PNG)<br/>
### Via Azure Storage Explorer
![Select a blob to upload from Azure Storage Explorer](screenshots/newexplorer.PNG)
For example, a trigger JSON file with name `task1.json` in the "new" directory, will be move to the "inprogress" directory with a modified name `task1.uuid.json`. This uuid is a workflow ID assigned by Cromwell.<br/>
Once your workflow completes, you can view the output files of your workflow in the `cromwell-executions` container within your Azure Storage Account. Additional output files from the Cromwell endpoint, including metadata and the timing file, are found in the `outputs` container. To learn more about Cromwell's metadata and timing information, visit the [Cromwell documentation](https://cromwell.readthedocs.io/en/stable/).<br/>
## Get the Cromwell workflow ID
The Cromwell workflow ID is generated by Cromwell once the workflow is in progress, and it is appended to the trigger JSON file name.<br/>
For example, placing a trigger JSON file with name `task1.json` in the "new" directory will initiate the workflow. Once the workflow begins, the JSON file will be moved to the "inprogress" directory in the "workflows" container with a modified name `task1.guid.json`
## Abort your workflow
To abort a workflow that is in-progress, go to your Cromwell on Azure Storage account associated with your host VM. In the `workflows` container, place an empty file in the "abort" virtual directory named `cromwellID.json`, where "cromwellID" is the Cromwell workflow ID you wish to abort.

Просмотреть файл

@ -1,148 +0,0 @@
# Overview
This quickstart describes how to deploy Cromwell on Azure and run a sample workflow.
The main steps are:
1. **Deploy.** Download prerequisites and use the deployment executable to configure the Azure resources needed to run Cromwell on Azure.
1. **Prepare your workflow.** Create a JSON trigger file with required URLs for your workflow.
1. **Execute.** Upload the trigger file so that Cromwell starts running your workflow.
# Deployment
## Prerequisites
1. You will need an [Azure Subscription](https://portal.azure.com/) to deploy Cromwell on Azure, if you don't already have one.
1. You must have the proper [Azure role assignments](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to deploy Cromwell on Azure. To check your current role assignments, please follow [these instructions](https://docs.microsoft.com/en-us/azure/role-based-access-control/check-access). You must have one of the following combinations of [role assignments](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles):
1. `Owner` of the subscription<br/>
1. `Contributor` and `User Access Administrator` of the subscription
1. `Owner` of the resource group.
1. *Note: this level of access will result in a warning during deployment, and will not use the latest VM pricing data.</i> [Learn more](https://github.com/microsoft/CromwellOnAzure/blob/master/docs/troubleshooting-guide.md#dynamic-cost-optimization-and-ratecard-api-access). Also, you must specify the resource group name during deployment with this level of access (see below).*
1. Note: if you only have `Service Administrator` as a role assignment, please assign yourself as `Owner` of the subscription.
1. Install the [Azure Command Line Interface (az cli)](https://docs.microsoft.com/en-us/cli/azure/?view=azure-cli-latest), a command line experience for managing Azure resources.
1. Run `az login` to authenticate with Azure
## Download the deployment executable
1. Download the required executable from [Releases](https://github.com/microsoft/CromwellOnAzure/releases). Choose the runtime of your choice from `win-x64`, `linux-x64`, `osx-x64`
1. *Optional: build the executable yourself. Clone the [Cromwell on Azure repository](https://github.com/microsoft/CromwellOnAzure) and build the solution in Visual Studio 2019. Note that [VS 2019](https://visualstudio.microsoft.com/vs/) and [.NET Core SDK 3.0 and 2.2.x](https://dotnet.microsoft.com/download/dotnet-core) are required prerequisites. Build and [publish](https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-publish?tabs=netcore21#synopsis) the `deploy-cromwell-on-azure` project [as a self-contained deployment with your target RID](https://docs.microsoft.com/en-us/dotnet/core/deploying/#self-contained-deployments-scd) to produce the executable*
## Run the deployment executable
####
1. **Linux and OS X only**: assign execute permissions to the file by running the following command on the terminal:<br/>
`chmod +x <fileName>`
1. Replace `<fileName>` with the correct name: `deploy-cromwell-on-azure-linux` or `deploy-cromwell-on-azure-osx.app`
1. You must specify the following parameters:
1. `SubscriptionId` (**required**)
1. This can be obtained by navigating to the [subscriptions blade in the Azure portal](https://portal.azure.com/#blade/Microsoft_Azure_Billing/SubscriptionsBlade)
1. `RegionName` (**required**)
1. Specifies the region you would like to use for your Cromwell on Azure instance. To find a list of all available regions, run `az account list-locations` on the command line or in PowerShell and use the desired region's "name" property for `RegionName`.
1. `MainIdentifierPrefix` (*optional*)
1. This string will be used to prefix the name of your Cromwell on Azure resource group and associated resources. If not specified, the default value of "coa" followed by random characters is used as a prefix for the resource group and all Azure resources created for your Cromwell on Azure instance. After installation, you can search for your resources using the `MainIdentifierPrefix` value.<br/>
1. `ResourceGroupName` (*optional*, **required** when you only have owner-level access of the *resource group*)
1. Specifies the name of a pre-existing resource group that you wish to deploy into.
Run the following at the command line or terminal after navigating to where your executable is saved:
```
.\deploy-cromwell-on-azure.exe --SubscriptionId <Your subscription ID> --RegionName <Your region> --MainIdentifierPrefix <Your string>
```
**Example:**
```
.\deploy-cromwell-on-azure.exe --SubscriptionId 00000000-0000-0000-0000-000000000000 --RegionName westus2 --MainIdentifierPrefix coa
```
Deployment can take up to 40 minutes to complete.
## Cromwell on Azure deployed resources
Once deployed, Cromwell on Azure configures the following Azure resources:
* [Host VM](https://azure.microsoft.com/en-us/services/virtual-machines/) - runs [Ubuntu 16.04 LTS](https://github.com/microsoft/CromwellOnAzure/blob/421ccd163bfd53807413ed696c0dab31fb2478aa/src/deploy-cromwell-on-azure/Configuration.cs#L16) and [Docker Compose with four containers](https://github.com/microsoft/CromwellOnAzure/blob/master/src/deploy-cromwell-on-azure/scripts/docker-compose.yml) (Cromwell, MySQL, TES, TriggerService). [Blobfuse](https://github.com/Azure/azure-storage-fuse) is used to mount the default storage account as a local file system available to the four containers. Also created are an OS and data disk, network interface, public IP address, virtual network, and network security group. [Learn more](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/)
* [Batch account](https://docs.microsoft.com/en-us/azure/batch/) - The Azure Batch account is used by TES to spin up the virtual machines that run each task in a workflow. After deployment, create an Azure support request to increase your core quotas if you plan on running large workflows. [Learn more](https://docs.microsoft.com/en-us/azure/batch/batch-quota-limit#resource-quotas)
* [Storage account](https://docs.microsoft.com/en-us/azure/storage/) - The Azure Storage account is mounted to the host VM using [blobfuse](https://github.com/Azure/azure-storage-fuse), which enables [Azure Block Blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs) to be mounted as a local file system available to the four containers running in Docker. By default, it includes the following Blob containers - `cromwell-executions`, `cromwell-workflow-logs`, `inputs`, `outputs`, and `workflows`.
* [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview) - This contains logs from TES and the Trigger Service to enable debugging.
* [Cosmos DB](https://docs.microsoft.com/en-us/azure/cosmos-db/introduction) - This database is used by TES, and includes information and metadata about each TES task that is run as part of a workflow.
All of these resources will be grouped under a single resource group in your account, which you can view on the [Azure Portal](https://portal.azure.com). Note that your specific resource group name, host VM name and host VM password for username "vmadmin" are printed to the screen during deployment. You can store these for your future use, or you can reset the VM's password at a later date via the Azure Portal.<br/>
Note that as part of the Cromwell on Azure deployment, a "Hello World" workflow is automatically run. The input files for this workflow are found in the `inputs` container, and the output files can be found in the `cromwell-executions` container.<br/>
# Run a sample workflow
To run a workflow using Cromwell on Azure, you will need to upload your input files and your WDL file to Azure Storage. You will also need to generate a Cromwell on Azure-specific trigger file which includes the path to your WDL and inputs file, and any workflow options and dependencies. Submitting this trigger file initiates the Cromwell workflow. In this example, we will run a sample workflow written in WDL that converts FASTQ files to uBAM for chromosome 21.
## Access input data
You can find publicly available paired end reads for chromosome 21 hosted here:
[https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read1.fq.gz](https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read1.fq.gz)
[https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read2.fq.gz](https://msgenpublicdata.blob.core.windows.net/inputs/chr21/chr21.read2.fq.gz)
You can use these input files directly as they are publicly available.<br/>
Alternatively, you can choose to upload the data into the "inputs" container in your Cromwell on Azure storage account associated with your host VM.
You can do this directly from the Azure Portal, or use various tools including [Microsoft Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/), [blobporter](https://github.com/Azure/blobporter), or [AzCopy](https://docs.microsoft.com/azure/storage/common/storage-use-azcopy?toc=%2fazure%2fstorage%2fblobs%2ftoc.json). <br/>
## Get inputs JSON and WDL files
You can find an inputs JSON file and a sample WDL for converting FASTQ to uBAM format in this [GitHub repo](https://github.com/microsoft/CromwellOnAzure/blob/master/samples/quickstart). The chr21 FASTQ files are hosted on the public Azure Storage account container.<br/>
You can use the "msgenpublicdata" storage account directly as a relative path, like the below example.<br/>
The inputs JSON file should contain the following:
```
{
"FastqToUbamSingleSample.sample_name": "chr21",
"FastqToUbamSingleSample.library_name": "Pond001",
"FastqToUbamSingleSample.group_name": "GrA",
"FastqToUbamSingleSample.platform": "illumina",
"FastqToUbamSingleSample.platform_unit": "GrA.chr21",
"FastqToUbamSingleSample.fastq_pair": [
"/msgenpublicdata/inputs/chr21/chr21.read1.fq.gz",
"/msgenpublicdata/inputs/chr21/chr21.read2.fq.gz"
]
}
```
The input path consists of 3 parts - the storage account name, the blob container name, file path with extension. Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
`"/msgenpublicdata/inputs/chr21/chr21.read1.fq.gz"`
If you chose to host these files on your own Storage account, replace the name "msgenpublicdata/inputs" to your `<storageaccountname>/<containername>`. <br/>
Alternatively, you can use http or https paths for your input files [using shared access signatures (SAS)](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for files in a private Azure Storage account container or refer to any public file location.
*Please note, [Cromwell engine currently does not support http(s) paths](https://github.com/broadinstitute/cromwell/issues/4184#issuecomment-425981166) in the JSON inputs file that accompany a WDL. **Ensure that your workflow WDL does not perform any WDL operations/input expressions that require Cromwell to download the http(s) inputs on the host machine.***
## Configure your Cromwell on Azure trigger file
Cromwell on Azure uses a JSON trigger file to note the paths to all input information and to initiate the workflow. [A sample trigger file can be downloaded from this GitHub repo](https://github.com/microsoft/CromwellOnAzure/blob/master/samples/quickstart/FastqToUbamSingleSample.chr21.json) and includes the following information:
- The "WorkflowUrl" is the url for your WDL file.
- The "WorkflowInputsUrl" is the url for your input JSON file.
- The "WorkflowOptionsUrl" is only used with some WDL files. If you are not using it set this to `null`.
- The "WorkflowDependenciesUrl" is only used with some WDL files. If you are not using it set this to `null`.
Your trigger file should be configured as follows:
```
{
"WorkflowUrl": <URL path to your WDL in quotes>,
"WorkflowInputsUrl": <URL path to your input json in quotes>,
"WorkflowOptionsUrl": null,
"WorkflowDependenciesUrl": null
}
```
When using WDL and inputs JSON file hosted on your private Azure Storage account's blob containers, the specific URL can be found by clicking on the file to view the blob's properties from the Azure portal. The URL path to "WorkflowUrl" for a test WDL file will look like:
`https://<storageaccountname>.blob.core.windows.net/inputs/test/test.wdl`
Alternatively, you can use any http or https path to a TES compliant WDL and inputs.json [using shared access signatures (SAS)](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for files in a private Azure Storage account container or refer to any public file location.
You can also host your WDL and JSON inputs files on your storage account container and use the `/<storageaccountname>/<containername>/blobName` format.
## Start a WDL workflow
To start a WDL workflow, go to your Cromwell on Azure Storage account associated with your host VM. In the `workflows` container, place the trigger file in the "new" virtual directory (note: virtual directories do not exist on their own, they are just part of a blob's name). This initiates a Cromwell workflow, and returns a workflow ID that is appended to the trigger JSON file name and transferred to the "inprogress" directory in the `workflows` container.<br/>
This can be done programatically using the [Azure Storage SDKs](https://azure.microsoft.com/en-us/downloads/), or manually via the [Azure Portal](https://portal.azure.com) or [Azure Storage Explorer](https://azure.microsoft.com/en-us/features/storage-explorer/).
### Via the Azure Portal
![Select a blob to upload from the portal](screenshots/newportal.PNG)<br/>
### Via Azure Storage Explorer
![Select a blob to upload from Azure Storage Explorer](screenshots/newexplorer.PNG)
For example, a trigger JSON file with name `task1.json` in the "new" directory, will be move to the "inprogress" directory with a modified name `task1.uuid.json`. This uuid is a workflow ID assigned by Cromwell.<br/>
Once your workflow completes, you can view the output files of your workflow in the `cromwell-executions` container within your Azure Storage Account. Additional output files from the Cromwell endpoint, including metadata and the timing file, are found in the `outputs` container. To learn more about Cromwell's metadata and timing information, visit the [Cromwell documentation](https://cromwell.readthedocs.io/en/stable/).<br/>
## Abort an in-progress workflow
To abort a workflow that is in-progress, go to your Cromwell on Azure Storage account associated with your host VM. In the `workflows` container, place an empty file in the "abort" virtual directory named `cromwellID.json`, where "cromwellID" is the Cromwell workflow ID you wish to abort.

Двоичные данные
docs/screenshots/add-role.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 222 KiB

Двоичные данные
docs/screenshots/disk_size_scatter.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 55 KiB

Двоичные данные
docs/screenshots/edit-config.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 424 KiB

Двоичные данные
docs/screenshots/index_1.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 36 KiB

Двоичные данные
docs/screenshots/index_2.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичные данные
docs/screenshots/logo.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 48 KiB

Двоичные данные
docs/screenshots/preemptible.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 18 KiB

Двоичные данные
docs/screenshots/resize-VM.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 432 KiB

Двоичные данные
docs/screenshots/runtime.PNG Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 31 KiB

Просмотреть файл

@ -1,88 +1,132 @@
# FAQ and Troubleshooting guide
This article lists common questions and troubleshooting tips for using Cromwell on Azure.
# FAQs, advanced troubleshooting and known issues for Cromwell on Azure
## How to prepare files to run a workflow on Cromwell on Azure
For any pipeline, you can create a [WDL](https://software.broadinstitute.org/wdl/) file that calls your tools in docker containers. Please note that Cromwell on Azure only supports tasks with docker containers defined for security reasons.<br/>
For specifying inputs to any workflow, you may want to use a JSON file that allows you to customize inputs to any workflow WDL file.<br/>
This article answers FAQs, describes advanced features that allow customization and debugging of Cromwell on Azure, as well as how to diagnose, debug, and work around [known issues](#Known-Issues-And-Mitigation). We are actively tracking these as bugs to be fixed in upcoming releases!
For files hosted on an Azure Storage account, the input path consists of 3 parts - the storage account name, the blob container name, file path with extension. Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
1. Setup
* I am trying to [setup Cromwell on Azure for multiple users on the same subscription](#Setup-Cromwell-on-Azure-for-multiple-users-in-the-same-Azure-subscription), what are the ways I can do this?
* I ran the [Cromwell on Azure installer and it failed](#Debug-my-Cromwell-on-Azure-installation-that-ran-into-an-error). How can I fix it?
* How can I [upgrade my Cromwell on Azure instance?](#Upgrade-my-Cromwell-on-Azure-instance)
2. Analysis
* I submitted a job and it failed almost immediately. What [happened to it](#Job-failed-immediately)?
* How do I [setup my own WDL](#Set-up-my-own-WDL) to run on Cromwell?
* How can I see [how far along my workflow has progressed?](#Check-all-tasks-running-for-a-workflow-using-Batch-account)
* My workflow failed at task X. Where should I look to determine why it failed?
* [Which tasks failed?](#Find-which-tasks-failed-in-a-workflow)
* Some tasks are stuck or my workflow is stuck in the "inprogress" directory in the "workflows" container. [Were there Azure infrastructure issues?](#Make-sure-there-are-no-Azure-infrastructure-errors)
3. Customizing your instance
* How do I [use input data files for my workflows from a different Azure Storage account](#Use-input-data-files-from-an-existing-Storage-account-that-my-lab-or-team-is-currently-using) that my lab or team is currently using?
* Can I connect a different [batch account with previously increased quotas](#Use-a-batch-account-for-which-I-have-already-requested-or-received-increased-cores-quota-from-Azure-Support) to run my workflows?
* How can I [use private Docker containers for my workflows?](#Use-private-docker-containers-hosted-on-Azure)
* A lot of tasks for my workflows run longer than 24 hours and have been randomly stopped. How can I [run all my tasks on dedicated batch VMs](#Configure-my-Cromwell-on-Azure-instance-to-always-use-dedicated-Batch-VMs-to-avoid-getting-preempted)
* Can I get [direct access to Cromwell's REST API?](#Access-the-Cromwell-REST-API-directly-from-Linux-host-VM)
4. Performance & Optimization
* How can I figure out how much Cromwell on Azure costs me?
* How much am I [paying for my Cromwell on Azure instance?](#Cost-analysis-for-Cromwell-on-Azure)
* How are [batch VMs selected to run tasks in a workflow?](#How-Cromwell-on-Azure-selects-batch-VMs-to-run-tasks-in-a-workflow)
* Do you have guidance on how to [optimize my WDLs](#Optimize-my-WDLs)?
5. Miscellaneous
* I cannot find my issue in this document and [want more information](#Get-container-logs-to-debug-issues) from Cromwell, MySQL, or TES Docker container logs.
* I am running a large amount of workflows and [MySQL storage disk is full](#I-am-running-a-large-amount-of-workflows-and-MySQL-storage-disk-is-full)
## Known Issues And Mitigation
### I am trying to use files with SAS tokens but run into file access issues
There is currently a bug (which we are tracking) in a dependency tool we use to get files from Azure Storage to the VM to perform a task. For now, follow these steps as a workaround if you are running into errors getting access to your files using SAS tokens on Cromwell on Azure.
If you followed [these](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview#get-started-with-sas) instructions to create a SAS URL, youll get something similar to
```
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28si=inputs-key&sr=c&sig=somestring
```
Focus on this part: **si=inputs-key&sr=c** <br/>
Manually change order of `sr` and `si` fields to get something similar to
```
https://YourStorageAccount.blob.core.windows.net/inputs?sv=2018-03-28&sr=c&si=inputs-keysig=somestring
```
After the change, **sr=c&si=inputs-key** should be the order in your SAS URL. <br/>
Update all the SAS URLs similarly and retry your workflow.
### All TES tasks for my workflow are done running, but the trigger JSON file is still in the "inprogress" directory in the workflows container
The root cause is most likely memory pressure on the host Linux VM because [blobfuse](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux#overview) processes grow to consume all physical memory.
You may see the following Cromwell container logs as a symptom:
> Cromwell shutting down because it cannot access the database):
Shutting down cromid-5bd1d24 as at least 15 minutes of heartbeat write errors have occurred between 2020-02-18T22:03:01.110Z and 2020-02-18T22:19:01.111Z (16.000016666666667 minutes
OR
> Failed to instantiate Cromwell System. Shutting down Cromwell.
liquibase.exception.LockException: Could not acquire change log lock. Currently locked by 012ec19c3285 (172.18.0.4) since 2/19/20 4:10 PM
To mitigate, please resize your VM in the resource group to a machine with at least 14GB memory/RAM. Any workflows still in progress will not be affected.
![Resize VM](/docs/screenshots/resize-VM.png)
## Setup
### Setup Cromwell on Azure for multiple users in the same Azure subscription
This section is COMING SOON.
### Debug my Cromwell on Azure installation that ran into an error
When the Cromwell on Azure installer is run, if there are errors, the logs are printed in the terminal. Most errors are related to insufficient permissions to create resources in Azure on your behalf, or intermittent Azure failures. In case of an error, we terminate the installation process and begin deleting all the resources in the Resource Group if already created. <br/>
Deleting all the resources in the Resource Group may take a while but as soon as you see logs that the batch account was deleted, you may exit the current process using Ctrl+C or Command+C on terminal/command prompt/PowerShell. The deletion of other Azure resources can continue in the background on Azure. Re-run the installer after fixing any user errors like permissions from the previous try.
If you see an issue that is unrelated to your permissions, and re-trying the installer does not fix it, please file a bug on our GitHub issues.
### Upgrade my Cromwell on Azure instance
Starting in version 1.x, for convenience, some configuration files are hosted on your Cromwell on Azure storage account, in the "configuration" container - `containers-to-mount`, and `cromwell-application.conf`. You can modify and save these file using Azure Portal UI "Edit Blob" option or simply upload a new file to replace the existing one. Please create the "configuration" container in your storage account if it isn't there already; and then [follow these steps](https://github.com/microsoft/CromwellOnAzure/releases/tag/1.3.0) to upgrade your Cromwell on Azure instance.
## Analysis
### Job failed immediately
If a workflow you start has a task that failed immediately and lead to workflow failure, be sure to check your input JSON files. Follow the instructions [here](managing-your-workflow.md/#How-to-prepare-an-inputs-JSON-file-to-use-in-your-workflow) and check out an example WDL and inputs JSON file [here](example-fastq-to-ubam.md/#Configure-your-Cromwell-on-Azure-trigger-JSON,-inputs-JSON-and-WDL-files) to ensure there are no errors in defining your input files.
> For files hosted on an Azure Storage account that is connected to your Cromwell on Azure instance, the input path consists of 3 parts - the storage account name, the blob container name, file path with extension, following this format:
```
/<storageaccountname>/<containername>/<blobName>
```
> Example file path for an "inputs" container in a storage account "msgenpublicdata" will look like
`"/msgenpublicdata/inputs/chr21.read1.fq.gz"`
Example WDL file:
```
task hello {
String name
Another possibility is that you are trying to use a storage account that hasn't been mounted to your Cromwell on Azure instance - either by [default during setup](../README.md/#Cromwell-on-Azure-deployed-resources) or by following these steps to [mount a different storage account](#Use-input-data-files-from-an-existing-Storage-account-that-my-lab-or-team-is-currently-using). <br/>
command {
echo 'Hello ${name}!'
}
output {
File response = stdout()
}
runtime {
docker: 'ubuntu:16.04'
}
}
Check out these [known issues and mitigation](#Known-Issues-And-Mitigation) for more commonly seen issues caused by bugs we are actively tracking.
workflow test {
call hello
}
```
### Set up my own WDL
To get started you can view this [Hello World sample](../README.md/#Hello-World-WDL-test), an [example WDL](example-fastq-to-ubam.md) to convert FASTQ to UBAM or [follow these steps](change-existing-WDL-for-Azure.md) to convert an existing public WDL for other clouds to run on Azure. <br/>
There are also links to ready-to-try WDLs for common workflows [here](../README.md/#Run-Common-Omics-Workflows)
Example inputs.json file:
```
{
"test.hello.name": "World"
}
```
**Instructions to write a WDL file for a pipeline from scratch are COMING SOON.**
## How to define the runtime attributes in a WDL workflow for Cromwell on Azure
In order to run a WDL file, you must modify/create a workflow with the following runtime attributes for the tasks that are compliant with the [TES or Task Execution Schemas](https://cromwell.readthedocs.io/en/develop/backends/TES/):
```
runtime {
cpu: 1
memory: 2 GB
disk: 10 GB
docker:
maxRetries: 0
}
```
Ensure that the attributes `memory` and `disk` (note: use the singular form for `disk` NOT `disks`) have units. Supported units from Cromwell:
> KB - "KB", "K", "KiB", "Ki"<br/>
> MB - "MB", "M", "MiB", "Mi"<br/>
> GB - "GB", "G", "GiB", "Gi"<br/>
> TB - "TB", "T", "TiB", "Ti"<br/>
`preemptible` and `zones` attributes are currently not being passed through Broad's Cromwell to the TES backend, and hence are not supported.<br/>
Each of these runtime attributes are specific to your workflow and tasks within those workflows. The default values for resource requirements are as set above.<br/>
Learn more about Cromwell's runtime attributes [here](https://cromwell.readthedocs.io/en/develop/RuntimeAttributes).
## How to get the Cromwell workflow ID
The Cromwell workflow ID is generated by Cromwell once the workflow is in progress, and it is appended to the trigger JSON file name.<br/>
For example, placing a trigger JSON file with name `task1.json` in the "new" directory will initiate the workflow. Once the workflow begins, the JSON file will be moved to the "inprogress" directory in the "workflows" container with a modified name `task1.guid.json`
## How to abort a workflow when using Cromwell on Azure
To abort a workflow, upload an empty JSON file to the "workflows" container named `abort/ID.json` where ID is the Cromwell workflow ID.
## Pricing for Cromwell on Azure
To learn more about your Resource Group's cost, navigate to the "Cost Analysis" menu item in the "Cost Management" section of your Azure Resource Group on Azure Portal. More information [here](https://docs.microsoft.com/en-us/azure/cost-management/quick-acm-cost-analysis).<br/>
![RG cost analysis](screenshots/rgcost.PNG)
You can also use the [Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/) to estimate your monthly cost.
## Dynamic cost optimization and RateCard API access
VM price data is used to select the most cost-effective VM for a task's runtime requirements, and is also stored in the TES database to allow calculation of total workflow cost. VM price data is obtained from the [Azure RateCard API](https://docs.microsoft.com/en-us/previous-versions/azure/reference/mt219005(v=azure.100)). Accessing the Azure RateCard API requires the VM's [Billing Reader](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#billing-reader) role to be assigned to your Azure subscription scope. If you don't have [Owner](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#owner), or both [Contributor](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#contributor) and [User Access Administrator](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#user-access-administrator) roles assigned to your Azure subscription, the deployer will not be able to complete this on your behalf - you will need to contact your Azure subscription administrator(s) to complete this for you. You will see a warning in the TES logs indicating that default VM prices are being used until this is resolved.
## Debugging tools
### How to check all tasks running for a workflow using Batch account
Each task in a workflow starts an Azure Batch node. To see currently active tasks, navigate to your Azure Batch instance on Azure Poetal. Click on "Jobs" and then search for the Cromwell `workflowId` to see all tasks associated with a workflow.<br/>
### Check all tasks running for a workflow using batch account
Each task in a workflow starts an Azure Batch VM. To see currently active tasks, navigate to your Azure Batch account connected to Cromwell on Azure on Azure Portal. Click on "Jobs" and then search for the Cromwell `workflowId` to see all tasks associated with a workflow.<br/>
![Batch account](screenshots/batch-account.png)
### How to use Application Insights
### Find which tasks failed in a workflow
[Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/) stores information about all tasks in a workflow. For monitoring or debugging any workflow you may choose to query the database.<br/>
Navigate to your Cosmos DB instance on Azure Portal. Click on the "Data Explorer" menu item, Click on the "TES" container and select "Items". <br/>
![Cosmos DB SQL query](screenshots/cosmosdb.PNG)
You can write a [SQL query](https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-query-sql-api) to get all tasks that have not completed successfully in a workflow using the following query, replacing `workflowId` with the id returned from Cromwell for your workflow:
```
SELECT * FROM c where startswith(c.description,"workflowId") AND c.state != "COMPLETE"
```
OR
```
SELECT * FROM c where startswith(c.id,"<first 9 character of the workflowId>") AND c.state != "COMPLETE"
```
### Make sure there are no Azure infrastructure errors
When working with Cromwell on Azure, you may run into issues with Azure Batch or Storage accounts. For instance, if a file path cannot be found or if the WDL workflow failed with an unknown reason. For these scenarios, consider debugging or collecting more information using [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview).<br/>
Navigate to your Application Insights instance on Azure Portal. Click on the "Logs (Analytics)" menu item under the "Monitoring" section to get all logs from Cromwell on Azure's TES backend.<br/>
@ -91,22 +135,118 @@ Navigate to your Application Insights instance on Azure Portal. Click on the "Lo
You can explore exceptions or logs to find the reason for failure, and use time ranges or [Kusto Query Language](https://docs.microsoft.com/en-us/azure/kusto/query/) to narrow your search.<br/>
### How to use Cosmos DB
[Cosmos DB](https://azure.microsoft.com/en-us/services/cosmos-db/) stores information about all tasks in a workflow. For monitoring or debugging any workflow you may choose to query the database.<br/>
Navigate to your Cosmos DB instance on Azure Portal. Click on the "Data Explorer" menu item, Click on the "TES" container and select "Items". <br/>
## Customizing your Cromwell on Azure instance
### Connect to the host VM that runs all the Docker containers
![Cosmos DB SQL query](screenshots/cosmosdb.PNG)
To get logs from all the Docker containers or to use the Cromwell REST API endpoints, you may want to connect to the Linux host VM. At installation, a user is created to allow managing the host VM with username "vmadmin". The password is randomly generated and shown during installation. If you need to reset your VM password, you can do this using the Azure Portal or by following these [instructions](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/reset-password).
You can write a [SQL query](https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-query-sql-api) to get all tasks in a workflow using the following query, replacing `workflowId` with the id returned from Cromwell for your workflow:
![Reset password](/docs/screenshots/resetpassword.PNG)
To connect to your host VM, you can either
1. Construct your ssh connection string if you have the VM name `ssh vmadmin@<hostname>` OR
2. Navigate to the Connect button on the Overview blade of your Azure VM instance, then copy the ssh connection string.
Paste the ssh connection string in a command line, PowerShell or terminal application to log in.
![Connect with SSH](/docs/screenshots/connectssh.PNG)
### Use input data files from an existing storage account that my lab or team is currently using
Navigate to the "configuration" container in the Cromwell on Azure Storage account. Replace YOURSTORAGEACCOUNTNAME with your storage account name and YOURCONTAINERNAME with your container name in the `containers-to-mount` file below:
```
SELECT * FROM c where startswith(c.description,"workflowId")
/YOURSTORAGEACCOUNTNAME/YOURCONTAINERNAME/
```
Add this to the end of file and save your changes.<br/>
To allow the host VM to write to a storage account, [add the VM identity as a Contributor](/README.md/#Connect-to-existing-Azure-resources-I-own-that-are-not-part-of-the-Cromwell-on-Azure-instance-by-default) to the Storage Account via Azure Portal or Azure CLI.<br/>
Alternatively, you can choose to add a [SAS url for your desired container](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) to the end of the `containers-to-mount` file. This is also applicable if your VM cannot be granted Contributor access to the storage account because the two resources are in different Azure tenants
```
https://<yourstorageaccountname>.blob.core.windows.net:443/<yourcontainername>?<sastoken>
```
OR
When using the newly mounted storage account in your inputs JSON file, use the path `"/container-mountpath/blobName"`, where `container-mountpath` is `/YOURSTORAGEACCOUNTNAME/YOURCONTAINERNAME/`.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI.
### Use a batch account for which I have already requested or received increased cores quota from Azure Support
[Log on to the host VM](#Connect-to-the-host-VM-that-runs-all-the-docker-containers) using the ssh connection string as described in the instructions. Replace `BatchAccountName` environment variable for the "tes" service in the `docker-compose.yml` file with the name of the desired batch account and save your changes.<br/>
```
SELECT * FROM c where startswith(c.id,"<first 9 character of the workflowId>")
cd /cromwellazure/
sudo nano docker-compose.yml
# Modify the BatchAccountName to your Batch Account name and save the file
```
To allow the host VM to use a batch account, [add the VM identity as a Contributor](../README.md/#Connect-to-existing-Azure-resources-I-own-that-are-not-part-of-the-Cromwell-on-Azure-instance-by-default) to the Azure batch account via Azure Portal or Azure CLI.<br/>
To allow the host VM to read prices and information about types of machines available for the batch account, [add the VM identity as a Billing Reader](../README.md/#Connect-to-existing-Azure-resources-I-own-that-are-not-part-of-the-Cromwell-on-Azure-instance-by-default) to the subscription with the configured Batch Account.
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`. or run `sudo reboot`.
### Use private Docker containers hosted on Azure
Cromwell on Azure supports private Docker images for your WDL tasks hosted on [Azure Container Registry or ACR](https://docs.microsoft.com/en-us/azure/container-registry/).
To allow the host VM to use an ACR, [add the VM identity as a Contributor](../README.md/#Connect-to-existing-Azure-resources-I-own-that-are-not-part-of-the-Cromwell-on-Azure-instance-by-default) to the Container Registry via Azure Portal or Azure CLI.<br/>
### Configure my Cromwell on Azure instance to always use dedicated batch VMs to avoid getting preempted
By default, we are using an environment variable `UsePreemptibleVmsOnly` set to true, to always use low priority Azure batch nodes.<br/>
If you prefer to use dedicated Azure Batch nodes for all tasks, [log on to the host VM](#Connect-to-the-host-VM-that-runs-all-the-docker-containers) using the ssh connection string as described in the instructions. Replace `UsePreemptibleVmsOnly` environment variable for the "tes" service to "false" in the `docker-compose.yml` file and save your changes.<br/>
```
cd /cromwellazure/
sudo nano docker-compose.yml
# Modify UsePreemptibleVmsOnly to false and save the file
```
Note that, you can set this for each task individually by using the `preemptible` boolean flag set to `true` or `false` in the "runtime" attributes section of your task. The `preemptible` runtime attribute will overwrite the `UsePreemptibleVmsOnly` environment variable setting for a particular task.<br/>
For these changes to take effect, be sure to restart your Cromwell on Azure VM through the Azure Portal UI or run `sudo reboot`.
### Access the Cromwell REST API directly from Linux host VM
Cromwell is run in server mode on the Linux host VM. After [logging in to the host VM](#Connect-to-the-host-VM-that-runs-all-the-docker-containers), it can be accessed via curl as described below:
***Get all workflows***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/query" -H "accept: application/json"`<br/>
***Get specific workflow's status by id***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/{id}/status" -H "accept: application/json"`<br/>
***Get call-caching difference between two workflow calls***<br/>
`curl -X GET "http://localhost:8000/api/workflows/v1/callcaching/diff?workflowA={workflowId1}&callA={workflowName.callName1}&workflowB={workflowId2}&callB={workflowName.callName2}" -H "accept: application/json"`<br/>
You can perform other Cromwell API calls following a similar pattern. To see all available API endpoints, see Cromwell's REST API [here](https://cromwell.readthedocs.io/en/stable/api/RESTAPI/)
## Performance and Optimization
### Cost analysis for Cromwell on Azure
To learn more about your Cromwell on Azure Resource Group's cost, navigate to the "Cost Analysis" menu item in the "Cost Management" section of your Azure Resource Group on the Azure Portal. More information [here](https://docs.microsoft.com/en-us/azure/cost-management/quick-acm-cost-analysis).<br/>
![RG cost analysis](screenshots/rgcost.PNG)
You can also use the [Pricing Calculator](https://azure.microsoft.com/en-us/pricing/calculator/) to estimate your monthly cost.
### How Cromwell on Azure selects batch VMs to run tasks in a workflow
VM price data is used to select the cheapest per hour VM for a task's runtime requirements, and is also stored in the TES database to allow calculation of total workflow cost. VM price data is obtained from the [Azure RateCard API](https://docs.microsoft.com/en-us/previous-versions/azure/reference/mt219005(v=azure.100)). Accessing the Azure RateCard API requires the VM's [Billing Reader](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#billing-reader) role to be assigned to your Azure subscription scope. If you don't have [Owner](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#owner), or both [Contributor](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#contributor) and [User Access Administrator](https://docs.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#user-access-administrator) roles assigned to your Azure subscription, the deployer will not be able to complete this on your behalf - you will need to contact your Azure subscription administrator(s) to complete this for you. You will see a warning in the TES logs indicating that default VM prices are being used until this is resolved.
### Optimize my WDLs
This section is COMING SOON.
## Miscellaneous
### Get container logs to debug issues
The host VM is running multiple Docker containers that enable Cromwell on Azure - mysql, broadinstitute/cromwell, cromwellonazure/tes, cromwellonazure/triggerservice. On rare occasions, you may want to debug and diagnose issues with the Docker containers. After [logging in to the host VM](#Connect-to-the-host-VM-that-runs-all-the-docker-containers), run:
```
sudo docker ps
```
This command will list the names of all the Docker containers currently running. To get logs for a particular container, run:
```
sudo docker logs 'containerName'
```
### I am running a large amount of workflows and MySQL storage disk is full
To ensure that no data is corrupted for MySQL backed storage for Cromwell, Cromwell on Azure mounts MySQL files on to an Azure Managed Data Disk of size 32G. In case there is a need to increase the size of this data disk, follow instructions [here](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/expand-disks#expand-an-azure-managed-disk).