|
@ -0,0 +1,21 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) Microsoft Corporation.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE
|
139
README.md
|
@ -1,3 +1,136 @@
|
|||
# Repository setup required :wave:
|
||||
|
||||
Please visit the website URL :point_right: for this repository to complete the setup of this repository and configure access controls.
|
||||
## Azure Synapse 1-click POC environment with pre-populated dataset, pipeline, notebook
|
||||
This 1-click deployment allows the user to deploy a Proof-of-Concept environment of Azure Synapse Analytics with dataset (New York Taxi Trips & Fares data), pipeline to (ingest, merge, aggregate), notebook (Spark ML prediction)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Owner role (or Contributor roles) for the Azure Subscription the template being deployed in. This is for creation of a separate Proof-of-Concept Resource Group and to delegate roles necessary for this proof of concept. Refer to this [official documentation](https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-steps) for RBAC role-assignments.
|
||||
|
||||
## Deployment Steps
|
||||
1. Fork out [this github repository](https://github.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click) into your github account.
|
||||
|
||||
**If you don't fork repo:**
|
||||
+ **The pre-populated dataset, pipeline and notebook will not be deployed**
|
||||
+ **You will get a Github publishing error**
|
||||
|
||||
|
||||
<!-- ![Fork](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/4.gif) -->
|
||||
|
||||
2. Click 'Deploy To Azure' button given below to deploy all the resources.
|
||||
|
||||
[![Deploy To Azure](https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/1-CONTRIBUTION-GUIDE/images/deploytoazure.svg?sanitize=true)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2FTest-Drive-Azure-Synapse-with-a-1-click-POC%2Fmain%2Fazuredeploy.json)
|
||||
|
||||
- Provide the values for:
|
||||
|
||||
- Resource group (create new)
|
||||
- Region
|
||||
- Company Tla
|
||||
- Option (true or false) for Allow All Connections
|
||||
- Option (true or false) for Spark Deployment
|
||||
- Spark Node Size (Small, Medium, large) if Spark Deployment is set to true
|
||||
- Sql Administrator Login
|
||||
- Sql Administrator Login Password
|
||||
- Sku
|
||||
- Option (true or false) for Metadata Sync
|
||||
- Frequency
|
||||
- Time Zone
|
||||
- Resume Time
|
||||
- Pause Time
|
||||
- Option (Enabled or Disabled) for Transparent Data Encryption
|
||||
- Github Username (username for the account where [this github repository](https://github.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click) was forked out into)
|
||||
|
||||
- Click 'Review + Create'.
|
||||
- On successful validation, click 'Create'.
|
||||
|
||||
## Azure Services being deployed
|
||||
This template deploys necessary resources to run an Azure Synapse Proof-of-Concept.
|
||||
Following resources are deployed with this template along with some RBAC role assignments:
|
||||
|
||||
- An Azure Synapse Workspace
|
||||
- An Azure Synapse SQL Pool
|
||||
- An optional Apache Spark Pool
|
||||
- Azure Data Lake Storage Gen2 account
|
||||
- A new File System inside the Storage Account to be used by Azure Synapse
|
||||
- A Logic App to Pause the SQL Pool at defined schedule
|
||||
- A Logic App to Resume the SQL Pool at defined schedule
|
||||
- A key vault to store the secrets
|
||||
|
||||
<!-- The data pipeline inside the Synapse Workspace gets New York Taxi trip and fare data, joins them and perform aggregations on them to give the final aggregated results. Other resources include datasets, linked services and dataflows. All resources are completely parameterized and all the secrets are stored in the key vault. These secrets are fetched inside the linked services using key vault linked service. The Logic App will check for Active Queries. If there are active queries, it will wait 5 minutes and check again until there are none before pausing -->
|
||||
|
||||
## Post Deployment
|
||||
- Current Azure user needs to have ["Storage Blob Data Contributor" role access](https://docs.microsoft.com/en-us/azure/synapse-analytics/get-started-add-admin#azure-rbac-role-assignments-on-the-workspaces-primary-storage-account) to recently created Azure Data Lake Storage Gen2 account to avoid 403 type permission errors.
|
||||
- After the deployment is complete, click 'Go to resource group'.
|
||||
- You'll see all the resources deployed in the resource group.
|
||||
- Click on the newly deployed Synapse workspace.
|
||||
- Click on link 'Open' inside the box labelled as 'Open Synapse Studio'.
|
||||
- Click on 'Log into Github' after workspace is opened. Provide your credentials for the github account holding the forked out repository.
|
||||
- After logging in into your github account, click on 'Integrate' icon in the left panel. A blade will appear from right side of the screen.
|
||||
- Make sure that 'main' branch is selected as 'Working branch' and click 'Save'.
|
||||
|
||||
![PostDeployment-1](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/1.gif)
|
||||
|
||||
- Now open the pipeline named 'TripFaresDataPipeline'.
|
||||
- Click on 'Parameters' tab at bottom of the window.
|
||||
- Update the following parameter values. ___(You can copy the resource names from the resource group recently deployed.)___
|
||||
- SynapseWorkspaceName (Make sure workspace name is fully qualified domain name, i.e. workspaceName.database.windows.net)
|
||||
- SQLDedicatedPoolName
|
||||
- SQLLoginUsername
|
||||
- KeyVaultName
|
||||
- DatalakeAccountName
|
||||
|
||||
![PostDeployment-2](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/2.gif)
|
||||
|
||||
- After the parameters are updated, click on 'Commit all'.
|
||||
- After successful commit, click 'Publish'. A blade will appear from right side of the window.
|
||||
- Click 'Ok'.
|
||||
|
||||
![PostDeployment-3](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/3.gif)
|
||||
|
||||
- Now to trigger the pipeline, click 'Add trigger' at the top panel and click 'Trigger now'.
|
||||
- Confirm the pipeline parameters' values and click 'Ok'.
|
||||
- You can check the pipeline status under 'Pipeline runs' in the 'Monitor' tab on the left panel.
|
||||
|
||||
![PostDeployment-4](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/5.gif)
|
||||
|
||||
- To run the notebook (if spark pool is deployed), click on 'Develop' tab on the left panel.
|
||||
- Now under 'Notebooks' dropdown on left side of screen, click the notebook named 'Data Exploration and ML Modeling - NYC taxi predict using Spark MLlib'.
|
||||
- Click 'Run all' to run the notebook. (It might take a few minutes to start the session)
|
||||
|
||||
![PostDeployment-5](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/6.gif)
|
||||
|
||||
- Once published all the resources will now be available in the live mode.
|
||||
- To switch to the live mode from git mode, click the drop down at top left corner and select 'Switch to live mode'.
|
||||
|
||||
![PostDeployment-6](https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/images/liveMode.PNG)
|
||||
|
||||
## Steps for PowerBI integration
|
||||
|
||||
**Pre-requisites**
|
||||
|
||||
PowerBI workspace created. Please note that you can’t use default workspace (‘My workspace’). create a new PBI workspace or use any other workspace other than ‘My workspace’.
|
||||
|
||||
Create PowerBI workspace --> https://docs.microsoft.com/en-us/power-bi/collaborate-share/service-create-the-new-workspaces
|
||||
|
||||
**Link Azure Synapse workspace to PowerBI workspace**
|
||||
|
||||
- In Synapse workspace, go to Manage --> Linked Services.
|
||||
- Click on PowerBIWorkspaceTripsFares linked service
|
||||
- From the drop down list, select your PowerBI workspace and Save and publish.
|
||||
|
||||
![20211014134407](https://user-images.githubusercontent.com/88354448/137524650-9d066921-d057-4a08-8d55-4f8c02eb3690.gif)
|
||||
|
||||
- Download [NYCTaxiCabTripAndFare.pbit] (https://github.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/tree/main/synapsepoc/PowerBITemplate/NYCTaxiCabTripAndFare.pbit) from PowerBITemplate folder
|
||||
- Provide ServerName, DatabaseName and login credentials. ServerName and DatabaseName can be found in connection strings.
|
||||
- To get the connection string, click on Dedicated SQL Pool
|
||||
- On the left hand side menu, click on connection strings
|
||||
- Copy ServerName and DatabaseName from connection string, paste them in PowerBI and click on 'Load'.
|
||||
- Select 'Database' (instead of default 'Windows') and provide User name, Password and click on 'Connect'
|
||||
|
||||
![20211014140340](https://user-images.githubusercontent.com/88354448/137524802-c720137f-9f9c-4c84-93b9-35c5ef0ce759.gif)
|
||||
|
||||
- Change the sensitivity level to 'Public' and **save** the dashboard.
|
||||
- Publish the dashboard to the PowerBI workspace you have created by clicking on 'Publish' and selecting the workspace.
|
||||
- In Synapse workspage navigate to Develop --> PowerBI --> Refresh.
|
||||
- You see the PowerBI report in Synapse you had published in PowerBI workspace.
|
||||
|
||||
![20211014144422](https://user-images.githubusercontent.com/88354448/137524861-ac32c4dc-856f-41e9-8f01-8dfa0cc7baae.gif)
|
||||
|
||||
|
|
|
@ -0,0 +1,583 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"companyTla": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "This is a Three Letter Acronym for your company name. 'CON' for Contoso for example."
|
||||
}
|
||||
},
|
||||
"allowAllConnections": {
|
||||
"type": "string",
|
||||
"allowedValues": [
|
||||
"true",
|
||||
"false"
|
||||
],
|
||||
"defaultValue": "true"
|
||||
},
|
||||
"sparkDeployment": {
|
||||
"type": "string",
|
||||
"defaultValue": "true",
|
||||
"allowedValues": [
|
||||
"true",
|
||||
"false"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "'True' deploys an Apache Spark pool as well as a SQL pool. 'False' does not deploy an Apache Spark pool."
|
||||
}
|
||||
},
|
||||
"sparkNodeSize": {
|
||||
"type": "string",
|
||||
"defaultValue": "Medium",
|
||||
"allowedValues": [
|
||||
"Small",
|
||||
"Medium",
|
||||
"Large"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "This parameter will determine the node size if SparkDeployment is true"
|
||||
}
|
||||
},
|
||||
"sqlAdministratorLogin": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "The username of the SQL Administrator"
|
||||
}
|
||||
},
|
||||
"sqlAdministratorLoginPassword": {
|
||||
"type": "securestring",
|
||||
"metadata": {
|
||||
"description": "The password for the SQL Administrator"
|
||||
}
|
||||
},
|
||||
"sku": {
|
||||
"type": "string",
|
||||
"defaultValue": "DW100c",
|
||||
"allowedValues": [
|
||||
"DW100c",
|
||||
"DW200c",
|
||||
"DW300c",
|
||||
"DW400c",
|
||||
"DW500c",
|
||||
"DW1000c",
|
||||
"DW1500c",
|
||||
"DW2000c",
|
||||
"DW2500c",
|
||||
"DW3000c"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "Select the SKU of the SQL pool."
|
||||
}
|
||||
},
|
||||
"metadataSync": {
|
||||
"type": "bool",
|
||||
"defaultValue": false,
|
||||
"metadata": {
|
||||
"description": "Choose whether you want to synchronise metadata."
|
||||
}
|
||||
},
|
||||
"Frequency": {
|
||||
"type": "string",
|
||||
"defaultValue": "Weekdays",
|
||||
"allowedValues": [
|
||||
"Daily",
|
||||
"Weekdays"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "Choose whether to run schedule every day of the week, or only on weekdays"
|
||||
}
|
||||
},
|
||||
"timeZone": {
|
||||
"type": "string",
|
||||
"defaultValue": "Eastern Standard Time",
|
||||
"allowedValues": [
|
||||
"Dateline Standard Time",
|
||||
"Samoa Standard Time",
|
||||
"Hawaiian Standard Time",
|
||||
"Alaskan Standard Time",
|
||||
"Pacific Standard Time",
|
||||
"Mountain Standard Time",
|
||||
"Mexico Standard Time 2",
|
||||
"Central Standard Time",
|
||||
"Canada Central Standard Time",
|
||||
"Mexico Standard Time",
|
||||
"Central America Standard Time",
|
||||
"Eastern Standard Time",
|
||||
"Atlantic Standard Time",
|
||||
"Newfoundland and Labrador Standard Time",
|
||||
"E. South America Standard Time",
|
||||
"S.A. Eastern Standard Time",
|
||||
"Greenland Standard Time",
|
||||
"Mid-Atlantic Standard Time",
|
||||
"Azores Standard Time",
|
||||
"Cape Verde Standard Time",
|
||||
"GMT Standard Time",
|
||||
"Greenwich Standard Time",
|
||||
"Central Europe Standard Time",
|
||||
"Central European Standard Time",
|
||||
"Romance Standard Time",
|
||||
"W. Europe Standard Time",
|
||||
"W. Central Africa Standard Time",
|
||||
"E. Europe Standard Time",
|
||||
"Egypt Standard Time",
|
||||
"FLE Standard Time",
|
||||
"GTB Standard Time",
|
||||
"Israel Standard Time",
|
||||
"South Africa Standard Time",
|
||||
"Russian Standard Time",
|
||||
"Arab Standard Time",
|
||||
"E. Africa Standard Time",
|
||||
"Arabic Standard Time",
|
||||
"Iran Standard Time",
|
||||
"Arabian Standard Time",
|
||||
"Caucasus Standard Time",
|
||||
"Transitional Islamic State of Afghanistan Standard Time",
|
||||
"Ekaterinburg Standard Time",
|
||||
"West Asia Standard Time",
|
||||
"India Standard Time",
|
||||
"Nepal Standard Time",
|
||||
"Central Asia Standard Time",
|
||||
"Sri Lanka Standard Time",
|
||||
"Myanmar Standard Time",
|
||||
"North Asia Standard Time",
|
||||
"China Standard Time",
|
||||
"Singapore Standard Time",
|
||||
"Taipei Standard Time",
|
||||
"North Asia East Standard Time",
|
||||
"Korea Standard Time",
|
||||
"Tokyo Standard Time",
|
||||
"Yakutsk Standard Time",
|
||||
"Tasmania Standard Time",
|
||||
"Vladivostok Standard Time",
|
||||
"West Pacific Standard Time",
|
||||
"Central Pacific Standard Time",
|
||||
"Fiji Islands Standard Time",
|
||||
"New Zealand Standard Time",
|
||||
"Tonga Standard Time"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "Timezone for the schedule. Consult https://msdn.microsoft.com/en-us/library/ms912391(v=winembedded.11).aspx for more information"
|
||||
}
|
||||
},
|
||||
"ResumeTime": {
|
||||
"type": "string",
|
||||
"defaultValue": "09:00 PM ( 21:00 )",
|
||||
"allowedValues": [
|
||||
"12:00 AM ( 0:00 )",
|
||||
"01:00 AM ( 1:00 )",
|
||||
"02:00 AM ( 2:00 )",
|
||||
"03:00 AM ( 3:00 )",
|
||||
"04:00 AM ( 4:00 )",
|
||||
"05:00 AM ( 5:00 )",
|
||||
"06:00 AM ( 6:00 )",
|
||||
"07:00 AM ( 7:00 )",
|
||||
"08:00 AM ( 8:00 )",
|
||||
"09:00 AM ( 9:00 )",
|
||||
"10:00 AM ( 10:00 )",
|
||||
"11:00 AM ( 11:00 )",
|
||||
"12:00 PM ( 12:00 )",
|
||||
"01:00 PM ( 13:00 )",
|
||||
"02:00 PM ( 14:00 )",
|
||||
"03:00 PM ( 15:00 )",
|
||||
"04:00 PM ( 16:00 )",
|
||||
"05:00 PM ( 17:00 )",
|
||||
"06:00 PM ( 18:00 )",
|
||||
"07:00 PM ( 19:00 )",
|
||||
"08:00 PM ( 20:00 )",
|
||||
"09:00 PM ( 21:00 )",
|
||||
"10:00 PM ( 22:00 )",
|
||||
"11:00 PM ( 23:00 )"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "Time of Day when the data warehouse will be resumed"
|
||||
}
|
||||
},
|
||||
"PauseTime": {
|
||||
"type": "string",
|
||||
"defaultValue": "05:00 PM ( 17:00 )",
|
||||
"allowedValues": [
|
||||
"12:00 AM ( 0:00 )",
|
||||
"01:00 AM ( 1:00 )",
|
||||
"02:00 AM ( 2:00 )",
|
||||
"03:00 AM ( 3:00 )",
|
||||
"04:00 AM ( 4:00 )",
|
||||
"05:00 AM ( 5:00 )",
|
||||
"06:00 AM ( 6:00 )",
|
||||
"07:00 AM ( 7:00 )",
|
||||
"08:00 AM ( 8:00 )",
|
||||
"09:00 AM ( 9:00 )",
|
||||
"10:00 AM ( 10:00 )",
|
||||
"11:00 AM ( 11:00 )",
|
||||
"12:00 PM ( 12:00 )",
|
||||
"01:00 PM ( 13:00 )",
|
||||
"02:00 PM ( 14:00 )",
|
||||
"03:00 PM ( 15:00 )",
|
||||
"04:00 PM ( 16:00 )",
|
||||
"05:00 PM ( 17:00 )",
|
||||
"06:00 PM ( 18:00 )",
|
||||
"07:00 PM ( 19:00 )",
|
||||
"08:00 PM ( 20:00 )",
|
||||
"09:00 PM ( 21:00 )",
|
||||
"10:00 PM ( 22:00 )",
|
||||
"11:00 PM ( 23:00 )"
|
||||
],
|
||||
"metadata": {
|
||||
"description": "Time of day when the data warehouse will be paused"
|
||||
}
|
||||
},
|
||||
"githubUsername": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "Username of your github account hosting synapse workspace resources"
|
||||
}
|
||||
}
|
||||
},
|
||||
"variables": {
|
||||
"_artifactsLocation": "[deployment().properties.templatelink.uri]",
|
||||
"location": "[resourceGroup().location]",
|
||||
"deploymentType": "poc",
|
||||
"synapseName": "[toLower(concat(parameters('companyTla'),uniquestring(resourceGroup().id),variables('deploymentType')))]",
|
||||
"dlsName": "[toLower(concat(parameters('companyTla'),uniquestring(resourceGroup().id),variables('deploymentType')))]",
|
||||
"dlsFsName": "[toLower(concat('dls',parameters('companyTla'),variables('deploymentType'),'fs1'))]",
|
||||
"sqlPoolName": "[toLower(concat(variables('workspaceName'),'p1'))]",
|
||||
"workspaceName": "[toLower(concat(variables('synapseName'),'ws1'))]",
|
||||
"sparkPoolName": "[toLower('ws1sparkpool1')]",
|
||||
"keyVaultName": "[toLower(concat('kv',parameters('companyTla'),uniquestring(resourceGroup().id),variables('deploymentType')))]",
|
||||
"storageAccountId": "[resourceId('Microsoft.Storage/storageAccounts', variables('dlsName'))]",
|
||||
"logicApps": [
|
||||
"SynapsePauseSchedule",
|
||||
"SynapseResumeSchedule"
|
||||
]
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"name": "keyVaultDeployment",
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2018-05-01",
|
||||
"dependsOn": [
|
||||
"[resourceId('Microsoft.Storage/storageAccounts', variables('dlsName'))]"
|
||||
],
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/keyvault.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"location": {
|
||||
"value": "[variables('location')]"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "[variables('keyVaultName')]"
|
||||
},
|
||||
"sqlAdministratorLoginPassword": {
|
||||
"value": "[parameters('sqlAdministratorLoginPassword')]"
|
||||
},
|
||||
"datalakeAccountAccessKeyValue": {
|
||||
"value": "[listKeys(variables('storageAccountId'), '2019-06-01').keys[0].value]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "keyVaultPermissionsUpdate",
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2018-05-01",
|
||||
"dependsOn": [
|
||||
"[variables('workspaceName')]",
|
||||
"[resourceId('Microsoft.Resources/deployments','keyVaultDeployment')]"
|
||||
],
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/keyvaultpermissionsupdate.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"location": {
|
||||
"value": "[variables('location')]"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "[variables('keyVaultName')]"
|
||||
},
|
||||
"workspaceName": {
|
||||
"value": "[variables('workspaceName')]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-06-01",
|
||||
"name": "logicAppPauseDeployment",
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/pausetemplate.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"logicAppName": {
|
||||
"value": "[variables('logicApps')[0]]"
|
||||
},
|
||||
"Frequency": {
|
||||
"value": "[parameters('Frequency')]"
|
||||
},
|
||||
"companyTla": {
|
||||
"value": "[parameters('companyTla')]"
|
||||
},
|
||||
"deploymentType": {
|
||||
"value": "[variables('deploymentType')]"
|
||||
},
|
||||
"timeZone": {
|
||||
"value": "[parameters('timeZone')]"
|
||||
},
|
||||
"PauseTime": {
|
||||
"value": "[parameters('PauseTime')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[variables('location')]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-06-01",
|
||||
"name": "logicAppResumeDeployment",
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/resumetemplate.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"logicAppName": {
|
||||
"value": "[variables('logicApps')[1]]"
|
||||
},
|
||||
"Frequency": {
|
||||
"value": "[parameters('Frequency')]"
|
||||
},
|
||||
"companyTla": {
|
||||
"value": "[parameters('companyTla')]"
|
||||
},
|
||||
"deploymentType": {
|
||||
"value": "[variables('deploymentType')]"
|
||||
},
|
||||
"timeZone": {
|
||||
"value": "[parameters('timeZone')]"
|
||||
},
|
||||
"ResumeTime": {
|
||||
"value": "[parameters('ResumeTime')]"
|
||||
},
|
||||
"location": {
|
||||
"value": "[variables('location')]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Storage/storageAccounts",
|
||||
"apiVersion": "2019-06-01",
|
||||
"name": "[variables('dlsName')]",
|
||||
"location": "[variables('location')]",
|
||||
"sku": {
|
||||
"name": "Standard_LRS"
|
||||
},
|
||||
"kind": "StorageV2",
|
||||
"properties": {
|
||||
"accessTier": "Hot",
|
||||
"supportsHttpsTrafficOnly": true,
|
||||
"isHnsEnabled": true
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"name": "[concat('default/', variables('dlsFsName'))]",
|
||||
"type": "blobServices/containers",
|
||||
"apiVersion": "2019-06-01",
|
||||
"dependsOn": [
|
||||
"[variables('dlsName')]"
|
||||
],
|
||||
"properties": {
|
||||
"publicAccess": "None"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Synapse/workspaces",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "[variables('workspaceName')]",
|
||||
"location": "[variables('location')]",
|
||||
"identity": {
|
||||
"type": "SystemAssigned"
|
||||
},
|
||||
"dependsOn": [
|
||||
"[variables('dlsName')]",
|
||||
"[variables('dlsFsName')]"
|
||||
],
|
||||
"properties": {
|
||||
"defaultDataLakeStorage": {
|
||||
"accountUrl": "[reference(variables('dlsName')).primaryEndpoints.dfs]",
|
||||
"filesystem": "[variables('dlsFsName')]"
|
||||
},
|
||||
"sqlAdministratorLogin": "[parameters('sqlAdministratorLogin')]",
|
||||
"sqlAdministratorLoginPassword": "[parameters('sqlAdministratorLoginPassword')]",
|
||||
"managedVirtualNetwork": "default",
|
||||
"workspaceRepositoryConfiguration": {
|
||||
"type": "WorkspaceGitHubConfiguration",
|
||||
"hostName": "https://github.com",
|
||||
"accountName": "[parameters('githubUsername')]",
|
||||
"repositoryName": "Test-Drive-Synapse-Link-For-DataVerse-With-1-Click",
|
||||
"rootFolder": "/synapsepoc",
|
||||
"collaborationBranch": "main"
|
||||
}
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"condition": "[equals(parameters('allowAllConnections'),'true')]",
|
||||
"type": "firewallrules",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "allowAll",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [ "[variables('workspaceName')]" ],
|
||||
"properties": {
|
||||
"startIpAddress": "0.0.0.0",
|
||||
"endIpAddress": "255.255.255.255"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "firewallrules",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "AllowAllWindowsAzureIps",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [ "[variables('workspaceName')]" ],
|
||||
"properties": {
|
||||
"startIpAddress": "0.0.0.0",
|
||||
"endIpAddress": "0.0.0.0"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "managedIdentitySqlControlSettings",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "default",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [ "[variables('workspaceName')]" ],
|
||||
"properties": {
|
||||
"grantSqlControlToManagedIdentity": {
|
||||
"desiredState": "Enabled"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Synapse/workspaces/sqlPools",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "[concat(variables('workspaceName'), '/', variables('sqlPoolName'))]",
|
||||
"location": "[variables('location')]",
|
||||
"sku": {
|
||||
"name": "[parameters('sku')]"
|
||||
},
|
||||
"dependsOn": [
|
||||
"[variables('workspaceName')]"
|
||||
],
|
||||
"properties": {
|
||||
"createMode": "Default",
|
||||
"collation": "SQL_Latin1_General_CP1_CI_AS"
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"condition": "[parameters('metadataSync')]",
|
||||
"type": "metadataSync",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "config",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [
|
||||
"[variables('sqlPoolName')]"
|
||||
],
|
||||
"properties": {
|
||||
"Enabled": "[parameters('metadataSync')]"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"condition": "[equals(parameters('sparkDeployment'),'true')]",
|
||||
"type": "Microsoft.Synapse/workspaces/bigDataPools",
|
||||
"apiVersion": "2019-06-01-preview",
|
||||
"name": "[concat(variables('workspaceName'), '/', variables('sparkPoolName'))]",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [
|
||||
"[variables('workspaceName')]"
|
||||
],
|
||||
"properties": {
|
||||
"nodeCount": 5,
|
||||
"nodeSizeFamily": "MemoryOptimized",
|
||||
"nodeSize": "[parameters('sparkNodeSize')]",
|
||||
"autoScale": {
|
||||
"enabled": true,
|
||||
"minNodeCount": 3,
|
||||
"maxNodeCount": 40
|
||||
},
|
||||
"autoPause": {
|
||||
"enabled": true,
|
||||
"delayInMinutes": 15
|
||||
},
|
||||
"sparkVersion": "2.4"
|
||||
}
|
||||
},
|
||||
{
|
||||
"scope": "[concat('Microsoft.Storage/storageAccounts/', variables('dlsName'))]",
|
||||
"type": "Microsoft.Authorization/roleAssignments",
|
||||
"apiVersion": "2020-04-01-preview",
|
||||
"name": "[guid(uniqueString(variables('dlsName')))]",
|
||||
"location": "[variables('location')]",
|
||||
"dependsOn": [
|
||||
"[variables('workspaceName')]"
|
||||
],
|
||||
"properties": {
|
||||
"roleDefinitionId": "[resourceId('Microsoft.Authorization/roleDefinitions', 'ba92f5b4-2d11-453d-a403-e96b0029c9fe')]",
|
||||
"principalId": "[reference(resourceId('Microsoft.Synapse/workspaces', variables('workspaceName')), '2019-06-01-preview', 'Full').identity.principalId]",
|
||||
"principalType": "ServicePrincipal"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-06-01",
|
||||
"name": "MSIRBACOnResourceGroup0",
|
||||
"dependsOn": [
|
||||
"logicAppResumeDeployment",
|
||||
"logicAppPauseDeployment"
|
||||
],
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/logicapproleassignments.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"logicAppName": {
|
||||
"value": "[variables('logicApps')[0]]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.Resources/deployments",
|
||||
"apiVersion": "2020-06-01",
|
||||
"name": "MSIRBACOnResourceGroup1",
|
||||
"dependsOn": [
|
||||
"logicAppResumeDeployment",
|
||||
"logicAppPauseDeployment"
|
||||
],
|
||||
"properties": {
|
||||
"mode": "Incremental",
|
||||
"templatelink": {
|
||||
"uri": "[uri(variables('_artifactsLocation'), concat('nestedtemplates/logicapproleassignments.json'))]"
|
||||
},
|
||||
"parameters": {
|
||||
"logicAppName": {
|
||||
"value": "[variables('logicApps')[1]]"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,15 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"companyTla": {
|
||||
"value": "GEN-UNIQUE-3"
|
||||
},
|
||||
"sqlAdministratorLogin": {
|
||||
"value": "GEN-UNIQUE-6"
|
||||
},
|
||||
"sqlAdministratorLoginPassword": {
|
||||
"value": "GEN-PASSWORD"
|
||||
}
|
||||
}
|
||||
}
|
После Ширина: | Высота: | Размер: 6.6 MiB |
После Ширина: | Высота: | Размер: 6.5 MiB |
После Ширина: | Высота: | Размер: 4.6 MiB |
После Ширина: | Высота: | Размер: 1.8 MiB |
После Ширина: | Высота: | Размер: 3.9 MiB |
После Ширина: | Высота: | Размер: 2.4 MiB |
После Ширина: | Высота: | Размер: 39 KiB |
После Ширина: | Высота: | Размер: 577 KiB |
|
@ -0,0 +1,14 @@
|
|||
{
|
||||
"$schema": "https://aka.ms/azure-quickstart-templates-metadata-schema#",
|
||||
"itemDisplayName": "Azure Synapse Proof-of-Concept",
|
||||
"description": "This template creates a proof of concept environment for Azure Synapse, including SQL Pools and optional Apache Spark Pools",
|
||||
"summary": "Azure Synapse Proof-of-Concept",
|
||||
"validationType": "Manual",
|
||||
"githubUsername": "JamJarchitect",
|
||||
"dateUpdated": "2020-09-10",
|
||||
"type": "QuickStart",
|
||||
"environments": [
|
||||
"AzureCloud"
|
||||
]
|
||||
}
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "Specifies the name of the key vault."
|
||||
}
|
||||
},
|
||||
"location": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "Specifies the Azure location where the key vault should be created."
|
||||
}
|
||||
},
|
||||
"datalakeAccountAccessKeyValue": {
|
||||
"type": "securestring"
|
||||
},
|
||||
"sqlAdministratorLoginPassword": {
|
||||
"type": "securestring",
|
||||
"metadata": {
|
||||
"description": "Your password must be at least 8 characters in length. Your password must contain characters from three of the following categories – English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, etc.) Your password cannot contain all or part of the login name. Part of a login name is defined as three or more consecutive alphanumeric characters."
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
"variables": {
|
||||
"sqlSecretName": "synapseSqlLoginPassword",
|
||||
"sqlSecretValue": "[parameters('sqlAdministratorLoginPassword')]",
|
||||
"adlsSecretName": "adlsAccessKey",
|
||||
"adlsSecretValue": "[parameters('datalakeAccountAccessKeyValue')]",
|
||||
"tenantId": "[subscription().tenantId]"
|
||||
},
|
||||
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.KeyVault/vaults",
|
||||
"apiVersion": "2019-09-01",
|
||||
"name": "[parameters('keyVaultName')]",
|
||||
"location": "[parameters('location')]",
|
||||
"properties": {
|
||||
"tenantId": "[variables('tenantId')]",
|
||||
"enableSoftDelete": false,
|
||||
"accessPolicies": [],
|
||||
"sku": {
|
||||
"name": "Standard",
|
||||
"family": "A"
|
||||
},
|
||||
"networkAcls": {
|
||||
"defaultAction": "Allow",
|
||||
"bypass": "AzureServices"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.KeyVault/vaults/secrets",
|
||||
"apiVersion": "2019-09-01",
|
||||
"name": "[concat(parameters('keyVaultName'), '/', variables('sqlSecretName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"dependsOn": [
|
||||
"[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]"
|
||||
],
|
||||
"properties": {
|
||||
"value": "[variables('sqlSecretValue')]"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "Microsoft.KeyVault/vaults/secrets",
|
||||
"apiVersion": "2019-09-01",
|
||||
"name": "[concat(parameters('keyVaultName'), '/', variables('adlsSecretName'))]",
|
||||
"location": "[parameters('location')]",
|
||||
"dependsOn": [
|
||||
"[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]"
|
||||
],
|
||||
"properties": {
|
||||
"value": "[variables('adlsSecretValue')]"
|
||||
}
|
||||
}
|
||||
],
|
||||
"outputs": {
|
||||
"dbSecretUri": {
|
||||
"type": "string",
|
||||
"value": "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,61 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "Specifies the name of the key vault."
|
||||
}
|
||||
},
|
||||
"location": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "Specifies the Azure location where the key vault should be created."
|
||||
}
|
||||
},
|
||||
"workspaceName": {
|
||||
"type": "string",
|
||||
"metadata": {
|
||||
"description": "The name you provide will be appended with a unique sting to make it globally available. The name can contain only letters, numbers and hyphens. The first and last characters must be a letter or number. Spaces are not allowed."
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
"variables": {
|
||||
"tenantId": "[subscription().tenantId]"
|
||||
},
|
||||
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.KeyVault/vaults",
|
||||
"apiVersion": "2019-09-01",
|
||||
"name": "[parameters('keyVaultName')]",
|
||||
"location": "[parameters('location')]",
|
||||
"properties": {
|
||||
"tenantId": "[variables('tenantId')]",
|
||||
"accessPolicies": [
|
||||
{
|
||||
"tenantId": "[variables('tenantId')]",
|
||||
"objectId": "[reference(concat('Microsoft.Synapse/workspaces/', parameters('workspaceName')), '2019-06-01-preview', 'Full').identity.principalId]",
|
||||
"permissions": {
|
||||
"keys": [],
|
||||
"secrets": [
|
||||
"Get"
|
||||
],
|
||||
"certificates": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"sku": {
|
||||
"name": "Standard",
|
||||
"family": "A"
|
||||
},
|
||||
"networkAcls": {
|
||||
"defaultAction": "Allow",
|
||||
"bypass": "AzureServices"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,27 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"logicAppName": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"variables": {
|
||||
"contributor": "b24988ac-6180-42a0-ab88-20f7382dd24c",
|
||||
"roleDefinitionId": "[resourceId('Microsoft.Authorization/roleDefinitions', variables('contributor'))]",
|
||||
"roleAssignmentName": "[guid(parameters('logicAppName'), resourceGroup().id, variables('roleDefinitionId'))]"
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Authorization/roleAssignments",
|
||||
"apiVersion": "2020-04-01-preview",
|
||||
"name": "[variables('roleAssignmentName')]",
|
||||
"properties": {
|
||||
"principalId": "[reference(resourceId('Microsoft.Logic/workflows', parameters('logicAppName')), '2019-05-01', 'full').identity.principalId]",
|
||||
"roleDefinitionId": "[variables('roleDefinitionId')]",
|
||||
"scope": "[resourceGroup().id]",
|
||||
"principalType": "ServicePrincipal"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,302 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"companyTla": {
|
||||
"type": "string"
|
||||
},
|
||||
"deploymentType": {
|
||||
"type": "string"
|
||||
},
|
||||
"LogicAppName": {
|
||||
"type": "string"
|
||||
},
|
||||
"Frequency": {
|
||||
"type": "string"
|
||||
},
|
||||
"timeZone": {
|
||||
"type": "string"
|
||||
},
|
||||
"PauseTime": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"variables": {
|
||||
"pauseTimeHour": "[split(substring(parameters('PauseTime'), 11, 5), ':')[0]]",
|
||||
"recurrenceHours": [
|
||||
"[variables('pauseTimeHour')]"
|
||||
],
|
||||
"recurrenceMinutes": [ 0 ],
|
||||
"pauseTimeString": "[substring(parameters('PauseTime'), 0, 8)]",
|
||||
"dailySchedule": [
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday",
|
||||
"Saturday",
|
||||
"Sunday"
|
||||
],
|
||||
"weekdaySchedule": [
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday"
|
||||
],
|
||||
"recurrenceSchedule": "[if(equals(parameters('Frequency'), 'Weekdays'), variables('weekdaySchedule'), variables('dailySchedule'))]",
|
||||
"synapseWorkspaceName": "[toLower(concat(variables('synapseName'),'ws1'))]",
|
||||
"synapseName": "[toLower(concat(parameters('companyTla'),parameters('deploymentType')))]",
|
||||
"synapseSQLPoolName": "[toLower(concat(variables('workspaceName'),'p1'))]",
|
||||
"workspaceName": "[toLower(concat(variables('synapseName'),'ws1'))]",
|
||||
"getRESTAPI": "subscriptions/@{variables('RestAPIVariables')['SubscriptionId']}/resourceGroups/@{variables('RestAPIVariables')['ResourceGroupName']}/providers/Microsoft.Synapse/workspaces/@{variables('RestAPIVariables')['workspaceName']}/sqlPools/@{variables('RestAPIVariables')['sqlPoolName']}?api-version=2019-06-01-preview",
|
||||
"pauseRESTAPI": "subscriptions/@{variables('RestAPIVariables')['SubscriptionId']}/resourceGroups/@{variables('RestAPIVariables')['ResourceGroupName']}/providers/Microsoft.Synapse/workspaces/@{variables('RestAPIVariables')['workspaceName']}/sqlPools/@{variables('RestAPIVariables')['sqlPoolName']}/pause?api-version=2019-06-01-preview",
|
||||
"aqcRESTAPI": "subscriptions/@{variables('RestAPIVariables')['SubscriptionId']}/resourceGroups/@{variables('RestAPIVariables')['ResourceGroupName']}/providers/Microsoft.Synapse/workspaces/@{variables('RestAPIVariables')['WorkspaceName']}/sqlpools/@{variables('RestAPIVariables')['SQLPoolName']}/dataWarehouseUserActivities/current?api-version=2019-06-01-preview",
|
||||
"managementEndpoint": "[environment().resourceManager]"
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Logic/workflows",
|
||||
"apiVersion": "2019-05-01",
|
||||
"name": "[parameters('LogicAppName')]",
|
||||
"location": "[parameters('location')]",
|
||||
"identity": {
|
||||
"type": "SystemAssigned"
|
||||
},
|
||||
"properties": {
|
||||
"state": "Enabled",
|
||||
"definition": {
|
||||
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
|
||||
"actions": {
|
||||
"Initialize_API_variables": {
|
||||
"type": "InitializeVariable",
|
||||
"inputs": {
|
||||
"variables": [
|
||||
{
|
||||
"name": "RestAPIVariables",
|
||||
"type": "Object",
|
||||
"value": {
|
||||
"workspaceName": "[variables('synapseWorkspaceName')]",
|
||||
"sqlPoolName": "[variables('synapseSQLPoolName')]",
|
||||
"ResourceGroupName": "[resourceGroup().name]",
|
||||
"SubscriptionId": "[subscription().subscriptionId]",
|
||||
"TenantId": "[subscription().tenantId]",
|
||||
"ScheduleTimeZone": "[parameters('timeZone')]",
|
||||
"PauseTime": "[variables('pauseTimeString')]"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"Initialize_ActiveQueryCount_variable": {
|
||||
"inputs": {
|
||||
"variables": [
|
||||
{
|
||||
"name": "ActiveQueryCount",
|
||||
"type": "Integer",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"runAfter": {
|
||||
"Initialize_API_variables": [
|
||||
"Succeeded"
|
||||
]
|
||||
},
|
||||
"type": "InitializeVariable"
|
||||
},
|
||||
"Get_Synapse_state": {
|
||||
"type": "Http",
|
||||
"inputs": {
|
||||
"method": "GET",
|
||||
"uri": "[concat(variables('managementEndpoint'),variables('getRESTAPI'))]",
|
||||
"authentication": { "type": "ManagedServiceIdentity" }
|
||||
},
|
||||
"runAfter": {
|
||||
"Initialize_ActiveQueryCount_variable": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
},
|
||||
"Parse_JSON": {
|
||||
"inputs": {
|
||||
"content": "@body('Get_Synapse_state')",
|
||||
"schema": {
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"properties": {
|
||||
"properties": {
|
||||
"collation": {
|
||||
"type": "string"
|
||||
},
|
||||
"creationDate": {
|
||||
"type": "string"
|
||||
},
|
||||
"maxSizeBytes": {
|
||||
"type": "integer"
|
||||
},
|
||||
"provisioningState": {
|
||||
"type": "string"
|
||||
},
|
||||
"restorePointInTime": {
|
||||
"type": "string"
|
||||
},
|
||||
"status": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
},
|
||||
"sku": {
|
||||
"properties": {
|
||||
"capacity": {
|
||||
"type": "integer"
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
},
|
||||
"type": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
}
|
||||
},
|
||||
"runAfter": {
|
||||
"Get_Synapse_state": [
|
||||
"Succeeded"
|
||||
]
|
||||
},
|
||||
"type": "ParseJson"
|
||||
},
|
||||
"PauseSynapseIfOnline": {
|
||||
"type": "If",
|
||||
"expression": {
|
||||
"and": [
|
||||
{
|
||||
"equals": [
|
||||
"@body('Get_Synapse_state')['properties']['status']",
|
||||
"Online"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"actions": {
|
||||
"Pause_SQL_Pool": {
|
||||
"type": "Http",
|
||||
"inputs": {
|
||||
"method": "POST",
|
||||
"uri": "[concat(variables('managementEndpoint'),variables('pauseRESTAPI'))]",
|
||||
"authentication": { "type": "ManagedServiceIdentity" }
|
||||
},
|
||||
"runAfter": {
|
||||
"Until_ZeroActiveQueries": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
},
|
||||
"Until_ZeroActiveQueries": {
|
||||
"type": "Until",
|
||||
"expression": "@equals(variables('ActiveQueryCount'), 0)",
|
||||
"limit": {
|
||||
"count": 3,
|
||||
"timeout": "PT3H"
|
||||
},
|
||||
"actions": {
|
||||
"GetActiveQueryCount": {
|
||||
"type": "Http",
|
||||
"inputs": {
|
||||
"method": "GET",
|
||||
"uri": "[concat(variables('managementEndpoint'),variables('aqcRESTAPI'))]",
|
||||
"authentication": {
|
||||
"type": "ManagedServiceIdentity"
|
||||
}
|
||||
}
|
||||
},
|
||||
"Update_ActiveQueryCount_variable": {
|
||||
"type": "SetVariable",
|
||||
"inputs": {
|
||||
"name": "ActiveQueryCount",
|
||||
"value": "@body('GetActiveQueryCount')['properties']['activeQueriesCount']"
|
||||
},
|
||||
"runAfter": {
|
||||
"GetActiveQueryCount": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
},
|
||||
"Wait5minsIfActiveQuery": {
|
||||
"type": "If",
|
||||
"actions": {
|
||||
"Wait_5mins": {
|
||||
"inputs": {
|
||||
"interval": {
|
||||
"count": 5,
|
||||
"unit": "Minute"
|
||||
}
|
||||
},
|
||||
"type": "Wait"
|
||||
}
|
||||
},
|
||||
"expression": {
|
||||
"and": [
|
||||
{
|
||||
"greater": [
|
||||
"@variables('ActiveQueryCount')",
|
||||
0
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"runAfter": {
|
||||
"Update_ActiveQueryCount_variable": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"runAfter": {
|
||||
"Parse_JSON": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"triggers": {
|
||||
"Recurrence": {
|
||||
"type": "Recurrence",
|
||||
"recurrence": {
|
||||
"frequency": "Week",
|
||||
"interval": 1,
|
||||
"timeZone": "[parameters('timeZone')]",
|
||||
"startTime": "2019-01-01T00:00:00Z",
|
||||
"schedule": {
|
||||
"weekDays": "[variables('recurrenceSchedule')]",
|
||||
"hours": "[variables('recurrenceHours')]",
|
||||
"minutes": "[variables('recurrenceMinutes')]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"contentVersion": "1.0.0.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,220 @@
|
|||
{
|
||||
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
|
||||
"contentVersion": "1.0.0.0",
|
||||
"parameters": {
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"companyTla": {
|
||||
"type": "string"
|
||||
},
|
||||
"deploymentType": {
|
||||
"type": "string"
|
||||
},
|
||||
"LogicAppName": {
|
||||
"type": "string"
|
||||
},
|
||||
"Frequency": {
|
||||
"type": "string"
|
||||
},
|
||||
"timeZone": {
|
||||
"type": "string"
|
||||
},
|
||||
"ResumeTime": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"variables": {
|
||||
"resumeTimeHour": "[split(substring(parameters('resumeTime'), 11, 5), ':')[0]]",
|
||||
"recurrenceHours": [
|
||||
"[variables('resumeTimeHour')]"
|
||||
],
|
||||
"recurrenceMinutes": [
|
||||
0
|
||||
],
|
||||
"dailySchedule": [
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday",
|
||||
"Saturday",
|
||||
"Sunday"
|
||||
],
|
||||
"weekdaySchedule": [
|
||||
"Monday",
|
||||
"Tuesday",
|
||||
"Wednesday",
|
||||
"Thursday",
|
||||
"Friday"
|
||||
],
|
||||
"recurrenceSchedule": "[if(equals(parameters('Frequency'), 'Weekdays'), variables('weekdaySchedule'), variables('dailySchedule'))]",
|
||||
"resumeTimeString": "[substring(parameters('ResumeTime'), 0, 8)]",
|
||||
"synapseWorkspaceName": "[toLower(concat(variables('synapseName'),'ws1'))]",
|
||||
"synapseName": "[toLower(concat(parameters('companyTla'),parameters('deploymentType')))]",
|
||||
"synapseSQLPoolName": "[toLower(concat(variables('workspaceName'),'p1'))]",
|
||||
"workspaceName": "[toLower(concat(variables('synapseName'),'ws1'))]",
|
||||
"managementEndpoint": "[environment().resourceManager]",
|
||||
"getRESTAPI": "subscriptions/@{variables('RestAPIVariables')['SubscriptionId']}/resourceGroups/@{variables('RestAPIVariables')['ResourceGroupName']}/providers/Microsoft.Synapse/workspaces/@{variables('RestAPIVariables')['workspaceName']}/sqlPools/@{variables('RestAPIVariables')['sqlPoolName']}?api-version=2019-06-01-preview",
|
||||
"resumeRESTAPI": "subscriptions/@{variables('RestAPIVariables')['SubscriptionId']}/resourceGroups/@{variables('RestAPIVariables')['ResourceGroupName']}/providers/Microsoft.Synapse/workspaces/@{variables('RestAPIVariables')['workspaceName']}/sqlPools/@{variables('RestAPIVariables')['sqlPoolName']}/resume?api-version=2019-06-01-preview"
|
||||
},
|
||||
"resources": [
|
||||
{
|
||||
"type": "Microsoft.Logic/workflows",
|
||||
"apiVersion": "2019-05-01",
|
||||
"name": "[parameters('LogicAppName')]",
|
||||
"location": "[parameters('location')]",
|
||||
"identity": {
|
||||
"type": "SystemAssigned"
|
||||
},
|
||||
"properties": {
|
||||
"state": "Enabled",
|
||||
"definition": {
|
||||
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
|
||||
"actions": {
|
||||
"Initialize_API_variables": {
|
||||
"type": "InitializeVariable",
|
||||
"inputs": {
|
||||
"variables": [
|
||||
{
|
||||
"name": "RestAPIVariables",
|
||||
"type": "Object",
|
||||
"value": {
|
||||
"workspaceName": "[variables('synapseWorkspaceName')]",
|
||||
"sqlPoolName": "[variables('synapseSQLPoolName')]",
|
||||
"ResourceGroupName": "[resourceGroup().name]",
|
||||
"SubscriptionId": "[subscription().subscriptionId]",
|
||||
"TenantId": "[subscription().tenantId]",
|
||||
"ScheduleTimeZone": "[parameters('timeZone')]",
|
||||
"ResumeTime": "[variables('resumeTimeString')]"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"Get_Synapse_state": {
|
||||
"type": "Http",
|
||||
"inputs": {
|
||||
"method": "GET",
|
||||
"uri": "[concat(variables('managementEndpoint'),variables('getRESTAPI'))]",
|
||||
"authentication": { "type": "ManagedServiceIdentity" }
|
||||
},
|
||||
"runAfter": {
|
||||
"Initialize_API_Variables": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
},
|
||||
"Parse_JSON": {
|
||||
"inputs": {
|
||||
"content": "@body('Get_Synapse_state')",
|
||||
"schema": {
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string"
|
||||
},
|
||||
"location": {
|
||||
"type": "string"
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"properties": {
|
||||
"properties": {
|
||||
"collation": {
|
||||
"type": "string"
|
||||
},
|
||||
"creationDate": {
|
||||
"type": "string"
|
||||
},
|
||||
"maxSizeBytes": {
|
||||
"type": "integer"
|
||||
},
|
||||
"provisioningState": {
|
||||
"type": "string"
|
||||
},
|
||||
"restorePointInTime": {
|
||||
"type": "string"
|
||||
},
|
||||
"status": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
},
|
||||
"sku": {
|
||||
"properties": {
|
||||
"capacity": {
|
||||
"type": "integer"
|
||||
},
|
||||
"name": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
},
|
||||
"type": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"type": "object"
|
||||
}
|
||||
},
|
||||
"runAfter": {
|
||||
"Get_Synapse_state": [
|
||||
"Succeeded"
|
||||
]
|
||||
},
|
||||
"type": "ParseJson"
|
||||
},
|
||||
"ResumeSynapseIfOnline": {
|
||||
"type": "If",
|
||||
"expression": {
|
||||
"and": [
|
||||
{
|
||||
"equals": [
|
||||
"@body('Get_Synapse_state')['properties']['status']",
|
||||
"Paused"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"actions": {
|
||||
"Resume_SQL_Pool": {
|
||||
"type": "Http",
|
||||
"inputs": {
|
||||
"method": "POST",
|
||||
"uri": "[concat(variables('managementEndpoint'),variables('resumeRESTAPI'))]",
|
||||
"authentication": { "type": "ManagedServiceIdentity" }
|
||||
}
|
||||
}
|
||||
},
|
||||
"runAfter": {
|
||||
"Parse_JSON": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"triggers": {
|
||||
"Recurrence": {
|
||||
"type": "Recurrence",
|
||||
"recurrence": {
|
||||
"frequency": "Week",
|
||||
"interval": 1,
|
||||
"timeZone": "[parameters('timeZone')]",
|
||||
"startTime": "2019-01-01T00:00:00Z",
|
||||
"schedule": {
|
||||
"weekDays": "[variables('recurrenceSchedule')]",
|
||||
"hours": "[variables('recurrenceHours')]",
|
||||
"minutes": "[variables('recurrenceMinutes')]"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"contentVersion": "1.0.0.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1 @@
|
|||
# synapse-test
|
|
@ -0,0 +1,6 @@
|
|||
{
|
||||
"name": "WorkspaceSystemIdentity",
|
||||
"properties": {
|
||||
"type": "ManagedIdentity"
|
||||
}
|
||||
}
|
|
@ -0,0 +1,45 @@
|
|||
{
|
||||
"name": "tripFaresDataTransformations",
|
||||
"properties": {
|
||||
"folder": {
|
||||
"name": "TripFaresDataFlow"
|
||||
},
|
||||
"type": "MappingDataFlow",
|
||||
"typeProperties": {
|
||||
"sources": [
|
||||
{
|
||||
"dataset": {
|
||||
"referenceName": "tripDataSink",
|
||||
"type": "DatasetReference"
|
||||
},
|
||||
"name": "TripDataCSV"
|
||||
},
|
||||
{
|
||||
"dataset": {
|
||||
"referenceName": "faresDataSink",
|
||||
"type": "DatasetReference"
|
||||
},
|
||||
"name": "FaresDataCSV"
|
||||
}
|
||||
],
|
||||
"sinks": [
|
||||
{
|
||||
"dataset": {
|
||||
"referenceName": "azureSynapseAnalyticsTable",
|
||||
"type": "DatasetReference"
|
||||
},
|
||||
"name": "SynapseAnalyticsSink"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"name": "AggregateByPaymentType"
|
||||
},
|
||||
{
|
||||
"name": "InnerJoinWithTripFares"
|
||||
}
|
||||
],
|
||||
"script": "source(output(\n\t\tmedallion as string,\n\t\thack_license as string,\n\t\tvendor_id as string,\n\t\trate_code as string,\n\t\tstore_and_fwd_flag as string,\n\t\tpickup_datetime as string,\n\t\tdropoff_datetime as string,\n\t\tpassenger_count as string,\n\t\ttrip_time_in_secs as string,\n\t\ttrip_distance as string,\n\t\tpickup_longitude as string,\n\t\tpickup_latitude as string,\n\t\tdropoff_longitude as string,\n\t\tdropoff_latitude as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tinferDriftedColumnTypes: true,\n\tignoreNoFilesFound: false) ~> TripDataCSV\nsource(output(\n\t\tmedallion as string,\n\t\thack_license as string,\n\t\tvendor_id as string,\n\t\tpickup_datetime as string,\n\t\tpayment_type as string,\n\t\tfare_amount as string,\n\t\tsurcharge as string,\n\t\tmta_tax as string,\n\t\ttip_amount as string,\n\t\ttolls_amount as string,\n\t\ttotal_amount as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tinferDriftedColumnTypes: true,\n\tignoreNoFilesFound: false) ~> FaresDataCSV\nInnerJoinWithTripFares aggregate(groupBy(payment_type),\n\taverage_fare = avg(toInteger(total_amount)),\n\t\ttotal_trip_distance = sum(toInteger(trip_distance))) ~> AggregateByPaymentType\nTripDataCSV, FaresDataCSV join(TripDataCSV@medallion == FaresDataCSV@medallion\n\t&& TripDataCSV@hack_license == FaresDataCSV@hack_license\n\t&& TripDataCSV@vendor_id == FaresDataCSV@vendor_id\n\t&& TripDataCSV@pickup_datetime == FaresDataCSV@pickup_datetime,\n\tjoinType:'inner',\n\tbroadcast: 'auto')~> InnerJoinWithTripFares\nAggregateByPaymentType sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false,\n\trecreate:true,\n\tformat: 'table',\n\tstaged: false,\n\tskipDuplicateMapInputs: true,\n\tskipDuplicateMapOutputs: true,\n\terrorHandlingOption: 'stopOnFirstError') ~> SynapseAnalyticsSink"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,47 @@
|
|||
{
|
||||
"name": "AzureSynapseAnalyticsFaresData",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@dataset().SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@dataset().SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@dataset().SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDWTable",
|
||||
"schema": [],
|
||||
"typeProperties": {
|
||||
"table": "FaresData"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,12 @@
|
|||
{
|
||||
"name": "AzureSynapseAnalyticsTable1",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"type": "LinkedServiceReference"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDWTable",
|
||||
"schema": []
|
||||
}
|
||||
}
|
|
@ -0,0 +1,47 @@
|
|||
{
|
||||
"name": "AzureSynapseAnalyticsTripsData",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@dataset().SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@dataset().SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@dataset().SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDWTable",
|
||||
"schema": [],
|
||||
"typeProperties": {
|
||||
"table": "TripsData"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,47 @@
|
|||
{
|
||||
"name": "azureSynapseAnalyticsSchema",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@dataset().SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@dataset().SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@dataset().SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDWTable",
|
||||
"schema": []
|
||||
}
|
||||
}
|
|
@ -0,0 +1,57 @@
|
|||
{
|
||||
"name": "azureSynapseAnalyticsTable",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@dataset().SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@dataset().SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@dataset().SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"SchemaName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDWTable",
|
||||
"schema": [],
|
||||
"typeProperties": {
|
||||
"schema": {
|
||||
"value": "@dataset().SchemaName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"table": "AggregateTaxiData"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,91 @@
|
|||
{
|
||||
"name": "faresDataSink",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresDataLakeStorageLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@dataset().datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"type": "string",
|
||||
"defaultValue": "kvmsft"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"type": "string",
|
||||
"defaultValue": "adlsmsft"
|
||||
}
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "DelimitedText",
|
||||
"typeProperties": {
|
||||
"location": {
|
||||
"type": "AzureBlobFSLocation",
|
||||
"fileName": "fares-data.csv",
|
||||
"fileSystem": "public"
|
||||
},
|
||||
"columnDelimiter": ",",
|
||||
"escapeChar": "\\",
|
||||
"firstRowAsHeader": true,
|
||||
"quoteChar": "\""
|
||||
},
|
||||
"schema": [
|
||||
{
|
||||
"name": "medallion",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "hack_license",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "vendor_id",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "pickup_datetime",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "payment_type",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "fare_amount",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "surcharge",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "mta_tax",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "tip_amount",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "tolls_amount",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "total_amount",
|
||||
"type": "String"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,24 @@
|
|||
{
|
||||
"name": "faresDataSource",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "HttpServerTripFareDataLinkedService",
|
||||
"type": "LinkedServiceReference"
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "DelimitedText",
|
||||
"typeProperties": {
|
||||
"location": {
|
||||
"type": "HttpServerLocation"
|
||||
},
|
||||
"columnDelimiter": ",",
|
||||
"escapeChar": "\\",
|
||||
"firstRowAsHeader": true,
|
||||
"quoteChar": "\""
|
||||
},
|
||||
"schema": []
|
||||
}
|
||||
}
|
|
@ -0,0 +1,101 @@
|
|||
{
|
||||
"name": "tripDataSink",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresDataLakeStorageLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@dataset().keyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@dataset().datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"parameters": {
|
||||
"datalakeAccountName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "DelimitedText",
|
||||
"typeProperties": {
|
||||
"location": {
|
||||
"type": "AzureBlobFSLocation",
|
||||
"fileName": "trip-data.csv",
|
||||
"fileSystem": "public"
|
||||
},
|
||||
"columnDelimiter": ",",
|
||||
"escapeChar": "\\",
|
||||
"firstRowAsHeader": true,
|
||||
"quoteChar": "\""
|
||||
},
|
||||
"schema": [
|
||||
{
|
||||
"name": "medallion",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "hack_license",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "vendor_id",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "rate_code",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "store_and_fwd_flag",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "pickup_datetime",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "dropoff_datetime",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "passenger_count",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "trip_time_in_secs",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "trip_distance",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "pickup_longitude",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "pickup_latitude",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "dropoff_longitude",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "dropoff_latitude",
|
||||
"type": "String"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,24 @@
|
|||
{
|
||||
"name": "tripsDataSource",
|
||||
"properties": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "HttpServerTripDataLinkedService",
|
||||
"type": "LinkedServiceReference"
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFareDatasets"
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "DelimitedText",
|
||||
"typeProperties": {
|
||||
"location": {
|
||||
"type": "HttpServerLocation"
|
||||
},
|
||||
"columnDelimiter": ",",
|
||||
"escapeChar": "\\",
|
||||
"firstRowAsHeader": true,
|
||||
"quoteChar": "\""
|
||||
},
|
||||
"schema": []
|
||||
}
|
||||
}
|
|
@ -0,0 +1,20 @@
|
|||
{
|
||||
"name": "AutoResolveIntegrationRuntime",
|
||||
"properties": {
|
||||
"type": "Managed",
|
||||
"typeProperties": {
|
||||
"computeProperties": {
|
||||
"location": "AutoResolve",
|
||||
"dataFlowProperties": {
|
||||
"computeType": "General",
|
||||
"coreCount": 8,
|
||||
"timeToLive": 0
|
||||
}
|
||||
}
|
||||
},
|
||||
"managedVirtualNetwork": {
|
||||
"type": "ManagedVirtualNetworkReference",
|
||||
"referenceName": "default"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,16 @@
|
|||
{
|
||||
"name": "HttpServerTripDataLinkedService",
|
||||
"properties": {
|
||||
"annotations": [],
|
||||
"type": "HttpServer",
|
||||
"typeProperties": {
|
||||
"url": "https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/tripDataAndFaresCSV/trip-data.csv",
|
||||
"enableServerCertificateValidation": true,
|
||||
"authenticationType": "Anonymous"
|
||||
},
|
||||
"connectVia": {
|
||||
"referenceName": "AutoResolveIntegrationRuntime",
|
||||
"type": "IntegrationRuntimeReference"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,16 @@
|
|||
{
|
||||
"name": "HttpServerTripFareDataLinkedService",
|
||||
"properties": {
|
||||
"annotations": [],
|
||||
"type": "HttpServer",
|
||||
"typeProperties": {
|
||||
"url": "https://raw.githubusercontent.com/Azure/Test-Drive-Synapse-Link-For-DataVerse-With-1-Click/main/tripDataAndFaresCSV/fares-data.csv",
|
||||
"enableServerCertificateValidation": true,
|
||||
"authenticationType": "Anonymous"
|
||||
},
|
||||
"connectVia": {
|
||||
"referenceName": "AutoResolveIntegrationRuntime",
|
||||
"type": "IntegrationRuntimeReference"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,11 @@
|
|||
{
|
||||
"name": "PowerBIWorkspaceTripsFares",
|
||||
"properties": {
|
||||
"annotations": [],
|
||||
"type": "PowerBIWorkspace",
|
||||
"typeProperties": {
|
||||
"workspaceID": "",
|
||||
"tenantID": "72f988bf-86f1-41af-91ab-2d7cd011db47"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,37 @@
|
|||
{
|
||||
"name": "TripFaresDataLakeStorageLinkedService",
|
||||
"properties": {
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureBlobFS",
|
||||
"typeProperties": {
|
||||
"url": "@{concat('https://',linkedService().datalakeAccountName,'.dfs.core.windows.net')}",
|
||||
"accountKey": {
|
||||
"type": "AzureKeyVaultSecret",
|
||||
"store": {
|
||||
"referenceName": "keyVaultLinkedservice",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@linkedService().keyVaultName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"secretName": "adlsAccessKey"
|
||||
}
|
||||
},
|
||||
"connectVia": {
|
||||
"referenceName": "AutoResolveIntegrationRuntime",
|
||||
"type": "IntegrationRuntimeReference"
|
||||
}
|
||||
},
|
||||
"type": "Microsoft.Synapse/workspaces/linkedservices"
|
||||
}
|
|
@ -0,0 +1,43 @@
|
|||
{
|
||||
"name": "TripFaresSynapseAnalyticsLinkedService",
|
||||
"properties": {
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureSqlDW",
|
||||
"typeProperties": {
|
||||
"connectionString": "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=@{linkedService().SynapseWorkspaceName};Initial Catalog=@{linkedService().SQLDedicatedPoolName};User ID=@{linkedService().SQLLoginUsername}",
|
||||
"password": {
|
||||
"type": "AzureKeyVaultSecret",
|
||||
"store": {
|
||||
"referenceName": "keyVaultLinkedservice",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@linkedService().keyVaultName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"secretName": "synapseSqlLoginPassword"
|
||||
}
|
||||
},
|
||||
"connectVia": {
|
||||
"referenceName": "AutoResolveIntegrationRuntime",
|
||||
"type": "IntegrationRuntimeReference"
|
||||
}
|
||||
},
|
||||
"type": "Microsoft.Synapse/workspaces/linkedservices"
|
||||
}
|
|
@ -0,0 +1,15 @@
|
|||
{
|
||||
"name": "keyVaultLinkedservice",
|
||||
"properties": {
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"annotations": [],
|
||||
"type": "AzureKeyVault",
|
||||
"typeProperties": {
|
||||
"baseUrl": "@{concat('https://',linkedService().keyVaultName,'.vault.azure.net/')}"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,4 @@
|
|||
{
|
||||
"name": "default",
|
||||
"type": "Microsoft.Synapse/workspaces/managedVirtualNetworks"
|
||||
}
|
|
@ -0,0 +1,325 @@
|
|||
{
|
||||
"name": "Data Exploration and ML Modeling - NYC taxi predict using Spark MLlib",
|
||||
"properties": {
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2,
|
||||
"bigDataPool": {
|
||||
"referenceName": "ws1sparkpool1",
|
||||
"type": "BigDataPoolReference"
|
||||
},
|
||||
"sessionProperties": {
|
||||
"driverMemory": "56g",
|
||||
"driverCores": 8,
|
||||
"executorMemory": "56g",
|
||||
"executorCores": 8,
|
||||
"numExecutors": 2,
|
||||
"conf": {
|
||||
"spark.dynamicAllocation.enabled": "false",
|
||||
"spark.dynamicAllocation.minExecutors": "2",
|
||||
"spark.dynamicAllocation.maxExecutors": "2"
|
||||
}
|
||||
},
|
||||
"metadata": {
|
||||
"saveOutput": true,
|
||||
"synapse_widget": {
|
||||
"version": "0.1"
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "synapse_pyspark",
|
||||
"display_name": "Synapse PySpark"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"a365ComputeOptions": {
|
||||
"id": "/subscriptions/4eeedd72-d937-4243-86d1-c3982a84d924/resourceGroups/nashahzsfin/providers/Microsoft.Synapse/workspaces/mfstspdjvzuh3xeu2pocws1/bigDataPools/ws1sparkpool1",
|
||||
"name": "ws1sparkpool1",
|
||||
"type": "Spark",
|
||||
"endpoint": "https://mfstspdjvzuh3xeu2pocws1.dev.azuresynapse.net/livyApi/versions/2019-11-01-preview/sparkPools/ws1sparkpool1",
|
||||
"auth": {
|
||||
"type": "AAD",
|
||||
"authResource": "https://dev.azuresynapse.net"
|
||||
},
|
||||
"sparkVersion": "2.4",
|
||||
"nodeCount": 5,
|
||||
"cores": 8,
|
||||
"memory": 56,
|
||||
"automaticScaleJobs": false
|
||||
},
|
||||
"sessionKeepAliveTimeout": 30
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Predict NYC Taxi Tips using Spark ML and Azure Open Datasets\n",
|
||||
"\n",
|
||||
"The notebook ingests, visualizes, prepares and then trains a model based on an Open Dataset that tracks NYC Yellow Taxi trips and various attributes around them.\n",
|
||||
"The goal is to predict for a given trip whether there will be a tip or not.\n",
|
||||
"\n",
|
||||
" https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-machine-learning-mllib-notebook\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"from pyspark.sql.functions import unix_timestamp\n",
|
||||
"\n",
|
||||
"from pyspark.sql import SparkSession\n",
|
||||
"from pyspark.sql.types import *\n",
|
||||
"from pyspark.sql.functions import *\n",
|
||||
"\n",
|
||||
"from pyspark.ml import Pipeline\n",
|
||||
"from pyspark.ml import PipelineModel\n",
|
||||
"from pyspark.ml.feature import RFormula\n",
|
||||
"from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorIndexer\n",
|
||||
"from pyspark.ml.classification import LogisticRegression\n",
|
||||
"from pyspark.mllib.evaluation import BinaryClassificationMetrics\n",
|
||||
"from pyspark.ml.evaluation import BinaryClassificationEvaluator"
|
||||
],
|
||||
"execution_count": 1
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Ingest Data¶ \n",
|
||||
"\n",
|
||||
"Get a sample data of nyc yellow taxi to make it faster/easier to evaluate different approaches to prep for the modelling phase later in the notebook."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Import NYC yellow cab data from Azure Open Datasets\n",
|
||||
"from azureml.opendatasets import NycTlcYellow\n",
|
||||
"\n",
|
||||
"from datetime import datetime\n",
|
||||
"from dateutil import parser\n",
|
||||
"\n",
|
||||
"end_date = parser.parse('2018-05-08 00:00:00')\n",
|
||||
"start_date = parser.parse('2018-05-01 00:00:00')\n",
|
||||
"\n",
|
||||
"nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)\n",
|
||||
"nyc_tlc_df = nyc_tlc.to_spark_dataframe()"
|
||||
],
|
||||
"execution_count": 2
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#To make development easier, faster and less expensive downsample for now\n",
|
||||
"sampled_taxi_df = nyc_tlc_df.sample(True, 0.001, seed=1234)"
|
||||
],
|
||||
"execution_count": 3
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Exploratory Data Analysis\n",
|
||||
"\n",
|
||||
"Look at the data and evaluate its suitability for use in a model, do this via some basic charts focussed on tip values and relationships."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"#The charting package needs a Pandas dataframe or numpy array do the conversion\n",
|
||||
"sampled_taxi_pd_df = sampled_taxi_df.toPandas()\n",
|
||||
"\n",
|
||||
"# Look at tips by amount count histogram\n",
|
||||
"ax1 = sampled_taxi_pd_df['tipAmount'].plot(kind='hist', bins=25, facecolor='lightblue')\n",
|
||||
"ax1.set_title('Tip amount distribution')\n",
|
||||
"ax1.set_xlabel('Tip Amount ($)')\n",
|
||||
"ax1.set_ylabel('Counts')\n",
|
||||
"plt.suptitle('')\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"# How many passengers tip'd by various amounts\n",
|
||||
"ax2 = sampled_taxi_pd_df.boxplot(column=['tipAmount'], by=['passengerCount'])\n",
|
||||
"ax2.set_title('Tip amount by Passenger count')\n",
|
||||
"ax2.set_xlabel('Passenger count') \n",
|
||||
"ax2.set_ylabel('Tip Amount ($)')\n",
|
||||
"plt.suptitle('')\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"# Look at the relationship between fare and tip amounts\n",
|
||||
"ax = sampled_taxi_pd_df.plot(kind='scatter', x= 'fareAmount', y = 'tipAmount', c='blue', alpha = 0.10, s=2.5*(sampled_taxi_pd_df['passengerCount']))\n",
|
||||
"ax.set_title('Tip amount by Fare amount')\n",
|
||||
"ax.set_xlabel('Fare Amount ($)')\n",
|
||||
"ax.set_ylabel('Tip Amount ($)')\n",
|
||||
"plt.axis([-2, 80, -2, 20])\n",
|
||||
"plt.suptitle('')\n",
|
||||
"plt.show()"
|
||||
],
|
||||
"execution_count": 4
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Data Prep and Featurization\n",
|
||||
"\n",
|
||||
"It's clear from the visualizations above that there are a bunch of outliers in the data. These will need to be filtered out in addition there are extra variables that are not going to be useful in the model we build at the end.\n",
|
||||
"\n",
|
||||
"Finally there is a need to create some new (derived) variables that will work better with the model.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"taxi_df = sampled_taxi_df.select('totalAmount', 'fareAmount', 'tipAmount', 'paymentType', 'rateCodeId', 'passengerCount'\\\n",
|
||||
" , 'tripDistance', 'tpepPickupDateTime', 'tpepDropoffDateTime'\\\n",
|
||||
" , date_format('tpepPickupDateTime', 'hh').alias('pickupHour')\\\n",
|
||||
" , date_format('tpepPickupDateTime', 'EEEE').alias('weekdayString')\\\n",
|
||||
" , (unix_timestamp(col('tpepDropoffDateTime')) - unix_timestamp(col('tpepPickupDateTime'))).alias('tripTimeSecs')\\\n",
|
||||
" , (when(col('tipAmount') > 0, 1).otherwise(0)).alias('tipped')\n",
|
||||
" )\\\n",
|
||||
" .filter((sampled_taxi_df.passengerCount > 0) & (sampled_taxi_df.passengerCount < 8)\\\n",
|
||||
" & (sampled_taxi_df.tipAmount >= 0) & (sampled_taxi_df.tipAmount <= 25)\\\n",
|
||||
" & (sampled_taxi_df.fareAmount >= 1) & (sampled_taxi_df.fareAmount <= 250)\\\n",
|
||||
" & (sampled_taxi_df.tipAmount < sampled_taxi_df.fareAmount)\\\n",
|
||||
" & (sampled_taxi_df.tripDistance > 0) & (sampled_taxi_df.tripDistance <= 100)\\\n",
|
||||
" & (sampled_taxi_df.rateCodeId <= 5)\n",
|
||||
" & (sampled_taxi_df.paymentType.isin({\"1\", \"2\"}))\n",
|
||||
" )"
|
||||
],
|
||||
"execution_count": 5
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Data Prep and Featurization Part 2\n",
|
||||
"\n",
|
||||
"Having created new variables its now possible to drop the columns they were derived from so that the dataframe that goes into the model is the smallest in terms of number of variables, that is required.\n",
|
||||
"\n",
|
||||
"Also create some more features based on new columns from the first round.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"taxi_featurised_df = taxi_df.select('totalAmount', 'fareAmount', 'tipAmount', 'paymentType', 'passengerCount'\\\n",
|
||||
" , 'tripDistance', 'weekdayString', 'pickupHour','tripTimeSecs','tipped'\\\n",
|
||||
" , when((taxi_df.pickupHour <= 6) | (taxi_df.pickupHour >= 20),\"Night\")\\\n",
|
||||
" .when((taxi_df.pickupHour >= 7) & (taxi_df.pickupHour <= 10), \"AMRush\")\\\n",
|
||||
" .when((taxi_df.pickupHour >= 11) & (taxi_df.pickupHour <= 15), \"Afternoon\")\\\n",
|
||||
" .when((taxi_df.pickupHour >= 16) & (taxi_df.pickupHour <= 19), \"PMRush\")\\\n",
|
||||
" .otherwise(0).alias('trafficTimeBins')\n",
|
||||
" )\\\n",
|
||||
" .filter((taxi_df.tripTimeSecs >= 30) & (taxi_df.tripTimeSecs <= 7200))"
|
||||
],
|
||||
"execution_count": 6
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Encoding\n",
|
||||
"\n",
|
||||
"Different ML algorithms support different types of input, for this example Logistic Regression is being used for Binary Classification. This means that any Categorical (string) variables must be converted to numbers.\n",
|
||||
"\n",
|
||||
"The process is not as simple as a \"map\" style function as the relationship between the numbers can introduce a bias in the resulting model, the approach is to index the variable and then encode using a std approach called One Hot Encoding.\n",
|
||||
"\n",
|
||||
"This approach requires the encoder to \"learn\"/fit a model over the data in the Spark instance and then transform based on what was learnt.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# The sample uses an algorithm that only works with numeric features convert them so they can be consumed\n",
|
||||
"sI1 = StringIndexer(inputCol=\"trafficTimeBins\", outputCol=\"trafficTimeBinsIndex\"); \n",
|
||||
"en1 = OneHotEncoder(dropLast=False, inputCol=\"trafficTimeBinsIndex\", outputCol=\"trafficTimeBinsVec\");\n",
|
||||
"sI2 = StringIndexer(inputCol=\"weekdayString\", outputCol=\"weekdayIndex\"); \n",
|
||||
"en2 = OneHotEncoder(dropLast=False, inputCol=\"weekdayIndex\", outputCol=\"weekdayVec\");\n",
|
||||
"\n",
|
||||
"# Create a new dataframe that has had the encodings applied\n",
|
||||
"encoded_final_df = Pipeline(stages=[sI1, en1, sI2, en2]).fit(taxi_featurised_df).transform(taxi_featurised_df)"
|
||||
],
|
||||
"execution_count": 7
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Generation of Testing and Training Data Sets\n",
|
||||
"Simple split, 70% for training and 30% for testing the model. Playing with this ratio may result in different models.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Decide on the split between training and testing data from the dataframe \n",
|
||||
"trainingFraction = 0.7\n",
|
||||
"testingFraction = (1-trainingFraction)\n",
|
||||
"seed = 1234\n",
|
||||
"\n",
|
||||
"# Split the dataframe into test and training dataframes\n",
|
||||
"train_data_df, test_data_df = encoded_final_df.randomSplit([trainingFraction, testingFraction], seed=seed)"
|
||||
],
|
||||
"execution_count": 8
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Train the Model\n",
|
||||
"\n",
|
||||
"Train the Logistic Regression model and then evaluate it using Area under ROC as the metric."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"## Create a new LR object for the model\n",
|
||||
"logReg = LogisticRegression(maxIter=10, regParam=0.3, labelCol = 'tipped')\n",
|
||||
"\n",
|
||||
"## The formula for the model\n",
|
||||
"classFormula = RFormula(formula=\"tipped ~ pickupHour + weekdayVec + passengerCount + tripTimeSecs + tripDistance + fareAmount + paymentType+ trafficTimeBinsVec\")\n",
|
||||
"\n",
|
||||
"## Undertake training and create an LR model\n",
|
||||
"lrModel = Pipeline(stages=[classFormula, logReg]).fit(train_data_df)\n",
|
||||
"\n",
|
||||
"## Saving the model is optional but its another for of inter session cache\n",
|
||||
"datestamp = datetime.now().strftime('%m-%d-%Y-%s');\n",
|
||||
"fileName = \"lrModel_\" + datestamp;\n",
|
||||
"logRegDirfilename = fileName;\n",
|
||||
"lrModel.save(logRegDirfilename)\n",
|
||||
"\n",
|
||||
"## Predict tip 1/0 (yes/no) on the test dataset, evaluation using AUROC\n",
|
||||
"predictions = lrModel.transform(test_data_df)\n",
|
||||
"predictionAndLabels = predictions.select(\"label\",\"prediction\").rdd\n",
|
||||
"metrics = BinaryClassificationMetrics(predictionAndLabels)\n",
|
||||
"print(\"Area under ROC = %s\" % metrics.areaUnderROC)"
|
||||
],
|
||||
"execution_count": 10
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Evaluate and Visualize\n",
|
||||
"\n",
|
||||
"Plot the actual curve to develop a better understanding of the model.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"## Plot the ROC curve, no need for pandas as this uses the modelSummary object\n",
|
||||
"modelSummary = lrModel.stages[-1].summary\n",
|
||||
"\n",
|
||||
"plt.plot([0, 1], [0, 1], 'r--')\n",
|
||||
"plt.plot(modelSummary.roc.select('FPR').collect(),\n",
|
||||
" modelSummary.roc.select('TPR').collect())\n",
|
||||
"plt.xlabel('False Positive Rate')\n",
|
||||
"plt.ylabel('True Positive Rate')\n",
|
||||
"plt.show()"
|
||||
],
|
||||
"execution_count": null
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
|
@ -0,0 +1,492 @@
|
|||
{
|
||||
"name": "TripFaresDataPipeline",
|
||||
"properties": {
|
||||
"activities": [
|
||||
{
|
||||
"name": "IngestTripDataIntoADLS",
|
||||
"description": "Copies the trip data csv file from the git repo and loads it into the ADLS.",
|
||||
"type": "Copy",
|
||||
"dependsOn": [],
|
||||
"policy": {
|
||||
"timeout": "0.00:10:00",
|
||||
"retry": 3,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"source": {
|
||||
"type": "DelimitedTextSource",
|
||||
"storeSettings": {
|
||||
"type": "HttpReadSettings",
|
||||
"requestMethod": "GET"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextReadSettings"
|
||||
}
|
||||
},
|
||||
"sink": {
|
||||
"type": "DelimitedTextSink",
|
||||
"storeSettings": {
|
||||
"type": "AzureBlobFSWriteSettings"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextWriteSettings",
|
||||
"quoteAllText": true,
|
||||
"fileExtension": ".txt"
|
||||
}
|
||||
},
|
||||
"enableStaging": false,
|
||||
"translator": {
|
||||
"type": "TabularTranslator",
|
||||
"typeConversion": true,
|
||||
"typeConversionSettings": {
|
||||
"allowDataTruncation": true,
|
||||
"treatBooleanAsNumber": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"inputs": [
|
||||
{
|
||||
"referenceName": "tripsDataSource",
|
||||
"type": "DatasetReference"
|
||||
}
|
||||
],
|
||||
"outputs": [
|
||||
{
|
||||
"referenceName": "tripDataSink",
|
||||
"type": "DatasetReference",
|
||||
"parameters": {
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "IngestTripFaresDataIntoADLS",
|
||||
"description": "Copies the trip fare data csv file from the git repo and loads it into the ADLS.",
|
||||
"type": "Copy",
|
||||
"dependsOn": [],
|
||||
"policy": {
|
||||
"timeout": "0.00:10:00",
|
||||
"retry": 3,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"source": {
|
||||
"type": "DelimitedTextSource",
|
||||
"storeSettings": {
|
||||
"type": "HttpReadSettings",
|
||||
"requestMethod": "GET"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextReadSettings"
|
||||
}
|
||||
},
|
||||
"sink": {
|
||||
"type": "DelimitedTextSink",
|
||||
"storeSettings": {
|
||||
"type": "AzureBlobFSWriteSettings"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextWriteSettings",
|
||||
"quoteAllText": true,
|
||||
"fileExtension": ".txt"
|
||||
}
|
||||
},
|
||||
"enableStaging": false,
|
||||
"translator": {
|
||||
"type": "TabularTranslator",
|
||||
"typeConversion": true,
|
||||
"typeConversionSettings": {
|
||||
"allowDataTruncation": true,
|
||||
"treatBooleanAsNumber": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"inputs": [
|
||||
{
|
||||
"referenceName": "faresDataSource",
|
||||
"type": "DatasetReference"
|
||||
}
|
||||
],
|
||||
"outputs": [
|
||||
{
|
||||
"referenceName": "faresDataSink",
|
||||
"type": "DatasetReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "JoinAndAggregateData",
|
||||
"description": "Reads the raw data from both CSV files inside the ADLS, performs the desired transformations (inner join and aggregation) and writes the transformed data into the synapse SQL pool.",
|
||||
"type": "ExecuteDataFlow",
|
||||
"dependsOn": [
|
||||
{
|
||||
"activity": "Create Schema If Does Not Exists",
|
||||
"dependencyConditions": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
],
|
||||
"policy": {
|
||||
"timeout": "0.00:30:00",
|
||||
"retry": 3,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"dataflow": {
|
||||
"referenceName": "tripFaresDataTransformations",
|
||||
"type": "DataFlowReference",
|
||||
"datasetParameters": {
|
||||
"TripDataCSV": {
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
}
|
||||
},
|
||||
"FaresDataCSV": {
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
},
|
||||
"SynapseAnalyticsSink": {
|
||||
"SchemaName": {
|
||||
"value": "@pipeline().parameters.SchemaName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@pipeline().parameters.SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@pipeline().parameters.SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@pipeline().parameters.SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"compute": {
|
||||
"coreCount": 8,
|
||||
"computeType": "General"
|
||||
},
|
||||
"traceLevel": "Fine"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Create Schema If Does Not Exists",
|
||||
"description": "Creates the schema inside the SQL dedicated pool. Shema name comes from the pipeline parameter 'SchemaName'.",
|
||||
"type": "Lookup",
|
||||
"dependsOn": [
|
||||
{
|
||||
"activity": "IngestTripDataIntoADLS",
|
||||
"dependencyConditions": [
|
||||
"Succeeded"
|
||||
]
|
||||
},
|
||||
{
|
||||
"activity": "IngestTripFaresDataIntoADLS",
|
||||
"dependencyConditions": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
],
|
||||
"policy": {
|
||||
"timeout": "0.00:05:00",
|
||||
"retry": 3,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"source": {
|
||||
"type": "SqlDWSource",
|
||||
"sqlReaderQuery": {
|
||||
"value": "IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = '@{pipeline().parameters.SchemaName}')\nBEGIN\nEXEC('CREATE SCHEMA @{pipeline().parameters.SchemaName}')\nselect Count(*) from sys.symmetric_keys;\nEND\nELSE\nBEGIN\n select Count(*) from sys.symmetric_keys;\nEND",
|
||||
"type": "Expression"
|
||||
},
|
||||
"queryTimeout": "02:00:00",
|
||||
"partitionOption": "None"
|
||||
},
|
||||
"dataset": {
|
||||
"referenceName": "azureSynapseAnalyticsSchema",
|
||||
"type": "DatasetReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@pipeline().parameters.SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@pipeline().parameters.SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@pipeline().parameters.SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
},
|
||||
"firstRowOnly": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Copy data Trips Data",
|
||||
"type": "Copy",
|
||||
"dependsOn": [
|
||||
{
|
||||
"activity": "Create Schema If Does Not Exists",
|
||||
"dependencyConditions": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
],
|
||||
"policy": {
|
||||
"timeout": "7.00:00:00",
|
||||
"retry": 0,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"source": {
|
||||
"type": "DelimitedTextSource",
|
||||
"storeSettings": {
|
||||
"type": "HttpReadSettings",
|
||||
"requestMethod": "GET"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextReadSettings"
|
||||
}
|
||||
},
|
||||
"sink": {
|
||||
"type": "SqlDWSink",
|
||||
"preCopyScript": "IF (EXISTS (SELECT *\n FROM INFORMATION_SCHEMA.TABLES\n WHERE TABLE_SCHEMA = 'dbo'\n AND TABLE_NAME = 'TripsData'))\nBEGIN \n Truncate table TripsData;\nEnd\n",
|
||||
"allowPolyBase": true,
|
||||
"polyBaseSettings": {
|
||||
"rejectValue": 0,
|
||||
"rejectType": "value",
|
||||
"useTypeDefault": true
|
||||
},
|
||||
"tableOption": "autoCreate",
|
||||
"disableMetricsCollection": false
|
||||
},
|
||||
"enableStaging": true,
|
||||
"stagingSettings": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresDataLakeStorageLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"inputs": [
|
||||
{
|
||||
"referenceName": "tripsDataSource",
|
||||
"type": "DatasetReference"
|
||||
}
|
||||
],
|
||||
"outputs": [
|
||||
{
|
||||
"referenceName": "AzureSynapseAnalyticsTripsData",
|
||||
"type": "DatasetReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@pipeline().parameters.SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@pipeline().parameters.SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@pipeline().parameters.SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Copy data Fares Data",
|
||||
"type": "Copy",
|
||||
"dependsOn": [
|
||||
{
|
||||
"activity": "Create Schema If Does Not Exists",
|
||||
"dependencyConditions": [
|
||||
"Succeeded"
|
||||
]
|
||||
}
|
||||
],
|
||||
"policy": {
|
||||
"timeout": "7.00:00:00",
|
||||
"retry": 0,
|
||||
"retryIntervalInSeconds": 30,
|
||||
"secureOutput": false,
|
||||
"secureInput": false
|
||||
},
|
||||
"userProperties": [],
|
||||
"typeProperties": {
|
||||
"source": {
|
||||
"type": "DelimitedTextSource",
|
||||
"storeSettings": {
|
||||
"type": "HttpReadSettings",
|
||||
"requestMethod": "GET"
|
||||
},
|
||||
"formatSettings": {
|
||||
"type": "DelimitedTextReadSettings"
|
||||
}
|
||||
},
|
||||
"sink": {
|
||||
"type": "SqlDWSink",
|
||||
"preCopyScript": "IF (EXISTS (SELECT *\n FROM INFORMATION_SCHEMA.TABLES\n WHERE TABLE_SCHEMA = 'dbo'\n AND TABLE_NAME = 'FaresData'))\nBEGIN \n Truncate table FaresData;\nEnd\n",
|
||||
"allowPolyBase": true,
|
||||
"polyBaseSettings": {
|
||||
"rejectValue": 0,
|
||||
"rejectType": "value",
|
||||
"useTypeDefault": true
|
||||
},
|
||||
"tableOption": "autoCreate",
|
||||
"disableMetricsCollection": false
|
||||
},
|
||||
"enableStaging": true,
|
||||
"stagingSettings": {
|
||||
"linkedServiceName": {
|
||||
"referenceName": "TripFaresDataLakeStorageLinkedService",
|
||||
"type": "LinkedServiceReference",
|
||||
"parameters": {
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"value": "@pipeline().parameters.datalakeAccountName",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"inputs": [
|
||||
{
|
||||
"referenceName": "faresDataSource",
|
||||
"type": "DatasetReference"
|
||||
}
|
||||
],
|
||||
"outputs": [
|
||||
{
|
||||
"referenceName": "AzureSynapseAnalyticsFaresData",
|
||||
"type": "DatasetReference",
|
||||
"parameters": {
|
||||
"SynapseWorkspaceName": {
|
||||
"value": "@pipeline().parameters.SynapseWorkspaceName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"value": "@pipeline().parameters.SQLDedicatedPoolName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"keyVaultName": {
|
||||
"value": "@pipeline().parameters.KeyVaultName",
|
||||
"type": "Expression"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"value": "@pipeline().parameters.SQLLoginUsername",
|
||||
"type": "Expression"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"parameters": {
|
||||
"SchemaName": {
|
||||
"type": "string",
|
||||
"defaultValue": "tripFares"
|
||||
},
|
||||
"SynapseWorkspaceName": {
|
||||
"type": "string",
|
||||
"defaultValue": "<synapse-workspace-name>.database.windows.net"
|
||||
},
|
||||
"SQLDedicatedPoolName": {
|
||||
"type": "string",
|
||||
"defaultValue": "<sql-dedicated-pool-name>"
|
||||
},
|
||||
"SQLLoginUsername": {
|
||||
"type": "string",
|
||||
"defaultValue": "<sql-login-username>"
|
||||
},
|
||||
"KeyVaultName": {
|
||||
"type": "string",
|
||||
"defaultValue": "<keyvaukt-name>"
|
||||
},
|
||||
"datalakeAccountName": {
|
||||
"type": "string",
|
||||
"defaultValue": "<datalake-account-name>"
|
||||
}
|
||||
},
|
||||
"folder": {
|
||||
"name": "TripFaresDataPipeline"
|
||||
},
|
||||
"annotations": []
|
||||
}
|
||||
}
|