Doc framework
22
README.md
|
@ -1,28 +1,30 @@
|
|||
|
||||
# Technical Reference Implementation for Enterprise BI and Reporting
|
||||
|
||||
Azure offers a rich data and analytics platform for customers and ISVs seeking to build scalable BI and Reporting solutions. However, customers face pragmatic challenges in building the right infrastructure for enterprise-grade, production systems. They have to evaluate the various products for security, scale, performance and geo-availability requirements. They have to understand service features and their interoperability, and plan to address any perceived gaps with custom software. This takes time and effort, and many times, the end to end system architecture they design around thier product choices yields sub-optimal results. Consequently, the promise and expectations set during proof-of-concept (POC) stages do not translate to robust production systems in the expected time to market.
|
||||
Azure offers a rich data and analytics platform for customers and ISVs seeking to build scalable BI and Reporting solutions. However, customers face pragmatic challenges in building the right infrastructure for enterprise-grade, production systems. They have to evaluate the various products for security, scale, performance and geo-availability requirements. They have to understand service features and their interoperability, and plan to address any perceived gaps with custom software. This takes time and effort, and many times, the end to end system architecture they design is sub-optimal. Consequently, the promise and expectations set during proof-of-concept (POC) stages do not translate to robust production systems in the expected time to market.
|
||||
|
||||
This TRI addresses this customer pain by providing a reference implementation that (a) is pre-built based on selected, stable Azure components proven to work in enterprise BI and reporting scenarios, (b) can be easily configured and deployed to an Azure subscription within a few hours, (c) is pre-built with software to handle all the operational essentials for a full fledged production system, and (d) is tested end to end against large workloads. Once deployed, the TRI can be used as-is, or customized to fit the application needs using the technical documentation that is provided with the TRI.
|
||||
This TRI addresses this customer pain by providing a reference implementation that
|
||||
- is pre-built based on selected, stable Azure components proven to work in enterprise BI and reporting scenarios
|
||||
- can be easily configured and deployed to an Azure subscription within a few hours,
|
||||
- is bundled with software to handle all the operational essentials for a full fledged production system, and
|
||||
- is tested end to end against large workloads.
|
||||
|
||||
This liberates the customer to build the software that delivers the business goals based on a robust and functional foundational infrastructure.
|
||||
Once deployed, the TRI can be used as-is, or customized to fit the application needs using the technical documentation that is provided with the TRI. This enables the customer to build the solution that delivers the business goals based on a robust and functional infrastructure.
|
||||
|
||||
# Audience
|
||||
Business decision makers and evaluators can review the content in the **Solution Overview** folder that explain the benefits of using the TRI versus building a similar system from scratch.
|
||||
Business decision makers and evaluators can review the content in the **Solution Overview LINK TBD** folder to understand the benefits of using the TRI. For more information on how to tailor the TRI for your needs, **connect with one of our partners LINK TBD**.
|
||||
|
||||
For more information on how to tailor the TRI for your needs, **connect with one of our trained partners**.
|
||||
|
||||
It is recommended that the TRI is reviewed and deployed by a technical audience that is familiar with operational concepts in data warehousing, business intelligence, and analytics. Knowledge of Azure is a plus, but not mandatory. The technical guides provide pointers to Azure documentation for all the resources employed in this TRI.
|
||||
It is recommended that the TRI is reviewed and deployed by a person who is familiar with operational concepts of data warehousing, business intelligence, and analytics. Knowledge of Azure is a plus, but not mandatory. The technical guides provide pointers to Azure documentation for all the resources employed in this TRI.
|
||||
|
||||
# How to Deploy
|
||||
The TRI can be deployed from http://gallery.cortanaintelligence.com/azure-arch-enterprise-bi-and-reporting
|
||||
Click on the Deploy button on the right pane, and follow the instructions to fill in the input configuration parameters based on your application.
|
||||
Before you deploy, follow the prerequisites. Then click on the Deploy button on the right pane, and provide the configuration parameters based on your requirements. The deployment will create an Azure resource group, with **these components TBD**
|
||||
|
||||
# Architecture
|
||||
|
||||
![Architecture](./img/azure-arch-enterprise-bi-and-reporting.png)
|
||||
|
||||
The TRI is designed with the initial premise that a customer desires to move data from an existing reporting and BI solution powered by a data warehouse into Azure. A key assumption is that the data ingested into the system is already ETL-processed to be loaded into a data warehouse.
|
||||
The TRI is designed with a key assumption that the data that is to be ingested into the system has been ETL-processed for reporting and analytics.
|
||||
|
||||
The TRI has 4 stages: Ingestion, Processing, Analysis and Reporting, and Consumption.
|
||||
1. A data generator, provided in place of the customer's data source, queries the job manager for a staging [Azure Blob](https://docs.microsoft.com/en-us/azure/storage/) storage. The job manager returns the handle to an ephemeral BLOB, and the data generator pushes data files into this storage. [Configure the data ingestion](https://msdata.visualstudio.com/AlgorithmsAndDataScience/TRIEAD/_git/CIPatterns?_aConfiguringDataIngestion.md) module to load actual customer data.
|
||||
|
@ -38,6 +40,8 @@ operations. More details are provided in **[Configuring Logical Data Warehouse]*
|
|||
6. For reporting, SSRS generates the report from data in the SQL DW via SSAS Direct Query. SSAS also offers row level security for the data fetched from SQL DW.
|
||||
7. You can schedule report generation with SSRS using the Report Builder client tool. The generated reports are stored in SSRS servers. You can enable email based delivery of reports to users.
|
||||
|
||||
# Technical Guides
|
||||
|
||||
# How to Delete a Deployment
|
||||
The TRI creates the end to end system in a dedicated resource group provided by you. Login to http://portal.azure.com, and delete this resource group from your subscription.
|
||||
|
||||
|
|
|
@ -0,0 +1,90 @@
|
|||
# Configuring Logical Data Warehouses
|
||||
The TRI implements data load orchestration into multiple parallel data warehouses for redundancy and high availability.
|
||||
|
||||
![Architecture](./ConfiguringSQLDWforTRI.png)
|
||||
|
||||
## Data Availability and Orchestration features
|
||||
|
||||
**TODO - Dev Team - Review the following and confirm if they apply for TRI-1**
|
||||
|
||||
The logical Data Warehouse architecture and orchestration address these requirements:
|
||||
|
||||
1. Each logical data warehouse (LDW) consists of a single physical data warehouse by default. More replicas per LDW can be configured for scalability and high availability.
|
||||
2. SQL DW data refresh cycle can be configured by the user - one option is to use 8 hours.This implies loading the physical data warehouses 3 times a day.
|
||||
3. Adding new schemas and data files to the SQL DW is a simple, scriptable process. The TRI assumes 100-500 data files being sent in every day, but this can vary day to day.
|
||||
4. The job manager is both “table” aware and "data/time" aware to plan execution of a report until data has been applied representing a given period of time for that table.
|
||||
5. Surrogate keys are not utilized and no surrogate key computation is applied during data upload.
|
||||
6. All data files are expected to be applied using “INSERT” operations. There is no support to upload “DELETE” datasets. Datasets must be deleted by hand; no special accommodation is made in the architecture for DELETE or UPDATE.
|
||||
7. All fact tables in the data warehouse (and the DIMENSION_HISTORY) tables are expected to follow the Kimball [Additive Accumulating Snapshot Fact Table](http://www.kimballgroup.com/2008/11/fact-tables/) approach. A “reversal flag” approach is recommended, to indicate if a fact is to be removed, with offsetting numeric values. For example, a cancelled order is stored with value of $100 on day 1 and reversal flag set to false; and stored with a value of -$100 on day 2 with a reversal flag set to true.
|
||||
8. All fact tables will have DW_ARCHIVAL_DATE column set so that out-of-time analysis and aggregation can be performed. The values for the DW_ARCHIVAL_DATE will be set by the Data Generator that computes the change set for the LDW each local-timezone day.
|
||||
9. The job manager does not prioritize data loads, and provides only a minimal dependency tracking for golden dimensions and aggregates. “Golden Dimensions” are tables that must be loaded before other tables (dimension, fact or aggregate) into the physical EDWs.
|
||||
10. Dimension tables must be re-calculated and refreshed after every load of a dimension table with >0 records. A stored procedure to re-create the current dimension table after a load of dimension table history records is sufficient.
|
||||
11. The Admin GUI provides DW load status.
|
||||
12. Data availability can be controlled using manual overrides.
|
||||
|
||||
## Relationship with Tabular Models
|
||||
|
||||
The TRI also meets the following requirements for the tabular model generation in relation to the SQL DW:
|
||||
|
||||
1. An optional stored procedure runs on tables to produce aggregate results after a load. The aggregate tables will also be tracked in the job manager. A set of tabular model caches will be refreshed with the results of the incremental dataset changes.
|
||||
2. Tabular model refreshes do not need to be applied synchronously with the logical data warehouse flip; however, there will be minimal (data volume dependent) delay between the tabular model refresh and the application of updates as viewed by a customer.
|
||||
3. Dependencies from the tabular model caches will be known to the Job Manager. Only the tabular model caches that are impacted by a dataset change will get re-evaluated and their read-only instances updated.
|
||||
4. The system is designed to refresh 10-100 tabular model caches 3 times daily, with each tabular model having size approximately 10Gb of data.
|
||||
|
||||
## Logical Data Warehouse Status and Availability
|
||||
A set of control tables associate physical DWs to tables, schemas, and to time ranges and record dataset auditing information (start date, end date, row count, filesize, checksum) in a separate audit file.
|
||||
|
||||
The LDW load and read data sets iterate through three states:
|
||||
- Load: The LDW set is processing uploaded data files to “catch-up” to the latest and greatest data.
|
||||
- Load-Defer:The LDW is not processing updates nor serving customers; it is a hot-standby with “best available” data staleness for disaster recovery purposes. **TODO - Confirm if we have this state**
|
||||
- Primary: The LDW is up-to-date and serving requests but not receiving any additional data loads.
|
||||
|
||||
It is recommended that the data files that are loaded into physical DW instances have the following naming structure:
|
||||
|
||||
- Data File: `startdatetime-enddatetime-schema.tablename.data.csv`
|
||||
- Audit file: `startdatetime-enddatetime-schema.tablename.data.audit.json`
|
||||
|
||||
This will provide sufficient information to determine the intent of the file should it appear outside of the expected system paths. The purpose of the audit file is to contain the rowcount, start/end date, filesize and checksum. Audit files must appear next to their data files in the same working directory always. Orphaned data or audit files should not be loaded.
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
### Anatomy of a Logical SQL DW Flip
|
||||
|
||||
Here is an example schedule showing how the logical DW flips occur - with physical data warehouses located in different Azure availability regions. The DW control logic in the job manager performs the flip operation on the schedule only if the current time is past the time of the schedule and the conditions for safe and healthy operation of the scheduled event is fulfilled.
|
||||
|
||||
- Active - is the state when the LDW is active serving user queries.
|
||||
- Load - is the state when the LDW is being loaded with data via data load jobs.
|
||||
- Standby - is the state when the administrator has paused the LDW (i.e. the physical data warehouses in the LDW) for planned maintenance, if no data is available to be loaded, or other reasons.
|
||||
|
||||
|
||||
| PST | EST | UTC | LDW 1 - US West | LDW 2 - US East | Data scenario |
|
||||
|:----|:----|:----|:------|:------|:-------------------------|
|
||||
|00:00 | 03:00 | 08:00 | Active | Load | Batch 1 is loaded into LDW 2 from BLOB via dynamic ADF pipelines |
|
||||
|08:00 | 11:00 | 16:00 | Load | Active | Batch 2 data is loaded into LDW 1, while LDW 2 becomes the reader/primary (NOTE: Any incomplete ADF pipelines may continue to load LDW 2 until completion; Query connections and performance may be impacted in LDW2) |
|
||||
|16:00 | 19:00 |24:00 | Active | Load | Batch 3 is loaded into LDW 2, while LDW 1 becomes the primary |
|
||||
|20:00 | 23:00 |04:00 | Active | PAUSE | Admin pauses the Loader LDW 4 hours into the loading cycle |
|
||||
|
||||
# DataWarehouse Flip Operation
|
||||
This transition of a LDW from Load to Active and vice versa a.k.a the "Flip Operation" is done every T hours where T is configurable by the user.
|
||||
The flip operation is executed through the following steps
|
||||
1. Once the current UTC time is past the end time of the current flip interval of T hours, a flip operation is initiated which needs to transition the currently Active LDW to Load status
|
||||
and the next-to-be-Active Load LDW into Active status. If there are no LDWs in Load state then no flip will happen. If there are more than 1 LDW in Load state then the next LDW in sequence after the currently Active LDW is picked as the one to be flipped to Active state.
|
||||
2. Once a flip operation is initiated the following conditions are checked before a Load LDW can be switched to Active state
|
||||
a. Each Load PDW in the Load LDW is transitioned to StopLoading state when no new load jobs for the PDW are started and the it waits for current load jobs to complete
|
||||
b. StopLoading PDW is transitioned to ScaleToActive state when current load jobs have completed and PDW's DWU capacity is being scaled up to higher capacity for servicing requests
|
||||
c. ScaleToActive PDW is transitioned to Active state when it can actively serve user queries
|
||||
3. Once each PDW in the next-to-be-Active LDW are flipped to Active state, the direct query nodes pointing to the PDWs of the previously Active LDW are switched to point to the newly Active ones.
|
||||
4. The above steps happen in a staggered manner such that Direct Query nodes don't change PDW connections all at once.This is to ensure that no existing user connections are dropped. A connection drain time is allowed when a Direct Query node stops accepting new requests but completes processing its existing requests before it can flip to the newly Active PDW.
|
||||
5. Once all the PDWs have switched to Active state, the Active PDWs of the previously Active LDW are then transitioned into Load state after being scaled down to a lower DWU capacity.
|
||||
6. A record is inserted in the database containing the timestamp when the next flip operation will be initiated and the all the above steps are repeated once the current UTC time is past that timestamp
|
||||
|
||||
**Importantly, are there any timing instructions for the Admin to restart the process**
|
||||
The flip interval of T hours is a configurable property and can be set by the Admin by updating a ControlServer database property
|
||||
When the next flip time comes around, this value will be used to set the next flip interval.
|
||||
If the Admin wants to flip immediately then the end timestamp of the current flip interval will need to be updated to current UTC time in the LDWExpectedStates db table and flip operation should be initiated in the next couple of minutes.
|
||||
|
||||
**What other situations will require Admin intervention**
|
||||
The flip operation requires a Load PDW to satisfy certain conditions before it can be made Active. These are explained in 2.a - 2.c of DataWarehouse Flip Operation. If load jobs get stuck or if scaling takes a long time, flip operation will be halted. If all the Direct Query nodes die, even then flip operation will not be triggered because currently ASDQ daemons initiate flip operation. Admin intervention will be required to address these.
|
||||
|
||||
**Explain what other steps the Admin should NOT do with the flip pattern**
|
||||
Once a flip operation is started, Admin should not try to change the state of PDWs or LDWs by themselves. Since these states are maintained in the job manangers database, any mismatch between those and the real state will throw off the flip operation. If any of the PDWs die , Admin needs to get it back into the state as was last recorded in the database.
|
|
@ -0,0 +1,52 @@
|
|||
# DataWarehouse Flip Operation
|
||||
|
||||
The Logical Datawarehouse Sets( each set being a group of Physical Datawarehouses by availability region) iterate through "Load" and "Active" states when the system is running. A LDW can also be in "Standby" state if it is not being actively used in the datawarehousing process. The 3 states are defined below
|
||||
- Load: The LDW set is processing uploaded data files to "catch-up" to the latest and greatest data.
|
||||
- Standby:The LDW is not processing updates nor serving customers; it is a hot-standby with “best available” data staleness for disaster recovery purposes.
|
||||
- Active: The LDW is up-to-date and serving requests but not receiving any additional data loads.
|
||||
|
||||
### Anatomy of a Logical Datawarehouse Flip
|
||||
|
||||
This transition of a LDW from Load to Active and vice versa a.k.a the "Flip Operation" is done every T hours where T is configurable by the user.
|
||||
The flip operation is triggered by daemons which run as scheduled task on each of the Analysis Server Direct Query (ASDQ) Nodes.
|
||||
|
||||
|
||||
Every few minutes each daemon running on each ASDQ node queries the job manager if a LDW flip needs to happen.
|
||||
|
||||
1. The job manager maintains a database table "LDWExpectedStates" which stores the start and end times of the current flip interval.It consists of records that define which LDW is in Load state and which is in Active state and til what time they are supposed to be in those states.
|
||||
2. On being queried by the ASDQ daemon, job manager queries this table and checks if its past the end time of the current flip interval else it responds with a No-Op. If the current UTC time is past the endtime, then flip operation needs to be executed and the following steps are executed.
|
||||
|
||||
a. The LDW which needs to be Active in the next flip interval is determined and LDWExpectedStates table is populated with details regarding the start and end time of the next flip interval. The endtime stamp is determined by adding T hours to the start time which is the current UTC time. If there are no LDWs in Load state then no flip will happen. If there are more than 1 LDW in Load state then the next LDW in sequence after the currently Active LDW is picked as the one to be flipped to Active state.
|
||||
|
||||
b. The state of the next-to-be-Active LDW is switched to Active state and the state transitions of its PDWs from Load to Active are initiated
|
||||
|
||||
3. PDW state transition from Load to Active goes through a couple of intermediate steps as follows
|
||||
a. Load : The PDW is processing uploaded data files to "catch-up" to the latest and greatest data.
|
||||
b. StopLoading : The PDW will not be accepting any new data load jobs but will wait till the current load jobs complete
|
||||
c. ScaleUpToActive : State indicating that the PDW has completed all its assigned load jobs and is being scaled up to Active DWU capacity
|
||||
d. Active - PDW is up-to-date and serving requests but not receiving any additional data loads.
|
||||
|
||||
4. Once a PDW is changed to Active state, job manager checks if there is atleast 1 DQ node in the "DQ Alias group" which is still serving active queries. A "DQ Alias group" is the group of DQ nodes which point to the same PDW instance in an LDW. Multiple DQ nodes can point to the same PDW. This is ensure that we can increase the availability of DQs if we need to, assuming the PDW can support concurrent queries from all these DQs. Checking atleast 1 DQ is in active state ensures new requests do not get dropped. If this check succeeds a "Transition" response is sent to the DQ node which stops accepting new connections from the DQ LoadBalancer and drains off the existing connections. Once the grace time is over, the DQ changes its connection string to point to the newly Active PDW and reports to job manager which then allows other ASDQs to start their transitions.
|
||||
|
||||
5. Once all the DQs in a "DQ Alias group" have flipped to a different PDW, the group's original PDW is transitioned to a Load state after scaling down its DWU capacity.
|
||||
6. After all the Active PDWs of the previously Active LDW have been transitioned to Load state, the state of the LDW is changed to Load state. This marks the end of the flip operation.
|
||||
|
||||
Here is an example schedule showing how the Flip Operation occurs using the following configuration
|
||||
2 LDWS : LDW01, LDW02
|
||||
2 PDWS : PDW01-LDW01 (LDW01), PDW01-LDW02 (LDW02)
|
||||
2 DQ Nodes : DQ01(points to PDW01-LDW01), DQ02(points to PDW01-LDW01)
|
||||
ASDQ daemon schedule - 1 minute
|
||||
Connection Time Drain - 10 minutes
|
||||
|
||||
| UTC | LDW01 | LDW02 | PDW01-LDW01 | PDW01-LDW02 | DQ01 | DQ02 |
|
||||
|:----|:----|:----|:----|:----|:----|:----|
|
||||
|00:00 | Active | Load | Active | Load | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 |
|
||||
|00:01 | Active | Load | Active | StopLoading | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 |
|
||||
|00:03 | Active | Load | Active | ScaleUpToActive | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 |
|
||||
|00:05 | Active | Active | Active | Active | Normal : PDW01-LDW01 | Normal : PDW01-LDW01 |
|
||||
|00:06 | Active | Active | Active | Active | Transition : PDW01-LDW01 | Normal : PDW01-LDW01 |
|
||||
|00:16 | Active | Active | Active | Active | ChangeCompleted : PDW01-LDW02 | Normal : PDW01-LDW01 |
|
||||
|00:26 | Active | Active | Active | Active | ChangeCompleted : PDW01-LDW02 | Transition : PDW01-LDW01 |
|
||||
|00:27 | Active | Active | ScaleDownToLoad | Active | Normal : PDW01-LDW02 | Normal : PDW02-LDW02 |
|
||||
|00:29 | Load | Active | Load | Active | Normal : PDW01-LDW02 | Normal : PDW02-LDW02 |
|
||||
|
|
@ -0,0 +1,225 @@
|
|||
# Analysis Services for Interactive BI
|
||||
|
||||
The TRI helps you operationalize and manage tabular models in Analysis Services for interactive BI. The read-only AS servers are configured to handle interactive BI query load from client connections via a front end load balancer. Analysis Services, tabular models, and their characteristics are explained [here]((https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas))
|
||||
|
||||
The SSAS Model Cache in this TRI consists of six components - their roles described in the [architectural overview] (..\CiqsInstaller\CiqsInstaller\core\HomePage.md)
|
||||
- Tabular Models
|
||||
- SSAS Partition Builder
|
||||
- SSAS Read Only Cache servers
|
||||
- Job Manager that coordinates the tabular model refresh
|
||||
- Azure Blob that stores the tabular models for refresh
|
||||
- Load Balancers that handles client connections
|
||||
|
||||
![SSAS Tabular Models Cache](../img/SSAS-Model-Cache.png)
|
||||
|
||||
## Tabular Model Creation
|
||||
|
||||
Tabular models are Analysis Services databases that run in-memory, or act a pass-through for backend data sources. They support cached summaries and drilldowns of large amounts of data, thanks to a columnar storage that offers 10x or more data compression. This makes them ideal for interactive BI applications. See [this article](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas) for more details. Typically, tabular models hold only a subset of the big data held in upstream data warehouse or data marts – in terms of the number of entities, and data size. There is a large corpus of best practices information for tabular model design and tuning, including this [excellent article](https://msdn.microsoft.com/en-us/library/dn751533.aspx) on the lifecycle of an enterprise grade tabular model.
|
||||
|
||||
You can use tools such as the [SSDT tabular model designer](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-designer-ssas) available with Visual Studio 2015 (or greater) to create your tabular models. Set the compatibility level of the tabular models at 1200 or higher (latest is 1400 as of this writing) and the query mode to In-Memory. See [here](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-solutions-ssas-tabular) for details on tabular model creation.
|
||||
|
||||
## Tabular Model Partition Processing
|
||||
|
||||
Partition creation is automated using the open source [AsPartitionProcessing tool](https://github.com/Microsoft/Analysis-Services/tree/master/AsPartitionProcessing). Many of the configurable options for partition building directly correspond to the configuration of this tool. Refer to AsPartitionProcessing tool's [whitepaper](https://github.com/Microsoft/Analysis-Services/blob/master/AsPartitionProcessing/Automated%20Partition%20Management%20for%20Analysis%20Services%20Tabular%20Models.pdf) for further documentation.
|
||||
|
||||
## Tabular model configuration for continuous incremental refresh at scale
|
||||
The various orchestration components of the TRI refer to four configuration tables to enable continuous and incremental model refresh.
|
||||
|
||||
You can provide configuration inputs for two of these tables:
|
||||
|
||||
| TableName | Description |
|
||||
|:----------|:------------|
|
||||
|**TabularModel**|Lists tabular models with their server and database names, to be used by the daemon on the Partition Builder servers to connect to the SSAS server and the database for refresh.|
|
||||
|**TabularModelTablePartitions**|This table specifies which model that a tabular model table is part of, and the source (DW) table to which a tabular model table is bound to. It also provides the column that will be used in refreshing the tabular model, and the lower and upper bounds of the data held in this tabular model. It also defines the strategy for processing the SSAS partitions.|
|
||||
|
||||
### TabularModel
|
||||
Provide the unique <_server, database_> pairs in the table. This information uniquely identifies each tabular model for the Job Manager, and in turn, will be used by the daemon on the Partition Builder nodes to connect to the SSAS server and the database before refresh.
|
||||
|
||||
_Example_:
|
||||
|
||||
```json
|
||||
{
|
||||
"AnalysisServicesServer":"ssaspbvm00",
|
||||
"AnalysisServicesDatabase":"AdventureWorks",
|
||||
"IntegratedAuth":true,
|
||||
"MaxParallelism":4,
|
||||
"CommitTimeout":-1
|
||||
}
|
||||
```
|
||||
|
||||
* **AnalysisServicesServer** : SSAS VM name or Azure AS URL.
|
||||
* **AnalysisServicesDatabase** : Name of the database.
|
||||
* **IntegratedAuth** : Boolean flag whether connection to DW to be made using integrated authentication or SQL authentication.
|
||||
* **MaxParallelism** : Maximum number of threads on which to run processing commands in parallel during partition building.
|
||||
* **CommitTimeout** : Cancels processing (after specified time in seconds) if write locks cannot be obtained. -1 will use the server default.
|
||||
|
||||
### TabularModelTablePartitions
|
||||
|
||||
_Example_:
|
||||
```json
|
||||
{
|
||||
"AnalysisServicesTable":"FactResellerSales",
|
||||
"SourceTableName":"[dbo].[FactResellerSales]",
|
||||
"SourcePartitionColumn":"OrderDate",
|
||||
"TabularModel_FK":1,
|
||||
"DWTable_FK":"dbo.FactResellerSales",
|
||||
"DefaultPartition":"FactResellerSales",
|
||||
"ProcessStrategy":"ProcessDefaultPartition",
|
||||
"MaxDate":"2156-01-01T00:00:00Z",
|
||||
"LowerBoundary":"2010-01-01T00:00:00Z",
|
||||
"UpperBoundary":"2011-12-31T23:59:59Z",
|
||||
"Granularity":"Daily",
|
||||
"NumberOfPartitionsFull":0,
|
||||
"NumberOfPartitionsForIncrementalProcess":0
|
||||
}
|
||||
```
|
||||
|
||||
* **AnalysisServicesTable** : The table to be partitioned.
|
||||
* **SourceTableName** : The source table in the DW database.
|
||||
* **SourcePartitionColumn** : The source column of the source table.
|
||||
* **TabularModel_FK** : Foreign Key reference to the TabularModel.
|
||||
* **DWTable_FK** : Foreign Key reference to the DWTable.
|
||||
* **DefaultPartition** :
|
||||
* **MaxDate** : The maximum date that needs to be accounted for in the partitioning configuration.
|
||||
* **LowerBoundary** : The lower boundary of the partition date range.
|
||||
* **UpperBoundary** : The upper boundary of the partition date range.
|
||||
* **ProcessStrategy** : Strategy used for processing the partition; "RollingWindow" or "ProcessDefaultPartition". The default partition can be specified using the "DefaultPartition" property. Otherwise assumes a "template" partition with same name as table is present.
|
||||
* **Granularity** : Partition granularity of "Daily", "Monthly", or "Yearly".
|
||||
* **NumberOfPartitionsFull** : Count of all partitions in the rolling window. For example, a rolling window of 10 years partitioned by month would require 120 partitions.
|
||||
* **NumberOfPartitionsForIncrementalProcess** : Count of hot partitions where the data can change. For example, it may be necessary to refresh the most recent 3 months of data every day. This only applies to the most recent partitions.
|
||||
|
||||
|
||||
Provide one of the following values for partitioning strategy in the column _PartitionStrategy_:
|
||||
|
||||
- _ModelProcessStrategy.ProcessDefaultPartition_ (Default)
|
||||
- _ModelProcessStrategy.RollingWindow_
|
||||
|
||||
If you choose _ModelProcessStrategy.ProcessDefaultPartition_:
|
||||
|
||||
- Confirm that the tabular model contains a partition with the same name as the tabular model table. This partition is always used for the data load of the date slice, even if there are other partitions in the table.
|
||||
- Provide values for _SourceTableName_ and _SourcePartitionColumn_.
|
||||
- Provide values for _LowerBoundary, UpperBoundary_ to provide the start and end time span for the data in the tabular model.
|
||||
|
||||
If you choose _ModelProcessingStrategy.RollingWindow_:
|
||||
- Confirm that the table partitions are defined on time granularity based ranges - as in Daily, Monthly or Yearly.
|
||||
- Provide values for the columns _MaxDate_, _NumberOfPartitionsFull_ and _NumberOfPartitionsforIncrementalProcess_.
|
||||
|
||||
For both the cases, confirm that the value provided in _SourcePartitionColumn_ of the table represents a column of type DateTime in the source DW table. Tabular model refresh is incremental on time, for a specific date range.
|
||||
|
||||
Next, the following two tables are **read-only**. You should **not** change or update these tables or its values, but you can view them for troubleshooting and/or understanding how the model refresh happens.
|
||||
|
||||
| TableName | Description |
|
||||
|:----------|:------------|
|
||||
|**TabularModelPartitionStates**|In this table, the Job Manager tracks the source and target context for all data slices that are to be refreshed or processed in a tabular model, the start and end dates of each data slice, and the Blob URI where the processed tabular model backups will be stored.|
|
||||
|**TabularModelNodeAssignments**|In this table, the partition builder tracks the refresh state of each AS Read-Only node for each tabular model. It is used to indicate the maximum date for an entity for which the data has been processed. Each of the SSAS Read-Only nodes provides its current state here - in terms of latest data by date for every entity.
|
||||
|
||||
### TabularModelPartitionStates
|
||||
|
||||
This table helps track all the data slices that are to be refreshed or processed in a tabular model. Each piece of incremental data loaded into the DW is specified with a start and end date, defining the data slice. Each new data slice will trigger a new partition to be built.
|
||||
|
||||
---
|
||||
_Example_:
|
||||
```json
|
||||
{
|
||||
"ProcessStatus":"Purged",
|
||||
"TabularModelTablePartition_FK":4,
|
||||
"StartDate":"2017-09-12T18:00:00Z",
|
||||
"EndDate":"2017-09-12T21:00:00Z",
|
||||
"PartitionUri":"https://edw.blob.core.windows.net/data/AdventureWorks-backup-20170912T090408Z.abf",
|
||||
"ArchiveUri":"https://edw.blob.core.windows.net/data",
|
||||
"SourceContext":"{\"Name\":\"AzureDW\",\"Description\":\"Data source connection Azure SQL DW\",\"DataSource\":\"bd044-pdw01-ldw01.database.windows.net\",\"InitialCatalog\":\"dw\",\"ConnectionUserName\":\"username\",\"ConnectionUserPassword\":\"password\",\"ImpersonationMode\":\"ImpersonateServiceAccount\"}",
|
||||
"TargetContext":"{\"ModelConfigurationID\":1,\"AnalysisServicesServer\":\"ssaspbvm00\",\"AnalysisServicesDatabase\":\"AdventureWorks\",\"ProcessStrategy\":1,\"IntegratedAuth\":true,\"MaxParallelism\":4,\"CommitTimeout\":-1,\"InitialSetup\":false,\"IncrementalOnline\":true,\"TableConfigurations\":[{\"TableConfigurationID\":1,\"AnalysisServicesTable\":\"FactSalesQuota\",\"PartitioningConfigurations\":[{\"DWTable\":null,\"AnalysisServicesTable\":\"FactSalesQuota\",\"Granularity\":0,\"NumberOfPartitionsFull\":0,\"NumberOfPartitionsForIncrementalProcess\":0,\"MaxDate\":\"2156-01-01T00:00:00\",\"LowerBoundary\":\"2017-09-12T18:00:00\",\"UpperBoundary\":\"2017-09-12T21:00:00\",\"SourceTableName\":\"[dbo].[FactSalesQuota]\",\"SourcePartitionColumn\":\"Date\",\"TabularModel_FK\":1,\"DWTable_FK\":\"dbo.FactSalesQuota\",\"DefaultPartition\":\"FactSalesQuota\",\"Id\":4,\"CreationDate\":\"2017-09-12T17:17:28.8225494\",\"CreatedBy\":null,\"LastUpdatedDate\":\"2017-09-12T17:17:28.8225494\",\"LastUpdatedBy\":null}],\"DefaultPartitionName\":\"FactSalesQuota\"}]}"
|
||||
}
|
||||
```
|
||||
* **ProcessStatus**: Current status of the partition being processed. _Queued_, _Dequeued_, _Ready_, or _Purged_.
|
||||
* **TabularModelTablePartition_FK**: Foreign key referencing the TabularModelTablePartition.
|
||||
* **StartDate**: The start date of the current refresh data slice.
|
||||
* **EndDate**: The end date of the current refresh data slice.
|
||||
* **PartitionUri**: Uri of the resulting partitioned backup datafile.
|
||||
* **ArchiveUri**: Uri of the location to place the partitioned backup datafile.
|
||||
* **SourceContext**: JSON Object specifying the source DW connection information.
|
||||
* **TargetContract**: JSON Object that maps to the AsPartitionProcessing client's "ModelConfiguration" contract.
|
||||
|
||||
Each row of this table represents a partition that should be built and the configuration that will be used to execute the partition builder client. Various components in the Control server can create a “work item” for the partition builder which uses the information in the attributes to process that data slice. It is evident from the fields in this table that all work items exist in the context of a _TabularModelTablePartition_ entity.
|
||||
|
||||
Each row contains the start and end date of the data slice. Each entity to be partitioned clearly defines a StartDate and EndDate for the date slice to be processed. Note that this date range is produced by producer of this entity. In a typical case, this represents the date range for which a tabular model needs to be refreshed – where the range is simply the date range of the data slice in a _DWTableAvailabilityRange_ entity.
|
||||
|
||||
#### Source context - DataSourceInfo sample contract object
|
||||
Source context – indicates which data source to connect to fetch the data for the slice (i.e. all information required to connect to a DW table), and contains serialized data source (DW) connection information that the tabular model uses to update or set its connection string dynamically. This data contract directly maps to the partition processing client’s DataSourceInfo contract.
|
||||
|
||||
```json
|
||||
{
|
||||
"Name":"AzureDW",
|
||||
"Description":"Data source connection Azure SQL DW",
|
||||
"DataSource":"pdw01-ldw01.database.windows.net",
|
||||
"InitialCatalog":"dw",
|
||||
"ConnectionUserName":"username",
|
||||
"ConnectionUserPassword":"password",
|
||||
"ImpersonationMode":"ImpersonateServiceAccount"
|
||||
}
|
||||
```
|
||||
#### Target context - ModelConfiguration sample contract object
|
||||
The TargetContext is a serialized representation of the ModelConfiguration contract that the partition processing client expects. It is a simplified representation of the tables and partitions that are to be processed by the client. A sample contract object is like so:
|
||||
|
||||
```json
|
||||
{
|
||||
"ModelConfigurationID":1,
|
||||
"AnalysisServicesServer":"ssaspbvm00",
|
||||
"AnalysisServicesDatabase":"AdventureWorks",
|
||||
"ProcessStrategy":1,
|
||||
"IntegratedAuth":true,
|
||||
"MaxParallelism":4,
|
||||
"CommitTimeout":-1,
|
||||
"InitialSetup":false,
|
||||
"IncrementalOnline":true,
|
||||
"TableConfigurations":[
|
||||
{
|
||||
"TableConfigurationID":1,
|
||||
"AnalysisServicesTable":"FactSalesQuota",
|
||||
"PartitioningConfigurations":[
|
||||
{
|
||||
"DWTable":null,
|
||||
"AnalysisServicesTable":"FactSalesQuota",
|
||||
"Granularity":0,
|
||||
"NumberOfPartitionsFull":0,
|
||||
"NumberOfPartitionsForIncrementalProcess":0,
|
||||
"MaxDate":"2156-01-01T00:00:00",
|
||||
"LowerBoundary":"2017-09-12T18:00:00",
|
||||
"UpperBoundary":"2017-09-12T21:00:00",
|
||||
"SourceTableName":"[dbo].[FactSalesQuota]",
|
||||
"SourcePartitionColumn":"Date",
|
||||
"TabularModel_FK":1,
|
||||
"DWTable_FK":"dbo.FactSalesQuota",
|
||||
"DefaultPartition":"FactSalesQuota"
|
||||
}],
|
||||
"DefaultPartitionName":"FactSalesQuota"
|
||||
}]
|
||||
}
|
||||
```
|
||||
### TabularModelNodeAssignment
|
||||
This table contains entities that represents the refresh state of each of the supported tabular model tables per entity. The Analysis Server Read-Only nodes use this table to figure which backup file from the _TabularModelPartitionState_ entity to restore on each of the nodes. The Partition Builder node logs an entry that points to the maximum date ceiling for which data has been refreshed on a per tabular model table basis.
|
||||
|
||||
_example_:
|
||||
```json
|
||||
{
|
||||
"Name":"ssaspbvm00",
|
||||
"Type":"ASPB",
|
||||
"TabularModelTablePartition_FK":1,
|
||||
"State":"Building",
|
||||
"LatestPartitionDate":"2017-09-14T18:00:00Z"
|
||||
}
|
||||
```
|
||||
* **Name**: The name of the virtual machine node.
|
||||
* **Type**: The type of the virtual machine node.
|
||||
* _ASRO_: SSAS Read-only
|
||||
* _ASPB_: SSAS Partition Builder
|
||||
* **TabularModelTablePartition_FK**: Foreign key referencing the _TabularModelTablePartition_ table.
|
||||
* **State**: Current state of the node.
|
||||
* _Normal_
|
||||
* _Transition_
|
||||
* _Building_
|
||||
* **LatestPartitionDate**: Latest partition build date for the node.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
# Summary
|
||||
This page summarizes prerequisites for EDW Reporting TRA deployment.
|
||||
|
||||
# VNET
|
||||
|
||||
Most of the resources provisioned will be placed in a pre-existing Azure VNET. Therefore, we require an Azure VNET resource and a domain controller to be deployed in the subscription where the EDW Reporting TRA will be deployed. Customers who already have a functioning Azure VNET can skip this section. For customers new to Azure, the guide below will show how to easily deploy prerequisites in their subscription.
|
||||
|
||||
## Provisioning Azure VNet resource
|
||||
|
||||
First, we will create new Azure VNET and VPN Gateway resources. Navigate to <source root>\edw\deployment directory and run the command below. Note that it might take up to 45 minutes to complete.
|
||||
|
||||
```PowerShell
|
||||
Login-AzureRmAccount
|
||||
|
||||
.\DeployVPN.ps1 -SubscriptionName "My Subscription" -Location "westus" -EDWAddressPrefix "10.254.0.0/16" -EDWGatewaySubnetPrefix "10.254.1.0/24" -OnpremiseVPNClientSubnetPrefix "192.168.200.0/24" -ResourceGroupName "ContosoVNetGroup" -VNetName "ContosoVNet" -VNetGatewayName "ContosoGateway" -RootCertificateName "ContosoRootCertificate" -ChildCertificateName "ContosoChildCertificate"
|
||||
```
|
||||
|
||||
In addition to provisioning Azure VNET and VPN Gateway resources, the script above will also create a self-signed root certificate and a client certificate for the VPN gateway. The root certificate is used for generating and signing client certificates on the client side, and for validating those client certificates on the VPN gateway side.
|
||||
|
||||
To enable people in your organization to connect to the newly provisioned VNET via the VPN gateway, you will need to export the two certificates. You can use the commands below to generate the PFX files. The two files can then be shared and installed on the machines of users who need VPN access.
|
||||
|
||||
```PowerShell
|
||||
$rootCert = Get-ChildItem -Path cert:\CurrentUser\My | ?{ $_.Subject -eq "CN=ContosoRootCertificate" }
|
||||
$childCert = Get-ChildItem -Path cert:\CurrentUser\My | ?{ $_.Subject -eq "CN=ContosoChildCertificate" }
|
||||
|
||||
$type = [System.Security.Cryptography.X509Certificates.X509Certificate]::pfx
|
||||
$securePassword = ConvertTo-SecureString -String "Welcome1234!" -Force –AsPlainText
|
||||
|
||||
Export-PfxCertificate -Cert $rootCert -FilePath "ContosoRootCertificate.pfx" -Password $securePassword -Verbose
|
||||
Export-PfxCertificate -Cert $childCert -FilePath "ContosoChildCertificate.pfx" -Password $securePassword -Verbose
|
||||
```
|
||||
|
||||
## Provisioning the Domain Controller
|
||||
|
||||
The next step is to deploy a Domain Controller VM and set up a new domain. All VMs provisioned during the EDW TRA deployment will join the domain managed by the domain controller. To do that, run the PowerShell script below.
|
||||
|
||||
```PowerShell
|
||||
.\DeployDC.ps1 -SubscriptionName "My Subscription" -Location "westus" -ExistingVNETResourceGroupName "ContosoVNetGroup" -ExistingVNETName "ContosoVNet" -DomainName "contosodomain.ms" -DomainUserName "edwadmin" -DomainUserPassword "Welcome1234!"
|
||||
```
|
||||
|
||||
The script above will provision an Azure VM and promote it to serve as the domain controller for the VNET. In addition, it will reconfigure the VNET to use the newly provisioned VM as its DNS server.
|
|
@ -0,0 +1 @@
|
|||
# Configuring Power BI Services and Gateway to enable Interactive BI
|
|
@ -0,0 +1,7 @@
|
|||
Steps:
|
||||
1. Stop all DataGen schedule
|
||||
2. Drop AdventureWorks table in all the physical data warehouses - both loader and reader.
|
||||
3. Size your data warehouse
|
||||
4. Optionally override the initial setting for flip time
|
||||
5. Create fact and dimension tables in all the physical data warehouses - both loader and reader.
|
||||
6. Insert entries in the mapping DW-Table in the Job Manager SQL Database.
|
|
@ -0,0 +1,412 @@
|
|||
# Configuring Data Ingestion
|
||||
This page provides the steps to configure data ingestion in the Enterprise Reporting and BI TRI.
|
||||
|
||||
Once the TRI is deployed, these are your two options to ingest your ETL-processed data into the system:
|
||||
1. Modify the code provided in the TRI to ingest your data
|
||||
2. Integrate your existing pipeline and store into the TRI
|
||||
|
||||
## Modify the Data Generator code
|
||||
|
||||
The TRI deploys a dedicated VM for data generation, with a Powershell script placed in the VM. This script gets called by the Job Manager at a regular cadence (that is configurable). You can modify this script as follows;
|
||||
|
||||
1. **Install the VPN client:** This has multiple steps:
|
||||
- Confirm that your client machine has the two certificates installed for VPN connectivity to the VM (see [prerequisites](https://msdata.visualstudio.com/AlgorithmsAndDataScience/TRIEAD/_git/CIPatterns?_a=preview&path=%2Fdoc%2FPrerequisites.md)).
|
||||
- Login to http://portal.azure.com, and find the Resource Group that corresponds to the VNet setup. Pick the **Virtual Network** resource, and then the **Virtual Network Gateway** in that resource.
|
||||
- Click on **Point-to-site configuration**, and **Download the VPN client** to the client machine.
|
||||
- Install the 64-bit (Amd64) or 32-bit (x86) version based on your Windows operating system. The modal dialog that pops up after you launch the application may show up with a single **Don't run** button. Click on **More**, and choose **Run anyway**.
|
||||
- Finally, choose the relevant VPN connection from **Network & Internet Settings**. This should set you up for the next step.
|
||||
|
||||
2. **Get the IP address for data generator VM:** From the portal, open the resource group in which the TRI is deployed (this will be different than the VNET resource group), and look for a VM with the string 'dg' in its name.
|
||||
|
||||
**TODO - The Filter input does not search for substrings - so the user will have to provide the exact prefix of the name or scroll through the VMs. Suggest that we make this easier with a 'DataGenerator' in the string.**
|
||||
|
||||
Choose (i.e. click on) the VM, click on **Networking** tab for that specific VM, and find the private IP address that you can remote to.
|
||||
|
||||
3. **Connect to the VM**: Remote Desktop to the VM using the IP address with the admin account and password that you specified as part of the pre-deployment checklist.
|
||||
|
||||
**TODO - where can I find this information in the Azure Resource Group itself? This is relevant because the DevOps persona who deployed the TRI (using CIQS) may be different than the Developer who is trying to implement the data load (who knows nothing about CIQS).**
|
||||
|
||||
4. **Confirm that prerequisites are installed in the VM** - Install **AzCopy** - if it is not already present in the VM (see [here](https://azure.microsoft.com/en-us/blog/azcopy-5-1-release/)). Confirm that GenData.exe
|
||||
|
||||
5. **Modify the code as per your requirements and run it:** The PowerShell script ``GenerateAndUploadDataData.ps1`` is located in the VM at ``C:\EDW\datagen-artifacts``.
|
||||
|
||||
**TODO - Replace EDW\datagen-artifacts with C:\Enterprise_BI_and_Reporting_TRI\DataGenerator**
|
||||
|
||||
**TODO - Explain the OData configuration for any random client to use this.**
|
||||
|
||||
**TODO - Does the code below follow https://github.com/PoshCode/PowerShellPracticeAndStyle#table-of-contents It is not a P0 that it should, but customers would expect that from a Microsoft product.**
|
||||
|
||||
### TODO - CODE REVIEW COMMENTS BELOW
|
||||
1. Parameters - we may want to unify the terminology with the pre-deployment questionnaire. There are like certificate thumbprint, AAD domain for control server authentication, AAD app uri etc. We need to use the same terminology for these paramaters that we use in the Pre-Deployment questionnaire - otherwise, the user will misconstrue these to be new prerequisites.
|
||||
2. We have to agree to one term - 'Control Server' or Job Manager. Job Manager is plastered all over our architecture diagrams - so if that is what we want to call it, I will change the diagrams. Let us be consistent - **Tyler**, let me know.
|
||||
2. $DATA_DIR - rename it to be something generic, as in 'DataFileLocation'
|
||||
3. Is it GenData.exe or GetData.exe?
|
||||
4. Rename $DATA_SLICE to $DATA_SLICE_IN_HOURS
|
||||
5. Why do we truncate the timestamp in $OUT_DATE_FORMAT, why not have yyyy-mm-dd-HH:mm:ss.
|
||||
6. $OUT_DIR to $OUTPUT_DIRECTORY
|
||||
7. $OUT_DATE_FORMAT to $OUTPUT_DATE_FORMAT
|
||||
8. $GEN_EXE to $DATAGEN_EXE
|
||||
9. I moved some initialization code to AFTER the functions - if the functions break because of positional dependency, those initialization labels should ideally be parameterized.
|
||||
10. Need some more clarity on the "Post processing workflow" section on the filename formats.
|
||||
|
||||
```Powershell
|
||||
|
||||
# ------------------------ Parameters ---------------------------------------------------------
|
||||
Param(
|
||||
[parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Control Server Uri, example: http://localhost:33009")]
|
||||
[string]$ControlServerUri,
|
||||
[parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Certificate thumbprint for control server authentication.")]
|
||||
[string]$CertThumbprint,
|
||||
[parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="AAD domain for control server authentication.")]
|
||||
[string]$AADTenantDomain,
|
||||
[parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="Control server AAD app uri for control server authentication.")]
|
||||
[string]$ControlServerIdentifierUris,
|
||||
[parameter(ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, Mandatory=$true, HelpMessage="AAD application to use for control server authentication.")]
|
||||
[string]$AADApplicationId
|
||||
)
|
||||
|
||||
# ------------------------- Functions ----------------------------------------------------------
|
||||
# Generate random data for AdventureWorks Data Warehouse
|
||||
|
||||
Function Generate-Data
|
||||
{
|
||||
$DATA_DIR='Adventure Works 2014 Warehouse Data' # Directory where data files are located
|
||||
$FILE_LIST='files.txt' # List of files to read to generate random data
|
||||
$GEN_EXE='bin\GetData.exe' # GenData.exe location
|
||||
$DATA_SLICE=3 # Data slice period in hours
|
||||
$DATE_FORMAT='yyyy-MM-dd H:mm:ss' # Date format to be used in data
|
||||
$OUT_DATE_FORMAT='yyyyMMddH' # Date format of the output directory
|
||||
$OUT_DIR='out' # Name of the output directory
|
||||
|
||||
echo "Running data generation" >> $LOG_FILE
|
||||
|
||||
# Get the current date, round it as ADF slices accept and calculate slice dates
|
||||
$curDate = Get-Date
|
||||
$roundDiffHour = ([int]($curDate.Hour/$DATA_SLICE))*$DATA_SLICE
|
||||
if($roundDiffHour -ne 0)
|
||||
{
|
||||
$roundDiffHour = $curDate.Hour-$roundDiffHour
|
||||
}
|
||||
else
|
||||
{
|
||||
$roundDiffHour = $curDate.Hour
|
||||
}
|
||||
$curDate = $curDate.AddHours(-$roundDiffHour)
|
||||
$prevDate = $curDate.AddHours(-$DATA_SLICE)
|
||||
$curDateStr = Get-Date $curDate -format $DATE_FORMAT
|
||||
$prevDateStr = Get-Date $prevDate -format $DATE_FORMAT
|
||||
|
||||
# Date to be used for output directory
|
||||
$curDateOutStr = Get-Date $curDate -format $OUT_DATE_FORMAT
|
||||
$prevDateOutStr = Get-Date $prevDate -format $OUT_DATE_FORMAT
|
||||
$outputDir="$OUT_DIR\$prevDateOutStr" + "_" + $curDateOutStr
|
||||
|
||||
# Create output directory
|
||||
New-Item $outputDir -type directory | Out-Null
|
||||
echo "Created dir $outputDir" >> $LOG_FILE
|
||||
|
||||
$jobHash = @{}
|
||||
|
||||
# Read the files list
|
||||
$files = Import-Csv $DATA_DIR\$FILE_LIST
|
||||
$processes = @()
|
||||
foreach ($file in $files)
|
||||
{
|
||||
# For each listed file generate random data
|
||||
$fileName = $file.FileName
|
||||
$sizeRequired = $file.SizeRequiredinMB
|
||||
|
||||
echo "Processing file $fileName" >> $LOG_FILE
|
||||
|
||||
$stdOutLog = [System.IO.Path]::GetTempFileName()
|
||||
$process = (`
|
||||
Start-Process `
|
||||
-FilePath "$directorypath\$GEN_EXE" `
|
||||
-ArgumentList "`"$directorypath\$DATA_DIR\$fileName`" `"$directorypath\$outputDir\$fileName`" `"$prevDateStr`" $sizeRequired" `
|
||||
-PassThru `
|
||||
-RedirectStandardOutput $stdOutLog)
|
||||
|
||||
$processes += @{Process=$process; LogFile=$stdOutLog}
|
||||
}
|
||||
# Aggregate the data into an output file
|
||||
foreach($process in $processes)
|
||||
{
|
||||
$process.Process.WaitForExit()
|
||||
Get-Content $process.LogFile | Out-File $LOG_FILE -Append
|
||||
Remove-Item $process.LogFile -ErrorAction Ignore
|
||||
}
|
||||
}
|
||||
|
||||
Function Archive-File
|
||||
(
|
||||
[Parameter(Mandatory=$true)]
|
||||
[string]$sourcefileToArchive,
|
||||
|
||||
[Parameter(Mandatory=$true)]
|
||||
[string]$archivalTargetFile
|
||||
)
|
||||
{
|
||||
# Move the processed data from source to the archival folder
|
||||
try
|
||||
{
|
||||
$sourceFolder = [io.path]::GetDirectoryName($sourcefileToArchive)
|
||||
$archivalTargetFolder = [io.path]::GetDirectoryName($archivalTargetFile)
|
||||
If (Test-Path $sourceFolder)
|
||||
{
|
||||
If(!(Test-Path $archivalTargetFolder))
|
||||
{
|
||||
New-Item -ItemType directory -Path $archivalTargetFolder
|
||||
}
|
||||
|
||||
Move-Item -Path $sourcefileToArchive -Destination $archivalTargetFile -Force
|
||||
}
|
||||
}
|
||||
catch
|
||||
{
|
||||
echo "Error moving $sourceFolder to archive file $archivalTargetFolder" >> $LOG_FILE
|
||||
echo $error >> $LOG_FILE
|
||||
}
|
||||
}
|
||||
|
||||
Function GetAccessTokenClientCertBased([string] $certThumbprint, [string] $tenant, [string] $resource, [string] $clientId)
|
||||
{
|
||||
[System.Reflection.Assembly]::LoadFile("$PSScriptRoot\Microsoft.IdentityModel.Clients.ActiveDirectory.dll") | Out-Null # adal
|
||||
[System.Reflection.Assembly]::LoadFile("$PSScriptRoot\Microsoft.IdentityModel.Clients.ActiveDirectory.Platform.dll") | Out-Null # adal
|
||||
|
||||
$cert = Get-childitem Cert:\LocalMachine\My | where {$_.Thumbprint -eq $certThumbprint}
|
||||
|
||||
$authContext = new-object Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext("https://login.windows.net/$tenant")
|
||||
|
||||
$assertioncert = new-object Microsoft.IdentityModel.Clients.ActiveDirectory.ClientAssertionCertificate($clientId, $cert)
|
||||
$result = $authContext.AcquireToken($resource, $assertioncert)
|
||||
|
||||
$authHeader = @{
|
||||
'Content-Type' = 'application\json'
|
||||
'Authorization' = $result.CreateAuthorizationHeader()
|
||||
}
|
||||
|
||||
return $authHeader
|
||||
}
|
||||
|
||||
# ----------------------- Initialization ------------------------------------------------------
|
||||
$invocation = (Get-Variable MyInvocation).Value
|
||||
$directorypath = Split-Path $invocation.MyCommand.Path
|
||||
|
||||
# Create logs folder if it does not already exist
|
||||
New-Item -ItemType Directory -Force -Path "$directorypath\logs"
|
||||
|
||||
# Log file name
|
||||
$LOG_FILE="logs\createdwtableavailabilityranges.$(get-date -Format yyyy-MM-ddTHH.mm.ss)-log.txt"
|
||||
|
||||
# Obtain bearer authentication header
|
||||
$authenticationHeader = GetAccessTokenClientCertBased -certThumbprint $CertThumbprint `
|
||||
-tenant $AADTenantDomain `
|
||||
-resource $ControlServerIdentifierUris `
|
||||
-clientId $AADApplicationId
|
||||
|
||||
# Set the working directory
|
||||
$invocation = (Get-Variable MyInvocation).Value
|
||||
$directorypath = Split-Path $invocation.MyCommand.Path
|
||||
Set-Location $directorypath
|
||||
|
||||
# On-prem data file location
|
||||
$source = "\\localhost\generated_data"
|
||||
|
||||
# On-prem source file archival location
|
||||
$archivalSource = "\\localhost\archive"
|
||||
|
||||
# The blob container to upload the datasets to
|
||||
$currentStorageSASURI = ''
|
||||
|
||||
# AzCopy path. AzCopy must be installed. Update path if installed in non-default location
|
||||
$azCopyPath = "C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe"
|
||||
|
||||
# Control server URI to fetch the storage details for uploading
|
||||
# Fetch only the current storage
|
||||
$getCurrentStorageAccountURI = $ControlServerUri + '/odata/StorageAccounts?$filter=IsCurrent%20eq%20true'
|
||||
|
||||
# DWTableAvailabilityRanges endpoint
|
||||
$dwTableAvailabilityRangesURI = $ControlServerUri + '/odata/DWTableAvailabilityRanges'
|
||||
|
||||
# Data contract for DWTableAvailabilityRanges' request body
|
||||
$dwTableAvailabilityRangeContract = @{
|
||||
DWTableName=""
|
||||
StorageAccountName=""
|
||||
ColumnDelimiter="|"
|
||||
FileUri=""
|
||||
StartDate=""
|
||||
EndDate=""
|
||||
}
|
||||
|
||||
# --------------------- Loading to Blob ------------------------------------------------------
|
||||
|
||||
# Generate random data for AdventureWorks DW
|
||||
Generate-Data
|
||||
|
||||
# Invoke the Control Server to fetch the latest blob container to upload the files to
|
||||
try
|
||||
{
|
||||
$response = Invoke-RestMethod -Uri $getCurrentStorageAccountURI -Method Get -Headers $authenticationHeader
|
||||
if($response -and $response.value -and $response.value.SASToken){
|
||||
$currentStorageSASURI = $response.value.SASToken
|
||||
$storageAccountName = $response.value.Name
|
||||
echo "Current storage location - $currentStorageSASURI" >> $LOG_FILE
|
||||
} else{
|
||||
$errMessage = "Could not find SAS token in the response from Control Server." + $response.ToString()
|
||||
echo $errMessage >> $LOG_FILE
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
catch
|
||||
{
|
||||
echo "Error fetching current storage account from Control Server" >> $LOG_FILE
|
||||
echo $error >> $LOG_FILE
|
||||
exit 2
|
||||
}
|
||||
|
||||
# Create a custom AzCopy log file stamped with current timestamp
|
||||
# IMPORTANT: Creation of DWTableAvailabilityRanges entry depends on this step.
|
||||
$azCopyLogFileName = [io.path]::combine($Env:TEMP,
|
||||
-join("AzCopy-", $((get-date).ToUniversalTime()).ToString("yyyyMMddThhmmssZ"), '.log'))
|
||||
|
||||
If (Test-Path $azCopyLogFileName)
|
||||
{
|
||||
Remove-Item $azCopyLogFileName
|
||||
echo "Deleted existing AzCopy log file $azCopyLogFileName" >> $LOG_FILE
|
||||
}
|
||||
|
||||
# Create empty log file in the same location
|
||||
$azCopyLogFile = New-Item $azCopyLogFileName -ItemType file
|
||||
|
||||
# Execute AzCopy to upload files.
|
||||
echo "Begin uploading data files to storage location using azcopy log at $azCopyLogFile" >> $LOG_FILE
|
||||
|
||||
& "$azCopyPath" /source:""$source"" /Dest:""$currentStorageSASURI"" /S /Y /Z /V:""$azCopyLogFile""
|
||||
|
||||
echo "Completed uploading data files to storage location" >> $LOG_FILE
|
||||
|
||||
# ----------- Post-upload logic to set DWTableAvailabilityRanges -----------------------------
|
||||
#
|
||||
# Why do we need this: AzCopy outputs a log file. We have to read this log file to figure out
|
||||
# if upload succeeded or not (regardless of whether this is for a single file or not).
|
||||
# We let the upload to be optimized by simply giving AzCopy the root share
|
||||
#
|
||||
# Sample AzCopy log content shown below:
|
||||
#
|
||||
#[2017/02/09 23:06:36.816+00:00][VERBOSE] Finished transfer: \\localhost\generated_data\2017020616_2017020619\FactCallCenter.csv => https://datadroptxjoynbi.blob.core.windows.net/data/2017020616_2017020619/FactCallCenter.csv
|
||||
#[2017/02/09 23:06:37.438+00:00][VERBOSE] Start transfer: \\localhost\generated_data\201702097_2017020910\FactSalesQuota.csv => https://datadroptxjoynbi.blob.core.windows.net/data/201702097_2017020910/FactSalesQuota.csv
|
||||
#[2017/02/09 23:06:43.623+00:00][VERBOSE] Finished transfer: \\localhost\generated_data\2017020613_2017020616\FactSalesQuota.csv => https://datadroptxjoynbi.blob.core.windows.net/data/2017020613_2017020616/FactSalesQuota.csv
|
||||
#[2017/02/09 23:06:46.078+00:00][VERBOSE] Start transfer: \\localhost\generated_data\201702097_2017020910\FactSurveyResponse.csv => https://datadroptxjoynbi.blob.core.windows.net/data/201702097_2017020910/FactSurveyResponse.csv
|
||||
#
|
||||
# Post-processing workflow
|
||||
# ------------------------
|
||||
# 1. Inspect AzCopy log file to find all files that have finished transfer successfully,
|
||||
# from ('Finished transfer:') entries.
|
||||
# 2. For each line in the file, extract the upload URI
|
||||
# 3. Construct all the required file segments for creating DWTableAvailabilityRanges
|
||||
# Last segment - File name of format - <dwTableName>.csv,
|
||||
# Last but one segment - Folder name of format - <startdate>_<enddate>
|
||||
# 4. Reformat/reconstruct the JSON body for the DWTableAvailabilityRanges using these values
|
||||
# 5. Once DWTableAvailabilityRanges is created, move the files out to an archival share
|
||||
# (where they can be further processed or deleted)
|
||||
# ---------------------------------------------------------------------------------------------
|
||||
|
||||
$transferredFiles = select-string -Path $azCopyLogFile -Pattern '\b.*Finished transfer:\s*([^\b]*)' -AllMatches | % { $_.Matches } | % { $_.Value }
|
||||
foreach($file in $transferredFiles)
|
||||
{
|
||||
echo "Begin publish to Control Server for - $file" >> $LOG_FILE
|
||||
|
||||
# Extract url
|
||||
$successFileUri = $file | %{ [Regex]::Matches($_, "(http[s]?|[s]?ftp[s]?)(:\/\/)([^\s,]+)") } | %{ $_.Value }
|
||||
|
||||
# URI of successfully uploaded blob
|
||||
$uri = New-Object -type System.Uri -argumentlist $successFileUri
|
||||
|
||||
# Extract segments - filename
|
||||
$fileNameSegment = $uri.Segments[$uri.Segments.Length-1].ToString()
|
||||
$dwTableName = [io.path]::GetFileNameWithoutExtension($fileNameSegment) # Assumes file name is table name.<format>
|
||||
|
||||
# Extract segments - start & end date
|
||||
$startEndDateSegment = $uri.Segments[$uri.Segments.Length-2].ToString() -replace ".$"
|
||||
|
||||
# **************************************************************************************
|
||||
# Date fields need special formatting.
|
||||
#
|
||||
# 1. Folder name has date of format yyyyMMHH or yyyyMMH
|
||||
# 2. Convert string to a valid DateTime based on #1
|
||||
# 3. Reconvert to OData supported DateTimeOffset format string using the
|
||||
# 's' and 'zzz' formatter options
|
||||
# **************************************************************************************
|
||||
# Start date
|
||||
$startDateStr = $startEndDateSegment.Split('_')[0]
|
||||
[datetime]$startDate = New-Object DateTime
|
||||
if(![DateTime]::TryParseExact($startDateStr, "yyyyMMddHH", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$startDate))
|
||||
{
|
||||
[DateTime]::TryParseExact($startDateStr, "yyyyMMddH", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$startDate)
|
||||
}
|
||||
$startDateFormatted = $startDate.ToString("s") + $startDate.ToString("zzz")
|
||||
|
||||
# End date
|
||||
$endDateStr = $startEndDateSegment.Split('_')[1]
|
||||
[datetime]$endDate = New-Object DateTime
|
||||
if(![DateTime]::TryParseExact($endDateStr, "yyyyMMddHH", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$endDate)){
|
||||
[DateTime]::TryParseExact($endDateStr, "yyyyMMddH", [System.Globalization.CultureInfo]::InvariantCulture, [System.Globalization.DateTimeStyles]::AdjustToUniversal, [ref]$endDate)
|
||||
}
|
||||
$endDateFormatted = $endDate.ToString("s") + $endDate.ToString("zzz")
|
||||
|
||||
#Construct DWTableAvailabilityRange request body
|
||||
$dwTableAvailabilityRangeContract['DWTableName'] = $dwTableName
|
||||
$dwTableAvailabilityRangeContract['FileUri'] = $successFileUri.ToString()
|
||||
$dwTableAvailabilityRangeContract['StorageAccountName'] = $storageAccountName
|
||||
$dwTableAvailabilityRangeContract['StartDate'] = $startDateFormatted
|
||||
$dwTableAvailabilityRangeContract['EndDate'] = $endDateFormatted
|
||||
|
||||
$dwTableAvailabilityRangeJSONBody = $dwTableAvailabilityRangeContract | ConvertTo-Json
|
||||
|
||||
# Create DWTableAvailabilityRanges entry for the current file
|
||||
try
|
||||
{
|
||||
echo "Begin DWTableAvailabilityRanges creation for file - $fileNameSegment with body $dwTableAvailabilityRangeJSONBody" >> $LOG_FILE
|
||||
$response = Invoke-RestMethod $dwTableAvailabilityRangesURI -Method Post -Body $dwTableAvailabilityRangeJSONBody -ContentType 'application/json' -Headers $authenticationHeader
|
||||
echo "DWTableAvailabilityRanges creation successful" >> $LOG_FILE
|
||||
}
|
||||
catch
|
||||
{
|
||||
echo "Error creating DWTableAvailabilityRanges on Control Server" >> $LOG_FILE
|
||||
echo $error >> $LOG_FILE
|
||||
exit 1
|
||||
}
|
||||
|
||||
$sourcefileToArchive = [io.path]::Combine($source, $startEndDateSegment, $fileNameSegment)
|
||||
$archivalTargetFolder = [io.path]::Combine($archivalSource, $startEndDateSegment)
|
||||
$archivalTargetFile = [io.path]::Combine($archivalTargetFolder, $fileNameSegment)
|
||||
Archive-File `
|
||||
-sourcefileToArchive $sourcefileToArchive `
|
||||
-archivalTargetFile $archivalTargetFile
|
||||
|
||||
echo "Completed publish to Control Server for - $file" >> $LOG_FILE
|
||||
|
||||
}
|
||||
|
||||
echo "Completed publish to Control Server for all files" >> $LOG_FILE
|
||||
|
||||
# Cleanup: Remove any empty folders from the source share
|
||||
Get-ChildItem -Path $source -Recurse `
|
||||
| Where { $($_.Attributes) -match "Directory" -and $_.GetFiles().Count -eq 0 } `
|
||||
| Foreach { Remove-Item $_.FullName -Recurse -Force }
|
||||
|
||||
# Cleanup: Delete the AzCopy log file if the processing went through successfully.
|
||||
# IMPORTANT: Undeleted log files will therefore indicate an issue in post-processing,
|
||||
# so they can be reprocessed if need be.
|
||||
Remove-Item $azCopyLogFileName
|
||||
|
||||
exit 0
|
||||
```
|
||||
|
||||
## Data file nomenclature
|
||||
The data files that are generated in Azure BLOBs for loading into the SQL DW should have the following, recommended, file names.
|
||||
Data File: <startdatetime>-<enddatetime>-<schema>.<tablename>.data.orc
|
||||
Audit file: <startdatetime>-<enddatetime>-<schema>.<tablename>.data.audit.json
|
||||
|
||||
This helps provide sufficient information to determine the intent of the file should it appear outside of the expected system paths. The purpose of the audit file is to contain the rowcount, start/end date, filesize and checksum. Audit files must appear next to their data files in the same working directory always. Orphaned data or audit files should not be loaded.
|
||||
|
|
@ -0,0 +1,225 @@
|
|||
# Analysis Services for Interactive BI
|
||||
|
||||
The TRI helps you operationalize and manage tabular models in Analysis Services for interactive BI. The read-only AS servers are configured to handle interactive BI query load from client connections via a front end load balancer. Analysis Services, tabular models, and their characteristics are explained [here]((https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas))
|
||||
|
||||
The SSAS Model Cache in this TRI consists of six components - their roles described in the [architectural overview] (..\CiqsInstaller\CiqsInstaller\core\HomePage.md)
|
||||
- Tabular Models
|
||||
- SSAS Partition Builder
|
||||
- SSAS Read Only Cache servers
|
||||
- Job Manager that coordinates the tabular model refresh
|
||||
- Azure Blob that stores the tabular models for refresh
|
||||
- Load Balancers that handles client connections
|
||||
|
||||
![SSAS Tabular Models Cache](../img/SSAS-Model-Cache.png)
|
||||
|
||||
## Tabular Model Creation
|
||||
|
||||
Tabular models are Analysis Services databases that run in-memory, or act a pass-through for backend data sources. They support cached summaries and drilldowns of large amounts of data, thanks to a columnar storage that offers 10x or more data compression. This makes them ideal for interactive BI applications. See [this article](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-models-ssas) for more details. Typically, tabular models hold only a subset of the big data held in upstream data warehouse or data marts – in terms of the number of entities, and data size. There is a large corpus of best practices information for tabular model design and tuning, including this [excellent article](https://msdn.microsoft.com/en-us/library/dn751533.aspx) on the lifecycle of an enterprise grade tabular model.
|
||||
|
||||
You can use tools such as the [SSDT tabular model designer](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-designer-ssas) available with Visual Studio 2015 (or greater) to create your tabular models. Set the compatibility level of the tabular models at 1200 or higher (latest is 1400 as of this writing) and the query mode to In-Memory. See [here](https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/tabular-model-solutions-ssas-tabular) for details on tabular model creation.
|
||||
|
||||
## Tabular Model Partition Processing
|
||||
|
||||
Partition creation is automated using the open source [AsPartitionProcessing tool](https://github.com/Microsoft/Analysis-Services/tree/master/AsPartitionProcessing). Many of the configurable options for partition building directly correspond to the configuration of this tool. Refer to AsPartitionProcessing tool's [whitepaper](https://github.com/Microsoft/Analysis-Services/blob/master/AsPartitionProcessing/Automated%20Partition%20Management%20for%20Analysis%20Services%20Tabular%20Models.pdf) for further documentation.
|
||||
|
||||
## Tabular model configuration for continuous incremental refresh at scale
|
||||
The various orchestration components of the TRI refer to four configuration tables to enable continuous and incremental model refresh.
|
||||
|
||||
You can provide configuration inputs for two of these tables:
|
||||
|
||||
| TableName | Description |
|
||||
|:----------|:------------|
|
||||
|**TabularModel**|Lists tabular models with their server and database names, to be used by the daemon on the Partition Builder servers to connect to the SSAS server and the database for refresh.|
|
||||
|**TabularModelTablePartitions**|This table specifies which model that a tabular model table is part of, and the source (DW) table to which a tabular model table is bound to. It also provides the column that will be used in refreshing the tabular model, and the lower and upper bounds of the data held in this tabular model. It also defines the strategy for processing the SSAS partitions.|
|
||||
|
||||
### TabularModel
|
||||
Provide the unique <_server, database_> pairs in the table. This information uniquely identifies each tabular model for the Job Manager, and in turn, will be used by the daemon on the Partition Builder nodes to connect to the SSAS server and the database before refresh.
|
||||
|
||||
_Example_:
|
||||
|
||||
```json
|
||||
{
|
||||
"AnalysisServicesServer":"ssaspbvm00",
|
||||
"AnalysisServicesDatabase":"AdventureWorks",
|
||||
"IntegratedAuth":true,
|
||||
"MaxParallelism":4,
|
||||
"CommitTimeout":-1
|
||||
}
|
||||
```
|
||||
|
||||
* **AnalysisServicesServer** : SSAS VM name or Azure AS URL.
|
||||
* **AnalysisServicesDatabase** : Name of the database.
|
||||
* **IntegratedAuth** : Boolean flag whether connection to DW to be made using integrated authentication or SQL authentication.
|
||||
* **MaxParallelism** : Maximum number of threads on which to run processing commands in parallel during partition building.
|
||||
* **CommitTimeout** : Cancels processing (after specified time in seconds) if write locks cannot be obtained. -1 will use the server default.
|
||||
|
||||
### TabularModelTablePartitions
|
||||
|
||||
_Example_:
|
||||
```json
|
||||
{
|
||||
"AnalysisServicesTable":"FactResellerSales",
|
||||
"SourceTableName":"[dbo].[FactResellerSales]",
|
||||
"SourcePartitionColumn":"OrderDate",
|
||||
"TabularModel_FK":1,
|
||||
"DWTable_FK":"dbo.FactResellerSales",
|
||||
"DefaultPartition":"FactResellerSales",
|
||||
"ProcessStrategy":"ProcessDefaultPartition",
|
||||
"MaxDate":"2156-01-01T00:00:00Z",
|
||||
"LowerBoundary":"2010-01-01T00:00:00Z",
|
||||
"UpperBoundary":"2011-12-31T23:59:59Z",
|
||||
"Granularity":"Daily",
|
||||
"NumberOfPartitionsFull":0,
|
||||
"NumberOfPartitionsForIncrementalProcess":0
|
||||
}
|
||||
```
|
||||
|
||||
* **AnalysisServicesTable** : The table to be partitioned.
|
||||
* **SourceTableName** : The source table in the DW database.
|
||||
* **SourcePartitionColumn** : The source column of the source table.
|
||||
* **TabularModel_FK** : Foreign Key reference to the TabularModel.
|
||||
* **DWTable_FK** : Foreign Key reference to the DWTable.
|
||||
* **DefaultPartition** :
|
||||
* **MaxDate** : The maximum date that needs to be accounted for in the partitioning configuration.
|
||||
* **LowerBoundary** : The lower boundary of the partition date range.
|
||||
* **UpperBoundary** : The upper boundary of the partition date range.
|
||||
* **ProcessStrategy** : Strategy used for processing the partition; "RollingWindow" or "ProcessDefaultPartition". The default partition can be specified using the "DefaultPartition" property. Otherwise assumes a "template" partition with same name as table is present.
|
||||
* **Granularity** : Partition granularity of "Daily", "Monthly", or "Yearly".
|
||||
* **NumberOfPartitionsFull** : Count of all partitions in the rolling window. For example, a rolling window of 10 years partitioned by month would require 120 partitions.
|
||||
* **NumberOfPartitionsForIncrementalProcess** : Count of hot partitions where the data can change. For example, it may be necessary to refresh the most recent 3 months of data every day. This only applies to the most recent partitions.
|
||||
|
||||
|
||||
Provide one of the following values for partitioning strategy in the column _PartitionStrategy_:
|
||||
|
||||
- _ModelProcessStrategy.ProcessDefaultPartition_ (Default)
|
||||
- _ModelProcessStrategy.RollingWindow_
|
||||
|
||||
If you choose _ModelProcessStrategy.ProcessDefaultPartition_:
|
||||
|
||||
- Confirm that the tabular model contains a partition with the same name as the tabular model table. This partition is always used for the data load of the date slice, even if there are other partitions in the table.
|
||||
- Provide values for _SourceTableName_ and _SourcePartitionColumn_.
|
||||
- Provide values for _LowerBoundary, UpperBoundary_ to provide the start and end time span for the data in the tabular model.
|
||||
|
||||
If you choose _ModelProcessingStrategy.RollingWindow_:
|
||||
- Confirm that the table partitions are defined on time granularity based ranges - as in Daily, Monthly or Yearly.
|
||||
- Provide values for the columns _MaxDate_, _NumberOfPartitionsFull_ and _NumberOfPartitionsforIncrementalProcess_.
|
||||
|
||||
For both the cases, confirm that the value provided in _SourcePartitionColumn_ of the table represents a column of type DateTime in the source DW table. Tabular model refresh is incremental on time, for a specific date range.
|
||||
|
||||
Next, the following two tables are **read-only**. You should **not** change or update these tables or its values, but you can view them for troubleshooting and/or understanding how the model refresh happens.
|
||||
|
||||
| TableName | Description |
|
||||
|:----------|:------------|
|
||||
|**TabularModelPartitionStates**|In this table, the Job Manager tracks the source and target context for all data slices that are to be refreshed or processed in a tabular model, the start and end dates of each data slice, and the Blob URI where the processed tabular model backups will be stored.|
|
||||
|**TabularModelNodeAssignments**|In this table, the partition builder tracks the refresh state of each AS Read-Only node for each tabular model. It is used to indicate the maximum date for an entity for which the data has been processed. Each of the SSAS Read-Only nodes provides its current state here - in terms of latest data by date for every entity.
|
||||
|
||||
### TabularModelPartitionStates
|
||||
|
||||
This table helps track all the data slices that are to be refreshed or processed in a tabular model. Each piece of incremental data loaded into the DW is specified with a start and end date, defining the data slice. Each new data slice will trigger a new partition to be built.
|
||||
|
||||
---
|
||||
_Example_:
|
||||
```json
|
||||
{
|
||||
"ProcessStatus":"Purged",
|
||||
"TabularModelTablePartition_FK":4,
|
||||
"StartDate":"2017-09-12T18:00:00Z",
|
||||
"EndDate":"2017-09-12T21:00:00Z",
|
||||
"PartitionUri":"https://edw.blob.core.windows.net/data/AdventureWorks-backup-20170912T090408Z.abf",
|
||||
"ArchiveUri":"https://edw.blob.core.windows.net/data",
|
||||
"SourceContext":"{\"Name\":\"AzureDW\",\"Description\":\"Data source connection Azure SQL DW\",\"DataSource\":\"bd044-pdw01-ldw01.database.windows.net\",\"InitialCatalog\":\"dw\",\"ConnectionUserName\":\"username\",\"ConnectionUserPassword\":\"password\",\"ImpersonationMode\":\"ImpersonateServiceAccount\"}",
|
||||
"TargetContext":"{\"ModelConfigurationID\":1,\"AnalysisServicesServer\":\"ssaspbvm00\",\"AnalysisServicesDatabase\":\"AdventureWorks\",\"ProcessStrategy\":1,\"IntegratedAuth\":true,\"MaxParallelism\":4,\"CommitTimeout\":-1,\"InitialSetup\":false,\"IncrementalOnline\":true,\"TableConfigurations\":[{\"TableConfigurationID\":1,\"AnalysisServicesTable\":\"FactSalesQuota\",\"PartitioningConfigurations\":[{\"DWTable\":null,\"AnalysisServicesTable\":\"FactSalesQuota\",\"Granularity\":0,\"NumberOfPartitionsFull\":0,\"NumberOfPartitionsForIncrementalProcess\":0,\"MaxDate\":\"2156-01-01T00:00:00\",\"LowerBoundary\":\"2017-09-12T18:00:00\",\"UpperBoundary\":\"2017-09-12T21:00:00\",\"SourceTableName\":\"[dbo].[FactSalesQuota]\",\"SourcePartitionColumn\":\"Date\",\"TabularModel_FK\":1,\"DWTable_FK\":\"dbo.FactSalesQuota\",\"DefaultPartition\":\"FactSalesQuota\",\"Id\":4,\"CreationDate\":\"2017-09-12T17:17:28.8225494\",\"CreatedBy\":null,\"LastUpdatedDate\":\"2017-09-12T17:17:28.8225494\",\"LastUpdatedBy\":null}],\"DefaultPartitionName\":\"FactSalesQuota\"}]}"
|
||||
}
|
||||
```
|
||||
* **ProcessStatus**: Current status of the partition being processed. _Queued_, _Dequeued_, _Ready_, or _Purged_.
|
||||
* **TabularModelTablePartition_FK**: Foreign key referencing the TabularModelTablePartition.
|
||||
* **StartDate**: The start date of the current refresh data slice.
|
||||
* **EndDate**: The end date of the current refresh data slice.
|
||||
* **PartitionUri**: Uri of the resulting partitioned backup datafile.
|
||||
* **ArchiveUri**: Uri of the location to place the partitioned backup datafile.
|
||||
* **SourceContext**: JSON Object specifying the source DW connection information.
|
||||
* **TargetContract**: JSON Object that maps to the AsPartitionProcessing client's "ModelConfiguration" contract.
|
||||
|
||||
Each row of this table represents a partition that should be built and the configuration that will be used to execute the partition builder client. Various components in the Control server can create a “work item” for the partition builder which uses the information in the attributes to process that data slice. It is evident from the fields in this table that all work items exist in the context of a _TabularModelTablePartition_ entity.
|
||||
|
||||
Each row contains the start and end date of the data slice. Each entity to be partitioned clearly defines a StartDate and EndDate for the date slice to be processed. Note that this date range is produced by producer of this entity. In a typical case, this represents the date range for which a tabular model needs to be refreshed – where the range is simply the date range of the data slice in a _DWTableAvailabilityRange_ entity.
|
||||
|
||||
#### Source context - DataSourceInfo sample contract object
|
||||
Source context – indicates which data source to connect to fetch the data for the slice (i.e. all information required to connect to a DW table), and contains serialized data source (DW) connection information that the tabular model uses to update or set its connection string dynamically. This data contract directly maps to the partition processing client’s DataSourceInfo contract.
|
||||
|
||||
```json
|
||||
{
|
||||
"Name":"AzureDW",
|
||||
"Description":"Data source connection Azure SQL DW",
|
||||
"DataSource":"pdw01-ldw01.database.windows.net",
|
||||
"InitialCatalog":"dw",
|
||||
"ConnectionUserName":"username",
|
||||
"ConnectionUserPassword":"password",
|
||||
"ImpersonationMode":"ImpersonateServiceAccount"
|
||||
}
|
||||
```
|
||||
#### Target context - ModelConfiguration sample contract object
|
||||
The TargetContext is a serialized representation of the ModelConfiguration contract that the partition processing client expects. It is a simplified representation of the tables and partitions that are to be processed by the client. A sample contract object is like so:
|
||||
|
||||
```json
|
||||
{
|
||||
"ModelConfigurationID":1,
|
||||
"AnalysisServicesServer":"ssaspbvm00",
|
||||
"AnalysisServicesDatabase":"AdventureWorks",
|
||||
"ProcessStrategy":1,
|
||||
"IntegratedAuth":true,
|
||||
"MaxParallelism":4,
|
||||
"CommitTimeout":-1,
|
||||
"InitialSetup":false,
|
||||
"IncrementalOnline":true,
|
||||
"TableConfigurations":[
|
||||
{
|
||||
"TableConfigurationID":1,
|
||||
"AnalysisServicesTable":"FactSalesQuota",
|
||||
"PartitioningConfigurations":[
|
||||
{
|
||||
"DWTable":null,
|
||||
"AnalysisServicesTable":"FactSalesQuota",
|
||||
"Granularity":0,
|
||||
"NumberOfPartitionsFull":0,
|
||||
"NumberOfPartitionsForIncrementalProcess":0,
|
||||
"MaxDate":"2156-01-01T00:00:00",
|
||||
"LowerBoundary":"2017-09-12T18:00:00",
|
||||
"UpperBoundary":"2017-09-12T21:00:00",
|
||||
"SourceTableName":"[dbo].[FactSalesQuota]",
|
||||
"SourcePartitionColumn":"Date",
|
||||
"TabularModel_FK":1,
|
||||
"DWTable_FK":"dbo.FactSalesQuota",
|
||||
"DefaultPartition":"FactSalesQuota"
|
||||
}],
|
||||
"DefaultPartitionName":"FactSalesQuota"
|
||||
}]
|
||||
}
|
||||
```
|
||||
### TabularModelNodeAssignment
|
||||
This table contains entities that represents the refresh state of each of the supported tabular model tables per entity. The Analysis Server Read-Only nodes use this table to figure which backup file from the _TabularModelPartitionState_ entity to restore on each of the nodes. The Partition Builder node logs an entry that points to the maximum date ceiling for which data has been refreshed on a per tabular model table basis.
|
||||
|
||||
_example_:
|
||||
```json
|
||||
{
|
||||
"Name":"ssaspbvm00",
|
||||
"Type":"ASPB",
|
||||
"TabularModelTablePartition_FK":1,
|
||||
"State":"Building",
|
||||
"LatestPartitionDate":"2017-09-14T18:00:00Z"
|
||||
}
|
||||
```
|
||||
* **Name**: The name of the virtual machine node.
|
||||
* **Type**: The type of the virtual machine node.
|
||||
* _ASRO_: SSAS Read-only
|
||||
* _ASPB_: SSAS Partition Builder
|
||||
* **TabularModelTablePartition_FK**: Foreign key referencing the _TabularModelTablePartition_ table.
|
||||
* **State**: Current state of the node.
|
||||
* _Normal_
|
||||
* _Transition_
|
||||
* _Building_
|
||||
* **LatestPartitionDate**: Latest partition build date for the node.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -0,0 +1,73 @@
|
|||
# Configuring Reporting Services
|
||||
**SQL Server Reporting Services (SSRS)** is part of Microsoft SQL Server services - SSRS, SSAS and SSIS. It is a server-based report generating software system that can create, deploy and manage traditional and mobile ready paginated reports via a modern web portal.
|
||||
|
||||
> Read more about [SQL Server Reporting Service](https://en.wikipedia.org/wiki/SQL_Server_Reporting_Services) and find [documentation](https://docs.microsoft.com/en-us/sql/reporting-services/create-deploy-and-manage-mobile-and-paginated-reports) here.
|
||||
|
||||
# Table Of Contents
|
||||
1. [Connect to SSRS Web Portal](#connect-to-ssrs-web-portal)
|
||||
2. [Subscribing to Reports via Email](#subscribing-to-reports-via-email)
|
||||
|
||||
|
||||
### Connect to SSRS Web Portal
|
||||
Deploying the solution, provisions two SSRS virtual machines frontended by an [Azure Load balancer](https://azure.microsoft.com/en-us/services/load-balancer/) for high availability and performance. Follow the next steps to connect to the SSRS admin web-portal.
|
||||
1. Obtain the **SSRS** load balancer url from the deployment summary page.
|
||||
- For instance `http://<unique_name_prefix>ssrslb.ciqsedw.ms/reports`.
|
||||
![ssrs-url](./reportingserver_assets/ssrs-url.png)
|
||||
2. Point your web browser to the SSRS load balancer url.
|
||||
3. Enter the admin credentials on the prompt.
|
||||
- Username name **MUST** be a user that can authenticate against the SQL Server in the format **domain\username**. For instance `ciqsedw\edwadmin`.
|
||||
- Password is the SSRS admin password.
|
||||
![authentication](./reportingserver_assets/authentication.png)
|
||||
4. If everything works correctly, you should now have successfully authenticated and can access the reports and data sources.
|
||||
![Home](./reportingserver_assets/ssrs-home.png)
|
||||
|
||||
|
||||
### Subscribing to Reports via Email
|
||||
This step isn't automated by the solution, however, users can manually configure email subscription in very few steps. The following requirements must be met for this.
|
||||
- Create a SendGrid SMTP username and password on [Azure](portal.azure.com)
|
||||
- Enter these created credentials on the **SSRS Server** for email delivery.
|
||||
|
||||
|
||||
##### 1. Create SendGrid SMTP credentials on Azure
|
||||
1. Go to [Azure](portal.azure.com)
|
||||
2. Search for **SendGrid Email Delivery** from the market place.
|
||||
3. Create a new SendGrid account
|
||||
![SendGrid Account](./reportingserver_assets/sendgrid-smtp.png)
|
||||
4. Find the SendGrid account created under your subscription.
|
||||
5. Go to **All settings** -> **Configurations**. Get the following
|
||||
- Username
|
||||
- SMTP Server address
|
||||
- Password
|
||||
![Configuration Parameters](./reportingserver_assets/sendgrid-config.png)
|
||||
|
||||
> **NOTE:** Password is the same one created when the SendGrid account was created.
|
||||
|
||||
##### 2. Enter SendGrid credentials into Reporting Server
|
||||
1. Remote login to both of your **SSRS Servers**
|
||||
2. Open the **Reporting Services Configuration Manager**
|
||||
3. Connect to the server instance.
|
||||
![SSRS Instance](./reportingserver_assets/ssrs-instance.png)
|
||||
4. On the left tabs, click on Email Settings and fill out the following
|
||||
- Sender email address
|
||||
- SMTP Server (smtp.sendgrid.net)
|
||||
- Username
|
||||
- Password/Confirm Password.
|
||||
![SSRS Information](./reportingserver_assets/ssrs-email.png)
|
||||
5. Click Apply.
|
||||
|
||||
> **Note:** There is no need to restart Reporting Server service. It takes the most recent configuration.
|
||||
|
||||
##### 3. Subscribe to receive email report delivery on Reporting Server web portal
|
||||
1. Right click on any paginated report you want to subscribe to.
|
||||
2. Click on **Subscribe**
|
||||
![Subscribe](./reportingserver_assets/subscribe-1.png)
|
||||
3. When the page loads make sure the following options are set correctly.
|
||||
- The **Owner** field points to a user that can query the SQL Server.
|
||||
- Select **Destination (Deliver the report to:)** as Email.
|
||||
- Create a schedule for report delivery
|
||||
- Fill out the **Delivery options (E-mail)** fields
|
||||
- Click on **Create subscription**
|
||||
![Create Subscriptions](./reportingserver_assets/subscribe-2.png)
|
||||
4. If the subscription was successful, the page reloads to the home page.
|
||||
![Home Page](./reportingserver_assets/ssrs-home.png)
|
||||
5. Find your existing subscriptions, per report, by clicking on the gear icon at the top right hand side of the page and clicking **My Subscriptions**
|
После Ширина: | Высота: | Размер: 53 KiB |
После Ширина: | Высота: | Размер: 48 KiB |
После Ширина: | Высота: | Размер: 44 KiB |
После Ширина: | Высота: | Размер: 75 KiB |
После Ширина: | Высота: | Размер: 53 KiB |
После Ширина: | Высота: | Размер: 47 KiB |
После Ширина: | Высота: | Размер: 69 KiB |
После Ширина: | Высота: | Размер: 17 KiB |
После Ширина: | Высота: | Размер: 26 KiB |
После Ширина: | Высота: | Размер: 102 KiB |
После Ширина: | Высота: | Размер: 39 KiB |
После Ширина: | Высота: | Размер: 106 KiB |
После Ширина: | Высота: | Размер: 36 KiB |
После Ширина: | Высота: | Размер: 29 KiB |
После Ширина: | Высота: | Размер: 30 KiB |
После Ширина: | Высота: | Размер: 34 KiB |
После Ширина: | Высота: | Размер: 54 KiB |
После Ширина: | Высота: | Размер: 60 KiB |