Providing a text version of the readme.

This commit is contained in:
KayUnkroth 2017-08-02 14:13:19 -07:00
Родитель 5e89a21551
Коммит f8f55a7a7f
2 изменённых файлов: 22 добавлений и 0 удалений

Двоичные данные
UsqlScripts/readme.rtf

Двоичный файл не отображается.

22
UsqlScripts/readme.txt Normal file
Просмотреть файл

@ -0,0 +1,22 @@
U-SQL Scripts for Processing a TPC-DS Data Set
The U-SQL scripts for processing a TPC-DS data set demonstrate how to use Azure Data Lake Analytics to prepare raw data for import into an Azure Analysis Services data model. For a detailed discussion, see the blog article “Using Azure Analysis Services on Top of Azure Data Lake Storage” on the Analysis Services Team Blog.
To use these scripts, the TPC-DS data set must be generated by using the dsdgen tool, which can be downloaded as source code from the TPC-DS web site. Run the dsdgen tool with /PARALLEL 100 and /CHILD ids ranging from 1 – 100 to generate the source files with the expected file naming conventions and place the source files in an Azure Blob Storage account, as discussed in “Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2” on the Analysis Services Team Blog. Finally, edit the U-SQL scripts and replace the storage account placeholder (@<blob storage account name>) with your actual storage account.
The subfolders containing the U-SQL scripts highlight different scenarios:
* all_single   These scripts create a single csv file per table containing all the source data.
* large_multiple   These scripts 4 csv files for each of the large tables (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) and a single csv file for each of the remaining tables.
* last_available_year   These scripts create a single csv file per table containing only the source data for the last year in the data set, which is the year 2003.
* modelling    These scripts create a data set for modelling purposes with a single csv file per table containing up to 100 rows of data.