Changing folder names to be more self-explanatory

This commit is contained in:
deguhath 2017-09-09 19:13:29 -07:00
Родитель 5276101e33
Коммит 5d4e8f4c1a
13 изменённых файлов: 9 добавлений и 12 удалений

Просмотреть файл

Просмотреть файл

Просмотреть файл

Просмотреть файл

@ -1,8 +0,0 @@
The _**Data**_ directory in the project git repository is the place to store sample datasets which should be of small size, **NOT** the entire datasets. If your client does not allow you to store even the sample data on the github repository, if possible, store a sample dataset with all confidential fields hashed. If still not allowed, please do not store sample data here. But, please still fill in the table in each sub-directory.
The small sample datasets can be used to make your data preprocessing, feature engineering, or modeling scripts runnable. It can be helpful to quickly run the scripts that process or model the data, and understand what the scripts are doing.
In each directory, there is a markdown file, which lists all datasets in each directory. Please provide the link to the full dataset in case one wants to access the full dataset.

Просмотреть файл

@ -5,10 +5,7 @@ This is a general project directory structure for Team Data Science Process deve
[Team Data Science Process (TDSP)](https://github.com/Azure/Microsoft-TDSP) is an agile, iterative, data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and [tools](https://github.com/Azure/Azure-TDSP-Utilities) for productive data science.
**NOTE:** In this directory structure, the **Data folder is NOT supposed to contain raw or processed data**, which could be big in size. It is only supposed to contain INFORMATION or DOCUMENTATION about the data, such as:
1. Data definitions
2. Why and how raw is converted into processed data
3. Location of the data in storage containers in Azure blobs, Azure Data Lake, SQL server etc.
**NOTE:** In this directory structure, the **Sample_Data folder is NOT supposed to contain LARGE raw or processed data**. It is only supposed to contain **small and sample** data sets, which could be used to test the code.
The two documents under Docs/Project, namely the [Charter](./Docs/Project/Charter.md) and [Exit Report](./Docs/Project/Exit%20Report.md) are particularly important to consider. They help to define the project at the start of an engagement, and provide a final report to the customer or client.

Просмотреть файл

Просмотреть файл

8
Sample_Data/README.md Normal file
Просмотреть файл

@ -0,0 +1,8 @@
The **Sample_Data** directory in the project git repository is the place to store **SAMPLE** datasets which should be of small size, **NOT** the entire datasets. If your client does not allow you to store even the sample data on the github repository, if possible, store a sample dataset with all confidential fields hashed. If still not allowed, please do not store sample data here. But, please still fill in the table in each sub-directory.
The small sample datasets can be used to make your data preprocessing, feature engineering, or modeling scripts runnable. It can be helpful to quickly run the scripts that process or model the data, and understand what the scripts are doing.
In each directory, there is a markdown file, which lists all datasets in each directory. Please provide the link to the full dataset in case one wants to access the full dataset.

Просмотреть файл