Add code for creating a dataset using path on datastore (#73)

* Add code for creating a dataset using path on datastore

* OutputFileDatasetConfig example

* Example for registering from local

* update warning

* address comments

* Fix import
This commit is contained in:
fuhuifang 2021-06-24 16:10:13 -07:00 коммит произвёл GitHub
Родитель f2d62d7040
Коммит b3156223e9
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
1 изменённых файлов: 40 добавлений и 0 удалений

Просмотреть файл

@ -251,6 +251,46 @@ config.run_config.data_references[data_ref.data_reference_name] = data_ref.to_co
## Create Dataset
### From local data
You could create and register a dataset directly from a folder on your local machine. Note that `src_dir` must point to a **folder**, not file.
:warning: Method `upload_directory`: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
```python
from azureml.core import Dataset
# upload the data to datastore and create a FileDataset from it
folder_data = Dataset.File.upload_directory(src_dir="path/to/folder", target=(datastore, "self-defined/path/on/datastore"))
dataset = folder_data.register(workspace=ws, name="<dataset_name>")
```
### From a datastore
The code snippet below shows how to create a `Dataset` given a relative path on `datastore`. Note that the path could either point to a folder (e.g. `local/test/`) or a single file (e.g. `local/test/data.tsv`).
```python
from azureml.core import Dataset
# create input dataset
data = Dataset.File.from_files(path=(datastore, "path/on/datastore"))
dataset = data.register(workspace=ws, name="<dataset_name>")
```
### From outputs using `OutputFileDatasetConfig`
```python
from azureml.core import ScriptRunConfig
from azureml.data import OutputFileDatasetConfig
output_data = OutputFileDatasetConfig(
destination=(datastore, "path/on/datastore"),
name="<output_name>",
)
config = ScriptRunConfig(
source_directory=".",
script="run.py",
arguments=["--output_dir", output_data.as_mount()],
)
# register your OutputFileDatasetConfig as a dataset
output_data_dataset = output_data.register_on_complete(name="<dataset_name>", description = "<dataset_description>")
```
#### Upload to datastore