ENH: Add a "hello world" test script (#948)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Anton Schwaighofer 2024-08-13 10:59:20 +01:00 коммит произвёл GitHub
Родитель 3ecea4b23f
Коммит ca7a4d8f15
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
6 изменённых файлов: 543 добавлений и 5 удалений

Просмотреть файл

@ -1,5 +1,17 @@
#!/bin/bash
# Read input file from argument 1, default to primary_deps.yml
input_file="primary_deps.yml"
output_file="environment.yml"
if [ "$#" -gt 0 ]; then
input_file=$1
echo "Using input file: $input_file"
fi
if [ "$#" -gt 1 ]; then
output_file=$2
echo "Using output file: $output_file"
fi
os_name=$(uname)
if [[ ! $os_name == *"Linux"* ]]; then
echo "ERROR: cannot run environment locking in non-linux environment. Windows users can do this using WSL - https://docs.microsoft.com/en-us/windows/wsl/install"
@ -9,7 +21,7 @@ else
fi
# get environment name from primary dependencies YAML file
name_line="$(cat primary_deps.yml | grep 'name:')"
name_line="$(cat $input_file | grep 'name:')"
IFS=':' read -ra name_arr <<< "$name_line"
env_name="${name_arr[1]}"
@ -19,7 +31,7 @@ echo "Building Conda environment: $env_name"
export CONDA_ALWAYS_YES="true"
conda activate base
conda env remove --name $env_name
conda env create --file primary_deps.yml
conda env create --file $input_file
# export new environment to environment.yml
echo "Exporting environment $env_name to environment.tmp1"
@ -39,7 +51,7 @@ while IFS='' read -r line; do
fi
done < environment.tmp1 > environment.tmp2
echo "Creating final environment.yml with warning line"
echo "# WARNING - DO NOT EDIT THIS FILE MANUALLY" > environment.yml
echo "# To update, please modify 'primary_deps.yml' and then run the locking script 'create_and_lock_environment.sh'">> environment.yml
cat environment.tmp2 >> environment.yml
echo "# WARNING - DO NOT EDIT THIS FILE MANUALLY" > $output_file
echo "# To update, please modify '$input_file' and then run the locking script 'create_and_lock_environment.sh'">> $output_file
cat environment.tmp2 >> $output_file
rm environment.tmp1 environment.tmp2

Просмотреть файл

@ -87,3 +87,22 @@ test_all: pip_test call_flake8 call_mypy call_pytest_and_coverage
example: pip_local
echo 'edit src/health/azure/examples/elevate_this.py to reference your compute_cluster_name'
cd src/health/azure/examples; python elevate_this.py --azureml --message 'running example from makefile'
# Create a local Conda environment
env:
conda env create --file environment.yml
# Install Conda from scratch
conda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
rm Miniconda3-latest-Linux-x86_64.sh
conda update -y -n base conda
conda install -y -n base conda-libmamba-solver
conda config --set solver libmamba
env_hello_world_lock:
../create_and_lock_environment.sh primary_deps_hello_world.yml environment_hello_world.yml
env_hello_world:
conda env create --file environment_hello_world.yml

68
hi-ml-azure/README.md Normal file
Просмотреть файл

@ -0,0 +1,68 @@
# HI-ML-Azure
This folder contains the source code for PyPI package `hi-ml-azure`.
## Testing an AzureML setup
To test if your AzureML setup is correct, follow these steps to setup up Python on your local machine:
- Change the working directory to `<repo_root>/hi-ml-azure`
- Run `make conda` to install MiniConda
- Run `make env_hello_world` to build a simple Python environment with the necessary packages
- Run `conda activate hello_world` to activate the environment
Then follow these steps to test the AzureML setup:
- Download the `config.json` file from your AzureML workspace and place it in folder `<repo_root>/hi-ml-azure`
There is a `Download config.json` button once you expand the dropdown menu on the top-right of your AzureML workspace.
This is not in the core Azure portal, but only visible once you open `AzureML Studio` from the portal.
The file `config.json` should look like this:
```json
{
"subscription_id": "your-subscription-id",
"resource_group": "your-resource-group",
"workspace_name": "your-workspace-name"
}
```
- To run the test script, you must have created a compute cluster in your AzureML workspace.
You can do this by clicking on `Compute` in the left-hand menu, selecting the "Compute clusters" tab, and
then `+ New` to create a new compute cluster. To run the test script, it is sufficient to use a cheap CPU-only VM
type, like `STANDARD_DS3_V2`. Give the cluster a name, and use the same name in the script below.
- Log into Azure by running `az login` in the terminal.
- Start the test script via `python hello_world.py --cluster <your_compute_cluster_name>`.
If successful, this will print out "Successfully queued run..." at the end, and a "Run URL" that points to your job.
- Open the "Run URL" that was printed on the console in the browser to monitor the run.
## Testing access to OpenAI from an AzureML job
Requirements:
- Your compute cluster has a managed identity assigned.
- You have an OpenAI deployment URL and model name.
- The managed identity has "Cognitive Services OpenAI User" access to the OpenAI deployment.
Run the following script to test access to OpenAI from an AzureML job:
```python
python hello_world.py --cluster <your_compute_cluster_name> --openai_url <URL> --openai_model <Model>
```
If successful, this will print out the response from OpenAI.
## Testing access to datasets
Requirements:
- You have a datastore registered in your AzureML workspace
- You have a dataset registered in your AzureML workspace. For example, upload an empty file to a folder `hello_world`
in your storage account, and register this folder as a dataset called `hello_world` in your AzureML workspace.
- You have a folder in your storage account for an output dataset. For that, upload an empty file to a folder
`hello_world_output` and register this folder as a dataset `hello_world_output` in your AzureML workspace.
Then run the following script to test access to datasets:
```python
python hello_world.py --cluster <your_compute_cluster_name> --input_dataset hello_world --output_dataset hello_world_output
```

Просмотреть файл

@ -0,0 +1,283 @@
# WARNING - DO NOT EDIT THIS FILE MANUALLY
# To update, please modify 'primary_deps_openai.yml' and then run the locking script 'create_and_lock_environment.sh'
name: hello_world
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- ca-certificates=2024.3.11=h06a4308_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.4.4=h6a678d5_1
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- ncurses=6.4=h6a678d5_0
- openssl=3.0.14=h5eee18b_0
- pip=23.3.1=py39h06a4308_0
- python=3.9.18
- readline=8.2=h5eee18b_0
- setuptools=69.5.1=py39h06a4308_0
- sqlite=3.45.3=h5eee18b_0
- tk=8.6.14=h39e8969_0
- wheel=0.43.0=py39h06a4308_0
- xz=5.4.6=h5eee18b_1
- zlib=1.2.13=h5eee18b_1
- pip:
- absl-py==2.1.0
- adal==1.2.7
- alembic==1.13.2
- aniso8601==9.0.1
- annotated-types==0.7.0
- antlr4-python3-runtime==4.13.1
- anyio==4.4.0
- applicationinsights==0.11.10
- argcomplete==3.3.0
- attrs==23.2.0
- azure-ai-ml==1.17.0
- azure-appconfiguration==1.1.1
- azure-batch==14.2.0
- azure-cli==2.61.0
- azure-cli-core==2.61.0
- azure-cli-telemetry==1.1.0
- azure-common==1.1.28
- azure-core==1.30.2
- azure-cosmos==3.2.0
- azure-data-tables==12.4.0
- azure-datalake-store==0.0.53
- azure-graphrbac==0.60.0
- azure-identity==1.17.1
- azure-keyvault-administration==4.4.0b2
- azure-keyvault-certificates==4.7.0
- azure-keyvault-keys==4.9.0b3
- azure-keyvault-secrets==4.7.0
- azure-mgmt-advisor==9.0.0
- azure-mgmt-apimanagement==4.0.0
- azure-mgmt-appconfiguration==3.0.0
- azure-mgmt-appcontainers==2.0.0
- azure-mgmt-applicationinsights==1.0.0
- azure-mgmt-authorization==4.0.0
- azure-mgmt-batch==17.3.0
- azure-mgmt-batchai==7.0.0b1
- azure-mgmt-billing==6.0.0
- azure-mgmt-botservice==2.0.0
- azure-mgmt-cdn==12.0.0
- azure-mgmt-cognitiveservices==13.5.0
- azure-mgmt-compute==31.0.0
- azure-mgmt-containerinstance==10.1.0
- azure-mgmt-containerregistry==10.3.0
- azure-mgmt-containerservice==30.0.0
- azure-mgmt-core==1.4.0
- azure-mgmt-cosmosdb==9.4.0
- azure-mgmt-databoxedge==1.0.0
- azure-mgmt-datalake-nspkg==3.0.1
- azure-mgmt-datalake-store==0.5.0
- azure-mgmt-datamigration==10.0.0
- azure-mgmt-devtestlabs==4.0.0
- azure-mgmt-dns==8.0.0
- azure-mgmt-eventgrid==10.2.0b2
- azure-mgmt-eventhub==10.1.0
- azure-mgmt-extendedlocation==1.0.0b2
- azure-mgmt-hdinsight==9.0.0
- azure-mgmt-imagebuilder==1.3.0
- azure-mgmt-iotcentral==10.0.0b2
- azure-mgmt-iothub==3.0.0
- azure-mgmt-iothubprovisioningservices==1.1.0
- azure-mgmt-keyvault==10.3.0
- azure-mgmt-kusto==0.3.0
- azure-mgmt-loganalytics==13.0.0b4
- azure-mgmt-managedservices==1.0.0
- azure-mgmt-managementgroups==1.0.0
- azure-mgmt-maps==2.0.0
- azure-mgmt-marketplaceordering==1.1.0
- azure-mgmt-media==9.0.0
- azure-mgmt-monitor==5.0.1
- azure-mgmt-msi==7.0.0
- azure-mgmt-netapp==10.1.0
- azure-mgmt-network==25.4.0
- azure-mgmt-nspkg==3.0.2
- azure-mgmt-policyinsights==1.1.0b4
- azure-mgmt-privatedns==1.0.0
- azure-mgmt-rdbms==10.2.0b17
- azure-mgmt-recoveryservices==3.0.0
- azure-mgmt-recoveryservicesbackup==9.1.0
- azure-mgmt-redhatopenshift==1.4.0
- azure-mgmt-redis==14.3.0
- azure-mgmt-resource==23.1.1
- azure-mgmt-search==9.1.0
- azure-mgmt-security==6.0.0
- azure-mgmt-servicebus==8.2.0
- azure-mgmt-servicefabric==2.1.0
- azure-mgmt-servicefabricmanagedclusters==2.0.0b6
- azure-mgmt-servicelinker==1.2.0b2
- azure-mgmt-signalr==2.0.0b1
- azure-mgmt-sql==4.0.0b16
- azure-mgmt-sqlvirtualmachine==1.0.0b5
- azure-mgmt-storage==21.1.0
- azure-mgmt-synapse==2.1.0b5
- azure-mgmt-trafficmanager==1.0.0
- azure-mgmt-web==7.2.0
- azure-monitor-query==1.2.0
- azure-multiapi-storage==1.2.0
- azure-nspkg==3.0.2
- azure-storage-blob==12.19.0
- azure-storage-common==1.4.2
- azure-storage-file-datalake==12.14.0
- azure-storage-file-share==12.16.0
- azure-synapse-accesscontrol==0.5.0
- azure-synapse-artifacts==0.18.0
- azure-synapse-managedprivateendpoints==0.4.0
- azure-synapse-spark==0.2.0
- azureml-core==1.56.0
- azureml-dataprep==5.1.6
- azureml-dataprep-native==41.0.0
- azureml-dataprep-rslex==2.22.2
- azureml-dataset-runtime==1.56.0
- azureml-mlflow==1.56.0
- azureml-telemetry==1.56.0
- azureml-tensorboard==1.56.0
- azureml-train-core==1.56.0
- azureml-train-restclients-hyperdrive==1.56.0
- backports-tempfile==1.0
- backports-weakref==1.0.post1
- bcrypt==4.1.3
- blinker==1.8.2
- cachetools==5.3.3
- certifi==2024.6.2
- cffi==1.16.0
- chardet==5.2.0
- charset-normalizer==3.3.2
- click==8.1.7
- cloudpickle==2.2.1
- colorama==0.4.6
- conda-merge==0.2.0
- contextlib2==21.6.0
- contourpy==1.2.1
- cryptography==42.0.8
- cycler==0.12.1
- decorator==5.1.1
- deprecated==1.2.14
- distro==1.9.0
- docker==7.1.0
- entrypoints==0.4
- exceptiongroup==1.2.1
- fabric==3.2.2
- flask==3.0.3
- fonttools==4.53.0
- fusepy==3.0.1
- gitdb==4.0.11
- gitpython==3.1.43
- google-api-core==2.19.1
- google-auth==2.30.0
- googleapis-common-protos==1.63.2
- graphene==3.3
- graphql-core==3.2.3
- graphql-relay==3.2.0
- greenlet==3.0.3
- grpcio==1.64.1
- gunicorn==22.0.0
- h11==0.14.0
- httpcore==1.0.5
- httpx==0.27.0
- humanfriendly==10.0
- idna==3.7
- importlib-metadata==7.1.0
- importlib-resources==6.4.0
- invoke==2.2.0
- isodate==0.6.1
- itsdangerous==2.2.0
- javaproperties==0.5.2
- jeepney==0.8.0
- jinja2==3.1.4
- jmespath==1.0.1
- joblib==1.4.2
- jsondiff==2.0.0
- jsonpickle==3.2.2
- jsonschema==4.22.0
- jsonschema-specifications==2023.12.1
- kiwisolver==1.4.5
- knack==0.11.0
- mako==1.3.5
- markdown==3.6
- markupsafe==2.1.5
- marshmallow==3.21.3
- matplotlib==3.9.0
- mlflow==2.14.1
- mlflow-skinny==2.14.1
- msal==1.28.0
- msal-extensions==1.2.0b1
- msrest==0.7.1
- msrestazure==0.6.4
- ndg-httpsclient==0.5.1
- numpy==1.23.5
- oauthlib==3.2.2
- openai==1.35.5
- opencensus==0.11.4
- opencensus-context==0.1.3
- opencensus-ext-azure==1.1.13
- opencensus-ext-logging==0.1.1
- opentelemetry-api==1.25.0
- opentelemetry-sdk==1.25.0
- opentelemetry-semantic-conventions==0.46b0
- packaging==24.1
- pandas==2.2.2
- param==1.13.0
- paramiko==3.4.0
- pathspec==0.12.1
- pillow==10.3.0
- pkginfo==1.11.1
- portalocker==2.10.0
- proto-plus==1.24.0
- protobuf==3.20.3
- psutil==5.9.8
- pyarrow==15.0.2
- pyasn1==0.6.0
- pyasn1-modules==0.4.0
- pycomposefile==0.0.31
- pycparser==2.22
- pydantic==2.7.4
- pydantic-core==2.18.4
- pydash==8.0.1
- pygithub==1.59.1
- pygments==2.18.0
- pyjwt==2.8.0
- pynacl==1.5.0
- pyopenssl==24.1.0
- pyparsing==3.1.2
- pysocks==1.7.1
- python-dateutil==2.9.0.post0
- pytz==2024.1
- pyyaml==6.0.1
- querystring-parser==1.2.4
- referencing==0.35.1
- requests==2.32.3
- requests-oauthlib==2.0.0
- rpds-py==0.18.1
- rsa==4.9
- ruamel-yaml==0.18.6
- ruamel-yaml-clib==0.2.8
- scikit-learn==1.5.0
- scipy==1.13.1
- scp==0.13.6
- secretstorage==3.3.3
- semver==2.13.0
- six==1.16.0
- smmap==5.0.1
- sniffio==1.3.1
- sqlalchemy==2.0.31
- sqlparse==0.5.0
- sshtunnel==0.1.5
- strictyaml==1.7.3
- tabulate==0.9.0
- tensorboard==2.17.0
- tensorboard-data-server==0.7.2
- threadpoolctl==3.5.0
- tqdm==4.66.4
- typing-extensions==4.12.2
- tzdata==2024.1
- urllib3==2.2.2
- websocket-client==1.3.3
- werkzeug==3.0.3
- wrapt==1.16.0
- xmltodict==0.13.0
- zipp==3.19.2

146
hi-ml-azure/hello_world.py Normal file
Просмотреть файл

@ -0,0 +1,146 @@
# ------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License (MIT). See LICENSE in the repo root for license information.
# ------------------------------------------------------------------------------------------
"""
Simple 'hello world' script to elevate to AML using our `submit_to_azure_if_needed` function.
Invoke like this:
python hello_world.py --cluster <name_of_compute_cluster>
"""
import os
import sys
from argparse import ArgumentParser
from typing import Callable, Union
from pathlib import Path
from datetime import datetime
from azure.identity import get_bearer_token_provider, AzureCliCredential, ManagedIdentityCredential
import openai
# Add hi-ml packages to sys.path so that AML can find them
himl_azure_root = Path(__file__).resolve().parent
folders_to_add = [himl_azure_root / "src"]
for folder in folders_to_add:
if folder.is_dir():
sys.path.insert(0, str(folder))
from health_azure import submit_to_azure_if_needed, DatasetConfig, is_running_in_azure_ml
from health_azure.logging import logging_to_stdout
# The default scope for the Azure Cognitive Services. Tokens are retrieve from this page, and later used instead
# of the API key.
AZURE_COGNITIVE_SERVICES = "https://cognitiveservices.azure.com"
ENV_AZUREML_IDENTITY_ID = "DEFAULT_IDENTITY_CLIENT_ID"
def get_credential() -> Union[AzureCliCredential, ManagedIdentityCredential]:
"""Get the appropriate Azure credential based on the environment. The credential is a managed identity when running
in AzureML, otherwise Azure CLI credential."""
if is_running_in_azure_ml():
return ManagedIdentityCredential(client_id=os.environ[ENV_AZUREML_IDENTITY_ID])
return AzureCliCredential()
def get_azure_token_provider() -> Callable[[], str]:
"""Get a token provider for Azure Cognitive Services. The bearer token provider gets authentication tokens and
refreshes them automatically upon expiry.
"""
credential = get_credential()
credential.get_token(AZURE_COGNITIVE_SERVICES)
return get_bearer_token_provider(credential, AZURE_COGNITIVE_SERVICES)
def main() -> None:
"""
Write out the given message, in an AzureML 'experiment' if required.
First call submit_to_azure_if_needed.
"""
parser = ArgumentParser()
parser.add_argument("-c", "--cluster", type=str, required=True, help="The name of the compute cluster to run on")
parser.add_argument("--openai_url", type=str, required=False, help="The URL of the OpenAI endpoint to use")
parser.add_argument("--openai_model", type=str, required=False, help="The OpenAI deployment to use")
parser.add_argument("--input_dataset", type=str, required=False, help="The name of the input dataset to enumerate")
parser.add_argument("--output_dataset", type=str, required=False, help="The name of the output dataset")
args = parser.parse_args()
input_datasets: list[DatasetConfig] = []
output_datasets: list[DatasetConfig] = []
if args.input_dataset:
input_datasets.append(DatasetConfig(name=args.input_dataset, use_mounting=True))
if args.output_dataset:
output_datasets.append(DatasetConfig(name=args.output_dataset, use_mounting=True))
logging_to_stdout
run_info = submit_to_azure_if_needed(
compute_cluster_name=args.cluster,
strictly_aml_v1=True,
submit_to_azureml=True,
workspace_config_file=himl_azure_root / "config.json",
snapshot_root_directory=himl_azure_root,
input_datasets=input_datasets,
output_datasets=output_datasets,
conda_environment_file=himl_azure_root / "environment_hello_world.yml",
)
print("Hello Chris! This is your first successful AzureML run :-)")
any_error = False
if args.input_dataset:
try:
input_dataset = run_info.input_datasets[0]
assert input_dataset is not None
if input_dataset.is_file():
print(f"Input dataset is a single file: {input_dataset}")
elif input_dataset.is_dir():
print(f"Files in input dataset folder {input_dataset}:")
for file in input_dataset.glob("*"):
print(file)
else:
print(f"Input dataset is neither a file nor a folder: {input_dataset}")
any_error = True
except Exception as e:
print(f"Failed to read input dataset: {e}")
any_error = True
else:
print("No input dataset was specified.")
if args.output_dataset:
try:
output_dataset = run_info.output_datasets[0]
assert output_dataset is not None
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H%M%S")
output_file = output_dataset / f"hello_world_{timestamp}.txt"
output_text = f"Calling all dentists, the private song group starts at {timestamp}!"
print(f"Writing to output file: {output_text}")
output_file.write_text(output_text)
except Exception as e:
print(f"Failed to write output dataset: {e}")
any_error = True
else:
print("No output dataset was specified..")
if args.openai_url and args.openai_model:
try:
print(f"OpenAI URL: {args.openai_url}")
token_provider = get_azure_token_provider()
openai.api_version = "2023-12-01-preview"
openai.azure_endpoint = args.openai_url
openai.azure_ad_token_provider = token_provider
prompt = "Write a 4 line poem using the words 'private', 'dentist', 'song' and 'group'. "
print(f"Prompt: {prompt}")
response = openai.chat.completions.create(
model=args.openai_model,
messages=[{"role": "user", "content": prompt}],
max_tokens=50,
temperature=0.8,
)
content = response.choices[0].message.content
print(f"Response from OpenAI: {content}")
except Exception as e:
print(f"Failed to connect to OpenAI: {e}")
any_error = True
if any_error:
raise RuntimeError("There were errors during the run.")
if __name__ == "__main__":
main()

Просмотреть файл

@ -0,0 +1,10 @@
# This environment definition contains all packages required to run the hello_world script
name: hello_world
channels:
- defaults
dependencies:
- pip=23.3
- python=3.9.18
- pip:
- -r run_requirements.txt
- openai