meta-llama-3.1 client samples (#3304)

* meta-llama-3.1 client samples * black formatting * Update langchain.ipynb --------- Co-authored-by: shail2208 <59747407+shail2208@users.noreply.github.com>
2024-07-25 10:56:13 -07:00 · 2024-07-25 10:56:13 -07:00 · b7dc63b182
--- a/sdk/python/foundation-models/meta-llama-3.1/langchain.ipynb
+++ b/sdk/python/foundation-models/meta-llama-3.1/langchain.ipynb
@ -0,0 +1,232 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use LangChain with Meta-Llama-3.1-405B-Instruct for Azure AI\n",
+    "\n",
+    "You can use Meta-Llama-3.1-405B-Instruct - or any of the Llama 3.1 models - as a serverless API endpoint through Azure AI Studio and Azure ML with `langchain` to create more sophisticated intelligent applications. Models available on Azure as a serverless API endpoint:\n",
+    "\n",
+    "- `Meta-Llama-3.1-405B-Instruct`\n",
+    "- `Meta-Llama-3.1-70B-Instruct`\n",
+    "- `Meta-Llama-3.1-8B-Instruct`\n",
+    "\n",
+    "The above models are supported by cross-regional support for any region in the US.\n",
+    "\n",
+    "> Review the [documentation](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for Meta-Llama-3.1-405B-Instruct and other Llama 3.1 models through AI Studio and for ML Studio for details on how to provision inference endpoints, regional availability, pricing and inference schema reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Before we start, there are certain steps we need to take to deploy the models:\n",
+    "\n",
+    "* Register for a valid Azure account with subscription \n",
+    "* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)\n",
+    "* Create a project and resource group\n",
+    "* Select `Meta-Llama-3.1-405B-Instruct`.\n",
+    "\n",
+    "    > Notice that some models may not be available in all the regions in Azure AI and Azure Machine Learning. On those cases, you can create a workspace or project in the region where the models are available and then consume it with a connection from a different one. To learn more about using connections see [Consume models with connections](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deployments-connections)\n",
+    "\n",
+    "* Deploy with \"Pay-as-you-go\"\n",
+    "\n",
+    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.\n",
+    "\n",
+    "For more information, you should consult Azure's official documentation [here](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for model deployment and inference.\n",
+    "\n",
+    "To complete this tutorial, you will need to:\n",
+    "\n",
+    "* Install `langchain` and `langchain_community`\n",
+    "\n",
+    "    ```bash\n",
+    "    pip install langchain langchain_community\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example\n",
+    "\n",
+    "The following example demonstrate how to create a chain that uses `Meta-Llama-3.1-405B-Instruct` model deployed through Azure AI and Azure ML. The chain has been configured with a `ConversationBufferMemory`. This example has been adapted from [LangChain official documentation](https://python.langchain.com/docs/modules/memory/adding_memory)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install langchain langchain_community"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "imports"
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.chains import LLMChain\n",
+    "from langchain.memory import ConversationBufferMemory\n",
+    "from langchain.prompts import (\n",
+    "    ChatPromptTemplate,\n",
+    "    HumanMessagePromptTemplate,\n",
+    "    MessagesPlaceholder,\n",
+    ")\n",
+    "from langchain.schema import SystemMessage\n",
+    "from langchain_community.chat_models.azureml_endpoint import (\n",
+    "    AzureMLChatOnlineEndpoint,\n",
+    "    AzureMLEndpointApiType,\n",
+    "    LlamaChatContentFormatter,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's create an instance of our `AzureMLChatOnlineEndpoint` model. This class allow us to get access to any model deployed in Azure AI or Azure ML. For completion models use class `langchain_community.llms.azureml_endpoint.AzureMLOnlineEndpoint` with `LlamaContentFormatter` as the `content_formatter`.\n",
+    "\n",
+    "- `endpoint`: Use the endpoint URL from your deployment. Include `/v1` at the end of the endpoint URL.\n",
+    "- `api_key`: Use your API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_client"
+   },
+   "outputs": [],
+   "source": [
+    "chat_model = AzureMLChatOnlineEndpoint(\n",
+    "    endpoint_url=\"https://<endpoint-name>.<region>.inference.ai.azure.com/v1/chat/completions\",\n",
+    "    endpoint_api_type=AzureMLEndpointApiType.serverless,\n",
+    "    endpoint_api_key=\"<key>\",\n",
+    "    content_formatter=LlamaChatContentFormatter(),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the below prompt, we have two input keys: one for the actual input (`human_input`), and another for the input from the `Memory` class (`chat_history`)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "prompt-template"
+   },
+   "outputs": [],
+   "source": [
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        SystemMessage(\n",
+    "            content=\"You are a chatbot having a conversation with a human. You love making references to pop culture in  your answers.\"\n",
+    "        ),\n",
+    "        MessagesPlaceholder(variable_name=\"chat_history\"),\n",
+    "        HumanMessagePromptTemplate.from_template(\"{human_input}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We create the chain as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chain"
+   },
+   "outputs": [],
+   "source": [
+    "chat_llm_chain = LLMChain(\n",
+    "    llm=chat_model,\n",
+    "    prompt=prompt,\n",
+    "    memory=memory,\n",
+    "    verbose=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can see how it works:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_invoke"
+   },
+   "outputs": [],
+   "source": [
+    "chat_llm_chain.predict(human_input=\"Hi there my friend\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_invoke_2"
+   },
+   "outputs": [],
+   "source": [
+    "chat_llm_chain.predict(\n",
+    "    human_input=\"Who is the most renowned architect in the world? Provide a short poem that describes their work in the style of Shakespeare with Iambic pentimeter and a rhythm pattern of ABABCC\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Aditional resources\n",
+    "\n",
+    "Here are some additional reference:  \n",
+    "\n",
+    "* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "jupyter",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/sdk/python/foundation-models/meta-llama-3.1/litellm.ipynb
+++ b/sdk/python/foundation-models/meta-llama-3.1/litellm.ipynb
@ -0,0 +1,180 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use LiteLLM with Meta-Llama-3.1-405B-Instruct for Azure AI Models-as-a-Service\n",
+    "\n",
+    "You can use Meta-Llama-3.1-405B-Instruct - or any of the Llama 3.1 models - as a serverless API endpoint through Azure AI Studio and Azure ML with `litellm` to create more sophisticated intelligent applications. Models available on Azure as a serverless API endpoint:\n",
+    "\n",
+    "- `Meta-Llama-3.1-405B-Instruct`\n",
+    "- `Meta-Llama-3.1-70B-Instruct`\n",
+    "- `Meta-Llama-3.1-8B-Instruct`\n",
+    "\n",
+    "The above models are supported by cross-regional support for any region in the US.\n",
+    "\n",
+    "> Review the [documentation](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for Meta-Llama-3.1-405B-Instruct and other Llama 3.1 models through AI Studio and for ML Studio for details on how to provision inference endpoints, regional availability, pricing and inference schema reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Before we start, there are certain steps we need to take to deploy the models:\n",
+    "\n",
+    "* Register for a valid Azure account with subscription \n",
+    "* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)\n",
+    "* Create a project and resource group\n",
+    "* Select `Meta-Llama-3.1-405B-Instruct`.\n",
+    "\n",
+    "    > Notice that some models may not be available in all the regions in Azure AI and Azure Machine Learning. On those cases, you can create a workspace or project in the region where the models are available and then consume it with a connection from a different one. To learn more about using connections see [Consume models with connections](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deployments-connections)\n",
+    "\n",
+    "* Deploy with \"Pay-as-you-go\"\n",
+    "\n",
+    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.\n",
+    "\n",
+    "For more information, you should consult Azure's official documentation [here](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for model deployment and inference.\n",
+    "\n",
+    "To complete this tutorial, you will need to:\n",
+    "\n",
+    "* Install `litellm`\n",
+    "\n",
+    "    ```bash\n",
+    "    pip install litellm\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example\n",
+    "\n",
+    "The following is an example about how to use `litellm` with `Meta-Llama-3.1-405B-Instruct` model deployed through Azure AI and Azure ML."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install litellm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "imports"
+   },
+   "outputs": [],
+   "source": [
+    "import litellm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You will need to have a Endpoint url and Authentication Key associated with that endpoint. This can be acquired from previous steps. To work with `litellm`, configure the client as follows:\n",
+    "\n",
+    "- `base_url`: Use the endpoint URL from your deployment. Include the `/v1` in the URL.\n",
+    "- `api_key`: Use your API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_client"
+   },
+   "outputs": [],
+   "source": [
+    "client = litellm.LiteLLM(\n",
+    "    base_url=\"<your-maas-endpoint>\",\n",
+    "    api_key=\"<your-api-key>\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the client to create chat completions requests:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_invoke"
+   },
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"content\": \"Who is the most renowned architect in the world? Provide a short poem that describes their work in the style of Shakespeare with Iambic pentimeter and a rhythm pattern of ABABCC\",\n",
+    "            \"role\": \"user\",\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"azure_ai/meta-llama3-latest\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The generated text can be accessed as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_response"
+   },
+   "outputs": [],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Aditional resources\n",
+    "\n",
+    "Here are some additional reference:  \n",
+    "\n",
+    "* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/sdk/python/foundation-models/meta-llama-3.1/openaisdk.ipynb
+++ b/sdk/python/foundation-models/meta-llama-3.1/openaisdk.ipynb
@ -0,0 +1,193 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use OpenAI SDK with Meta-Llama-3.1-405B-Instruct in Azure AI and Azure ML\n",
+    "\n",
+    "Use `openai` SDK to consume `Meta-Llama-3.1-405B-Instruct` deployments through Azure AI and Azure ML, which offers an API compatible with the OpenAI Chat Completion API. It allows customers and users to transition seamlessly from OpenAI models to Llama 3.1 LLMs. \n",
+    "\n",
+    "The API can be directly used with OpenAI's client libraries or third-party tools, like LangChain or LiteLLM.\n",
+    "\n",
+    "The example below shows how to make this transition using the OpenAI Python Library. Notice that the Azure AI chat completions API supports only a portion of the parameters supported by OpenAI API.\n",
+    "\n",
+    "Models available on Azure as a serverless API endpoint:\n",
+    "\n",
+    "- `Meta-Llama-3.1-405B-Instruct`\n",
+    "- `Meta-Llama-3.1-70B-Instruct`\n",
+    "- `Meta-Llama-3.1-8B-Instruct`\n",
+    "\n",
+    "The above models are supported by cross-regional support for any region in the US.\n",
+    "\n",
+    "> Review the [documentation](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for Meta-Llama-3.1-405B-Instruct and other Llama 3.1 models through AI Studio and for ML Studio for details on how to provision inference endpoints, regional availability, pricing and inference schema reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Before we start, there are certain steps we need to take to deploy the models:\n",
+    "\n",
+    "* Register for a valid Azure account with subscription \n",
+    "* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)\n",
+    "* Create a project and resource group\n",
+    "* Select `Meta-Llama-3.1-405B-Instruct`.\n",
+    "\n",
+    "    > Notice that some models may not be available in all the regions in Azure AI and Azure Machine Learning. On those cases, you can create a workspace or project in the region where the models are available and then consume it with a connection from a different one. To learn more about using connections see [Consume models with connections](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deployments-connections)\n",
+    "\n",
+    "* Deploy with \"Pay-as-you-go\"\n",
+    "\n",
+    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.\n",
+    "\n",
+    "For more information, you should consult Azure's official documentation [here](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for model deployment and inference.\n",
+    "\n",
+    "\n",
+    "* Install `openai`:\n",
+    "\n",
+    "    ```bash\n",
+    "    pip install openai\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example\n",
+    "\n",
+    "The following is an example about how to use `openai` with a `Meta-Llama-3.1-405B-Instruct` deployed in Azure AI and Azure ML:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "imports"
+   },
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You will need to have a Endpoint url and Authentication Key associated with that endpoint. This can be acquired from previous steps. \n",
+    "To work with `openai`, configure the client as follows:\n",
+    "\n",
+    "- `base_url`: Use the endpoint URL from your deployment. Include `/v1` as part of the URL.\n",
+    "- `api_key`: Use your API key."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_client"
+   },
+   "outputs": [],
+   "source": [
+    "client = OpenAI(\n",
+    "    base_url=\"<your-maas-endpoint>\",\n",
+    "    api_key=\"<your-api-key>\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Use the client to create chat completions requests:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_invoke"
+   },
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Who is the most renowned architect in the world? Provide a short poem that describes their work in the style of Shakespeare with Iambic pentimeter and a rhythm pattern of ABABCC\",\n",
+    "        }\n",
+    "    ],\n",
+    "    model=\"azureai\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The generated text can be accessed as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "name": "chat_response"
+   },
+   "outputs": [],
+   "source": [
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice that not all the parameters supported by the OpenAI API is supported. See Azure AI documentation for the full list of supported arguments."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Aditional resources\n",
+    "\n",
+    "Here are some additional reference:  \n",
+    "\n",
+    "* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/sdk/python/foundation-models/meta-llama-3.1/webrequests.ipynb
+++ b/sdk/python/foundation-models/meta-llama-3.1/webrequests.ipynb
@ -0,0 +1,321 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use Azure API with Meta-Llama-3.1-405B-Instruct for Azure AI Models-as-a-Service\n",
+    "\n",
+    "You can use Meta-Llama-3.1-405B-Instruct - or any of the Llama 3.1 models - as a serverless API endpoint through Azure AI Studio and Azure ML. Models available on Azure as a serverless API endpoint:\n",
+    "\n",
+    "- `Meta-Llama-3.1-405B-Instruct`\n",
+    "- `Meta-Llama-3.1-70B-Instruct`\n",
+    "- `Meta-Llama-3.1-8B-Instruct`\n",
+    "\n",
+    "The above models are supported by cross-regional support for any region in the US.\n",
+    "\n",
+    "> Review the [documentation](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for Meta-Llama-3.1-405B-Instruct and other Llama 3.1 models through AI Studio and for ML Studio for details on how to provision inference endpoints, regional availability, pricing and inference schema reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Before we start, there are certain steps we need to take to deploy the models:\n",
+    "\n",
+    "* Register for a valid Azure account with subscription \n",
+    "* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)\n",
+    "* Create a project and resource group\n",
+    "* Select `Meta-Llama-3.1-405B-Instruct`.\n",
+    "\n",
+    "    > Notice that some models may not be available in all the regions in Azure AI and Azure Machine Learning. On those cases, you can create a workspace or project in the region where the models are available and then consume it with a connection from a different one. To learn more about using connections see [Consume models with connections](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deployments-connections)\n",
+    "\n",
+    "* Deploy with \"Pay-as-you-go\"\n",
+    "\n",
+    "Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.\n",
+    "\n",
+    "For more information, you should consult Azure's official documentation [here](https://aka.ms/meta-llama-3.1-405B-instruct-azure-ai-studio-docs) for model deployment and inference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basics\n",
+    "\n",
+    "For using the REST API, You will need to have a Endpoint url and Authentication Key associated with that endpoint.  \n",
+    "This can be acquired from previous steps.  \n",
+    "\n",
+    "In this chat completion example, we use a simple curl call for illustration. There are three major components:  \n",
+    "\n",
+    "* The `host-url` is your endpoint url with chat completion schema `/v1/chat/completions`. \n",
+    "* The `headers` defines the content type as well as your api key. \n",
+    "* The `payload` or `data`, which is your prompt detail and model hyper parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"prompt\": \"Math is a\", \"max_tokens\": 30, \"temperature\": 0.7}'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For chat completion, the API schema and request payload are slightly different.\n",
+    "\n",
+    "For `host-url` the path changed to `/v1/chat/completions` and the request payload also changed to include roles in conversations. Here is a sample payload:\n",
+    "```json\n",
+    "{ \n",
+    "  \"messages\": [ \n",
+    "    { \n",
+    "      \"content\": \"You are a helpful assistant.\", \n",
+    "      \"role\": \"system\" \n",
+    "},  \n",
+    "    { \n",
+    "      \"content\": \"Hello!\", \n",
+    "      \"role\": \"user\" \n",
+    "    } \n",
+    "  ], \n",
+    "  \"max_tokens\": 50, \n",
+    "}\n",
+    "```\n",
+    "\n",
+    "Here is a sample curl call for chat completion:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you compare the generation result for both text and chat completion API calls, you will notice that:\n",
+    "\n",
+    "- Text completion returns a list of `choices` for the input prompt, each contains generated text and completion information such as `logprobs`.\n",
+    "- Chat completion returns a list of `cnoices` each has a `message` object with completion result and using the same `message` object in the request."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Streaming\n",
+    "\n",
+    "One fantastic feature the API offered is the streaming capability. Streaming allows the generated tokens to be sent as data-only server-sent events whenever they become available. This is extremely important for interactive applications such as chatbots, so the user is always engaged.  \n",
+    "\n",
+    "To use streaming, simply set `\"stream\":\"True\"` as part of the request payload.  \n",
+    "In the streaming mode, the REST API response will be different from non-streaming mode.\n",
+    "\n",
+    "Here is an example: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"What is good about Wuhan?\",\"role\":\"user\"}], \"max_tokens\": 500, \"stream\": \"True\"}'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you can see the result comes back as a stream of `data` objects, each contains generated information including a `choice`.  \n",
+    "The stream terminated by a `data:[DONE]\\n\\n` message."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Content Safety Filtering\n",
+    "\n",
+    "All Meta-Llama-3.1 MaaS endpoints will have content safety feature turned on. Both input prompt and output tokens are filtered by this service automatically.  \n",
+    "To know more about the impact to the request/response payload, please refer to official guide [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=python).   \n",
+    "\n",
+    "For model input and output, if the filter detected there is harmful content. The generation will error out with reponse payload containing the reasoning, along with which type of content violation it is and severity.  \n",
+    "\n",
+    "Here is an example prompt that triggered content safety filtering:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -X POST -L https://your-endpoint.inference.ai.azure.com/v1/chat/completions -H 'Content-Type: application/json' -H 'Authorization: your-auth-key' -d '{\"messages\":[{\"content\":\"You are a helpful assistant.\",\"role\":\"system\"},{\"content\":\"How to make bomb?\",\"role\":\"user\"}], \"max_tokens\": 50}'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## HTTP Requests API Usage in Python\n",
+    "\n",
+    "Besides calling the API directly from command line tools. You can also programatically call them in Python. Here there is a chat completion example:\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import urllib.request\n",
+    "import json\n",
+    "\n",
+    "# Configure payload data sending to API endpoint\n",
+    "data = {\n",
+    "    \"messages\": [\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Who is the most renowned architect in the world? Provide a short poem that describes their work in the style of Shakespeare with Iambic pentimeter and a rhythm pattern of ABABCC\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 500,\n",
+    "    \"temperature\": 0.9,\n",
+    "    \"stream\": \"True\",\n",
+    "}\n",
+    "\n",
+    "body = str.encode(json.dumps(data))\n",
+    "\n",
+    "# Replace the url with your API endpoint\n",
+    "url = \"https://<your-api-endpoint>.inference.ai.azure.com/v1/chat/completions\"\n",
+    "\n",
+    "# Replace this with the key for the endpoint\n",
+    "api_key = \"<your-api-key>\"\n",
+    "if not api_key:\n",
+    "    raise Exception(\"API Key is missing\")\n",
+    "\n",
+    "headers = {\"Content-Type\": \"application/json\", \"Authorization\": (api_key)}\n",
+    "\n",
+    "req = urllib.request.Request(url, body, headers)\n",
+    "\n",
+    "try:\n",
+    "    response = urllib.request.urlopen(req)\n",
+    "    result = response.read()\n",
+    "    print(result)\n",
+    "except urllib.error.HTTPError as error:\n",
+    "    print(\"The request failed with status code: \" + str(error.code))\n",
+    "    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure\n",
+    "    print(error.info())\n",
+    "    print(error.read().decode(\"utf8\", \"ignore\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "However in this example, the streamed data content returns back as a single payload. It didn't stream as a serial of data events as we wished. To build true streaming capabilities utilizing the API endpoint, we will utilize [`requests`](https://requests.readthedocs.io/en/latest/) library instead."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Streaming in Python\n",
+    "\n",
+    "`Requests` library is a simple HTTP library for Python built with [`urllib3`](https://github.com/urllib3/urllib3). It automatically maintains the keep-alive and HTTP connection pooling. With the `Session` class, we can easily stream the result from our API calls.  \n",
+    "\n",
+    "Here is a quick example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install requests"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import requests\n",
+    "\n",
+    "data = {\n",
+    "    \"messages\": [\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What is good about Wuhan?\"},\n",
+    "    ],\n",
+    "    \"max_tokens\": 500,\n",
+    "    \"temperature\": 0.9,\n",
+    "    \"stream\": \"True\",\n",
+    "}\n",
+    "\n",
+    "\n",
+    "def post_stream(url):\n",
+    "    s = requests.Session()\n",
+    "    api_key = \"<your-api-key>\"\n",
+    "    headers = {\"Content-Type\": \"application/json\", \"Authorization\": (api_key)}\n",
+    "\n",
+    "    with s.post(url, data=json.dumps(data), headers=headers, stream=True) as resp:\n",
+    "        print(resp.status_code)\n",
+    "        for line in resp.iter_lines():\n",
+    "            if line:\n",
+    "                print(line)\n",
+    "\n",
+    "\n",
+    "url = \"https://<your-endpoint>inference.ai.azure.com/v1/chat/completions\"\n",
+    "post_stream(url)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Aditional resources\n",
+    "\n",
+    "Here are some additional reference:  \n",
+    "\n",
+    "* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}