Add Autogen and RAG Agent to MSTICpy (#793)

* Add dependencies and ensure that .rst docs are included in build packages * Establish dev environment and pass pre-commit checks * Add unit test for finding .rst files, delete old Jupyter Notebook * Add RAG Agent summary to gettingstarted, update %%ask docstrings - Moved example usage of %%ask to the docstring for the cell magic function - Added a section about the RAG agent and its usage to the Package Summary - Updated example notebook RagAgent.ipynb to remove unnecessary imports * Ensure that RST files are returned as strings, not Pathlib.Path * Read Autogen config from msticpyconfig.yaml * Read Autogen config from msticpyconfig.yaml * Inject API_KEY envar into config * Silence group chat * Reformat RAG Agent code snippets in PackageSummary.rst * Reenable pre-commit check_reqs * Fixes for Sync * Abstract logic from magic class * PR changes, refactoring magic class * Formatting fixes, remove unused arguments * minor fixes * Fix some pipeline errors * Pipeline fixes * Fix test for pkg import failure for autogen * Update test module names to align with recent changes * Remove explicit TypeAlias to support Py3.8 * py3.8 issues with union operator backport from future * Fix imports and type annotations for pipeline * Fix mypy type issues * Add importlib-resources to requirements for py3.8 * fix isort * Add importlib-resources to requirements-all.txt * Add importlib_resources to EXTRAS * Add importlib-resources to conda reqs * Update pkg_import test to handle required python versions * Fix python version check * API docs generated for aiagents Also fixing a couple of errors in PackageSummary.rst * Remove extra test file --------- Co-authored-by: Eileen Garcia <t-egarcia@microsoft.com> Co-authored-by: Ryan Cobb <ryan@detect.dev> Co-authored-by: ryan-detect-dot-dev <106877876+ryan-detect-dot-dev@users.noreply.github.com> Co-authored-by: ianhelle <ianhelle@microsoft.com>
2024-08-07 12:15:34 -07:00 · 2024-08-07 12:15:34 -07:00 · 032c1080cc
--- a/conda/conda-reqs-pip.txt
+++ b/conda/conda-reqs-pip.txt
@ -7,3 +7,5 @@ passivetotal>=2.5.3
 sumologic-sdk>=0.1.11
 splunk-sdk>=1.6.0,!=2.0.0
 packaging>=24.0
+pyautogen[retrievechat]>=0.2.28
+importlib-resources >= 6.4.0; python_version <= "3.8"
--- a/docs/notebooks/RagAgent.ipynb
+++ b/docs/notebooks/RagAgent.ipynb
@ -0,0 +1,716 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example Notebook for RAG (Retrieval-Augmented Generation) Agent Usage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Query the RAG agent using the cell magic `%%ask` command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# %load_ext msticpy.aiagents.mp_docs_rag_magic\n",
+    "# Or use:\n",
+    "%reload_ext msticpy.aiagents.mp_docs_rag_magic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "2024-07-30 15:48:19,414 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - \u001b[32mUse the existing collection `MSTICpy_Docs_2.12.0`.\u001b[0m\n",
+      "2024-07-30 15:48:27,518 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 384 chunks.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To connect to the Microsoft Sentinel Query Provider, you need the following three things:\n",
+       "\n",
+       "1. A `QueryProvider` instance.\n",
+       "2. The data environment string (\"MSSentinel\" for Microsoft Sentinel).\n",
+       "3. A connection string or authentication parameters.\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProviders.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I connect to the M365 Defender query provider?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To connect to the M365 Defender query provider, you need to follow these steps:\n",
+       "\n",
+       "1. Ensure your connection details are specified in the `msticpyconfig.yaml` file.\n",
+       "\n",
+       "2. Create a `QueryProvider` instance for M365 Defender.\n",
+       "\n",
+       "3. Call the `connect()` method on the instance.\n",
+       "\n",
+       "Here's an example:\n",
+       "\n",
+       "```python\n",
+       "from msticpy.data import QueryProvider\n",
+       "\n",
+       "# Create a QueryProvider instance\n",
+       "mdatp_prov = QueryProvider(\"M365D\")\n",
+       "\n",
+       "# Connect to the M365 Defender instance using the configured details\n",
+       "mdatp_prov.connect()\n",
+       "```\n",
+       "\n",
+       "If you have multiple instances configured, specify the instance name when calling `connect()`:\n",
+       "\n",
+       "```python\n",
+       "mdatp_prov.connect(instance=\"Tenant2\")\n",
+       "```\n",
+       "\n",
+       "If you prefer to pass connection parameters directly, use keyword arguments:\n",
+       "\n",
+       "```python\n",
+       "# Collect credentials\n",
+       "ten_id = input('Tenant ID')\n",
+       "client_id = input('Client ID')\n",
+       "client_secret = input('Client Secret')\n",
+       "\n",
+       "# Create a QueryProvider instance\n",
+       "mdatp_prov = QueryProvider('M365D')\n",
+       "\n",
+       "# Connect using collected credentials\n",
+       "mdatp_prov.connect(tenant_id=ten_id, client_id=client_id, client_secret=client_secret)\n",
+       "```\n",
+       "\n",
+       "Alternatively, you can use a connection string:\n",
+       "\n",
+       "```python\n",
+       "# Define a connection string\n",
+       "conn_str = (\n",
+       "    \"tenant_id='243bb6be-4136-4b64-9055-fb661594199a'; \"\n",
+       "    \"client_id='a5b24e23-a96a-4472-b729-9e5310c83e20'; \"\n",
+       "    \"client_secret='[PLACEHOLDER]'\"\n",
+       ")\n",
+       "\n",
+       "# Create a QueryProvider instance\n",
+       "mdatp_prov = QueryProvider('M365D')\n",
+       "\n",
+       "# Connect using the connection string\n",
+       "mdatp_prov.connect(conn_str)\n",
+       "```\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask\n",
+    "How do I connect to the M365 Defender query provider?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To add Azure Resource Graph to your `msticpyconfig.yaml` configuration, include the following under the `Azure` section:\n",
+       "\n",
+       "```yaml\n",
+       "Azure:\n",
+       "  auth_methods:\n",
+       "  - cli\n",
+       "  - interactive\n",
+       "  cloud: global\n",
+       "```\n",
+       "\n",
+       "For more information on configuring `msticpyconfig.yaml`, refer to the MSTICPy documentation.\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\ResourceGraphDriver.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask\n",
+    "What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### A response of `UPDATE_CONTEXT` indicates that the agents are unable to answer the query with the information retrieved by the RAG agent."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: Does the Splunk query provider support device code authentication?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: UPDATE CONTEXT"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask\n",
+    "Does the Splunk query provider support device code authentication?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How can I plot IP addresses in this dataframe on a map?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To plot IP addresses in a DataFrame on a map using MSTICpy's FoliumMap, you can use the `mp_plot.folium_map` pandas accessor. Here's an example:\n",
+       "\n",
+       "```python\n",
+       "# Plotting IP addresses using the mp_plot.folium_map accessor\n",
+       "geo_loc_df.mp_plot.folium_map(ip_column=\"IPAddress\")\n",
+       "```\n",
+       "\n",
+       "This will display an interactive map with markers based on the IP addresses in the \"IPAddress\" column of your DataFrame.\n",
+       "\n",
+       "Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\visualization\\\\FoliumMap.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How can I plot IP addresses in this dataframe on a map?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I create a new custom data provider with msticpy?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To create a new custom data provider with MSTICpy, follow these main steps:\n",
+       "\n",
+       "1. **Write the driver class:** Derive it from `DriverBase` and implement the methods `__init__`, `connect`, `query`, and optionally `query_with_results`.\n",
+       "2. **Customize the driver (optional):** Expose attributes via `QueryProvider`, and implement custom parameter formatting and query parameter substitution if needed.\n",
+       "3. **Register the driver:** Update the `DataEnvironment` enum and add an entry to the driver dynamic load table.\n",
+       "4. **Add queries:** Create a folder named after your `DataEnvironment` and add your query files there.\n",
+       "5. **Add settings definition:** Define settings in a YAML configuration file.\n",
+       "6. **Create documentation:** Document the configuration and use of the data provider.\n",
+       "7. **Create unit tests:** Add unit tests using mocks to simulate service responses.\n",
+       "\n",
+       "For detailed guidance on these steps, refer to the provided MSTICpy documentation related to data providers.\n",
+       "\n",
+       "Sources: WritingDataProviders.rst, PluginFramework.rst, ExtendingMsticpy.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How do I create a new custom data provider with msticpy?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I list which TI providers are currently enabled?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: ### Step 1: Intent\n",
+       "The user's intent is to get help with **question answering**.\n",
+       "\n",
+       "### Step 2: Answer\n",
+       "To list which Threat Intelligence (TI) providers are currently enabled in MSTICpy, you can inspect the configuration typically found in the `msticpyconfig.yaml` file under the `TIProviders` section. This configuration file determines which providers are set up and whether they are marked as primary/secondary.\n",
+       "\n",
+       "Sources: `C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\WritingTIAndContextProviders.rst`"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How do I list which TI providers are currently enabled?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I lookup threat intelligence for multiple IP addresses at once?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: Step 1: User's intent is to generate code for performing threat intelligence lookups for multiple IP addresses at once.\n",
+       "\n",
+       "Step 2:\n",
+       "```python\n",
+       "from msticpy.context.ip_utils import ip_whois\n",
+       "\n",
+       "# List of IP addresses to lookup\n",
+       "ip_list = [\"123.1.2.3\", \"124.5.6.7\"]\n",
+       "\n",
+       "# Performing Whois lookup for multiple IP addresses\n",
+       "whois_data = ip_whois(ip_list)\n",
+       "print(whois_data)\n",
+       "```\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\IPWhois.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How do I lookup threat intelligence for multiple IP addresses at once?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I use pivot functions?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To use pivot functions in MSTICpy, you have two primary options: creating persistent pivot function definitions in YAML files or adding ad hoc pivot functions directly in code. Here's a brief overview of both methods:\n",
+       "\n",
+       "**1. Persistent Pivot Function Definitions**\n",
+       "\n",
+       "- Define your pivot function properties in a YAML file with a top-level element `pivot_providers`.\n",
+       "- Example YAML definition:\n",
+       "\n",
+       "```yaml\n",
+       "pivot_providers:\n",
+       "  who_is:\n",
+       "    src_module: msticpy.context.ip_utils\n",
+       "    src_func_name: get_whois_df\n",
+       "    func_new_name: whois\n",
+       "    input_type: dataframe\n",
+       "    entity_map:\n",
+       "      IpAddress: Address\n",
+       "    func_df_param_name: data\n",
+       "    func_df_col_param_name: ip_column\n",
+       "    func_out_column_name: query\n",
+       "    func_static_params:\n",
+       "      all_columns: True\n",
+       "      show_progress: False\n",
+       "    func_input_value_arg: ip_address\n",
+       "```\n",
+       "\n",
+       "- Load and register the definition using:\n",
+       "\n",
+       "```python\n",
+       "from msticpy.init.pivot_core.pivot import Pivot\n",
+       "Pivot.register_pivot_providers(pivot_reg_path=path_to_your_yaml, namespace=globals(), def_container=\"my_container\", force_container=True)\n",
+       "```\n",
+       "\n",
+       "**2. Ad Hoc Pivot Functions in Code**\n",
+       "\n",
+       "- Add a function as a pivot using the `add_pivot_function` method:\n",
+       "\n",
+       "```python\n",
+       "def my_func(input: str):\n",
+       "    return input.upper()\n",
+       "\n",
+       "Pivot.add_pivot_function(\n",
+       "    func=my_func,\n",
+       "    container=\"change_case\",\n",
+       "    input_type=\"value\",\n",
+       "    entity_map={\"Host\": \"HostName\"},\n",
+       "    func_input_value_arg=\"input\",\n",
+       "    func_new_name=\"upper_name\",\n",
+       ")\n",
+       "```\n",
+       "\n",
+       "- Alternatively, use the `PivotRegistration` class:\n",
+       "\n",
+       "```python\n",
+       "from msticpy.init.pivot_core.pivot_register import PivotRegistration\n",
+       "\n",
+       "def my_func(input: str):\n",
+       "    return input.upper()\n",
+       "\n",
+       "piv_reg = PivotRegistration(\n",
+       "    input_type=\"value\",\n",
+       "    entity_map={\"Host\": \"HostName\"},\n",
+       "    func_input_value_arg=\"input\",\n",
+       "    func_new_name=\"upper_name\"\n",
+       ")\n",
+       "Pivot.add_pivot_function(my_func, piv_reg, container=\"change_case\")\n",
+       "```\n",
+       "\n",
+       "**Running Pivots in DataFrame Pipelines:**\n",
+       "\n",
+       "```python\n",
+       "(\n",
+       "    my_df\n",
+       "    .query(\"UserCount > 1\")\n",
+       "    .mp_pivot.run(IpAddress.util.whois, column=\"Ioc\")\n",
+       "    .drop_duplicates()\n",
+       ")\n",
+       "```\n",
+       "\n",
+       "- Use `mp_pivot.run` to integrate pivot functions into DataFrame processing pipelines.\n",
+       "- Join input and output DataFrames with the `join` parameter in `mp_pivot.run`.\n",
+       "\n",
+       "**Debugging Tools:**\n",
+       "\n",
+       "- `mp_pivot.display` for intermediate results.\n",
+       "- `mp_pivot.tee` for creating snapshots.\n",
+       "- `mp_pivot.tee_exec` for executing intermediate operations (e.g., plotting).\n",
+       "\n",
+       "Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\data_analysis\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.init.pivot.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How do I use pivot functions?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: Which columns do I need in a dataframe to plot process trees?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To plot process trees, the required columns in a DataFrame are typically:\n",
+       "\n",
+       "1. `ParentProcessName`\n",
+       "2. `Process`\n",
+       "\n",
+       "Additional attributes such as `SubjectUserName`, `SubjectDomainName`, `SubjectLogonId`, `NewProcessName`, `CommandLine`, and `TimeGenerated` can be used for more detailed visualization and analysis.\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\NetworkGraph.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "Which columns do I need in a dataframe to plot process trees?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: What kind of visualizations does msticpy support?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: MSTICPy supports various visualizations including interactive timelines, process trees, multi-dimensional Morph Charts, data viewers, matrix plots, network plots, and several others listed under the `msticpy.vis` package.\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\index.rst, C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\MorphCharts.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "What kind of visualizations does msticpy support?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: How do I add a new query for Microsoft 365 Defender to msticpy?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: To add a new query for Microsoft 365 Defender (M365D) to MSTICPy, you should use the `QueryProvider` class. Here's a step-by-step guide on how to achieve it:\n",
+       "\n",
+       "1. **Initialize the `QueryProvider` for M365D**:\n",
+       "   ```python\n",
+       "   from msticpy.data import QueryProvider\n",
+       "\n",
+       "   mdatp_prov = QueryProvider(\"M365D\")\n",
+       "   ```\n",
+       "\n",
+       "2. **Connect to the M365 Defender API**:\n",
+       "   ```python\n",
+       "   mdatp_prov.connect()\n",
+       "   ```\n",
+       "\n",
+       "3. **Add your new query**:\n",
+       "   You can add new queries to the query store of `QueryProvider`. Here’s an example of how to define and add a new query:\n",
+       "   ```python\n",
+       "   new_query = \"\"\"\n",
+       "   DeviceEvents\n",
+       "   | where ActionType == \"FileCreated\"\n",
+       "   | limit 10\n",
+       "   \"\"\"\n",
+       "   mdatp_prov.add_query(\"GetRecentFileCreatedEvents\", new_query)\n",
+       "   ```\n",
+       "\n",
+       "4. **Run the newly added query**:\n",
+       "   ```python\n",
+       "   results = mdatp_prov.exec_query(\"GetRecentFileCreatedEvents\")\n",
+       "   print(results)\n",
+       "   ```\n",
+       "\n",
+       "In summary, you need to instantiate a `QueryProvider` object for M365D, connect to the API, add the new query, and then execute the query.\n",
+       "\n",
+       "Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask \n",
+    "How do I add a new query for Microsoft 365 Defender to msticpy?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Question**: Which msticpy module contains the code related to visualizing network graphs?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/markdown": [
+       "\n",
+       "**Answer**: The MSTICpy module that contains the code related to visualizing network graphs is `msticpy.vis.network_plot`.\n",
+       "\n",
+       "Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.vis.network_plot.rst"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%ask\n",
+    "Which msticpy module contains the code related to visualizing network graphs?"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "internshipenv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/source/api/msticpy.aiagents.config_utils.rst
+++ b/docs/source/api/msticpy.aiagents.config_utils.rst
@ -0,0 +1,7 @@
+msticpy.aiagents.config\_utils module
+=====================================
+
+.. automodule:: msticpy.aiagents.config_utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/msticpy.aiagents.mp_docs_rag_magic.rst
+++ b/docs/source/api/msticpy.aiagents.mp_docs_rag_magic.rst
@ -0,0 +1,7 @@
+msticpy.aiagents.mp\_docs\_rag\_magic module
+============================================
+
+.. automodule:: msticpy.aiagents.mp_docs_rag_magic
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/msticpy.aiagents.rag_agents.rst
+++ b/docs/source/api/msticpy.aiagents.rag_agents.rst
@ -0,0 +1,7 @@
+msticpy.aiagents.rag\_agents module
+===================================
+
+.. automodule:: msticpy.aiagents.rag_agents
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/msticpy.aiagents.rst
+++ b/docs/source/api/msticpy.aiagents.rst
@ -0,0 +1,17 @@
+msticpy.aiagents package
+========================
+
+.. automodule:: msticpy.aiagents
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 4
+
+   msticpy.aiagents.config_utils
+   msticpy.aiagents.mp_docs_rag_magic
+   msticpy.aiagents.rag_agents
--- a/docs/source/api/msticpy.context.tiproviders.binaryedge.rst
+++ b/docs/source/api/msticpy.context.tiproviders.binaryedge.rst
@ -0,0 +1,7 @@
+msticpy.context.tiproviders.binaryedge module
+=============================================
+
+.. automodule:: msticpy.context.tiproviders.binaryedge
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/api/msticpy.context.tiproviders.rst
+++ b/docs/source/api/msticpy.context.tiproviders.rst
@ -15,6 +15,7 @@ Submodules
   msticpy.context.tiproviders.abuseipdb
   msticpy.context.tiproviders.alienvault_otx
   msticpy.context.tiproviders.azure_sent_byoti
+   msticpy.context.tiproviders.binaryedge
   msticpy.context.tiproviders.crowdsec
   msticpy.context.tiproviders.greynoise
   msticpy.context.tiproviders.ibm_xforce
--- a/docs/source/api/msticpy.rst
+++ b/docs/source/api/msticpy.rst
@ -12,6 +12,7 @@ Subpackages
 .. toctree::
   :maxdepth: 4

+   msticpy.aiagents
   msticpy.analysis
   msticpy.auth
   msticpy.common
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -217,6 +217,7 @@ autoapi_dirs = ["../../msticpy"]

 autodoc_mock_imports = [
    "adal",
+    "autogen",
    "azure",
    "azure.common.exceptions",
    "azure.core.exceptions",
@ -245,6 +246,7 @@ autodoc_mock_imports = [
    "dns",
    "folium",
    "geoip2",
+    "importlib_resources",
    "ipwhois",
    "IPython",
    "ipywidgets",
--- a/docs/source/getting_started/PackageSummary.rst
+++ b/docs/source/getting_started/PackageSummary.rst
@ -318,7 +318,7 @@ of known malicious commands as well as suspicious patterns of behaviour.
 domain_utils
 ~~~~~~~~~~~~

-:py:mod:`msticpy.transform.domain_utils`
+:py:mod:`msticpy.context.domain_utils`

 Module to support investigation of domain names and URLs with functions to
 validate a domain name and screenshot a URL.
@ -334,6 +334,68 @@ without compromising privacy.

 See :doc:`../data_acquisition/DataMasking`

+
+
+
+Agentic Module
+-----------------
+aiagents Module: RAG Agent
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+:py:mod:`msticpy.aiagents.mp_docs_rag_magic`
+
+The **aiagents** module in MSTICpy introduces the **Retrieval-Augmented Generation (RAG) Agent**.
+This agent is designed for Q&A about MSTICpy. It enhances InfoSec investigations and
+threat hunting in Jupyter Notebooks by providing contextually relevant responses using large language
+models and external knowledge retrieval from the MSTICpy documentation.
+
+You can invoke the RAG agent in a Jupyter Notebook by adding the following:
+
+.. code-block:: ipython
+
+   # Load the RAG cell magic
+   %load_ext msticpy.aiagents.mp_docs_rag_magic
+
+.. code-block:: ipython
+
+   # The following command should be in a separate cell and be the first line of the cell
+   %%ask
+   What are the three things that I need to connect to Azure Query Provider?
+
+   You can also invoke the RAG Agent outside of a Jupyter Notebook as follows:
+
+.. code-block:: python
+
+   # Import the necessary module
+   from .rag_agents import (
+      ask_question,
+      get_retrieval_assistant_agent,
+      get_retrieval_user_proxy_agent,
+   )
+
+   # Initialize the agents
+   assistant_agent = get_retrieval_assistant_agent()
+   user_proxy_agent = get_retrieval_user_proxy_agent()
+
+   # Define your question
+   question = "Your MSTICpy-related question here"
+
+   # Query the agents
+   response = ask_question(
+      assistant_agent,
+      user_proxy_agent,
+      question=question,
+   )
+
+   print(response.summary)
+
+
+See also:
+
+- :doc:`Autogen LLM Configurations in msticpyconfig <../getting_started/msticpyconfig>`
+- Sample notebook - `RAG Agent Example Notebook <https://github.com/microsoft/msticpy/blob/master/docs/notebooks/RagAgent.ipynb>`__
+
+
 Supported Platforms and Packages
 --------------------------------

@ -341,3 +403,4 @@ Supported Platforms and Packages
 -  Requires Python 3.8 or later
 -  See `requirements.txt <https://github.com/microsoft/msticpy/blob/master/requirements.txt>`__
   for more details and version requirements.
+
--- a/docs/source/getting_started/msticpyconfig.rst
+++ b/docs/source/getting_started/msticpyconfig.rst
@ -171,6 +171,42 @@ as geo-location providers.
          DBFolder: "~/.msticpy"
        Provider: "GeoLiteLookup"

+Autogen
+~~~~~~~
+This section allows you to specify `settings for the autogen module
+<https://microsoft.github.io/autogen/docs/topics/llm_configuration>`.
+`Autogen` is a top-level section in the configuration file. This
+section is used as the `llm_config` argument when instantiating
+an agent from the `autogen` package.
+
+When using Entra ID authentication with Autogen, you an specify the
+`azure_ad_token_provider` as the string `token_provider`. This will
+automatically use `DefaultAzureCredential` to obtain bearer tokens
+rather than using an API key. This is the preferred method of
+authenticating to Azure OpenAI endpoints because there are no
+secrets to manage.
+
+If using an API key, you can specify `api_key` as the name of
+an environment variable.
+
+Here is an example of authenticating to an Azure OpenAI endpoint using
+both Entra ID and an API key:
+
+.. code:: yaml
+
+  Autogen:
+    config_list:
+      - model: gpt-4o
+        api_type: azure
+        api_version: 2024-02-15-preview
+        base_url: https://my-aoai-endpoint.openai.azure.com/
+        azure_ad_token_provider: token_provider
+      - model: gpt-4o
+        api_type: azure
+        api_version: 2024-02-15-preview
+        base_url: https://my-aoai-endpoint.openai.azure.com/
+        api_key: SOME_ENVIRONMENT_VARIABLE
+
 DataProviders
 ~~~~~~~~~~~~~~
 This section is similar to the previous two sections, allowing you
--- a/msticpy/aiagents/init.py
+++ b/msticpy/aiagents/init.py
@ -0,0 +1,6 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for
+# license information.
+# --------------------------------------------------------------------------
+"""Modules related to AI agents used in MSTICpy."""
--- a/msticpy/aiagents/config_utils.py
+++ b/msticpy/aiagents/config_utils.py
@ -0,0 +1,116 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for
+# license information.
+# --------------------------------------------------------------------------
+"""Utility Modules related to AI agents used in MSTICpy."""
+
+import os
+from typing import Callable, Dict, List, Union
+
+from azure.identity import DefaultAzureCredential, get_bearer_token_provider
+
+from ..common.exceptions import MsticpyUserConfigError
+from ..common.pkg_config import get_config
+
+
+ConfigItem = Dict[str, Union[str, Callable]]
+ConfigList = List[ConfigItem]
+Config = Dict[str, Union[str, float, ConfigList]]
+
+token_provider = get_bearer_token_provider(
+    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+)
+
+
+def inject_token_provider_callable(
+    config: ConfigItem,
+) -> ConfigItem:
+    """Replace autogen configuration `azure_ad_token_provider` with a token provider callable.
+
+    Parameters
+    ----------
+    config : ConfigItem
+        Autogen LLM configuration.
+
+    Returns
+    -------
+    ConfigItem
+        Autogen LLM configuration with the token provider callable.
+    """
+    if "azure_ad_token_provider" in config:
+        config["azure_ad_token_provider"] = token_provider
+
+    return config
+
+
+def inject_environment_variable(config: ConfigItem) -> ConfigItem:
+    """Replace autogen configuration `api_key` with the value of an environment variable.
+
+    Parameters
+    ----------
+    config : ConfigItem
+        Autogen LLM configuration.
+
+    Returns
+    -------
+    ConfigItem
+        Autogen LLM configuration with the environment variable value.
+    """
+    if "api_key" in config:
+        api_key = os.environ.get(str(config["api_key"]), None)
+        if not api_key:
+            raise MsticpyUserConfigError(
+                f"Environment variable {config['api_key']} specified, but not found!"
+            )
+        config["api_key"] = api_key
+
+    return config
+
+
+def get_autogen_config_from_msticpyconfig() -> Config:
+    """Get Autogen configuration from msticpyconfig.yaml.
+
+    See `https://microsoft.github.io/autogen/docs/topics/llm_configuration`
+    for more information on the structure of the LLM configuration object.
+
+    Please note that a configuration list is required, rather than a single configuration.
+
+    Returns
+    -------
+    Config
+        Autogen configuration.
+
+    Raises
+    ------
+    MsticpyUserConfigError
+        Autogen settings not found in msticpyconfig.yaml configuration
+    MsticpyUserConfigError
+        Autogen.config_list setting not found in msticpyconfig.yaml configuration
+    """
+    autogen_config = get_config("Autogen", None)
+    if not autogen_config:
+        raise MsticpyUserConfigError(
+            "Autogen settings not found in msticpyconfig.yaml configuration!"
+        )
+
+    if "config_list" not in autogen_config:
+        raise MsticpyUserConfigError(
+            "Autogen.config_list setting not found in msticpyconfig.yaml configuration!"
+        )
+
+    injectors = [
+        inject_token_provider_callable,
+        inject_environment_variable,
+    ]
+
+    config_list = []
+    for config in autogen_config["config_list"]:
+        for injector in injectors:
+            config = injector(config)
+        config_list.append(config)
+
+    return {
+        **autogen_config,
+        "config_list": config_list,
+    }
--- a/msticpy/aiagents/mp_docs_rag_magic.py
+++ b/msticpy/aiagents/mp_docs_rag_magic.py
@ -0,0 +1,69 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for
+# license information.
+# --------------------------------------------------------------------------
+"""Integrates MSTICpy with IPython custom magic commands for retrieval-augmented generation."""
+
+import io
+from contextlib import redirect_stdout
+
+from IPython.core.magic import Magics, cell_magic, magics_class
+from IPython.display import display_markdown
+
+from .rag_agents import (
+    ask_question,
+    get_retrieval_assistant_agent,
+    get_retrieval_user_proxy_agent,
+)
+
+
+@magics_class
+class DocsRagMagic(Magics):
+    """Implement a class to provide RAG Magic functionalities for MSTICpy."""
+
+    def __init__(self, shell):
+        super().__init__(shell)
+        self.assistant_agent = get_retrieval_assistant_agent()
+        self.user_proxy_agent = get_retrieval_user_proxy_agent()
+
+    @cell_magic
+    def ask(self, _, cell: str):
+        """
+        Enable the user to ask a question to the RAG agent using a cell magic function.
+
+        It calls the ask_magic method with the cell content as the question.
+
+        Parameters
+        ----------
+        cell : str
+            The content of the cell. This is used as the question to ask the RAG agent.
+
+        Example Usage
+        -------------
+        To ask a question to the RAG agent, use the cell magic
+        command followed by the question in the cell.
+        For example:
+
+        %%ask
+        Which msticpy module contains the code related to visualizing network graphs?
+        """
+        question = cell.strip()
+        output = io.StringIO()
+        with redirect_stdout(output):
+            response = ask_question(
+                self.assistant_agent,
+                self.user_proxy_agent,
+                question=question,
+            )
+
+        answer = response.summary
+
+        display_markdown(f"\n**Question**: {question}", raw=True)
+        display_markdown(f"\n**Answer**: {answer}", raw=True)
+
+
+# Register the magic class with IPython
+def load_ipython_extension(ipython):
+    """Register the magic class with IPython."""
+    ipython.register_magics(DocsRagMagic)
--- a/msticpy/aiagents/rag_agents.py
+++ b/msticpy/aiagents/rag_agents.py
@ -0,0 +1,157 @@
+# -------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for
+# license information.
+# --------------------------------------------------------------------------
+"""
+Module for MSTICpy documentation utilities and retrieval agent configuration.
+
+Includes functions to find documentation files and to set up retrieval
+agents that assist security analysts by answering questions based on MSTICpy documentation.
+"""
+
+import sys
+
+from pathlib import Path
+from typing import List, Optional
+
+from autogen.agentchat.chat import ChatResult
+from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
+from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
+
+from .._version import VERSION
+from ..common.exceptions import MsticpyUserConfigError
+from .config_utils import get_autogen_config_from_msticpyconfig
+
+if sys.version_info < (3, 9):
+    import importlib_resources as pkg_resources
+else:
+    import importlib.resources as pkg_resources
+
+
+def find_rst_files() -> List[str]:
+    """
+    Find all .rst files in the docs/source directory of 'msticpy' package.
+
+    Returns
+    -------
+    list of str
+        List of paths to .rst files in the docs/source directory.
+    """
+    # Get the path to the docs/source directory of the package
+    docs_path = Path(pkg_resources.files("msticpy")).parent / "docs" / "source"
+
+    # Find all .rst files in the docs/source directory
+    rst_files = list(str(fp) for fp in docs_path.rglob("*.rst"))
+
+    return rst_files
+
+
+def get_retrieval_assistant_agent(system_message: str = "") -> RetrieveAssistantAgent:
+    """
+    Create and return a RetrieveAssistantAgent.
+
+    Parameters
+    ----------
+    system_message : str, optional
+        Custom system message for the assistant.
+
+    Returns
+    -------
+    RetrieveAssistantAgent
+        Configured RetrieveAssistantAgent instance.
+    """
+    if not system_message:
+        system_message = (
+            "You are a helpful assistant to security analysts using MSTICpy."
+        )
+    return RetrieveAssistantAgent(
+        name="assistant",
+        system_message=system_message,
+        llm_config=get_autogen_config_from_msticpyconfig(),
+    )
+
+
+def get_retrieval_user_proxy_agent(
+    customized_prompt: Optional[str] = None,
+) -> RetrieveUserProxyAgent:
+    """
+    Create and return a RetrieveUserProxyAgent.
+
+    Parameters
+    ----------
+    customized_prompt : Optional[str], optional
+        Custom prompt for the assistant agent, by default None.
+
+    Returns
+    -------
+    RetrieveUserProxyAgent
+        Configured RetrieveUserProxyAgent instance.
+
+    Raises
+    ------
+    MsticpyUserConfigError
+        Autogen settings not found in msticpyconfig.yaml configuration
+    """
+    rst_files = find_rst_files()
+    autogen_config = get_autogen_config_from_msticpyconfig()
+
+    default_model = None
+    if "config_list" in autogen_config and isinstance(
+        autogen_config["config_list"], list
+    ):
+        if autogen_config["config_list"]:
+            default_config = autogen_config["config_list"][0]
+            if "model" in default_config:
+                default_model = default_config["model"]
+
+    if not default_model:
+        raise MsticpyUserConfigError(
+            "Could not find a valid default Autogen model in msticpyconfig.yaml configuration!"
+        )
+
+    return RetrieveUserProxyAgent(
+        name="ragproxyagent",
+        human_input_mode="NEVER",
+        max_consecutive_auto_reply=1,
+        is_termination_msg=lambda x: True,
+        retrieve_config={
+            "task": "default",
+            "docs_path": rst_files,
+            "chunk_token_size": 2000,
+            "customized_prompt": customized_prompt,
+            "model": default_model,
+            "vector_db": "chroma",
+            "collection_name": f"MSTICpy_Docs_{VERSION}",
+            "get_or_create": True,
+        },
+        code_execution_config=False,
+    )
+
+
+def ask_question(
+    assistant_agent: RetrieveAssistantAgent,
+    user_proxy_agent: RetrieveUserProxyAgent,
+    question: str,
+) -> ChatResult:
+    """
+    Ask a question using the assistant and user proxy agents.
+
+    Parameters
+    ----------
+    assistant_agent : RetrieveAssistantAgent
+        The assistant agent to use.
+    user_proxy_agent : RetrieveUserProxyAgent
+        The user proxy agent to use.
+    question : str
+        The question to ask.
+
+    Returns
+    -------
+    ChatResult
+        The result of the chat interaction.
+    """
+    assistant_agent.reset()
+    return user_proxy_agent.initiate_chat(
+        assistant_agent, message=user_proxy_agent.message_generator, problem=question
+    )
--- a/requirements-all.txt
+++ b/requirements-all.txt
@ -23,6 +23,7 @@ folium>=0.9.0
 geoip2>=2.9.0
 httpx>=0.23.0, <1.0.0
 html5lib
+importlib-resources >= 6.4.0; python_version <= "3.8"
 ipython >= 7.1.1; python_version < "3.8"
 ipython >= 7.23.1; python_version >= "3.8"
 ipywidgets>=7.4.2, <9.0.0
@ -43,6 +44,7 @@ packaging>=24.0
 pandas>=1.4.0, <3.0.0
 panel>=0.14.4
 passivetotal>=2.5.3
+pyautogen[retrievechat]>=0.2.28
 pydantic>=1.8.0, <3.0.0
 pygments>=2.0.0
 pyjwt>=2.3.0
--- a/requirements.txt
+++ b/requirements.txt
@ -16,6 +16,7 @@ folium>=0.9.0
 geoip2>=2.9.0
 httpx>=0.23.0, <1.0.0
 html5lib
+importlib-resources >= 6.4.0; python_version <= "3.8"
 ipython >= 7.1.1; python_version < "3.8"
 ipython >= 7.23.1; python_version >= "3.8"
 ipywidgets>=7.4.2, <9.0.0
--- a/setup.py
+++ b/setup.py
@ -53,6 +53,7 @@ EXTRAS = {
    "sql2kql": ["mo-sql-parsing>=8, <9.0.0"],
    "riskiq": ["passivetotal>=2.5.3"],
    "panel": ["panel>=0.14.4"],
+    "aiagents": ["pyautogen[retrievechat]>=0.2.28"],
 }
 extras_all = [
    extra for name, extras in EXTRAS.items() for extra in extras if name != "dev"
@ -73,5 +74,8 @@ EXTRAS["sentinel"] = sorted(_combine_extras(["azure", "kql", "keyvault"]))

 if __name__ == "__main__":
    setuptools.setup(
-        install_requires=INSTALL_REQUIRES, extras_require=EXTRAS, version=__version__
+        install_requires=INSTALL_REQUIRES,
+        extras_require=EXTRAS,
+        version=__version__,
+        package_data={"docs": ["msticpy/docs/source/**/*.rst"]},
    )
--- a/tests/aiagents/test_rag_agents.py
+++ b/tests/aiagents/test_rag_agents.py
@ -0,0 +1,18 @@
+from pathlib import Path
+
+from msticpy.aiagents.rag_agents import find_rst_files
+
+
+def test_find_rst_files_not_empty():
+    rst_files = find_rst_files()
+
+    # Assert that the list of .rst files is not empty
+    assert rst_files, "The list of .rst files is empty."
+
+
+def test_find_specific_rst_file():
+    # Optionally, we can also check for specific .rst files
+    rst_files = find_rst_files()
+
+    rst_file_names = [Path(file).name for file in rst_files]
+    assert "DataAcquisition.rst" in rst_file_names, "The specific file is not found."
--- a/tests/test_pkg_imports.py
+++ b/tests/test_pkg_imports.py
@ -27,6 +27,8 @@ EXTRAS_EXCEPTIONS = {
    "msticnb",
    "pygeohash",
    "pyperclip",
+    "autogen",
+    "importlib_resources",
 }
 CONDA_PKG_EXCEPTIONS = {
    "vt-py",
@ -69,7 +71,12 @@ def test_missing_pkgs_req():
    missing_reqs = {
        req.strip() for reqs in mod_imports.values() for req in reqs.missing_reqs
    }
-    missing_reqs = missing_reqs - EXTRAS_EXCEPTIONS
+    # Remove any missing modules that part of an extra
+    missing_reqs = {
+        req
+        for req in missing_reqs
+        if not any(req.startswith(p) for p in EXTRAS_EXCEPTIONS)
+    }
    if missing_reqs:
        print(
            "Missing packages:\n",
@ -138,10 +145,10 @@ def test_conda_reqs(extras_from_setup):
            conda_reqs_pip_dict.pop(pkg_name)

    if conda_reqs_dict:
-        print("Extra items found in conda-reqs.txt", conda_reqs_pip_dict)
+        print("Extra items found in conda-reqs.txt", conda_reqs_dict)
    check.is_false(conda_reqs_dict, "no extra items in conda-reqs.txt")
    if conda_reqs_pip_dict:
-        print("Extra items found in conda-reqs-pip.txt", conda_reqs_dict)
+        print("Extra items found in conda-reqs-pip.txt", conda_reqs_pip_dict)
    check.is_false(conda_reqs_pip_dict, "no extra items in conda-reqs-pip.txt")


@ -153,4 +160,8 @@ def _get_reqs_from_file(reqs_file):
        for req in reqs_lines
        if req.strip() and not req.strip().startswith("#")
    ]
-    return {req.name.casefold(): req.specifier for req in reqs}
+    return {
+        req.name.casefold(): req.specifier
+        for req in reqs
+        if req.marker is None or req.marker.evaluate()
+    }