Add Autogen and RAG Agent to MSTICpy (#793)

* Add dependencies and ensure that .rst docs are included in build packages

* Establish dev environment and pass pre-commit checks

* Add unit test for finding .rst files, delete old Jupyter Notebook

* Add RAG Agent summary to gettingstarted, update %%ask docstrings

- Moved example usage of %%ask to the docstring for the cell magic function

- Added a section about the RAG agent and its usage to the Package Summary

- Updated example notebook RagAgent.ipynb to remove unnecessary imports

* Ensure that RST files are returned as strings, not Pathlib.Path

* Read Autogen config from msticpyconfig.yaml

* Read Autogen config from msticpyconfig.yaml

* Inject API_KEY envar into config

* Silence group chat

* Reformat RAG Agent code snippets in PackageSummary.rst

* Reenable pre-commit check_reqs

* Fixes for Sync

* Abstract logic from magic class

* PR changes, refactoring magic class

* Formatting fixes, remove unused arguments

* minor fixes

* Fix some pipeline errors

* Pipeline fixes

* Fix test for pkg import failure for autogen

* Update test module names to align with recent changes

* Remove explicit TypeAlias to support Py3.8

* py3.8 issues with union operator backport from future

* Fix imports and type annotations for pipeline

* Fix mypy type issues

* Add importlib-resources to requirements for py3.8

* fix isort

* Add importlib-resources to requirements-all.txt

* Add importlib_resources to EXTRAS

* Add importlib-resources to conda reqs

* Update pkg_import test to handle required python versions

* Fix python version check

* API docs generated for aiagents
Also fixing a couple of errors in PackageSummary.rst

* Remove extra test file

---------

Co-authored-by: Eileen Garcia <t-egarcia@microsoft.com>
Co-authored-by: Ryan Cobb <ryan@detect.dev>
Co-authored-by: ryan-detect-dot-dev <106877876+ryan-detect-dot-dev@users.noreply.github.com>
Co-authored-by: ianhelle <ianhelle@microsoft.com>
This commit is contained in:
Eileen Garcia 2024-08-07 12:15:34 -07:00 коммит произвёл GitHub
Родитель 8f9bf3ba1c
Коммит 032c1080cc
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
21 изменённых файлов: 1256 добавлений и 6 удалений

Просмотреть файл

@ -7,3 +7,5 @@ passivetotal>=2.5.3
sumologic-sdk>=0.1.11
splunk-sdk>=1.6.0,!=2.0.0
packaging>=24.0
pyautogen[retrievechat]>=0.2.28
importlib-resources >= 6.4.0; python_version <= "3.8"

Просмотреть файл

@ -0,0 +1,716 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example Notebook for RAG (Retrieval-Augmented Generation) Agent Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query the RAG agent using the cell magic `%%ask` command"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# %load_ext msticpy.aiagents.mp_docs_rag_magic\n",
"# Or use:\n",
"%reload_ext msticpy.aiagents.mp_docs_rag_magic"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-07-30 15:48:19,414 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - \u001b[32mUse the existing collection `MSTICpy_Docs_2.12.0`.\u001b[0m\n",
"2024-07-30 15:48:27,518 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 384 chunks.\u001b[0m\n"
]
},
{
"data": {
"text/markdown": [
"\n",
"**Question**: What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To connect to the Microsoft Sentinel Query Provider, you need the following three things:\n",
"\n",
"1. A `QueryProvider` instance.\n",
"2. The data environment string (\"MSSentinel\" for Microsoft Sentinel).\n",
"3. A connection string or authentication parameters.\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProviders.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I connect to the M365 Defender query provider?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To connect to the M365 Defender query provider, you need to follow these steps:\n",
"\n",
"1. Ensure your connection details are specified in the `msticpyconfig.yaml` file.\n",
"\n",
"2. Create a `QueryProvider` instance for M365 Defender.\n",
"\n",
"3. Call the `connect()` method on the instance.\n",
"\n",
"Here's an example:\n",
"\n",
"```python\n",
"from msticpy.data import QueryProvider\n",
"\n",
"# Create a QueryProvider instance\n",
"mdatp_prov = QueryProvider(\"M365D\")\n",
"\n",
"# Connect to the M365 Defender instance using the configured details\n",
"mdatp_prov.connect()\n",
"```\n",
"\n",
"If you have multiple instances configured, specify the instance name when calling `connect()`:\n",
"\n",
"```python\n",
"mdatp_prov.connect(instance=\"Tenant2\")\n",
"```\n",
"\n",
"If you prefer to pass connection parameters directly, use keyword arguments:\n",
"\n",
"```python\n",
"# Collect credentials\n",
"ten_id = input('Tenant ID')\n",
"client_id = input('Client ID')\n",
"client_secret = input('Client Secret')\n",
"\n",
"# Create a QueryProvider instance\n",
"mdatp_prov = QueryProvider('M365D')\n",
"\n",
"# Connect using collected credentials\n",
"mdatp_prov.connect(tenant_id=ten_id, client_id=client_id, client_secret=client_secret)\n",
"```\n",
"\n",
"Alternatively, you can use a connection string:\n",
"\n",
"```python\n",
"# Define a connection string\n",
"conn_str = (\n",
" \"tenant_id='243bb6be-4136-4b64-9055-fb661594199a'; \"\n",
" \"client_id='a5b24e23-a96a-4472-b729-9e5310c83e20'; \"\n",
" \"client_secret='[PLACEHOLDER]'\"\n",
")\n",
"\n",
"# Create a QueryProvider instance\n",
"mdatp_prov = QueryProvider('M365D')\n",
"\n",
"# Connect using the connection string\n",
"mdatp_prov.connect(conn_str)\n",
"```\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask\n",
"How do I connect to the M365 Defender query provider?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To add Azure Resource Graph to your `msticpyconfig.yaml` configuration, include the following under the `Azure` section:\n",
"\n",
"```yaml\n",
"Azure:\n",
" auth_methods:\n",
" - cli\n",
" - interactive\n",
" cloud: global\n",
"```\n",
"\n",
"For more information on configuring `msticpyconfig.yaml`, refer to the MSTICPy documentation.\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\ResourceGraphDriver.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask\n",
"What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### A response of `UPDATE_CONTEXT` indicates that the agents are unable to answer the query with the information retrieved by the RAG agent."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: Does the Splunk query provider support device code authentication?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: UPDATE CONTEXT"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask\n",
"Does the Splunk query provider support device code authentication?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How can I plot IP addresses in this dataframe on a map?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To plot IP addresses in a DataFrame on a map using MSTICpy's FoliumMap, you can use the `mp_plot.folium_map` pandas accessor. Here's an example:\n",
"\n",
"```python\n",
"# Plotting IP addresses using the mp_plot.folium_map accessor\n",
"geo_loc_df.mp_plot.folium_map(ip_column=\"IPAddress\")\n",
"```\n",
"\n",
"This will display an interactive map with markers based on the IP addresses in the \"IPAddress\" column of your DataFrame.\n",
"\n",
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\visualization\\\\FoliumMap.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How can I plot IP addresses in this dataframe on a map?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I create a new custom data provider with msticpy?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To create a new custom data provider with MSTICpy, follow these main steps:\n",
"\n",
"1. **Write the driver class:** Derive it from `DriverBase` and implement the methods `__init__`, `connect`, `query`, and optionally `query_with_results`.\n",
"2. **Customize the driver (optional):** Expose attributes via `QueryProvider`, and implement custom parameter formatting and query parameter substitution if needed.\n",
"3. **Register the driver:** Update the `DataEnvironment` enum and add an entry to the driver dynamic load table.\n",
"4. **Add queries:** Create a folder named after your `DataEnvironment` and add your query files there.\n",
"5. **Add settings definition:** Define settings in a YAML configuration file.\n",
"6. **Create documentation:** Document the configuration and use of the data provider.\n",
"7. **Create unit tests:** Add unit tests using mocks to simulate service responses.\n",
"\n",
"For detailed guidance on these steps, refer to the provided MSTICpy documentation related to data providers.\n",
"\n",
"Sources: WritingDataProviders.rst, PluginFramework.rst, ExtendingMsticpy.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How do I create a new custom data provider with msticpy?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I list which TI providers are currently enabled?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: ### Step 1: Intent\n",
"The user's intent is to get help with **question answering**.\n",
"\n",
"### Step 2: Answer\n",
"To list which Threat Intelligence (TI) providers are currently enabled in MSTICpy, you can inspect the configuration typically found in the `msticpyconfig.yaml` file under the `TIProviders` section. This configuration file determines which providers are set up and whether they are marked as primary/secondary.\n",
"\n",
"Sources: `C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\WritingTIAndContextProviders.rst`"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How do I list which TI providers are currently enabled?"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I lookup threat intelligence for multiple IP addresses at once?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: Step 1: User's intent is to generate code for performing threat intelligence lookups for multiple IP addresses at once.\n",
"\n",
"Step 2:\n",
"```python\n",
"from msticpy.context.ip_utils import ip_whois\n",
"\n",
"# List of IP addresses to lookup\n",
"ip_list = [\"123.1.2.3\", \"124.5.6.7\"]\n",
"\n",
"# Performing Whois lookup for multiple IP addresses\n",
"whois_data = ip_whois(ip_list)\n",
"print(whois_data)\n",
"```\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\IPWhois.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How do I lookup threat intelligence for multiple IP addresses at once?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I use pivot functions?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To use pivot functions in MSTICpy, you have two primary options: creating persistent pivot function definitions in YAML files or adding ad hoc pivot functions directly in code. Here's a brief overview of both methods:\n",
"\n",
"**1. Persistent Pivot Function Definitions**\n",
"\n",
"- Define your pivot function properties in a YAML file with a top-level element `pivot_providers`.\n",
"- Example YAML definition:\n",
"\n",
"```yaml\n",
"pivot_providers:\n",
" who_is:\n",
" src_module: msticpy.context.ip_utils\n",
" src_func_name: get_whois_df\n",
" func_new_name: whois\n",
" input_type: dataframe\n",
" entity_map:\n",
" IpAddress: Address\n",
" func_df_param_name: data\n",
" func_df_col_param_name: ip_column\n",
" func_out_column_name: query\n",
" func_static_params:\n",
" all_columns: True\n",
" show_progress: False\n",
" func_input_value_arg: ip_address\n",
"```\n",
"\n",
"- Load and register the definition using:\n",
"\n",
"```python\n",
"from msticpy.init.pivot_core.pivot import Pivot\n",
"Pivot.register_pivot_providers(pivot_reg_path=path_to_your_yaml, namespace=globals(), def_container=\"my_container\", force_container=True)\n",
"```\n",
"\n",
"**2. Ad Hoc Pivot Functions in Code**\n",
"\n",
"- Add a function as a pivot using the `add_pivot_function` method:\n",
"\n",
"```python\n",
"def my_func(input: str):\n",
" return input.upper()\n",
"\n",
"Pivot.add_pivot_function(\n",
" func=my_func,\n",
" container=\"change_case\",\n",
" input_type=\"value\",\n",
" entity_map={\"Host\": \"HostName\"},\n",
" func_input_value_arg=\"input\",\n",
" func_new_name=\"upper_name\",\n",
")\n",
"```\n",
"\n",
"- Alternatively, use the `PivotRegistration` class:\n",
"\n",
"```python\n",
"from msticpy.init.pivot_core.pivot_register import PivotRegistration\n",
"\n",
"def my_func(input: str):\n",
" return input.upper()\n",
"\n",
"piv_reg = PivotRegistration(\n",
" input_type=\"value\",\n",
" entity_map={\"Host\": \"HostName\"},\n",
" func_input_value_arg=\"input\",\n",
" func_new_name=\"upper_name\"\n",
")\n",
"Pivot.add_pivot_function(my_func, piv_reg, container=\"change_case\")\n",
"```\n",
"\n",
"**Running Pivots in DataFrame Pipelines:**\n",
"\n",
"```python\n",
"(\n",
" my_df\n",
" .query(\"UserCount > 1\")\n",
" .mp_pivot.run(IpAddress.util.whois, column=\"Ioc\")\n",
" .drop_duplicates()\n",
")\n",
"```\n",
"\n",
"- Use `mp_pivot.run` to integrate pivot functions into DataFrame processing pipelines.\n",
"- Join input and output DataFrames with the `join` parameter in `mp_pivot.run`.\n",
"\n",
"**Debugging Tools:**\n",
"\n",
"- `mp_pivot.display` for intermediate results.\n",
"- `mp_pivot.tee` for creating snapshots.\n",
"- `mp_pivot.tee_exec` for executing intermediate operations (e.g., plotting).\n",
"\n",
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\data_analysis\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.init.pivot.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How do I use pivot functions?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: Which columns do I need in a dataframe to plot process trees?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To plot process trees, the required columns in a DataFrame are typically:\n",
"\n",
"1. `ParentProcessName`\n",
"2. `Process`\n",
"\n",
"Additional attributes such as `SubjectUserName`, `SubjectDomainName`, `SubjectLogonId`, `NewProcessName`, `CommandLine`, and `TimeGenerated` can be used for more detailed visualization and analysis.\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\NetworkGraph.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"Which columns do I need in a dataframe to plot process trees?"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: What kind of visualizations does msticpy support?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: MSTICPy supports various visualizations including interactive timelines, process trees, multi-dimensional Morph Charts, data viewers, matrix plots, network plots, and several others listed under the `msticpy.vis` package.\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\index.rst, C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\MorphCharts.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"What kind of visualizations does msticpy support?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: How do I add a new query for Microsoft 365 Defender to msticpy?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: To add a new query for Microsoft 365 Defender (M365D) to MSTICPy, you should use the `QueryProvider` class. Here's a step-by-step guide on how to achieve it:\n",
"\n",
"1. **Initialize the `QueryProvider` for M365D**:\n",
" ```python\n",
" from msticpy.data import QueryProvider\n",
"\n",
" mdatp_prov = QueryProvider(\"M365D\")\n",
" ```\n",
"\n",
"2. **Connect to the M365 Defender API**:\n",
" ```python\n",
" mdatp_prov.connect()\n",
" ```\n",
"\n",
"3. **Add your new query**:\n",
" You can add new queries to the query store of `QueryProvider`. Heres an example of how to define and add a new query:\n",
" ```python\n",
" new_query = \"\"\"\n",
" DeviceEvents\n",
" | where ActionType == \"FileCreated\"\n",
" | limit 10\n",
" \"\"\"\n",
" mdatp_prov.add_query(\"GetRecentFileCreatedEvents\", new_query)\n",
" ```\n",
"\n",
"4. **Run the newly added query**:\n",
" ```python\n",
" results = mdatp_prov.exec_query(\"GetRecentFileCreatedEvents\")\n",
" print(results)\n",
" ```\n",
"\n",
"In summary, you need to instantiate a `QueryProvider` object for M365D, connect to the API, add the new query, and then execute the query.\n",
"\n",
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask \n",
"How do I add a new query for Microsoft 365 Defender to msticpy?"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"**Question**: Which msticpy module contains the code related to visualizing network graphs?"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/markdown": [
"\n",
"**Answer**: The MSTICpy module that contains the code related to visualizing network graphs is `msticpy.vis.network_plot`.\n",
"\n",
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.vis.network_plot.rst"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%ask\n",
"Which msticpy module contains the code related to visualizing network graphs?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "internshipenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Просмотреть файл

@ -0,0 +1,7 @@
msticpy.aiagents.config\_utils module
=====================================
.. automodule:: msticpy.aiagents.config_utils
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -0,0 +1,7 @@
msticpy.aiagents.mp\_docs\_rag\_magic module
============================================
.. automodule:: msticpy.aiagents.mp_docs_rag_magic
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -0,0 +1,7 @@
msticpy.aiagents.rag\_agents module
===================================
.. automodule:: msticpy.aiagents.rag_agents
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -0,0 +1,17 @@
msticpy.aiagents package
========================
.. automodule:: msticpy.aiagents
:members:
:undoc-members:
:show-inheritance:
Submodules
----------
.. toctree::
:maxdepth: 4
msticpy.aiagents.config_utils
msticpy.aiagents.mp_docs_rag_magic
msticpy.aiagents.rag_agents

Просмотреть файл

@ -0,0 +1,7 @@
msticpy.context.tiproviders.binaryedge module
=============================================
.. automodule:: msticpy.context.tiproviders.binaryedge
:members:
:undoc-members:
:show-inheritance:

Просмотреть файл

@ -15,6 +15,7 @@ Submodules
msticpy.context.tiproviders.abuseipdb
msticpy.context.tiproviders.alienvault_otx
msticpy.context.tiproviders.azure_sent_byoti
msticpy.context.tiproviders.binaryedge
msticpy.context.tiproviders.crowdsec
msticpy.context.tiproviders.greynoise
msticpy.context.tiproviders.ibm_xforce

Просмотреть файл

@ -12,6 +12,7 @@ Subpackages
.. toctree::
:maxdepth: 4
msticpy.aiagents
msticpy.analysis
msticpy.auth
msticpy.common

Просмотреть файл

@ -217,6 +217,7 @@ autoapi_dirs = ["../../msticpy"]
autodoc_mock_imports = [
"adal",
"autogen",
"azure",
"azure.common.exceptions",
"azure.core.exceptions",
@ -245,6 +246,7 @@ autodoc_mock_imports = [
"dns",
"folium",
"geoip2",
"importlib_resources",
"ipwhois",
"IPython",
"ipywidgets",

Просмотреть файл

@ -318,7 +318,7 @@ of known malicious commands as well as suspicious patterns of behaviour.
domain_utils
~~~~~~~~~~~~
:py:mod:`msticpy.transform.domain_utils`
:py:mod:`msticpy.context.domain_utils`
Module to support investigation of domain names and URLs with functions to
validate a domain name and screenshot a URL.
@ -334,6 +334,68 @@ without compromising privacy.
See :doc:`../data_acquisition/DataMasking`
Agentic Module
-----------------
aiagents Module: RAG Agent
~~~~~~~~~~~~~~~~~~~~~~~~~~
:py:mod:`msticpy.aiagents.mp_docs_rag_magic`
The **aiagents** module in MSTICpy introduces the **Retrieval-Augmented Generation (RAG) Agent**.
This agent is designed for Q&A about MSTICpy. It enhances InfoSec investigations and
threat hunting in Jupyter Notebooks by providing contextually relevant responses using large language
models and external knowledge retrieval from the MSTICpy documentation.
You can invoke the RAG agent in a Jupyter Notebook by adding the following:
.. code-block:: ipython
# Load the RAG cell magic
%load_ext msticpy.aiagents.mp_docs_rag_magic
.. code-block:: ipython
# The following command should be in a separate cell and be the first line of the cell
%%ask
What are the three things that I need to connect to Azure Query Provider?
You can also invoke the RAG Agent outside of a Jupyter Notebook as follows:
.. code-block:: python
# Import the necessary module
from .rag_agents import (
ask_question,
get_retrieval_assistant_agent,
get_retrieval_user_proxy_agent,
)
# Initialize the agents
assistant_agent = get_retrieval_assistant_agent()
user_proxy_agent = get_retrieval_user_proxy_agent()
# Define your question
question = "Your MSTICpy-related question here"
# Query the agents
response = ask_question(
assistant_agent,
user_proxy_agent,
question=question,
)
print(response.summary)
See also:
- :doc:`Autogen LLM Configurations in msticpyconfig <../getting_started/msticpyconfig>`
- Sample notebook - `RAG Agent Example Notebook <https://github.com/microsoft/msticpy/blob/master/docs/notebooks/RagAgent.ipynb>`__
Supported Platforms and Packages
--------------------------------
@ -341,3 +403,4 @@ Supported Platforms and Packages
- Requires Python 3.8 or later
- See `requirements.txt <https://github.com/microsoft/msticpy/blob/master/requirements.txt>`__
for more details and version requirements.

Просмотреть файл

@ -171,6 +171,42 @@ as geo-location providers.
DBFolder: "~/.msticpy"
Provider: "GeoLiteLookup"
Autogen
~~~~~~~
This section allows you to specify `settings for the autogen module
<https://microsoft.github.io/autogen/docs/topics/llm_configuration>`.
`Autogen` is a top-level section in the configuration file. This
section is used as the `llm_config` argument when instantiating
an agent from the `autogen` package.
When using Entra ID authentication with Autogen, you an specify the
`azure_ad_token_provider` as the string `token_provider`. This will
automatically use `DefaultAzureCredential` to obtain bearer tokens
rather than using an API key. This is the preferred method of
authenticating to Azure OpenAI endpoints because there are no
secrets to manage.
If using an API key, you can specify `api_key` as the name of
an environment variable.
Here is an example of authenticating to an Azure OpenAI endpoint using
both Entra ID and an API key:
.. code:: yaml
Autogen:
config_list:
- model: gpt-4o
api_type: azure
api_version: 2024-02-15-preview
base_url: https://my-aoai-endpoint.openai.azure.com/
azure_ad_token_provider: token_provider
- model: gpt-4o
api_type: azure
api_version: 2024-02-15-preview
base_url: https://my-aoai-endpoint.openai.azure.com/
api_key: SOME_ENVIRONMENT_VARIABLE
DataProviders
~~~~~~~~~~~~~~
This section is similar to the previous two sections, allowing you

Просмотреть файл

@ -0,0 +1,6 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------
"""Modules related to AI agents used in MSTICpy."""

Просмотреть файл

@ -0,0 +1,116 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------
"""Utility Modules related to AI agents used in MSTICpy."""
import os
from typing import Callable, Dict, List, Union
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from ..common.exceptions import MsticpyUserConfigError
from ..common.pkg_config import get_config
ConfigItem = Dict[str, Union[str, Callable]]
ConfigList = List[ConfigItem]
Config = Dict[str, Union[str, float, ConfigList]]
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
def inject_token_provider_callable(
config: ConfigItem,
) -> ConfigItem:
"""Replace autogen configuration `azure_ad_token_provider` with a token provider callable.
Parameters
----------
config : ConfigItem
Autogen LLM configuration.
Returns
-------
ConfigItem
Autogen LLM configuration with the token provider callable.
"""
if "azure_ad_token_provider" in config:
config["azure_ad_token_provider"] = token_provider
return config
def inject_environment_variable(config: ConfigItem) -> ConfigItem:
"""Replace autogen configuration `api_key` with the value of an environment variable.
Parameters
----------
config : ConfigItem
Autogen LLM configuration.
Returns
-------
ConfigItem
Autogen LLM configuration with the environment variable value.
"""
if "api_key" in config:
api_key = os.environ.get(str(config["api_key"]), None)
if not api_key:
raise MsticpyUserConfigError(
f"Environment variable {config['api_key']} specified, but not found!"
)
config["api_key"] = api_key
return config
def get_autogen_config_from_msticpyconfig() -> Config:
"""Get Autogen configuration from msticpyconfig.yaml.
See `https://microsoft.github.io/autogen/docs/topics/llm_configuration`
for more information on the structure of the LLM configuration object.
Please note that a configuration list is required, rather than a single configuration.
Returns
-------
Config
Autogen configuration.
Raises
------
MsticpyUserConfigError
Autogen settings not found in msticpyconfig.yaml configuration
MsticpyUserConfigError
Autogen.config_list setting not found in msticpyconfig.yaml configuration
"""
autogen_config = get_config("Autogen", None)
if not autogen_config:
raise MsticpyUserConfigError(
"Autogen settings not found in msticpyconfig.yaml configuration!"
)
if "config_list" not in autogen_config:
raise MsticpyUserConfigError(
"Autogen.config_list setting not found in msticpyconfig.yaml configuration!"
)
injectors = [
inject_token_provider_callable,
inject_environment_variable,
]
config_list = []
for config in autogen_config["config_list"]:
for injector in injectors:
config = injector(config)
config_list.append(config)
return {
**autogen_config,
"config_list": config_list,
}

Просмотреть файл

@ -0,0 +1,69 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------
"""Integrates MSTICpy with IPython custom magic commands for retrieval-augmented generation."""
import io
from contextlib import redirect_stdout
from IPython.core.magic import Magics, cell_magic, magics_class
from IPython.display import display_markdown
from .rag_agents import (
ask_question,
get_retrieval_assistant_agent,
get_retrieval_user_proxy_agent,
)
@magics_class
class DocsRagMagic(Magics):
"""Implement a class to provide RAG Magic functionalities for MSTICpy."""
def __init__(self, shell):
super().__init__(shell)
self.assistant_agent = get_retrieval_assistant_agent()
self.user_proxy_agent = get_retrieval_user_proxy_agent()
@cell_magic
def ask(self, _, cell: str):
"""
Enable the user to ask a question to the RAG agent using a cell magic function.
It calls the ask_magic method with the cell content as the question.
Parameters
----------
cell : str
The content of the cell. This is used as the question to ask the RAG agent.
Example Usage
-------------
To ask a question to the RAG agent, use the cell magic
command followed by the question in the cell.
For example:
%%ask
Which msticpy module contains the code related to visualizing network graphs?
"""
question = cell.strip()
output = io.StringIO()
with redirect_stdout(output):
response = ask_question(
self.assistant_agent,
self.user_proxy_agent,
question=question,
)
answer = response.summary
display_markdown(f"\n**Question**: {question}", raw=True)
display_markdown(f"\n**Answer**: {answer}", raw=True)
# Register the magic class with IPython
def load_ipython_extension(ipython):
"""Register the magic class with IPython."""
ipython.register_magics(DocsRagMagic)

Просмотреть файл

@ -0,0 +1,157 @@
# -------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for
# license information.
# --------------------------------------------------------------------------
"""
Module for MSTICpy documentation utilities and retrieval agent configuration.
Includes functions to find documentation files and to set up retrieval
agents that assist security analysts by answering questions based on MSTICpy documentation.
"""
import sys
from pathlib import Path
from typing import List, Optional
from autogen.agentchat.chat import ChatResult
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
from .._version import VERSION
from ..common.exceptions import MsticpyUserConfigError
from .config_utils import get_autogen_config_from_msticpyconfig
if sys.version_info < (3, 9):
import importlib_resources as pkg_resources
else:
import importlib.resources as pkg_resources
def find_rst_files() -> List[str]:
"""
Find all .rst files in the docs/source directory of 'msticpy' package.
Returns
-------
list of str
List of paths to .rst files in the docs/source directory.
"""
# Get the path to the docs/source directory of the package
docs_path = Path(pkg_resources.files("msticpy")).parent / "docs" / "source"
# Find all .rst files in the docs/source directory
rst_files = list(str(fp) for fp in docs_path.rglob("*.rst"))
return rst_files
def get_retrieval_assistant_agent(system_message: str = "") -> RetrieveAssistantAgent:
"""
Create and return a RetrieveAssistantAgent.
Parameters
----------
system_message : str, optional
Custom system message for the assistant.
Returns
-------
RetrieveAssistantAgent
Configured RetrieveAssistantAgent instance.
"""
if not system_message:
system_message = (
"You are a helpful assistant to security analysts using MSTICpy."
)
return RetrieveAssistantAgent(
name="assistant",
system_message=system_message,
llm_config=get_autogen_config_from_msticpyconfig(),
)
def get_retrieval_user_proxy_agent(
customized_prompt: Optional[str] = None,
) -> RetrieveUserProxyAgent:
"""
Create and return a RetrieveUserProxyAgent.
Parameters
----------
customized_prompt : Optional[str], optional
Custom prompt for the assistant agent, by default None.
Returns
-------
RetrieveUserProxyAgent
Configured RetrieveUserProxyAgent instance.
Raises
------
MsticpyUserConfigError
Autogen settings not found in msticpyconfig.yaml configuration
"""
rst_files = find_rst_files()
autogen_config = get_autogen_config_from_msticpyconfig()
default_model = None
if "config_list" in autogen_config and isinstance(
autogen_config["config_list"], list
):
if autogen_config["config_list"]:
default_config = autogen_config["config_list"][0]
if "model" in default_config:
default_model = default_config["model"]
if not default_model:
raise MsticpyUserConfigError(
"Could not find a valid default Autogen model in msticpyconfig.yaml configuration!"
)
return RetrieveUserProxyAgent(
name="ragproxyagent",
human_input_mode="NEVER",
max_consecutive_auto_reply=1,
is_termination_msg=lambda x: True,
retrieve_config={
"task": "default",
"docs_path": rst_files,
"chunk_token_size": 2000,
"customized_prompt": customized_prompt,
"model": default_model,
"vector_db": "chroma",
"collection_name": f"MSTICpy_Docs_{VERSION}",
"get_or_create": True,
},
code_execution_config=False,
)
def ask_question(
assistant_agent: RetrieveAssistantAgent,
user_proxy_agent: RetrieveUserProxyAgent,
question: str,
) -> ChatResult:
"""
Ask a question using the assistant and user proxy agents.
Parameters
----------
assistant_agent : RetrieveAssistantAgent
The assistant agent to use.
user_proxy_agent : RetrieveUserProxyAgent
The user proxy agent to use.
question : str
The question to ask.
Returns
-------
ChatResult
The result of the chat interaction.
"""
assistant_agent.reset()
return user_proxy_agent.initiate_chat(
assistant_agent, message=user_proxy_agent.message_generator, problem=question
)

Просмотреть файл

@ -23,6 +23,7 @@ folium>=0.9.0
geoip2>=2.9.0
httpx>=0.23.0, <1.0.0
html5lib
importlib-resources >= 6.4.0; python_version <= "3.8"
ipython >= 7.1.1; python_version < "3.8"
ipython >= 7.23.1; python_version >= "3.8"
ipywidgets>=7.4.2, <9.0.0
@ -43,6 +44,7 @@ packaging>=24.0
pandas>=1.4.0, <3.0.0
panel>=0.14.4
passivetotal>=2.5.3
pyautogen[retrievechat]>=0.2.28
pydantic>=1.8.0, <3.0.0
pygments>=2.0.0
pyjwt>=2.3.0

Просмотреть файл

@ -16,6 +16,7 @@ folium>=0.9.0
geoip2>=2.9.0
httpx>=0.23.0, <1.0.0
html5lib
importlib-resources >= 6.4.0; python_version <= "3.8"
ipython >= 7.1.1; python_version < "3.8"
ipython >= 7.23.1; python_version >= "3.8"
ipywidgets>=7.4.2, <9.0.0

Просмотреть файл

@ -53,6 +53,7 @@ EXTRAS = {
"sql2kql": ["mo-sql-parsing>=8, <9.0.0"],
"riskiq": ["passivetotal>=2.5.3"],
"panel": ["panel>=0.14.4"],
"aiagents": ["pyautogen[retrievechat]>=0.2.28"],
}
extras_all = [
extra for name, extras in EXTRAS.items() for extra in extras if name != "dev"
@ -73,5 +74,8 @@ EXTRAS["sentinel"] = sorted(_combine_extras(["azure", "kql", "keyvault"]))
if __name__ == "__main__":
setuptools.setup(
install_requires=INSTALL_REQUIRES, extras_require=EXTRAS, version=__version__
install_requires=INSTALL_REQUIRES,
extras_require=EXTRAS,
version=__version__,
package_data={"docs": ["msticpy/docs/source/**/*.rst"]},
)

Просмотреть файл

@ -0,0 +1,18 @@
from pathlib import Path
from msticpy.aiagents.rag_agents import find_rst_files
def test_find_rst_files_not_empty():
rst_files = find_rst_files()
# Assert that the list of .rst files is not empty
assert rst_files, "The list of .rst files is empty."
def test_find_specific_rst_file():
# Optionally, we can also check for specific .rst files
rst_files = find_rst_files()
rst_file_names = [Path(file).name for file in rst_files]
assert "DataAcquisition.rst" in rst_file_names, "The specific file is not found."

Просмотреть файл

@ -27,6 +27,8 @@ EXTRAS_EXCEPTIONS = {
"msticnb",
"pygeohash",
"pyperclip",
"autogen",
"importlib_resources",
}
CONDA_PKG_EXCEPTIONS = {
"vt-py",
@ -69,7 +71,12 @@ def test_missing_pkgs_req():
missing_reqs = {
req.strip() for reqs in mod_imports.values() for req in reqs.missing_reqs
}
missing_reqs = missing_reqs - EXTRAS_EXCEPTIONS
# Remove any missing modules that part of an extra
missing_reqs = {
req
for req in missing_reqs
if not any(req.startswith(p) for p in EXTRAS_EXCEPTIONS)
}
if missing_reqs:
print(
"Missing packages:\n",
@ -138,10 +145,10 @@ def test_conda_reqs(extras_from_setup):
conda_reqs_pip_dict.pop(pkg_name)
if conda_reqs_dict:
print("Extra items found in conda-reqs.txt", conda_reqs_pip_dict)
print("Extra items found in conda-reqs.txt", conda_reqs_dict)
check.is_false(conda_reqs_dict, "no extra items in conda-reqs.txt")
if conda_reqs_pip_dict:
print("Extra items found in conda-reqs-pip.txt", conda_reqs_dict)
print("Extra items found in conda-reqs-pip.txt", conda_reqs_pip_dict)
check.is_false(conda_reqs_pip_dict, "no extra items in conda-reqs-pip.txt")
@ -153,4 +160,8 @@ def _get_reqs_from_file(reqs_file):
for req in reqs_lines
if req.strip() and not req.strip().startswith("#")
]
return {req.name.casefold(): req.specifier for req in reqs}
return {
req.name.casefold(): req.specifier
for req in reqs
if req.marker is None or req.marker.evaluate()
}