зеркало из https://github.com/microsoft/msticpy.git
Add Autogen and RAG Agent to MSTICpy (#793)
* Add dependencies and ensure that .rst docs are included in build packages * Establish dev environment and pass pre-commit checks * Add unit test for finding .rst files, delete old Jupyter Notebook * Add RAG Agent summary to gettingstarted, update %%ask docstrings - Moved example usage of %%ask to the docstring for the cell magic function - Added a section about the RAG agent and its usage to the Package Summary - Updated example notebook RagAgent.ipynb to remove unnecessary imports * Ensure that RST files are returned as strings, not Pathlib.Path * Read Autogen config from msticpyconfig.yaml * Read Autogen config from msticpyconfig.yaml * Inject API_KEY envar into config * Silence group chat * Reformat RAG Agent code snippets in PackageSummary.rst * Reenable pre-commit check_reqs * Fixes for Sync * Abstract logic from magic class * PR changes, refactoring magic class * Formatting fixes, remove unused arguments * minor fixes * Fix some pipeline errors * Pipeline fixes * Fix test for pkg import failure for autogen * Update test module names to align with recent changes * Remove explicit TypeAlias to support Py3.8 * py3.8 issues with union operator backport from future * Fix imports and type annotations for pipeline * Fix mypy type issues * Add importlib-resources to requirements for py3.8 * fix isort * Add importlib-resources to requirements-all.txt * Add importlib_resources to EXTRAS * Add importlib-resources to conda reqs * Update pkg_import test to handle required python versions * Fix python version check * API docs generated for aiagents Also fixing a couple of errors in PackageSummary.rst * Remove extra test file --------- Co-authored-by: Eileen Garcia <t-egarcia@microsoft.com> Co-authored-by: Ryan Cobb <ryan@detect.dev> Co-authored-by: ryan-detect-dot-dev <106877876+ryan-detect-dot-dev@users.noreply.github.com> Co-authored-by: ianhelle <ianhelle@microsoft.com>
This commit is contained in:
Родитель
8f9bf3ba1c
Коммит
032c1080cc
|
@ -7,3 +7,5 @@ passivetotal>=2.5.3
|
|||
sumologic-sdk>=0.1.11
|
||||
splunk-sdk>=1.6.0,!=2.0.0
|
||||
packaging>=24.0
|
||||
pyautogen[retrievechat]>=0.2.28
|
||||
importlib-resources >= 6.4.0; python_version <= "3.8"
|
|
@ -0,0 +1,716 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Example Notebook for RAG (Retrieval-Augmented Generation) Agent Usage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query the RAG agent using the cell magic `%%ask` command"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %load_ext msticpy.aiagents.mp_docs_rag_magic\n",
|
||||
"# Or use:\n",
|
||||
"%reload_ext msticpy.aiagents.mp_docs_rag_magic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2024-07-30 15:48:19,414 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - \u001b[32mUse the existing collection `MSTICpy_Docs_2.12.0`.\u001b[0m\n",
|
||||
"2024-07-30 15:48:27,518 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 384 chunks.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To connect to the Microsoft Sentinel Query Provider, you need the following three things:\n",
|
||||
"\n",
|
||||
"1. A `QueryProvider` instance.\n",
|
||||
"2. The data environment string (\"MSSentinel\" for Microsoft Sentinel).\n",
|
||||
"3. A connection string or authentication parameters.\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProviders.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"What are the three things that I need to connect to Microsoft Sentinel Query Provider?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I connect to the M365 Defender query provider?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To connect to the M365 Defender query provider, you need to follow these steps:\n",
|
||||
"\n",
|
||||
"1. Ensure your connection details are specified in the `msticpyconfig.yaml` file.\n",
|
||||
"\n",
|
||||
"2. Create a `QueryProvider` instance for M365 Defender.\n",
|
||||
"\n",
|
||||
"3. Call the `connect()` method on the instance.\n",
|
||||
"\n",
|
||||
"Here's an example:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from msticpy.data import QueryProvider\n",
|
||||
"\n",
|
||||
"# Create a QueryProvider instance\n",
|
||||
"mdatp_prov = QueryProvider(\"M365D\")\n",
|
||||
"\n",
|
||||
"# Connect to the M365 Defender instance using the configured details\n",
|
||||
"mdatp_prov.connect()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you have multiple instances configured, specify the instance name when calling `connect()`:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"mdatp_prov.connect(instance=\"Tenant2\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you prefer to pass connection parameters directly, use keyword arguments:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"# Collect credentials\n",
|
||||
"ten_id = input('Tenant ID')\n",
|
||||
"client_id = input('Client ID')\n",
|
||||
"client_secret = input('Client Secret')\n",
|
||||
"\n",
|
||||
"# Create a QueryProvider instance\n",
|
||||
"mdatp_prov = QueryProvider('M365D')\n",
|
||||
"\n",
|
||||
"# Connect using collected credentials\n",
|
||||
"mdatp_prov.connect(tenant_id=ten_id, client_id=client_id, client_secret=client_secret)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Alternatively, you can use a connection string:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"# Define a connection string\n",
|
||||
"conn_str = (\n",
|
||||
" \"tenant_id='243bb6be-4136-4b64-9055-fb661594199a'; \"\n",
|
||||
" \"client_id='a5b24e23-a96a-4472-b729-9e5310c83e20'; \"\n",
|
||||
" \"client_secret='[PLACEHOLDER]'\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create a QueryProvider instance\n",
|
||||
"mdatp_prov = QueryProvider('M365D')\n",
|
||||
"\n",
|
||||
"# Connect using the connection string\n",
|
||||
"mdatp_prov.connect(conn_str)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask\n",
|
||||
"How do I connect to the M365 Defender query provider?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To add Azure Resource Graph to your `msticpyconfig.yaml` configuration, include the following under the `Azure` section:\n",
|
||||
"\n",
|
||||
"```yaml\n",
|
||||
"Azure:\n",
|
||||
" auth_methods:\n",
|
||||
" - cli\n",
|
||||
" - interactive\n",
|
||||
" cloud: global\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"For more information on configuring `msticpyconfig.yaml`, refer to the MSTICPy documentation.\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\ResourceGraphDriver.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask\n",
|
||||
"What do I need to add to my msticpyconfig.yaml config for the Azure Resource Graph query provider?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### A response of `UPDATE_CONTEXT` indicates that the agents are unable to answer the query with the information retrieved by the RAG agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: Does the Splunk query provider support device code authentication?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: UPDATE CONTEXT"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask\n",
|
||||
"Does the Splunk query provider support device code authentication?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How can I plot IP addresses in this dataframe on a map?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To plot IP addresses in a DataFrame on a map using MSTICpy's FoliumMap, you can use the `mp_plot.folium_map` pandas accessor. Here's an example:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"# Plotting IP addresses using the mp_plot.folium_map accessor\n",
|
||||
"geo_loc_df.mp_plot.folium_map(ip_column=\"IPAddress\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will display an interactive map with markers based on the IP addresses in the \"IPAddress\" column of your DataFrame.\n",
|
||||
"\n",
|
||||
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\visualization\\\\FoliumMap.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How can I plot IP addresses in this dataframe on a map?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I create a new custom data provider with msticpy?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To create a new custom data provider with MSTICpy, follow these main steps:\n",
|
||||
"\n",
|
||||
"1. **Write the driver class:** Derive it from `DriverBase` and implement the methods `__init__`, `connect`, `query`, and optionally `query_with_results`.\n",
|
||||
"2. **Customize the driver (optional):** Expose attributes via `QueryProvider`, and implement custom parameter formatting and query parameter substitution if needed.\n",
|
||||
"3. **Register the driver:** Update the `DataEnvironment` enum and add an entry to the driver dynamic load table.\n",
|
||||
"4. **Add queries:** Create a folder named after your `DataEnvironment` and add your query files there.\n",
|
||||
"5. **Add settings definition:** Define settings in a YAML configuration file.\n",
|
||||
"6. **Create documentation:** Document the configuration and use of the data provider.\n",
|
||||
"7. **Create unit tests:** Add unit tests using mocks to simulate service responses.\n",
|
||||
"\n",
|
||||
"For detailed guidance on these steps, refer to the provided MSTICpy documentation related to data providers.\n",
|
||||
"\n",
|
||||
"Sources: WritingDataProviders.rst, PluginFramework.rst, ExtendingMsticpy.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How do I create a new custom data provider with msticpy?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I list which TI providers are currently enabled?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: ### Step 1: Intent\n",
|
||||
"The user's intent is to get help with **question answering**.\n",
|
||||
"\n",
|
||||
"### Step 2: Answer\n",
|
||||
"To list which Threat Intelligence (TI) providers are currently enabled in MSTICpy, you can inspect the configuration typically found in the `msticpyconfig.yaml` file under the `TIProviders` section. This configuration file determines which providers are set up and whether they are marked as primary/secondary.\n",
|
||||
"\n",
|
||||
"Sources: `C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\WritingTIAndContextProviders.rst`"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How do I list which TI providers are currently enabled?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I lookup threat intelligence for multiple IP addresses at once?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: Step 1: User's intent is to generate code for performing threat intelligence lookups for multiple IP addresses at once.\n",
|
||||
"\n",
|
||||
"Step 2:\n",
|
||||
"```python\n",
|
||||
"from msticpy.context.ip_utils import ip_whois\n",
|
||||
"\n",
|
||||
"# List of IP addresses to lookup\n",
|
||||
"ip_list = [\"123.1.2.3\", \"124.5.6.7\"]\n",
|
||||
"\n",
|
||||
"# Performing Whois lookup for multiple IP addresses\n",
|
||||
"whois_data = ip_whois(ip_list)\n",
|
||||
"print(whois_data)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\IPWhois.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How do I lookup threat intelligence for multiple IP addresses at once?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I use pivot functions?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To use pivot functions in MSTICpy, you have two primary options: creating persistent pivot function definitions in YAML files or adding ad hoc pivot functions directly in code. Here's a brief overview of both methods:\n",
|
||||
"\n",
|
||||
"**1. Persistent Pivot Function Definitions**\n",
|
||||
"\n",
|
||||
"- Define your pivot function properties in a YAML file with a top-level element `pivot_providers`.\n",
|
||||
"- Example YAML definition:\n",
|
||||
"\n",
|
||||
"```yaml\n",
|
||||
"pivot_providers:\n",
|
||||
" who_is:\n",
|
||||
" src_module: msticpy.context.ip_utils\n",
|
||||
" src_func_name: get_whois_df\n",
|
||||
" func_new_name: whois\n",
|
||||
" input_type: dataframe\n",
|
||||
" entity_map:\n",
|
||||
" IpAddress: Address\n",
|
||||
" func_df_param_name: data\n",
|
||||
" func_df_col_param_name: ip_column\n",
|
||||
" func_out_column_name: query\n",
|
||||
" func_static_params:\n",
|
||||
" all_columns: True\n",
|
||||
" show_progress: False\n",
|
||||
" func_input_value_arg: ip_address\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"- Load and register the definition using:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from msticpy.init.pivot_core.pivot import Pivot\n",
|
||||
"Pivot.register_pivot_providers(pivot_reg_path=path_to_your_yaml, namespace=globals(), def_container=\"my_container\", force_container=True)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"**2. Ad Hoc Pivot Functions in Code**\n",
|
||||
"\n",
|
||||
"- Add a function as a pivot using the `add_pivot_function` method:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"def my_func(input: str):\n",
|
||||
" return input.upper()\n",
|
||||
"\n",
|
||||
"Pivot.add_pivot_function(\n",
|
||||
" func=my_func,\n",
|
||||
" container=\"change_case\",\n",
|
||||
" input_type=\"value\",\n",
|
||||
" entity_map={\"Host\": \"HostName\"},\n",
|
||||
" func_input_value_arg=\"input\",\n",
|
||||
" func_new_name=\"upper_name\",\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"- Alternatively, use the `PivotRegistration` class:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"from msticpy.init.pivot_core.pivot_register import PivotRegistration\n",
|
||||
"\n",
|
||||
"def my_func(input: str):\n",
|
||||
" return input.upper()\n",
|
||||
"\n",
|
||||
"piv_reg = PivotRegistration(\n",
|
||||
" input_type=\"value\",\n",
|
||||
" entity_map={\"Host\": \"HostName\"},\n",
|
||||
" func_input_value_arg=\"input\",\n",
|
||||
" func_new_name=\"upper_name\"\n",
|
||||
")\n",
|
||||
"Pivot.add_pivot_function(my_func, piv_reg, container=\"change_case\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"**Running Pivots in DataFrame Pipelines:**\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"(\n",
|
||||
" my_df\n",
|
||||
" .query(\"UserCount > 1\")\n",
|
||||
" .mp_pivot.run(IpAddress.util.whois, column=\"Ioc\")\n",
|
||||
" .drop_duplicates()\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"- Use `mp_pivot.run` to integrate pivot functions into DataFrame processing pipelines.\n",
|
||||
"- Join input and output DataFrames with the `join` parameter in `mp_pivot.run`.\n",
|
||||
"\n",
|
||||
"**Debugging Tools:**\n",
|
||||
"\n",
|
||||
"- `mp_pivot.display` for intermediate results.\n",
|
||||
"- `mp_pivot.tee` for creating snapshots.\n",
|
||||
"- `mp_pivot.tee_exec` for executing intermediate operations (e.g., plotting).\n",
|
||||
"\n",
|
||||
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\extending\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\data_analysis\\\\PivotFunctions.rst, C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.init.pivot.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How do I use pivot functions?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: Which columns do I need in a dataframe to plot process trees?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To plot process trees, the required columns in a DataFrame are typically:\n",
|
||||
"\n",
|
||||
"1. `ParentProcessName`\n",
|
||||
"2. `Process`\n",
|
||||
"\n",
|
||||
"Additional attributes such as `SubjectUserName`, `SubjectDomainName`, `SubjectLogonId`, `NewProcessName`, `CommandLine`, and `TimeGenerated` can be used for more detailed visualization and analysis.\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\NetworkGraph.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"Which columns do I need in a dataframe to plot process trees?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: What kind of visualizations does msticpy support?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: MSTICPy supports various visualizations including interactive timelines, process trees, multi-dimensional Morph Charts, data viewers, matrix plots, network plots, and several others listed under the `msticpy.vis` package.\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\index.rst, C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\visualization\\MorphCharts.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"What kind of visualizations does msticpy support?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: How do I add a new query for Microsoft 365 Defender to msticpy?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: To add a new query for Microsoft 365 Defender (M365D) to MSTICPy, you should use the `QueryProvider` class. Here's a step-by-step guide on how to achieve it:\n",
|
||||
"\n",
|
||||
"1. **Initialize the `QueryProvider` for M365D**:\n",
|
||||
" ```python\n",
|
||||
" from msticpy.data import QueryProvider\n",
|
||||
"\n",
|
||||
" mdatp_prov = QueryProvider(\"M365D\")\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"2. **Connect to the M365 Defender API**:\n",
|
||||
" ```python\n",
|
||||
" mdatp_prov.connect()\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"3. **Add your new query**:\n",
|
||||
" You can add new queries to the query store of `QueryProvider`. Here’s an example of how to define and add a new query:\n",
|
||||
" ```python\n",
|
||||
" new_query = \"\"\"\n",
|
||||
" DeviceEvents\n",
|
||||
" | where ActionType == \"FileCreated\"\n",
|
||||
" | limit 10\n",
|
||||
" \"\"\"\n",
|
||||
" mdatp_prov.add_query(\"GetRecentFileCreatedEvents\", new_query)\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"4. **Run the newly added query**:\n",
|
||||
" ```python\n",
|
||||
" results = mdatp_prov.exec_query(\"GetRecentFileCreatedEvents\")\n",
|
||||
" print(results)\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"In summary, you need to instantiate a `QueryProvider` object for M365D, connect to the API, add the new query, and then execute the query.\n",
|
||||
"\n",
|
||||
"Sources: C:\\Users\\t-egarcia\\Documents\\Forked MSTICpy Repo\\msticpy\\docs\\source\\data_acquisition\\DataProv-MSDefender.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask \n",
|
||||
"How do I add a new query for Microsoft 365 Defender to msticpy?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Question**: Which msticpy module contains the code related to visualizing network graphs?"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/markdown": [
|
||||
"\n",
|
||||
"**Answer**: The MSTICpy module that contains the code related to visualizing network graphs is `msticpy.vis.network_plot`.\n",
|
||||
"\n",
|
||||
"Sources: C:\\\\Users\\\\t-egarcia\\\\Documents\\\\Forked MSTICpy Repo\\\\msticpy\\\\docs\\\\source\\\\api\\\\msticpy.vis.network_plot.rst"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ask\n",
|
||||
"Which msticpy module contains the code related to visualizing network graphs?"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "internshipenv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.14"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
|
@ -0,0 +1,7 @@
|
|||
msticpy.aiagents.config\_utils module
|
||||
=====================================
|
||||
|
||||
.. automodule:: msticpy.aiagents.config_utils
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
|
@ -0,0 +1,7 @@
|
|||
msticpy.aiagents.mp\_docs\_rag\_magic module
|
||||
============================================
|
||||
|
||||
.. automodule:: msticpy.aiagents.mp_docs_rag_magic
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
|
@ -0,0 +1,7 @@
|
|||
msticpy.aiagents.rag\_agents module
|
||||
===================================
|
||||
|
||||
.. automodule:: msticpy.aiagents.rag_agents
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
|
@ -0,0 +1,17 @@
|
|||
msticpy.aiagents package
|
||||
========================
|
||||
|
||||
.. automodule:: msticpy.aiagents
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Submodules
|
||||
----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
msticpy.aiagents.config_utils
|
||||
msticpy.aiagents.mp_docs_rag_magic
|
||||
msticpy.aiagents.rag_agents
|
|
@ -0,0 +1,7 @@
|
|||
msticpy.context.tiproviders.binaryedge module
|
||||
=============================================
|
||||
|
||||
.. automodule:: msticpy.context.tiproviders.binaryedge
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
|
@ -15,6 +15,7 @@ Submodules
|
|||
msticpy.context.tiproviders.abuseipdb
|
||||
msticpy.context.tiproviders.alienvault_otx
|
||||
msticpy.context.tiproviders.azure_sent_byoti
|
||||
msticpy.context.tiproviders.binaryedge
|
||||
msticpy.context.tiproviders.crowdsec
|
||||
msticpy.context.tiproviders.greynoise
|
||||
msticpy.context.tiproviders.ibm_xforce
|
||||
|
|
|
@ -12,6 +12,7 @@ Subpackages
|
|||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
msticpy.aiagents
|
||||
msticpy.analysis
|
||||
msticpy.auth
|
||||
msticpy.common
|
||||
|
|
|
@ -217,6 +217,7 @@ autoapi_dirs = ["../../msticpy"]
|
|||
|
||||
autodoc_mock_imports = [
|
||||
"adal",
|
||||
"autogen",
|
||||
"azure",
|
||||
"azure.common.exceptions",
|
||||
"azure.core.exceptions",
|
||||
|
@ -245,6 +246,7 @@ autodoc_mock_imports = [
|
|||
"dns",
|
||||
"folium",
|
||||
"geoip2",
|
||||
"importlib_resources",
|
||||
"ipwhois",
|
||||
"IPython",
|
||||
"ipywidgets",
|
||||
|
|
|
@ -318,7 +318,7 @@ of known malicious commands as well as suspicious patterns of behaviour.
|
|||
domain_utils
|
||||
~~~~~~~~~~~~
|
||||
|
||||
:py:mod:`msticpy.transform.domain_utils`
|
||||
:py:mod:`msticpy.context.domain_utils`
|
||||
|
||||
Module to support investigation of domain names and URLs with functions to
|
||||
validate a domain name and screenshot a URL.
|
||||
|
@ -334,6 +334,68 @@ without compromising privacy.
|
|||
|
||||
See :doc:`../data_acquisition/DataMasking`
|
||||
|
||||
|
||||
|
||||
|
||||
Agentic Module
|
||||
-----------------
|
||||
aiagents Module: RAG Agent
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:py:mod:`msticpy.aiagents.mp_docs_rag_magic`
|
||||
|
||||
The **aiagents** module in MSTICpy introduces the **Retrieval-Augmented Generation (RAG) Agent**.
|
||||
This agent is designed for Q&A about MSTICpy. It enhances InfoSec investigations and
|
||||
threat hunting in Jupyter Notebooks by providing contextually relevant responses using large language
|
||||
models and external knowledge retrieval from the MSTICpy documentation.
|
||||
|
||||
You can invoke the RAG agent in a Jupyter Notebook by adding the following:
|
||||
|
||||
.. code-block:: ipython
|
||||
|
||||
# Load the RAG cell magic
|
||||
%load_ext msticpy.aiagents.mp_docs_rag_magic
|
||||
|
||||
.. code-block:: ipython
|
||||
|
||||
# The following command should be in a separate cell and be the first line of the cell
|
||||
%%ask
|
||||
What are the three things that I need to connect to Azure Query Provider?
|
||||
|
||||
You can also invoke the RAG Agent outside of a Jupyter Notebook as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Import the necessary module
|
||||
from .rag_agents import (
|
||||
ask_question,
|
||||
get_retrieval_assistant_agent,
|
||||
get_retrieval_user_proxy_agent,
|
||||
)
|
||||
|
||||
# Initialize the agents
|
||||
assistant_agent = get_retrieval_assistant_agent()
|
||||
user_proxy_agent = get_retrieval_user_proxy_agent()
|
||||
|
||||
# Define your question
|
||||
question = "Your MSTICpy-related question here"
|
||||
|
||||
# Query the agents
|
||||
response = ask_question(
|
||||
assistant_agent,
|
||||
user_proxy_agent,
|
||||
question=question,
|
||||
)
|
||||
|
||||
print(response.summary)
|
||||
|
||||
|
||||
See also:
|
||||
|
||||
- :doc:`Autogen LLM Configurations in msticpyconfig <../getting_started/msticpyconfig>`
|
||||
- Sample notebook - `RAG Agent Example Notebook <https://github.com/microsoft/msticpy/blob/master/docs/notebooks/RagAgent.ipynb>`__
|
||||
|
||||
|
||||
Supported Platforms and Packages
|
||||
--------------------------------
|
||||
|
||||
|
@ -341,3 +403,4 @@ Supported Platforms and Packages
|
|||
- Requires Python 3.8 or later
|
||||
- See `requirements.txt <https://github.com/microsoft/msticpy/blob/master/requirements.txt>`__
|
||||
for more details and version requirements.
|
||||
|
||||
|
|
|
@ -171,6 +171,42 @@ as geo-location providers.
|
|||
DBFolder: "~/.msticpy"
|
||||
Provider: "GeoLiteLookup"
|
||||
|
||||
Autogen
|
||||
~~~~~~~
|
||||
This section allows you to specify `settings for the autogen module
|
||||
<https://microsoft.github.io/autogen/docs/topics/llm_configuration>`.
|
||||
`Autogen` is a top-level section in the configuration file. This
|
||||
section is used as the `llm_config` argument when instantiating
|
||||
an agent from the `autogen` package.
|
||||
|
||||
When using Entra ID authentication with Autogen, you an specify the
|
||||
`azure_ad_token_provider` as the string `token_provider`. This will
|
||||
automatically use `DefaultAzureCredential` to obtain bearer tokens
|
||||
rather than using an API key. This is the preferred method of
|
||||
authenticating to Azure OpenAI endpoints because there are no
|
||||
secrets to manage.
|
||||
|
||||
If using an API key, you can specify `api_key` as the name of
|
||||
an environment variable.
|
||||
|
||||
Here is an example of authenticating to an Azure OpenAI endpoint using
|
||||
both Entra ID and an API key:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
Autogen:
|
||||
config_list:
|
||||
- model: gpt-4o
|
||||
api_type: azure
|
||||
api_version: 2024-02-15-preview
|
||||
base_url: https://my-aoai-endpoint.openai.azure.com/
|
||||
azure_ad_token_provider: token_provider
|
||||
- model: gpt-4o
|
||||
api_type: azure
|
||||
api_version: 2024-02-15-preview
|
||||
base_url: https://my-aoai-endpoint.openai.azure.com/
|
||||
api_key: SOME_ENVIRONMENT_VARIABLE
|
||||
|
||||
DataProviders
|
||||
~~~~~~~~~~~~~~
|
||||
This section is similar to the previous two sections, allowing you
|
||||
|
|
|
@ -0,0 +1,6 @@
|
|||
# -------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License. See License.txt in the project root for
|
||||
# license information.
|
||||
# --------------------------------------------------------------------------
|
||||
"""Modules related to AI agents used in MSTICpy."""
|
|
@ -0,0 +1,116 @@
|
|||
# -------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License. See License.txt in the project root for
|
||||
# license information.
|
||||
# --------------------------------------------------------------------------
|
||||
"""Utility Modules related to AI agents used in MSTICpy."""
|
||||
|
||||
import os
|
||||
from typing import Callable, Dict, List, Union
|
||||
|
||||
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
|
||||
|
||||
from ..common.exceptions import MsticpyUserConfigError
|
||||
from ..common.pkg_config import get_config
|
||||
|
||||
|
||||
ConfigItem = Dict[str, Union[str, Callable]]
|
||||
ConfigList = List[ConfigItem]
|
||||
Config = Dict[str, Union[str, float, ConfigList]]
|
||||
|
||||
token_provider = get_bearer_token_provider(
|
||||
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
|
||||
)
|
||||
|
||||
|
||||
def inject_token_provider_callable(
|
||||
config: ConfigItem,
|
||||
) -> ConfigItem:
|
||||
"""Replace autogen configuration `azure_ad_token_provider` with a token provider callable.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : ConfigItem
|
||||
Autogen LLM configuration.
|
||||
|
||||
Returns
|
||||
-------
|
||||
ConfigItem
|
||||
Autogen LLM configuration with the token provider callable.
|
||||
"""
|
||||
if "azure_ad_token_provider" in config:
|
||||
config["azure_ad_token_provider"] = token_provider
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def inject_environment_variable(config: ConfigItem) -> ConfigItem:
|
||||
"""Replace autogen configuration `api_key` with the value of an environment variable.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
config : ConfigItem
|
||||
Autogen LLM configuration.
|
||||
|
||||
Returns
|
||||
-------
|
||||
ConfigItem
|
||||
Autogen LLM configuration with the environment variable value.
|
||||
"""
|
||||
if "api_key" in config:
|
||||
api_key = os.environ.get(str(config["api_key"]), None)
|
||||
if not api_key:
|
||||
raise MsticpyUserConfigError(
|
||||
f"Environment variable {config['api_key']} specified, but not found!"
|
||||
)
|
||||
config["api_key"] = api_key
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def get_autogen_config_from_msticpyconfig() -> Config:
|
||||
"""Get Autogen configuration from msticpyconfig.yaml.
|
||||
|
||||
See `https://microsoft.github.io/autogen/docs/topics/llm_configuration`
|
||||
for more information on the structure of the LLM configuration object.
|
||||
|
||||
Please note that a configuration list is required, rather than a single configuration.
|
||||
|
||||
Returns
|
||||
-------
|
||||
Config
|
||||
Autogen configuration.
|
||||
|
||||
Raises
|
||||
------
|
||||
MsticpyUserConfigError
|
||||
Autogen settings not found in msticpyconfig.yaml configuration
|
||||
MsticpyUserConfigError
|
||||
Autogen.config_list setting not found in msticpyconfig.yaml configuration
|
||||
"""
|
||||
autogen_config = get_config("Autogen", None)
|
||||
if not autogen_config:
|
||||
raise MsticpyUserConfigError(
|
||||
"Autogen settings not found in msticpyconfig.yaml configuration!"
|
||||
)
|
||||
|
||||
if "config_list" not in autogen_config:
|
||||
raise MsticpyUserConfigError(
|
||||
"Autogen.config_list setting not found in msticpyconfig.yaml configuration!"
|
||||
)
|
||||
|
||||
injectors = [
|
||||
inject_token_provider_callable,
|
||||
inject_environment_variable,
|
||||
]
|
||||
|
||||
config_list = []
|
||||
for config in autogen_config["config_list"]:
|
||||
for injector in injectors:
|
||||
config = injector(config)
|
||||
config_list.append(config)
|
||||
|
||||
return {
|
||||
**autogen_config,
|
||||
"config_list": config_list,
|
||||
}
|
|
@ -0,0 +1,69 @@
|
|||
# -------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License. See License.txt in the project root for
|
||||
# license information.
|
||||
# --------------------------------------------------------------------------
|
||||
"""Integrates MSTICpy with IPython custom magic commands for retrieval-augmented generation."""
|
||||
|
||||
import io
|
||||
from contextlib import redirect_stdout
|
||||
|
||||
from IPython.core.magic import Magics, cell_magic, magics_class
|
||||
from IPython.display import display_markdown
|
||||
|
||||
from .rag_agents import (
|
||||
ask_question,
|
||||
get_retrieval_assistant_agent,
|
||||
get_retrieval_user_proxy_agent,
|
||||
)
|
||||
|
||||
|
||||
@magics_class
|
||||
class DocsRagMagic(Magics):
|
||||
"""Implement a class to provide RAG Magic functionalities for MSTICpy."""
|
||||
|
||||
def __init__(self, shell):
|
||||
super().__init__(shell)
|
||||
self.assistant_agent = get_retrieval_assistant_agent()
|
||||
self.user_proxy_agent = get_retrieval_user_proxy_agent()
|
||||
|
||||
@cell_magic
|
||||
def ask(self, _, cell: str):
|
||||
"""
|
||||
Enable the user to ask a question to the RAG agent using a cell magic function.
|
||||
|
||||
It calls the ask_magic method with the cell content as the question.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
cell : str
|
||||
The content of the cell. This is used as the question to ask the RAG agent.
|
||||
|
||||
Example Usage
|
||||
-------------
|
||||
To ask a question to the RAG agent, use the cell magic
|
||||
command followed by the question in the cell.
|
||||
For example:
|
||||
|
||||
%%ask
|
||||
Which msticpy module contains the code related to visualizing network graphs?
|
||||
"""
|
||||
question = cell.strip()
|
||||
output = io.StringIO()
|
||||
with redirect_stdout(output):
|
||||
response = ask_question(
|
||||
self.assistant_agent,
|
||||
self.user_proxy_agent,
|
||||
question=question,
|
||||
)
|
||||
|
||||
answer = response.summary
|
||||
|
||||
display_markdown(f"\n**Question**: {question}", raw=True)
|
||||
display_markdown(f"\n**Answer**: {answer}", raw=True)
|
||||
|
||||
|
||||
# Register the magic class with IPython
|
||||
def load_ipython_extension(ipython):
|
||||
"""Register the magic class with IPython."""
|
||||
ipython.register_magics(DocsRagMagic)
|
|
@ -0,0 +1,157 @@
|
|||
# -------------------------------------------------------------------------
|
||||
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||
# Licensed under the MIT License. See License.txt in the project root for
|
||||
# license information.
|
||||
# --------------------------------------------------------------------------
|
||||
"""
|
||||
Module for MSTICpy documentation utilities and retrieval agent configuration.
|
||||
|
||||
Includes functions to find documentation files and to set up retrieval
|
||||
agents that assist security analysts by answering questions based on MSTICpy documentation.
|
||||
"""
|
||||
|
||||
import sys
|
||||
|
||||
from pathlib import Path
|
||||
from typing import List, Optional
|
||||
|
||||
from autogen.agentchat.chat import ChatResult
|
||||
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
|
||||
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
|
||||
|
||||
from .._version import VERSION
|
||||
from ..common.exceptions import MsticpyUserConfigError
|
||||
from .config_utils import get_autogen_config_from_msticpyconfig
|
||||
|
||||
if sys.version_info < (3, 9):
|
||||
import importlib_resources as pkg_resources
|
||||
else:
|
||||
import importlib.resources as pkg_resources
|
||||
|
||||
|
||||
def find_rst_files() -> List[str]:
|
||||
"""
|
||||
Find all .rst files in the docs/source directory of 'msticpy' package.
|
||||
|
||||
Returns
|
||||
-------
|
||||
list of str
|
||||
List of paths to .rst files in the docs/source directory.
|
||||
"""
|
||||
# Get the path to the docs/source directory of the package
|
||||
docs_path = Path(pkg_resources.files("msticpy")).parent / "docs" / "source"
|
||||
|
||||
# Find all .rst files in the docs/source directory
|
||||
rst_files = list(str(fp) for fp in docs_path.rglob("*.rst"))
|
||||
|
||||
return rst_files
|
||||
|
||||
|
||||
def get_retrieval_assistant_agent(system_message: str = "") -> RetrieveAssistantAgent:
|
||||
"""
|
||||
Create and return a RetrieveAssistantAgent.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
system_message : str, optional
|
||||
Custom system message for the assistant.
|
||||
|
||||
Returns
|
||||
-------
|
||||
RetrieveAssistantAgent
|
||||
Configured RetrieveAssistantAgent instance.
|
||||
"""
|
||||
if not system_message:
|
||||
system_message = (
|
||||
"You are a helpful assistant to security analysts using MSTICpy."
|
||||
)
|
||||
return RetrieveAssistantAgent(
|
||||
name="assistant",
|
||||
system_message=system_message,
|
||||
llm_config=get_autogen_config_from_msticpyconfig(),
|
||||
)
|
||||
|
||||
|
||||
def get_retrieval_user_proxy_agent(
|
||||
customized_prompt: Optional[str] = None,
|
||||
) -> RetrieveUserProxyAgent:
|
||||
"""
|
||||
Create and return a RetrieveUserProxyAgent.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
customized_prompt : Optional[str], optional
|
||||
Custom prompt for the assistant agent, by default None.
|
||||
|
||||
Returns
|
||||
-------
|
||||
RetrieveUserProxyAgent
|
||||
Configured RetrieveUserProxyAgent instance.
|
||||
|
||||
Raises
|
||||
------
|
||||
MsticpyUserConfigError
|
||||
Autogen settings not found in msticpyconfig.yaml configuration
|
||||
"""
|
||||
rst_files = find_rst_files()
|
||||
autogen_config = get_autogen_config_from_msticpyconfig()
|
||||
|
||||
default_model = None
|
||||
if "config_list" in autogen_config and isinstance(
|
||||
autogen_config["config_list"], list
|
||||
):
|
||||
if autogen_config["config_list"]:
|
||||
default_config = autogen_config["config_list"][0]
|
||||
if "model" in default_config:
|
||||
default_model = default_config["model"]
|
||||
|
||||
if not default_model:
|
||||
raise MsticpyUserConfigError(
|
||||
"Could not find a valid default Autogen model in msticpyconfig.yaml configuration!"
|
||||
)
|
||||
|
||||
return RetrieveUserProxyAgent(
|
||||
name="ragproxyagent",
|
||||
human_input_mode="NEVER",
|
||||
max_consecutive_auto_reply=1,
|
||||
is_termination_msg=lambda x: True,
|
||||
retrieve_config={
|
||||
"task": "default",
|
||||
"docs_path": rst_files,
|
||||
"chunk_token_size": 2000,
|
||||
"customized_prompt": customized_prompt,
|
||||
"model": default_model,
|
||||
"vector_db": "chroma",
|
||||
"collection_name": f"MSTICpy_Docs_{VERSION}",
|
||||
"get_or_create": True,
|
||||
},
|
||||
code_execution_config=False,
|
||||
)
|
||||
|
||||
|
||||
def ask_question(
|
||||
assistant_agent: RetrieveAssistantAgent,
|
||||
user_proxy_agent: RetrieveUserProxyAgent,
|
||||
question: str,
|
||||
) -> ChatResult:
|
||||
"""
|
||||
Ask a question using the assistant and user proxy agents.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
assistant_agent : RetrieveAssistantAgent
|
||||
The assistant agent to use.
|
||||
user_proxy_agent : RetrieveUserProxyAgent
|
||||
The user proxy agent to use.
|
||||
question : str
|
||||
The question to ask.
|
||||
|
||||
Returns
|
||||
-------
|
||||
ChatResult
|
||||
The result of the chat interaction.
|
||||
"""
|
||||
assistant_agent.reset()
|
||||
return user_proxy_agent.initiate_chat(
|
||||
assistant_agent, message=user_proxy_agent.message_generator, problem=question
|
||||
)
|
|
@ -23,6 +23,7 @@ folium>=0.9.0
|
|||
geoip2>=2.9.0
|
||||
httpx>=0.23.0, <1.0.0
|
||||
html5lib
|
||||
importlib-resources >= 6.4.0; python_version <= "3.8"
|
||||
ipython >= 7.1.1; python_version < "3.8"
|
||||
ipython >= 7.23.1; python_version >= "3.8"
|
||||
ipywidgets>=7.4.2, <9.0.0
|
||||
|
@ -43,6 +44,7 @@ packaging>=24.0
|
|||
pandas>=1.4.0, <3.0.0
|
||||
panel>=0.14.4
|
||||
passivetotal>=2.5.3
|
||||
pyautogen[retrievechat]>=0.2.28
|
||||
pydantic>=1.8.0, <3.0.0
|
||||
pygments>=2.0.0
|
||||
pyjwt>=2.3.0
|
||||
|
|
|
@ -16,6 +16,7 @@ folium>=0.9.0
|
|||
geoip2>=2.9.0
|
||||
httpx>=0.23.0, <1.0.0
|
||||
html5lib
|
||||
importlib-resources >= 6.4.0; python_version <= "3.8"
|
||||
ipython >= 7.1.1; python_version < "3.8"
|
||||
ipython >= 7.23.1; python_version >= "3.8"
|
||||
ipywidgets>=7.4.2, <9.0.0
|
||||
|
|
6
setup.py
6
setup.py
|
@ -53,6 +53,7 @@ EXTRAS = {
|
|||
"sql2kql": ["mo-sql-parsing>=8, <9.0.0"],
|
||||
"riskiq": ["passivetotal>=2.5.3"],
|
||||
"panel": ["panel>=0.14.4"],
|
||||
"aiagents": ["pyautogen[retrievechat]>=0.2.28"],
|
||||
}
|
||||
extras_all = [
|
||||
extra for name, extras in EXTRAS.items() for extra in extras if name != "dev"
|
||||
|
@ -73,5 +74,8 @@ EXTRAS["sentinel"] = sorted(_combine_extras(["azure", "kql", "keyvault"]))
|
|||
|
||||
if __name__ == "__main__":
|
||||
setuptools.setup(
|
||||
install_requires=INSTALL_REQUIRES, extras_require=EXTRAS, version=__version__
|
||||
install_requires=INSTALL_REQUIRES,
|
||||
extras_require=EXTRAS,
|
||||
version=__version__,
|
||||
package_data={"docs": ["msticpy/docs/source/**/*.rst"]},
|
||||
)
|
||||
|
|
|
@ -0,0 +1,18 @@
|
|||
from pathlib import Path
|
||||
|
||||
from msticpy.aiagents.rag_agents import find_rst_files
|
||||
|
||||
|
||||
def test_find_rst_files_not_empty():
|
||||
rst_files = find_rst_files()
|
||||
|
||||
# Assert that the list of .rst files is not empty
|
||||
assert rst_files, "The list of .rst files is empty."
|
||||
|
||||
|
||||
def test_find_specific_rst_file():
|
||||
# Optionally, we can also check for specific .rst files
|
||||
rst_files = find_rst_files()
|
||||
|
||||
rst_file_names = [Path(file).name for file in rst_files]
|
||||
assert "DataAcquisition.rst" in rst_file_names, "The specific file is not found."
|
|
@ -27,6 +27,8 @@ EXTRAS_EXCEPTIONS = {
|
|||
"msticnb",
|
||||
"pygeohash",
|
||||
"pyperclip",
|
||||
"autogen",
|
||||
"importlib_resources",
|
||||
}
|
||||
CONDA_PKG_EXCEPTIONS = {
|
||||
"vt-py",
|
||||
|
@ -69,7 +71,12 @@ def test_missing_pkgs_req():
|
|||
missing_reqs = {
|
||||
req.strip() for reqs in mod_imports.values() for req in reqs.missing_reqs
|
||||
}
|
||||
missing_reqs = missing_reqs - EXTRAS_EXCEPTIONS
|
||||
# Remove any missing modules that part of an extra
|
||||
missing_reqs = {
|
||||
req
|
||||
for req in missing_reqs
|
||||
if not any(req.startswith(p) for p in EXTRAS_EXCEPTIONS)
|
||||
}
|
||||
if missing_reqs:
|
||||
print(
|
||||
"Missing packages:\n",
|
||||
|
@ -138,10 +145,10 @@ def test_conda_reqs(extras_from_setup):
|
|||
conda_reqs_pip_dict.pop(pkg_name)
|
||||
|
||||
if conda_reqs_dict:
|
||||
print("Extra items found in conda-reqs.txt", conda_reqs_pip_dict)
|
||||
print("Extra items found in conda-reqs.txt", conda_reqs_dict)
|
||||
check.is_false(conda_reqs_dict, "no extra items in conda-reqs.txt")
|
||||
if conda_reqs_pip_dict:
|
||||
print("Extra items found in conda-reqs-pip.txt", conda_reqs_dict)
|
||||
print("Extra items found in conda-reqs-pip.txt", conda_reqs_pip_dict)
|
||||
check.is_false(conda_reqs_pip_dict, "no extra items in conda-reqs-pip.txt")
|
||||
|
||||
|
||||
|
@ -153,4 +160,8 @@ def _get_reqs_from_file(reqs_file):
|
|||
for req in reqs_lines
|
||||
if req.strip() and not req.strip().startswith("#")
|
||||
]
|
||||
return {req.name.casefold(): req.specifier for req in reqs}
|
||||
return {
|
||||
req.name.casefold(): req.specifier
|
||||
for req in reqs
|
||||
if req.marker is None or req.marker.evaluate()
|
||||
}
|
||||
|
|
Загрузка…
Ссылка в новой задаче