Adding Sentinel Transformations Library

2022-03-04 10:50:47 +01:00 · 2022-03-04 10:50:47 +01:00 · b04bf6252b
--- a/Tools/Transformations-Library/Filtering/FilteringFieldsDCR.json
+++ b/Tools/Transformations-Library/Filtering/FilteringFieldsDCR.json
@ -0,0 +1,66 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+        "dataCollectionRuleName": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the name of the Data Collection Rule to create."
+            }
+        },
+        "location": {
+            "defaultValue": "westus2",
+            "allowedValues": [
+                "westus2",
+                "eastus2",
+                "eastus2euap"
+            ],
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the location in which to create the Data Collection Rule."
+            }
+        },
+        "workspaceResourceId": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the Azure resource ID of the Log Analytics workspace to use."
+            }
+        }
+    },
+    "resources": [
+        {
+            "type": "Microsoft.Insights/dataCollectionRules",
+            "apiVersion": "2021-09-01-preview",
+            "name": "[parameters('dataCollectionRuleName')]",
+            "location": "[parameters('location')]",
+            "kind": "WorkspaceTransforms",
+            "properties": {
+                "destinations": {
+                    "logAnalytics": [
+                        {
+                            "workspaceResourceId": "[parameters('workspaceResourceId')]",
+                            "name": "clv2ws1"
+                        }
+                    ]
+                },
+                "dataFlows": [
+                    {
+                        "streams": [
+                            "Microsoft-Table-AWSVPCFlow"
+                        ],
+                        "destinations": [
+                            "clv2ws1"
+                        ],
+                        "transformKql": "source | project-away Version, InterfaceId"
+                    }
+                ]
+            }
+        }
+    ],
+    "outputs": {
+        "dataCollectionRuleId": {
+            "type": "String",
+            "value": "[resourceId('Microsoft.Insights/dataCollectionRules', parameters('dataCollectionRuleName'))]"
+        }
+    }
+}
--- a/Tools/Transformations-Library/Filtering/FilteringRowsDCR.json
+++ b/Tools/Transformations-Library/Filtering/FilteringRowsDCR.json
@ -0,0 +1,66 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+        "dataCollectionRuleName": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the name of the Data Collection Rule to create."
+            }
+        },
+        "location": {
+            "defaultValue": "westus2",
+            "allowedValues": [
+                "westus2",
+                "eastus2",
+                "eastus2euap"
+            ],
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the location in which to create the Data Collection Rule."
+            }
+        },
+        "workspaceResourceId": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the Azure resource ID of the Log Analytics workspace to use."
+            }
+        }
+    },
+    "resources": [
+        {
+            "type": "Microsoft.Insights/dataCollectionRules",
+            "apiVersion": "2021-09-01-preview",
+            "name": "[parameters('dataCollectionRuleName')]",
+            "location": "[parameters('location')]",
+            "kind": "WorkspaceTransforms",
+            "properties": {
+                "destinations": {
+                    "logAnalytics": [
+                        {
+                            "workspaceResourceId": "[parameters('workspaceResourceId')]",
+                            "name": "clv2ws1"
+                        }
+                    ]
+                },
+                "dataFlows": [
+                    {
+                        "streams": [
+                            "Microsoft-Table-AWSVPCFlow"
+                        ],
+                        "destinations": [
+                            "clv2ws1"
+                        ],
+                        "transformKql": "source | where Action contains 'REJECT'"
+                    }
+                ]
+            }
+        }
+    ],
+    "outputs": {
+        "dataCollectionRuleId": {
+            "type": "String",
+            "value": "[resourceId('Microsoft.Insights/dataCollectionRules', parameters('dataCollectionRuleName'))]"
+        }
+    }
+}
--- a/Tools/Transformations-Library/Filtering/README.md
+++ b/Tools/Transformations-Library/Filtering/README.md
@ -0,0 +1,50 @@
+# Filtering at ingestion time
+
+Filtering incoming logs is essential to avoid noise in our telemetry and to keep ingestion costs under control.
+
+In this folder we have two examples on how to achieve filtering: dropping fields (FilteringFieldsDCR.json) or entire rows (FilteringRowsDCR.json).
+
+## Dropping fields
+
+This is about removing fields that don't add any value to our security operations. The way to achieve this in transformKql is very simple:
+
+```
+source 
+| project-away Version, InterfaceId
+```
+
+Just using ```project-away``` will avoid ingesting the specified fields into the workspace. Take into account that the column will not disappear, but field will be empty, avoiding the ingestion cost.
+
+Deploy this DCR:
+
+[![Deploy this DCR to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2FAzure-Sentinel%2Fmaster%2FTools%2FTransformations-Library%2FFiltering%2FFilteringFieldsDCR.json)
+
+## Dropping rows
+
+This is about discarding entire rows (records) when certain conditions are met. Example:
+
+```
+source | where Action contains 'REJECT'
+```
+
+In this example we're using a table with firewall traffic information, where we just want to keep records where the action taken by the firewall was to reject the traffic. We use ```where``` clause to do that.
+
+Deploy this DCR:
+
+[![Deploy this DCR to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2FAzure-Sentinel%2Fmaster%2FTools%2FTransformations-Library%2FFiltering%2FFilteringRowsDCR.json)
+
+## Multiple workspaces for idependent entities
+
+There are situations where you have multiple Sentinel workspaces, each owned by an independent entity. In those case, customers want each entity to see only its logs and not logs from other entities. This is ok for most data sources, but it can be a challenge for tenant-level sources like Office 365 or Azure AD.
+
+For these situations, you can multi-home the data source (eg. Office365) to send to multiple workspace, and then filter out at ingestion time the data that doesn't belong to the entity. For this to be done successfully, you need to have something in the event that can determine the owning entity, like a different domain or a country code.
+
+![image](../Media/AAD_multi-ws.png)
+
+In this case, you would need to apply a filtering trasnformation in the default DCR of all workspaces involved. Each transformation would look something like this (replacing country name):
+
+```kusto
+OfficeActivity | where OrganizationName == 'contoso-<country_name>.onmicrosoft.com'
+```
+
+This will of course vary for each implementation and data type, but this gives an idea on how to do it.
--- a/Tools/Transformations-Library/Masking/MaskingDCR.json
+++ b/Tools/Transformations-Library/Masking/MaskingDCR.json
@ -0,0 +1,91 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+        "dataCollectionRuleName": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the name of the Data Collection Rule to create."
+            }
+        },
+        "location": {
+            "defaultValue": "westus2",
+            "allowedValues": [
+                "westus2",
+                "eastus2",
+                "eastus2euap"
+            ],
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the location in which to create the Data Collection Rule."
+            }
+        },
+        "workspaceResourceId": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the Azure resource ID of the Log Analytics workspace to use."
+            }
+        },
+        "endpointResourceId": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the Azure resource ID of the Data Collection Endpoint to use."
+            }
+        }
+    },
+    "resources": [
+        {
+            "type": "Microsoft.Insights/dataCollectionRules",
+            "apiVersion": "2021-09-01-preview",
+            "name": "[parameters('dataCollectionRuleName')]",
+            "location": "[parameters('location')]",
+            "properties": {
+                "dataCollectionEndpointId": "[parameters('endpointResourceId')]",
+                "streamDeclarations": {
+                    "Custom-CustomerData": {
+                        "columns": [
+                            {
+                                "name": "Time",
+                                "type": "datetime"
+                            },
+                            {
+                                "name": "SSN",
+                                "type": "string"
+                            },
+                            {
+                                "name": "Email",
+                                "type": "string"
+                            }
+                        ]
+                    }
+                },
+                "destinations": {
+                    "logAnalytics": [
+                        {
+                            "workspaceResourceId": "[parameters('workspaceResourceId')]",
+                            "name": "clv2ws1"
+                        }
+                    ]
+                },
+                "dataFlows": [
+                    {
+                        "streams": [
+                            "Custom-CustomerData"
+                        ],
+                        "destinations": [
+                            "clv2ws1"
+                        ],
+                        "transformKql": "source | extend parsedSSN = split(SSN,'-') | extend SSN = iif(SSN matches regex @'^\\d{3}-\\d{2}-\\d{4}$' and not( SSN matches regex @'^(000|666|9)-\\d{2}-\\d{4}$') and not( SSN matches regex @'^\\d{3}-00-\\d{4}$') and not (SSN matches regex @'^\\d{3}-\\d{2}-0000$' ),strcat('XXX','-', 'XX','-',parsedSSN[2]), 'Invalid SSN') | extend Email = iif(Email matches regex @'^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9.-]+$','PII data removed', Email) |project-away parsedSSN | project TimeGenerated = Time, SSN, Email",
+                        "outputStream": "Custom-CustomerData_CL"
+                    }
+                ]
+            }
+        }
+    ],
+    "outputs": {
+        "dataCollectionRuleId": {
+            "type": "String",
+            "value": "[resourceId('Microsoft.Insights/dataCollectionRules',parameters('dataCollectionRuleName'))]"
+        }
+    }
+}
--- a/Tools/Transformations-Library/Masking/README.md
+++ b/Tools/Transformations-Library/Masking/README.md
@ -0,0 +1,35 @@
+# Masking at ingestion time
+
+Masking and obfuscation can be very useful to hide specific content.
+
+In the JSON file in this folder, you can find a Data Collection Rule that contains a *transformationKql* which performs two different kinds of masking. Below we break the query down into the two different kinds.
+
+## Masking last 4 digits of Social Security Number
+
+The first masking is done on a field called *SSN* which is supposed to contain a Social Security Number. The goal is to replace the first 5 numbers in the SSN with Xs, but only when the SSN is valid. If the SSN is invalid, then we should have an *Invalid SSN* message. This is the part of the transformation that does that:
+
+```kusto
+source 
+| extend parsedSSN = split(SSN,'-') 
+| extend SSN = iif(SSN matches regex @'^\\d{3}-\\d{2}-\\d{4}$' 
+and not( SSN matches regex @'^(000|666|9)-\\d{2}-\\d{4}$') 
+and not( SSN matches regex @'^\\d{3}-00-\\d{4}$') 
+and not (SSN matches regex @'^\\d{3}-\\d{2}-0000$' ),strcat('XXX','-', 'XX','-',parsedSSN[2]), 'Invalid SSN') 
+```
+
+As you can see, first we use ```split``` to separate the SSN field using ```-``` as a separator and we store it in a temporary field. We then use ```iif``` to do the replacement, only if the field matches a valid SSN. Inside the ```iif```, we use ```matches regex``` to find the pattern of a Social Security Number (including exclusions). If the SSN is valid, we use ```strcat``` to leave the last 4 digits but replace the first 5 digits with Xs. If it's not valid, we replace with *Invalid SSN*. Finally, we remove the intermediary field.
+
+## Removing Personal Identifiable Information
+
+This second masking will completely replace a field that contains an email address. If it's not a valid email address, we leave it as is. Here is the transformation:
+
+```kusto
+source
+| extend Email = iif(Email matches regex @'^[a-zA-Z0-9+_.-]+@[a-zA-Z0-9.-]+$','PII data removed', Email)
+```
+
+This one is easier, because we want to replace the whole field if it matches the regex, if not, we leave the field content as is.
+
+Deploy this DCR (includes both SSN and email maskings):
+
+[![Deploy this DCR to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2FAzure-Sentinel%2Fmaster%2FTools%2FTransformations-Library%2FMasking%2FMaskingDCR.json)
--- a/Tools/Transformations-Library/Media/AAD_multi-ws.png
+++ b/Tools/Transformations-Library/Media/AAD_multi-ws.png
--- a/Tools/Transformations-Library/README.md
+++ b/Tools/Transformations-Library/README.md
@ -0,0 +1,27 @@
+# Microsoft Sentinel Transformations Library
+
+This repository contains samples for multiple scenarios that are possible thanks to the new Log Analytics Custom Logs v2 and pipeline transformation features.
+
+### Filtering
+
+Ingestion time transformation allows you to drop specific fields from events or even full evets that you don't need to have in the workspace.
+
+1. [Dropping fields](./Filtering#dropping-fields)
+2. [Dropping entire records](./Filtering#dropping-rows)
+3. [Multiple workspaces for idependent entities](./Filtering#multiple-workspaces-for-idependent-entities)
+
+### Enrichment/Tagging
+
+Adding additional context to an event can greatly help analysts in their scoping and investigation process.
+
+1. [Enriching an event or a field in the event with additional meaningful information](./Tagging#enriching-an-event-with-additional-meaningful-information)
+2. [Translating a value into a customer’s business related value (Geo, Departments,…)](./Tagging#translating-a-value-into-a-customers-business-related-value)
+
+
+### PII Masking/Obfuscation
+
+Another scenario is obfuscation or masking of PII information. This can be Social Security Numbers, email addresses, phone numbers, etc.
+
+1. [Masking last 4 digits of SSN](./Masking#masking-last-4-digits-of-social-security-number)
+2. [Removing email addresses](./Masking#removing-personal-identifiable-information)
+
--- a/Tools/Transformations-Library/Tagging/EnrichmentDCR.json
+++ b/Tools/Transformations-Library/Tagging/EnrichmentDCR.json
@ -0,0 +1,66 @@
+{
+    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
+    "contentVersion": "1.0.0.0",
+    "parameters": {
+        "dataCollectionRuleName": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the name of the Data Collection Rule to create."
+            }
+        },
+        "location": {
+            "defaultValue": "westus2",
+            "allowedValues": [
+                "westus2",
+                "eastus2",
+                "eastus2euap"
+            ],
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the location in which to create the Data Collection Rule."
+            }
+        },
+        "workspaceResourceId": {
+            "type": "String",
+            "metadata": {
+                "description": "Specifies the Azure resource ID of the Log Analytics workspace to use."
+            }
+        }
+    },
+    "resources": [
+        {
+            "type": "Microsoft.Insights/dataCollectionRules",
+            "apiVersion": "2021-09-01-preview",
+            "name": "[parameters('dataCollectionRuleName')]",
+            "location": "[parameters('location')]",
+            "kind": "WorkspaceTransforms",
+            "properties": {
+                "destinations": {
+                    "logAnalytics": [
+                        {
+                            "workspaceResourceId": "[parameters('workspaceResourceId')]",
+                            "name": "clv2ws1"
+                        }
+                    ]
+                },
+                "dataFlows": [
+                    {
+                        "streams": [
+                            "Microsoft-Table-AWSVPCFlow"
+                        ],
+                        "destinations": [
+                            "clv2ws1"
+                        ],
+                        "transformKql": "source | extend Int_Ext_IP_CF = case(toint(case(substring(SrcAddr,0,3) contains '.', substring(SrcAddr,0,2), substring(SrcAddr,0,3))) >100, 'Internal IP', 'External IP')"
+                    }
+                ]
+            }
+        }
+    ],
+    "outputs": {
+        "dataCollectionRuleId": {
+            "type": "String",
+            "value": "[resourceId('Microsoft.Insights/dataCollectionRules', parameters('dataCollectionRuleName'))]"
+        }
+    }
+}
--- a/Tools/Transformations-Library/Tagging/README.md
+++ b/Tools/Transformations-Library/Tagging/README.md
@ -0,0 +1,30 @@
+# Tagging at ingestion time
+
+Tagging each record with additional information can be extremely useful to add context and ease the resolution of security incidents. We have seen this feature being used by customers that need to identify each record with the owning entity/team. Another use case is adding more info about an event, for example, adding a text description to an event ID.
+
+## Enriching an event with additional meaningful information
+
+In this case we add context to the event. This example, tags all events with the type of IP address in the *SrcAddress* field.
+
+```kusto
+source 
+| extend Int_Ext_IP_CF = case(toint(case(substring(SrcAddr,0,3) contains '.', substring(SrcAddr,0,2), substring(SrcAddr,0,3))) >100, 'Internal IP', 'External IP')
+```
+
+As you can see, we check if the first octet in the source IP address is greated than 100, in which case we tag it as internal.
+
+Deploy this DCR:
+
+[![Deploy this DCR to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2FAzure-Sentinel%2Fmaster%2FTools%2FTransformations-Library%2FTagging%2FEnrichmentDCR.json)
+
+## Translating a value into a customer’s business related value
+
+Here we two things: extract contents from a dynamic (json) field intoa separate fields and using internal information to add company division names.
+
+```kusto
+let divisions = parse_json('{"US": "HQ-WW","IL": "CyberEMEA"}');
+source
+| extend division_CF = divisions[Location], city_CF = tostring(LocationDetails.city) | extend countryOrRegion_CF = tostring(LocationDetails.countryOrRegion) | extend state_CF = tostring(LocationDetails.state)
+```
+
+As you can see, we start by creating a collection of JSON key value pairs, which we then use to populate the division custom field. The rest of the custom fields are extracted from an existing JSON field in the original record. This will avoid additional effort when querying these logs as part of investigations.