README.md
This folder has sample data for different data connectors that can be leveraged by all Microsoft Sentinel contributions
Sample Data Contribution Guidance
Sample data is extremely useful when troubleshooting issues, supporting and/or enhancing the Data Connectors with more Security-focused content (such as Analytics, Hunting Queries, Workbooks, etc.). So, for every data connector committed, authors must also upload the following three (3) files:
Expected file name | Source | Expected samples in the file | Expected file extension |
---|---|---|---|
ProductName_RawLogs | Product | Should contain raw logs directly from the source of the logs | .txt* (for CEF/Syslog based Data Connectors) or .json (for API – based Data Connectors) |
ProductName_IngestedLogs | Log Analytics Workspace | Should contain logs exported after ingestion into a Log Analytics Workspace | .csv* for all Data Connectors |
ProductName_Schema | Log Analytics Workspace | Should have the schema exported from Log Analytics | .csv* for all Data Connectors |
Note: Replace "ProductName" with the actual name of the Product or data connector.
*Guidance on how to extract these files is below.
Important: Contributors must upload log samples of all types of events that are generated by the product and captured by the data connector. These events may include different event results and response actions that the product generates. It’s also important to ensure that log details include fields and/or values that include information that can be normalized. Please refer to the Advanced Security Information Model (ASIM) documentation for more details. These fields include, but are not limited to usernames, IP addresses, IDs, hostnames, etc.
Logs format Guidance
Raw logs (directly from the source)
The format for the file that will contain raw data varies depending on the type of connector. The format for the file can be json (for API based Data Connector) / text (.txt) file (for Syslog/CEF based data Connectors) with the column names / property names adhering to the data type property names.
Below is a sample of the CEF formatted logs in their raw form:
Mar 20 10:12:18 192.168.1.5 CEF: 0|Check Point|VPN-1 & FireWall-1|Check Point|geo_protection|Log|Unknown|act=Drop cs3Label=Protection Type cs3=geo_protection deviceDirection=0 rt=1584698718000 spt=58429 dpt=27016 ifname=eth0 logid=65536 loguid={0x5e74955f,0x0,0x501a8c0,0x19633097} origin=192.168.1.5 originsicname=cn=cp_mgmt,o=FlemingGW..y76ath sequencenum=2 version=5 dst=192.168.1.5 dst_country=Internal inspection_information=Geo-location inbound enforcement inspection_profile=Default Geo Policy product=VPN-1 & FireWall-1 proto=17 src=123.113.101.36 src_country=Other
Mar 20 10:12:19 192.168.1.5 CEF: 0|Check Point|VPN-1 & FireWall-1|Check Point|geo_protection|Log|Unknown|act=Drop cs3Label=Protection Type cs3=geo_protection deviceDirection=0 rt=1584698718000 spt=58429 dpt=27019 ifname=eth0 logid=65536 loguid={0x5e749560,0x0,0x501a8c0,0x19633097} origin=192.168.1.5 originsicname=cn=cp_mgmt,o=FlemingGW..y76ath sequencenum=3 version=5 dst=192.168.1.5 dst_country=Internal inspection_information=Geo-location inbound enforcement inspection_pro^C
Below is a sample of a syslog message in its raw form:
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] BOMAn application event log entry.
Raw logs from API-based connectors can be extracted by leveraging an API usage platform (such as Postman) and using it to make an API call to the product and capturing a response. Below is a sample API response captured in its raw form:
[
{
"ts": "2020-03-20T16:00:10.144989Z",
"eventType": "File Scanned",
"clientName": "COMPUTER-M-V78J",
"clientMac": "10:dd:b1:eb:88:f8",
"clientIp": "192.168.128.2",
"srcIp": "192.168.128.2",
"destIp": "119.192.233.48",
"protocol": "http",
"uri": "http://www.favorite-icons.com/program/FavoriteIconsUninstall.exe",
"canonicalName": "PUA.Win.Dropper.Kraddare::1201",
"destinationPort": 80,
"fileHash": "3ec1b9a95fe62aa25fc959643a0f227b76d253094681934daaf628d3574b3463",
"fileType": "MS_EXE",
"fileSizeBytes": 193688,
"disposition": "Malicious",
"action": "Blocked"
},
{
"ts": "2022-03-08T01:18:30.072163Z",
"eventType": "IDS Alert",
"deviceMac": "ac:17:c8:21:1c:70",
"clientMac": "",
"srcIp": "45.137.23.246:42101",
"destIp": "84.14.28.183:9034",
"protocol": "udp/ip",
"priority": "1",
"classification": "9",
"blocked": false,
"message": "SERVER-OTHER RealTek UDPServer command injection attempt",
"signature": "1:58853:1",
"sigSource": "ids-vrt-balanced",
"ruleId": "meraki:intrusion/snort/GID/1/SID/58853"
}
]
Post-ingestion logs
The post-ingestion logs are exported from log analytics using the Export option in the query window. The format of the file will be csv as exported from Log Analytics JSON irrespective of the data connector type. These logs are important in helping in understanding how the information from raw logs has been mapped to fields.
Schema
The schema, similar to post-ingestion logs can be exported from log analytics using the Export option in the query window. The exported file is a csv. This is important to understand the schema of the table that the logs are ingested in.
Log Extraction Guidance
Extracting ingested logs from Log Analytics Workspace
Ingested logs can be extracted by running a KQL query in the Logs window in Microsoft Sentinel/Log Analytics Workspace. Typing a basic query to get all all logs ingested by a Data Connector will get you the logs along with the defined schema. After you run the query, click on Export and then click Export to CSV - all columns.
Extracting raw logs for CEF/Syslog based connectors
We have several ways to capture the original data that comes from syslog devices and that is getting ingested into syslog-ng or rsyslog sever. One of the way is to capture the traces on syslog-ng or rsyslog server over 514 port. You can use the following command to captre the traffic into pacp file
sudo tcpdump -s 0 -Ani any port 514 -vv -w /var/log/syslog.pcap
Once we have the pcap file, we can visualize the events using utility "tcpick" and export into readable format
tcpick -C -yP -r syslog.pcap > sampledata.log
nano sampledata.log
Extracting the schema
To extract the schema of the table in a csv file, run the following query in a log analytics query window:
TableName | getschema
Note: Replace "TableName" in the above query with the actual name of the table before executing it in Log Analytics. This will return the schema of the table which can then be exported to a csv file using the Export option as described above for post-ingested logs.
Sample data upload to GitHub
Once you've gathered all three files, submit them via a GitHub PR. All three files must reside inside a folder called "Sample Data" within the Solution folder. Example folder structure - "Azure-Sentinel/Solutions//Sample Data/".
Important: Please ensure all sample data has been scrubbed to remove all sensitive PII information that may exist in the logs. The intent is to understand the "what" and "how" from the logs not the "who".