8435aaf675
Signed-off-by: CoreNet Build Svc <NetwkBld@Microsoft.com> |
||
---|---|---|
Examples | ||
Results | ||
Tests | ||
helpers | ||
.gitignore | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md | ||
Validate-DCB.psd1 | ||
Validate-DCB.psm1 | ||
WhatsNew.md | ||
appveyor.yml |
README.md
⭐ More by the Microsoft Core Networking team
Find more from the Core Networking team using the MSFTNet topic
What's New in v2.2
For more information, please see What's New
Getting Started
Description
Validate-DCB v2.1 is a PowerShell-based unit test tool that allows you to:
✔️ Validate the expected configuration on one to N number of systems or clusters
✔️ Validate the configuration meets best practices
Additional benefits include:
✔️ The configuration doubles as DCB documentation for the expected configuration of your systems.
✔️ Answer "What Changed?" when faced with an operational issue (see Test Results)
✔️ [New with version 2] Deploy the configuration to nodes
ℹ️ Note: This tool does not modify your system unless you specify the -Deploy command. As such, you can re-validate the configuration as many times as desired.
Overview
RDMA over Converged Ethernet (RoCE) requires Data Center Bridging (DCB) technologies to make a network fabric lossless. The configuration requirements are complex and error prone, requiring exact configuration and adherence to best practices across:
➡️ Each Windows Node
➡️ Each network port RDMA traffic passes through on the fabric
This tool aims to validate the DCB configuration on the Windows nodes by taking an expected configuration as input and unit tests each Windows system.
❗ Important: The validation of the network fabric is out-of-scope for this tool
Here's a quick introductory video from Microsoft Premier Field Engineer, Jan Mortensen.
Scenarios
Validate-DCB will provide configuration validation for one or more nodes or clusters across a variety of scenarios including:
➡️ Native RDMA Adapters (Mode 1)
➡️ Host vNIC RDMA (Mode 2) with vNICs in the parent partition
➡️ Combination scenarios with both Native RDMA and Host Virtual NICs
➡️ Multiple virtual switches with RDMA enabled adapters
⚠️ For step-by-step configuration instructions, please see the Converged NIC Guide. Alternatively, you can use the deployment options in version 2
Test Overview
Test Types
Currently all tests in Validate-DCB are unit tests. That is, they break down and check individual configuration items one by one, rather than a holistic or functional test. In the future, we may incorporate integration/acceptance testing.
Tests are broken down into two types:
➡️ Global - Tests the TestHost, Each SUT, and Configuration File for prerequisites. If anything fails here, Validate-DCB will not move onto the actual DCB tests.
➡️ Modal - Tests each SUT for RDMA and configuration best practices
For more information, please see Test Details
Test Results
Testing with Azure DevOps and a CI/CD pipeline
Besides the on-screen feedback provided by the tool, results of the tests are stored in NUnitXML format in the \Results folder. These Results can be stored for historical reasons and take part in a CI/CD pipline as shown in Building a Continuous Integration and Continuous Deployment pipeline with DSC
Simple report using PowerBi
You can also use PowerBi to make displaying results easy. For more information, please see Using the Results or see this video from Microsoft Premier Field Engineer, Jan Mortensen.
Interpreting Test Results
Validate-DCB may not work with other languagues. In this case, use the test as guidance on how to verify your configuration.
How Test Output is Constructed
Tests are constructed hierarchically. Describing blocks contain one or more Context blocks. Context blocks contain one or more tests. This is Pester terminology outside the scope of this documentation. Pester is a PowerShell-based unit testing framework included inbox with Windows 10, Server 2016 and Server 2019.
While we have future plans to include more sections, currently the only two possible describe blocks are:
➡️ [Global Unit]
tests requirements or prerequisites to run the modal tests
➡️ [Modal Unit]
tests a node's configuration or best practices
A context block is a group of one or more tests. For example, Validate-DCB may test a physical NetAdapter's Advanced Properties including the VLANID or NetwordDirect (RDMA in driver terms) settings. These would be grouped in the same context.
Describe or Context Titles
Each Describe, Context, and Test includes a title enclosed in square brackets [ ]. Information inside these square brackets are intended to guide you to the necessary details to either resolve a failing test, or understand what just passed. Let's use this as an example:
➡️ Describing [Modal Unit]
contains unit tests for the RDMA modes of operation (NDK mode 1 or 2)
➡️ Context
can be broken down as follows:
↪️ [Modal Unit]
– The describe block this Context is within
↪️ [VMSwitch.RDMAEnabledAdapters]
– The section of the config file currently being testing.
↪️ [SUT: TK5-3WP07R0511]
– The hostname of the current System Under Test
In this example, the current context is used for testing an adapter that is expected to be enabled for RDMA and connected to a VMSwitch.
This adapter exists below the VMSwitch section of the configuration file.
✅ Note: During runtime, a variable named $ConfigData contains the information from the config file. With a debugger attached, you can walk the variable like this:
[DBG]: PS C:\> $ConfigData.AllNodes.VMSwitch.RDMAEnabledAdapters
Passing Tests
If your system passes a test you will see green text similar to this:
+ [SUT: TK5-3WP07R0511]-[VMSwitch: VMSTest]-[RDMAEnabledAdapter: RoCE-01]-[Noun: NetAdapter] Interface status must be "Up"
Using the above image as an example, you can interpret this passing test as:
▶️ The SUT
named TK5-3WP07R0511
↪️ is expecting the RDMAEnabledAdapter
named RoCE-01
↪️ intended to back the VMSwitch
named VMSTest
✔️ to have an interface operation status of "Up"
You can verify this using the PowerShell noun identified in the test (in the example, this is NetAdapter
).
Failing Tests
If your system is incorrectly configured, the test will provide an error message on-screen.
Unlike most PowerShell scripts, red error messages do not indicate an exception or failing code. Rather this (typically) is indicating a failing test. Another words, this is highlighting something you need to fix.
Failing tests give information to identify the misconfiguration. In the failing test shown below (red output), the RDMAEnabledAdapter
named RoCE-02
on SUT
named TK5-3WP07R0511
was expected to be attached to the VMSwitch
named VMSTest
.
As you can see above, the Enabled property corresponding to the:
By running Get-NetAdapterBinding on the SUT you can see this for yourself.
Here's another video from Microsoft Premier Field Engineer, Jan Mortensen, who reviews and validates errors found with Validate-DCB
Reviewing the Tests
You may also find it useful to review the code generating the failing test. To do this, navigate in the folder structure to the file and line specified in the test failure, for example:
This message identifies the file and line number of the failing test.
Now navigate to the file and review the code.
If you’re still stuck and want to review the variables during runtime, you can set a breakpoint on the line above that specified in the test failure (the test failed at line 490 so the breakpoint at 489 as shown here):
⚠️ If searching for a test in the code,please be aware that parenthesis typically indicates variables that are being expanded. All other test descriptions should be searchable.
For example, in this test description the exact driver version is specific to a particular NIC manufacturer (in this case 1.90.19240.0) and therefore, you cannot search for this in the test as it’s an expanded variable.
Resolving Test Failures
To complete our example above, we need to resolve the configuration issue. To do this, we'll attach the adapter(s) to the VMSwitch so the binding is now enabled.
Getting Started
Installation
Validate-DCB is now published in the PowerShell gallery. Please use Install-Module Validate-DCB
from a system with internet connectivity.
For disconnected systems, use Save-Module -Name Validate-DCB -Path c:\temp\Validate-DCB
then move the modules in c:\temp\Validate-DCB to your disconnected system. Here's a video from Microsoft Premier Field Engineer, Jan Mortensen.
Requirements
-
TestHost: Windows 10, Windows Server 2016, or Windows Server 2019. The TestHost can also be a SUT if it is the appropriate OS.
-
System Under Test (SUT): Windows Server 2016 or Windows Server 2019
-
Configuration File: This is a file that defines the expected configuration on the SUTs.
Configuration File
Regardless of the scenario, you need a configuration file to define the expected configuration on your systems. Validate-DCB then checks that each system matches the expected configuration. With Validate-DCB v2.1 we recommend using the user interface to create the configuration for you. To do this, run Validate-DCB
without parameters. For more information on customizing your own file, please see: Customize your Config
Running Validate-DCB
To begin testing, complete the wizard mentioned in the previous section or run Validate-DCB -ConfigFilePath <Path to your configuration file>.ps1
if you have an existing configuration file you wish to use.
Additionally, you can connect Validate-DCB with your Azure Automation account to first deploy the configuration (then validate).
ℹ️ Note: For full parameter help use:
Get-Help Validate-DCB
Here are a few tips on the parameters of the parameters.
Parameter | Description |
---|---|
TestScope | Determines the describe block to be run. You can use this to only run certain describe blocks. For example: Use Global if you just want to setup a test host or validate your systems are ready to be tested. Use Modal if you have already know you have all the prerequisites met. |
LaunchUI | Use this parameter to launch a user interface that helps create a configuration file. |
ExampleConfig | Use this to select one of the pre-defined configuration files that will test a system in Mode 1 or Mode 2. For more information on the example configuration guides, please see Examples. For details about the configuration for these modes, please review the Converged NIC Guide |
ConfigFilePath | Use this parameter to specify the path to a custom configuration file. |
ContinueOnFailure | If a test fails in one of the Describe blocks, Validate-DCB exits prior to moving to the next Describe block allowing you to correct the issue. Use this to attempt all tests even if a test failure is detected. |
Deploy | Use this parameter to deploy the configuration to all specified nodes prior to validating the configuration |