+ Fix build issue from publics directory layout change
+ Fix comments mentioning pu32PubExp
+ Add -mspeculative-load-hardening to module compiler flags when compiled with clang
Related work items: #33693362
Internal RBG state in SymCrypt for Open Enclave - seeded via RDSEED, our SP 800-90B compliant entropy source.
Small random byte request cache logic copied from Windows codebase. One RBG state, will reseed once the counter gets high enough
+ Documentation issues identified externally and in code review
+ Add note that Import DH and RSA keys do not check primality
but may do so in future
+ In `SymCryptEcDsaSignEx` return an error
if the produced signature is not valid when `piK` is provided
+ Signed overshift identified by UB sanitizer
+ Unittest memory leaks identified by leak sanitizer
+ Technical buffer overrun identified by memory sanitizer
+ Rejig ADO pipelines to run Debug, Release, and Sanitize modes
using pipeline template to avoid too much duplication
+ Sanitize mode enables Address, Leak, and Undefined Behavior
sanitizers on a version of the build that is otherwise much like a
debug build. This should enable us to identify many forms of subtle
bugs at the point of merging a PR.
+ I've only enabled Sanitizers in Linux builds so far - there is
address sanitizer support in MSVC too which we could similarly
enable
+ There are known alignment issues identified by UB sanitizer. We rely
on compilers to sanely translate use of unaligned pointers to 2B, 4B,
and 8B types with unaligned loads/stores, and trying to avoid this
will have an impact on performance and code size for no real gain
+ For now avoid using the alignment sanitizer
Related work items: #26527318, #26527332
This change adds a new SymCrypt shared object module for Linux. The shared object module implements integrity verification for FIPS compliance by reading its own memory at runtime, reversing any relocations, calculating the HMAC-SHA256 digest of the module contents in memory, and comparing the digest to a known-good value which is injected into the module (outside the FIPS boundary) post compilation by a Python script.
Related work items: #30397153, #30397542, #30397643, #30397707, #30397781, #32407416
+ For review, it may be easier to look at the individual commits, as I tried to keep each issue in a separate commit.
+ Tighten up EcDsaSignEx. In a codepath which is only intended to be exercised by KATs there was the ability to branch on uninitialized memory due to lack of validation.
+ Throw an INVALID_ARGUMENT error rather than read uninitialized memory.
+ Also update the documentation for SymCryptEcpointScalarMul to make further errors of this type less likely in future, and add a debug check to catch any cases which do make through to a running binary.
+ Avoid signed overshift in LOAD_LSB/MSB macros for unknown platforms
+ Fix arm64 build - something changed which meant intrin.h was not being included for arm64
+ Tighten the CFB block cipher mode to avoid an infinite loop if the specified cbShift amount is 0
+ The documentation already says only 1 or blockSize are supported, so just default to blockSize when it is not 1
+ I double checked in BCrypt and I think there is no risk of DOS from this today, but makes sense to avoid the issue at the SymCrypt level
+ Update SymCryptEcurveIsSame to also compare the curve types. Also add documentation note that callers may want to additionally compare the curves' G values (we don't do this in SymCryptEcurveIsSame as checking the Gs for equality requires scratch)
+ Update SymCryptMontgomeryDoubleAndAdd to additionally handle a first source point which is not normalized, and optionally perform an additional Modular multiplication to have the correct result in this case
+ It is OK to optionally perform the multiplication, as the value of the normalized flag should be non-secret
+ This avoids a few issues we had previously, where we forcibly normalized the source point in place and had inconsistent handling of the case where a source point was the point at infinity.
+ It also appears to give around 15% performance uplift for ECDH (ModInv is more expensive than a few extra ModMuls) :)
+ Fix SymCryptTwistedEdwardsIsZero check to only return true when the value of Y * Z is 1 (which implies X is 0 when the point is on the curve)
+ Previously we returned true when X is 0, which actually selects 2 points on the curve (one of which is never in the subgroup generated by a valid distinguished point)
+ Avoid technically undefined behavior in SymCryptTlsCbcHmacVerify
+ Don't modify pbData to point before the buffer it refers to and then index from the modified pointer, instead modify the indices before using them with the unchanged pbData pointer
+ Avoid technically undefined behavior with memcpy across SymCrypt
+ Don't invoke memcpy with source == destination
+ Don't invoke memcpy with source or destination == NULL, and count == 0
+ Revert recent Arm AES change that regressed performance on LITTLE cores with no appreciable impact on code size (the intent was a loop would be a bit smaller and not impact on performance)
+ Avoid potential for leaking mask in SymCryptEcpointMaskedCopy
+ We use t...
+ Add validation flags for DlKey and EcKey Import and Generate
+ Enables callers to perform the correct amount of validation to be compliant with SP800-56arev3
+ Add support for named safe-prime groups in DH, as shortest path to supporting compliance using FFC
+ This will enable BCrypt callers to explicitly or implicitly use named safe-prime groups groups and have the requisite validation, rather than having to create a new in-memory representation of DH keys to expose the prime Q
+ Add tests to for validity of the named safe-prime group constants, so the unit tests should pick up on inadvertent changes, or errors in definitions of new safe-prime groups if/when they may be added
+ Enable the strictest validation that makes sense in all existing tests to exercise validation code.
+ Also randomly exercise the use of all the different valid flag combinations in tests
Related work items: #26526301, #26527152, #31528841, #31528936, #31529044, #31782556
Publishing a PR as I've got to a state where we have good improvements
across the board and want to have feedback on whether this is a good way
to structure the code changes.
+ Implement stitched functions for AES-GCM xmm/neon
+ For AES-GCM enc/dec part functions, there is still a reasonable amount
of shared code to handle partial blocks. This should be rarely hit,
and I've kept the logic in aes-default.c - we only dispatch to the
stitched functions when we have full blocks to both en/decrypt and
authenticate
+ For Xmm we are quite register constrained, so phrasing such that we
can use a maximum of 16 registers at a time is important (otherwise we
can easily lose any benefit from spilling/loading unnecessarily)
+ In the hot loop we have 8 registers containing AES-CTR state, 3
registers containing GHASH accumulators and have 5 registers
remaining for temporary values (round keys, multiplication
constants, byte reversal constant, data to authenticate)
+ In earlier attempts to phrase this I was trying to avoid having to
reload the data to authenticate (it always has to be in a variable
in AES-CTR as the ciphertext), but doing this means having to unroll
less and is not worth it - it is better to just (re)load data to
authenticate 1 block at a time in parallel with doing some number of
AES rounds
+ For Neon I noticed we have been doing a redundant EXT in the inner
loop of GHASH (we swap top and bottom halves of the same v register
twice to do byte reversal and then compute top half EOR bottom half
for karatsuba multiplication). Rephrased to avoid this.
+ Otherwise the phrasing is very similar to Xmm. Could investigate
avoiding the reloading of GHASH data as we have very low register
pressure given 32 ASIMD registers. Currently do not have access to a
machine which merges AES instruction pairs (only have access to a
TX2), so I'm not going to look into this too much right now.
+ Still to do:
+ Expose specific AesGcm encrypt/decrypt functions from SymCrypt which
further reduce overheads (can avoid some checks which are not avoidable when
only have an optimized partial gcm function)
+ Implement SymCryptAesGcmXXcryptStitchedYmm to use VAES / VPCLMULQDQ
Depending on feedback I can either include these todos in another iteration of this PR,
or complete this and create a separate PR to handle these cases.
Related work items: #31417357, #31417376
+ Use offsetof in place of macro casting pointer to value with not
guaranteed results for all compilers (requires stddef.h)
+ Updated header documentation for RdRand/Rdseed to indicate that
SymCrypt does not validate the randomness of the results of
RdRand/RdSeed
+ Update some SAL annotations and whitespace in Gcm/Ccm
+ Clear the Ccm state in a failed DecryptFinal call rather than in the
full Decrypt function (so we have the same behaviour of clearing the
Ccm state for incremental processing on a failed decryption)
+ Check whether the result of EcpointScalarMul and EcpointMultiScalarMul
is a representation of the zero point, and if so ensure the result is
the canonical zero point of the point representation.
+ For EcpointScalarMul, do this in a constant time way - do not care
in the the MultiScalar case (as we only have implemented the non
side-channel safe version for now)
+ More consistently use bytesToProcess as a variable name in Gcm and Ccm
and make AddMacData functions the same shape
Related work items: #30307538, #30786227, #30965195
+ Move XORs off critical path for modes dominated by long chain of
dependent AES. We can merge XOR of data, first round key and last round
key
+ Gives performance uplift for AesCbc encryption and AesCbcMac
+ The performance uplift varies depending upon the latency of XOR and AES
+ For Skylake server AES takes 4 cycles and XOR 1 cycles, so we see an uplift
between ~3.5% (AES256) and ~5% (AES128b key)
+ For IceLake client AES take 3 cycles and XOR takes 1 cycle (and there appears to
be an additional 2 cycle penalty for mixing AES and XOR instructions), so we see
an uplift between ~9.5% and ~13%
+ For older uarch, the change will be less pronounced
+ Also tidy up some rogue SAL annotations and remove some function declarations
that are never defined
+ I want to do the same change for aes-neon but I can make that change
once I have access to an Arm64 machine again
Related work items: #31244822
+ Various whitespace changes to improve consistency
+ Handle rare sizes of buffers in Xmm / Neon directly rather than falling back to C version of AES
+ Call Xmm version for tail cases of Ymm / Zmm
+ Use VPCLMULQDQ in MUL_ALPHA8 and avoid use of temporaries
+ To be tested on Arm / IceLake
Product code:
- Added SymCryptChaCha20Poly1305Encrypt function declaration to symcrypt.h.
- Added SymCryptChaCha20Poly1305Decrypt function declaration to symcrypt.h.
- Added new file chacha20_poly1305.c.
- Created function definition SymCryptChaCha20Poly1305Encrypt.
- Created function definition SymCryptChaCha20Poly1305Decrypt.
- Created function definition SymCryptChaCha20Poly1305ComputeTag.
- Created CHACHA20_POLY1305_MAX_DATA_SIZE define.
Test code:
- Added unit tests for CHACHA20_POLY1305 authenticated encryption algorithm.
- Added the one (and only) test case for CHACHA20_POLY1305 from RFC 8439 to kat_authenc.dat.
- Added "partial" field to kat_authenc.dat to determine if algorithm supports partial encryption/decryption.
- Modified testAuthEncRandom to handle algorithms that do not support partial encryption/decryption.
- Added mode type: ModeNone for authenticated encryption algorithms without a mode.
- Fixed typo in testdsa.cpp that resulted in a buffer overread which prevented unit tests from running successfully with app verifier enabled.
Testing:
- Built all flavors using scbuild.
- Ran symcryptunittest.exe on x86 (fre|chk) and amd64 (fre|chk) with page heap and app verifier enabled.
Related work items: #25803596
This change adds a CMake toolchain file for compiling SymCrypt for Windows x86 targets, using an AMD64 compiler host. I had to disable testing YMM register saving/restoring for the unit test, because the latest CRT's implementation of memcpy uses the XMM registers even on x86, which causes the test to fail. I disabled the test across the board rather than only for x86, because the AMD64 implementation was already ignoring changed XMM values, so it didn't really validate anything. I updated the relevant comments to reflect this change.
Also added some comments to the build scripts and fixed compilation errors in printtable.cpp on Linux and x86 Windows.
Related work items: #15886191
First stage of modularizing the test code. I wrote new test code for RSA signing and RSA encryption.
Added conditional compilation switches to remove the various implementations (RSA32, CNG, CAPI, Bignum, …)
Old test code is still around, some in files called "old-*" and some commented out. We can remove it once we are happy with the new tests.
Fixed one bug found by the new tests.
Related work items: #15886191
The previous fixed restored the error code when the output buffer is too small, but we shouldn't return an error code if the output buffer is NULL.
Related work items: #23466960
The build system was designed for a single-branch Source Depot world. Changes include:
- Change release numbering to <API version>.<minor version>; start at <API version> = 100 to avoid aliasing.
- build system no longer updates repo.
- Separate build command for increasing version #.
- Include detailed build info in library for source tracking
- Include build info in CAB filename.
Fixed a few minor issues as well.
- Fix spelling mistakes from a GitHub pull request
- Add verification that header & lib are the same version
Related work items: #20681107