In commit dfdb73e10 I accidentally renamed them from "aes128-cbc" to
"aes128" (and ditto for the other two key lengths), probably because
of the confusing names of the C-level identifiers for those vtables.
Now restored to the versions actually described in RFC 4253.
In the bitsliced implementation, the addition of 0x63 as the last
operation inside the S-box actually costs cycles during encryption -
four bitslice inversions - which can be easily eliminated by detaching
the constant, moving it forward past the ShiftRows and MixColumns
(with both of which it commutes) until it's adjacent to the next round
key addition, and then folding it into the round key during key setup.
I had this idea while I was originally writing this implementation,
but deferred actually doing it because it made all the intermediate
results harder to check against the standard test vectors. Now the
code is working and stable, this is the moment to come back and do it.
That flag was missing from all the CBC vtables' flags fields, because
my recent rewrite forgot to put it in. As a result the SSH_MSG_IGNORE
defence against CBC length oracle attacks was not being enabled.
All those functions like aes256_encrypt_pubkey and des_decrypt_xdmauth
previously lived in the same source files as the ciphers they were
based on, and provided an alternative API to the internals of that
cipher's implementation. But there was no _need_ for them to have that
privileged access to the internals, because they didn't do anything
you couldn't access through the standard API. It was just a historical
oddity with no real benefit, whose main effect is to make refactoring
painful.
So now all of those functions live in a new file sshauxcrypt.c, and
they all work through the same vtable system as all other cipher
clients, by making an instance of the right cipher and configuring it
in the appropriate way. This should make them completely independent
of any internal changes to the cipher implementations they're based
on.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
The refactored sshaes.c gives me a convenient slot to drop in a second
hardware-accelerated AES implementation, similar to the existing one
but using Arm NEON intrinsics in place of the x86 AES-NI ones.
This needed a minor structural change, because Arm systems are often
heterogeneous, containing more than one type of CPU which won't
necessarily all support the same set of architecture features. So you
can't test at run time for the presence of AES acceleration by
querying the CPU you're running on - even if you found a way to do it,
the answer wouldn't be reliable once the OS started migrating your
process between CPUs. Instead, you have to ask the OS itself, because
only that knows about _all_ the CPUs on the system. So that means the
aes_hw_available() mechanism has to extend a tentacle into each
platform subdirectory.
The trickiest part was the nest of ifdefs that tries to detect whether
the compiler can support the necessary parts. I had successful
test-compiles on several compilers, and was able to run the code
directly on an AArch64 tablet (so I know it passes cryptsuite), but
it's likely that at least some Arm platforms won't be able to build it
because of some path through the ifdefs that I haven't been able to
test yet.
Ahem. I went to all the effort of setting up a wrapper function that
would store the result of the first call to aes_hw_available(), and
managed to forget to make it set the flag that said it _had_ stored
the result. So the underlying query function was being called every
time.
The 32-bit x86 Windows build can crash with an alignment fault the
first time it tries to write into the key schedule array, because it
turns out that the x86 VS C library's malloc doesn't guarantee 16-byte
alignment on the returned block even though there is a machine type
that needs it.
To avoid having to faff with non-portable library APIs, I solve the
problem locally in aes_hw_new, by over-allocating enough to guarantee
that an aligned block of the right size must exist somewhere in the
region.
sshaes.c is more or less completely changed by this commit.
Firstly, I've changed the top-level structure. In the old structure,
there were three levels of indirection controlling what an encryption
function would actually do: first the ssh2_cipher vtable, then a
subsidiary set of function pointers within that to select the software
or hardware implementation, and then inside the main encryption
function, a switch on the key length to jump into the right place in
the unrolled loop of cipher rounds.
That was all a bit untidy. So now _all_ of that is done by means of
just one selection system, namely the ssh2_cipher vtable. The software
and hardware implementations of a given SSH cipher each have their own
separate vtable, e.g. ssh2_aes256_sdctr_sw and ssh2_aes256_sdctr_hw;
this allows them to have their own completely different state
structures too, and not have to try to coexist awkwardly in the same
universal AESContext with workaround code to align things correctly.
The old implementation-agnostic vtables like ssh2_aes256_sdctr still
exist, but now they're mostly empty, containing only the constructor
function, which will decide whether AES-NI is currently available and
then choose one of the other _real_ vtables to instantiate.
As well as the cleaner data representation, this also means the
vtables can have different description strings, which means the Event
Log will indicate which AES implementation is actually in use; it
means the SW and HW vtables are available for testcrypt to use
(although actually using them is left for the next commit); and in
principle it would also make it easy to support a user override for
the automatic SW/HW selection (in case anyone turns out to want one).
The AES-NI implementation has been reorganised to fit into the new
framework. One thing I've done is to de-optimise the key expansion:
instead of having a separate blazingly fast loop-unrolled key setup
function for each key length, there's now just one, which uses AES
intrinsics for the actual transformations of individual key words, but
wraps them in a common loop structure for all the key lengths which
has a clear correspondence to the cipher spec. (Sorry to throw away
your work there, Pavel, but this isn't an application where key setup
really _needs_ to be hugely fast, and I decided I prefer a version I
can understand and debug.)
The software AES implementation is also completely replaced with one
that uses a bit-sliced representation, i.e. the cipher state is split
across eight integers in such a way that each logical byte of the
state occupies a single bit in each of those integers. The S-box
lookup is done by a long string of AND and XOR operations on the eight
bits (removing the potential cache side channel from a lookup table),
and this representation allows 64 S-box lookups to be done in parallel
simply by extending those AND/XOR operations to be bitwise ones on a
whole word. So now we can perform four AES encryptions or decryptions
in parallel, at least when the cipher mode permits it (which SDCTR and
CBC decryption both do).
The result is slower than the old implementation, but (a) not by as
much as you might think - those parallel S-boxes are surprisingly
competitive with 64 separate table lookups; (b) the compensation is
that now it should run in constant time with no data-dependent control
flow or memory addressing; and (c) in any case the really fast
hardware implementation will supersede it for most users.
The old names like ssh_aes128 and ssh_aes128_ctr reflect the SSH
protocol IDs, which is all very well, but I think a more important
principle is that it should be easy for me to remember which cipher
mode each one refers to. So I've renamed them so that they all end in
_cbc and _sdctr.
(I've left alone the string identifiers used by testcrypt, for the
moment. Perhaps I'll go back and change those later.)
All access to AES throughout the code is now done via the ssh2_cipher
vtable interface. All code that previously made direct calls to the
underlying functions (for encrypting and decrypting private key files)
now does it by instantiating an ssh2_cipher.
This removes constraints on the AES module's internal structure, and
allows me to reorganise it as much as I like.
The IV-incrementing code seems to have had several bugs. It was not
propagating a carry from bit 31 to 32 of the 128-bit integer,
apparently because of having arranged the 32-bit words wrongly within
the vector register; also, when it tried to implement carry
propagation from bit 63 to 64 by checking the low 64 bits for zero, it
checked the _high_ bits instead, leading to a spurious extra addition
in the low half if the high half happened to be zero.
This must surely have been able to cause mysterious decryption
failures about once every 2^32 cipher blocks = 64Gb of data
transferred. I suppose that must be a large enough number that either
no users of the snapshot builds have encountered the problem, or the
ones who did dismissed it as computers being randomly flaky.
The revised version now passes all the same tests as the software
implementation.
I noticed a few of these in the course of preparing the previous
commit. I must have been writing that idiom out by hand for _ages_
before it became totally habitual to #define it as 'lenof' in every
codebase I touch. Now I've gone through and replaced all the old
verbosity with nice terse lenofs.
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
Previously, lots of individual ssh2_cipheralg structures were declared
static, and only available to the rest of the code via a smaller
number of 'ssh2_ciphers' objects that wrapped them into lists. But I'm
going to want to access individual ciphers directly in the testing
system I'm currently working on, so I'm giving all those objects
external linkage and declaring them in ssh.h.
Also, I've made up an entirely new one, namely exposing MD5 as an
instance of the general ssh_hashalg abstraction, which it has no need
to be for the purposes of actually using it in SSH. But, again, this
will let me treat it the same as all the other hashes in the test
system.
No functional change, for the moment.
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
This fixes a batch of clang-analyzer warnings of the form 'you
declared / assigned this variable and then never use it'. It doesn't
fix _all_ of them - some are there so that when I add code in the
future _it_ can use the variable without me having to remember to
start setting it - but these are the ones I thought it would make the
code better instead of worse to fix.
In commit 884a7df94 I claimed that all my trait-like vtable systems
now had the generic object type being a struct rather than a bare
vtable pointer (e.g. instead of 'Socket' being a typedef for a pointer
to a const Socket_vtable, it's a typedef for a struct _containing_ a
vtable pointer).
In fact, I missed a few. This commit converts ssh_key, ssh2_cipher and
ssh1_cipher into the same form as the rest.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
The annoying int64.h is completely retired, since C99 guarantees a
64-bit integer type that you can actually treat like an ordinary
integer. Also, I've replaced the local typedefs uint32 and word32
(scattered through different parts of the crypto code) with the
standard uint32_t.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
The helper functions mm_shuffle_pd_i0 and mm_shuffle_pd_i1 need the
FUNC_ISA macro (which expands to __attribute__((target("sse4.1,aes")))
when building with clang) in order to avoid a build error complaining
that their use of the _mm_shuffle_pd intrinsic is illegal without at
least sse2.
This build error is new in the recently released clang 7.0.0, compared
to the svn trunk revision I was previously building with. But it
certainly seems plausible to me, so I assume there's been some
pre-release tightening up of the error reporting. In any case, those
helper functions are only ever called from other functions with the
same attribute, so it shouldn't cause trouble.
This is more or less the same job as the SSH-1 case, only more
extensive, because we have a wider range of ciphers.
I'm a bit disappointed about the AES case, in particular, because I
feel as if it ought to have been possible to arrange to combine this
layer of vtable dispatch with the subsidiary one that selects between
hardware and software implementations of the underlying cipher. I may
come back later and have another try at that, in fact.
This is a cleanup I started to notice a need for during the BinarySink
work. It removes a lot of faffing about casting things to char * or
unsigned char * so that some API will accept them, even though lots of
such APIs really take a plain 'block of raw binary data' argument and
don't care what C thinks the signedness of that data might be - they
may well reinterpret it back and forth internally.
So I've tried to arrange for all the function call APIs that ought to
have a void * (or const void *) to have one, and those that need to do
pointer arithmetic on the parameter internally can cast it back at the
top of the function. That saves endless ad-hoc casts at the call
sites.
A user reported that the new hardware AES implementation wasn't
working, and sent an event log suggesting that it was being run in CBC
mode - which is unusual enough these days that that may well have been
its first test.
I wasn't looking forward to debugging the actual AES intrinsics code,
but fortunately, I didn't have to, because an eyeball review spotted a
nice simple error in the CBC decrypt function in which the wrong local
variable was being stored into the IV variable on exit from the
function. Testing against a local CBC-only server reproduced the
reported failure and suggested that this fixed it.
__clang_major__ and __clang_minor__ macros may be overriden
in Apple and other compilers. Instead of them, we use
__has_attribute(target) to check whether Clang supports per-function
targeted build and __has_include() to check if there are intrinsic
header files
The new AES routines are compiled into the code on any platform where
the compiler can be made to generate the necessary AES-NI and SSE
instructions. But not every CPU will support those instructions, so
the pure-software routines haven't gone away: both sets of functions
sit side by side in the code, and at key setup time we check the CPUID
bitmap to decide which set to select.
(This reintroduces function pointers into AESContext, replacing the
ones that we managed to remove a few commits ago.)
The outer routines are the ones which handle the CBC encrypt, CBC
decrypt and SDCTR cipher modes. Previously each of those had to be
able to dispatch to one of the per-block-size core routines, which
made it worth dividing the system up into two layers. But now there's
only one set of core routines, they may as well be inlined into the
outer ones.
Also as part of this commit, the nasty undef/redef of MAKEWORD and
LASTWORD have been removed, and the different macro definitions now
have different macro _names_, to make it clearer which one is used
where.
They're not really part of AES at all, in that they were part of the
Rijndael design but not part of the subset standardised by NIST. More
relevantly, they're not used by any SSH cipher definition, so they're
just adding complexity to the code which is about to get in the way of
refactoring it.
Removing them means there's only one pair of core encrypt/decrypt
functions, so the 'encrypt' and 'decrypt' function pointer fields can
be completely removed from AESContext.
Apparently I forgot to edit that when I originally imported this AES
implementation into PuTTY's SSH code from the more generically named
source file in which I'd originally developed it.
The revamp of key generation in commit e460f3083 made the assumption
that you could decide how many bytes of key material to generate by
converting cipher->keylen from bits to bytes. This is a good
assumption for all ciphers except DES/3DES: since the SSH DES key
setup ignores one bit in every byte of key material it's given, you
need more bytes than its keylen field would have you believe. So
currently the DES ciphers aren't being keyed correctly.
The original keylen field is used for deciding how big a DH group to
request, and on that basis I think it still makes sense to keep it
reflecting the true entropy of a cipher key. So it turns out we need
two _separate_ key length fields per cipher - one for the real
entropy, and one for the much more obvious purpose of knowing how much
data to ask for from ssh2_mkkey.
A compensatory advantage, though, is that we can now measure the
latter directly in bytes rather than bits, so we no longer have to
faff about with dividing by 8 and rounding up.
zero but does it in such a way that over-clever compilers hopefully
won't helpfully optimise the call away if you do it just before
freeing something or letting it go out of scope. Use this for
(hopefully) every memset whose job is to destroy sensitive data that
might otherwise be left lying around in the process's memory.
[originally from svn r9586]
default preferred cipher), add code to inject SSH_MSG_IGNOREs to randomise
the IV when using CBC-mode ciphers. Each cipher has a flag to indicate
whether it needs this workaround, and the SSH packet output maze has gained
some extra complexity to implement it.
[originally from svn r5659]
"rijndael128-cbc" names for AES. These are in the IANA namespace, but
never appeared in any secsh-transport draft, and no version of OpenSSH
has supported them without also supporting the aes*-cbc names.
"rijndael-cbc@lysator.liu.se" gets to live because it's in the private
namespace.
[originally from svn r5607]
aes128-ctr, aes192-ctr, and aes256-ctr. blowfish-ctr and 3des-ctr are
present but disabled, since I haven't tested them yet.
In addition, change the user-visible names of ciphers (as displayed in the
Event Log) to include the mode name and, in Blowfish's case, the key size.
[originally from svn r5605]
malloc functions, which automatically cast to the same type they're
allocating the size of. Should prevent any future errors involving
mallocing the size of the wrong structure type, and will also make
life easier if we ever need to turn the PuTTY core code from real C
into C++-friendly C. I haven't touched the Mac frontend in this
checkin because I couldn't compile or test it.
[originally from svn r3014]
objects to incomplete static array declarations, which I introduced to work
around a bug in SC/MrC. Use #ifdefs to decide whether to enable the workaround
or not.
[originally from svn r2488]
arrya as full definitions, and hence gets upset when it finds a full definition
later. This is a bug (see K&R2 A10.2), but an easy one to work around by
making the tentative definitions incomplete, so I've done that.
[originally from svn r2462]