Pavel says there was no specific reason they avoided it, and compiling
with Debian Jessie's GCC 4.9.2 produces a binary that I've no reason
to believe won't work, although I haven't tested it on a real or
emulated CPU that supports the instructions.
Since the rewrite of hardware SHA support in cbbd464fd7, we've been
attempting to build with SHA-NI support on x86 with some GCC 4.x,
including Ubuntu 14.04's 4.8.x, whereas before we only tried it with
GCC 5.x and above. Revert to that.
(I think that GCC has had some support for this extension since 4.9.0 --
the "sha" attribute went in in upstream commit fc975a4090 -- and it
at least compiles with 4.9.2, but I'm assuming Pavel had a good reason
for sticking to 5+ in 5a38b293bd.)
Similarly to the 'AES (unaccelerated)' naming scheme I added in the
AES rewrite, the hash functions that have multiple implementations now
each come with an annotation saying which one they are.
This was more tricky for hashes than for ciphers, because the
annotation for a hash has to be a separate string literal from the
base text name, so that it can propagate into the name field for each
HMAC wrapper without looking silly.
Similarly to my recent addition of NEON-accelerated AES, these new
implementations drop in alongside the SHA-NI ones, under a different
set of ifdefs. All the details of selection and detection are
essentially the same as they were for the AES code.
The new structure of those modules is along similar lines to the
recent rewrite of AES, with selection of HW vs SW implementation being
done by the main vtable instead of a subsidiary function pointer
within it, freedom for each implementation to define its state
structure however is most convenient, and space to drop in other
hardware-accelerated implementations.
I've removed the centralised test for compiler SHA-NI support in
ssh.h, and instead duplicated it between the two SHA modules, on the
grounds that once you start considering an open-ended set of hardware
accelerators, the two hashes _need_ not go together.
I've also added an extra test in cryptsuite that checks the point at
which the end-of-hash padding switches to adding an extra cipher
block. That was just because I was rewriting that padding code, was
briefly worried that I might have got an off-by-one error in that part
of it, and couldn't see any existing test that gave me confidence I
hadn't.
Keeping that information alongside the hashes themselves seems more
sensible than having the HMAC code know that fact about everything it
can work with.
This replaces all the separate HMAC-implementing wrappers in the
various source files implementing the underlying hashes.
The new HMAC code also correctly handles the case of a key longer than
the underlying hash's block length, by replacing it with its own hash.
This means I can reinstate the test vectors in RFC 6234 which exercise
that case, which I didn't add to cryptsuite before because they'd have
failed.
It also allows me to remove the ad-hoc code at the call site in
cproxy.c which turns out to have been doing the same thing - I think
that must have been the only call site where the question came up
(since MAC keys invented by the main SSH-2 BPP are always shorter than
that).
Similar to the versions in ssh_cipheralg and ssh_keyalg, this allows a
set of vtables to share function pointers while providing varying
constant data that the shared function can use to vary its behaviour.
As an initial demonstration, I've used this to recombine the four
trivial text_name methods for the HMAC-SHA1 variants. I'm about to use
it for something more sensible, though.
All the hash-specific state structures, and the functions that
directly accessed them, are now local to the source files implementing
the hashes themselves. Everywhere we previously used those types or
functions, we're now using the standard ssh_hash or ssh2_mac API.
The 'simple' functions (hmacmd5_simple, SHA_Simple etc) are now a pair
of wrappers in sshauxcrypt.c, each of which takes an algorithm
structure and can do the same conceptual thing regardless of what it
is.
The aim of this reorganisation is to make it easier to test all the
ciphers in PuTTY in a uniform way. It was inconvenient that there were
two separate vtable systems for the ciphers used in SSH-1 and SSH-2
with different functionality.
Now there's only one type, called ssh_cipher. But really it's the old
ssh2_cipher, just renamed: I haven't made any changes to the API on
the SSH-2 side. Instead, I've removed ssh1_cipher completely, and
adapted the SSH-1 BPP to use the SSH-2 style API.
(The relevant differences are that ssh1_cipher encapsulated both the
sending and receiving directions in one object - so now ssh1bpp has to
make a separate cipher instance per direction - and that ssh1_cipher
automatically initialised the IV to all zeroes, which ssh1bpp now has
to do by hand.)
The previous ssh1_cipher vtable for single-DES has been removed
completely, because when converted into the new API it became
identical to the SSH-2 single-DES vtable; so now there's just one
vtable for DES-CBC which works in both protocols. The other two SSH-1
ciphers each had to stay separate, because 3DES is completely
different between SSH-1 and SSH-2 (three layers of CBC structure
versus one), and Blowfish varies in endianness and key length between
the two.
(Actually, while I'm here, I've only just noticed that the SSH-1
Blowfish cipher mis-describes itself in log messages as Blowfish-128.
In fact it passes the whole of the input key buffer, which has length
SSH1_SESSION_KEY_LENGTH == 32 bytes == 256 bits. So it's actually
Blowfish-256, and has been all along!)
This is the commit that f3295e0fb _should_ have been. Yesterday I just
added some typedefs so that I didn't have to wear out my fingers
typing 'struct' in new code, but what I ought to have done is to move
all the typedefs into defs.h with the rest, and then go through
cleaning up the legacy 'struct's all through the existing code.
But I was mostly trying to concentrate on getting the test suite
finished, so I just did the minimum. Now it's time to come back and do
it better.
This makes the API more flexible, so that it's not restricted to
taking a key of precisely the length specified in the ssh2_macalg
structure. Instead, ssh2bpp looks up that length to construct the
MAC's key.
Some MACs (e.g. Poly1305) will only _work_ with a single key length.
But this way, I can run standard test vectors against MACs that can
take a variable length (e.g. everything in the HMAC family).
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
The annoying int64.h is completely retired, since C99 guarantees a
64-bit integer type that you can actually treat like an ordinary
integer. Also, I've replaced the local typedefs uint32 and word32
(scattered through different parts of the crypto code) with the
standard uint32_t.
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
hmacmd5_do_hmac and hmac_sha1_simple should be consistently referring
to input memory blocks as 'const void *', but one had pointlessly
typed the pointer as 'const unsigned char *' and the other had missed
out the consts.
The new version of ssh_hash has the same nice property as ssh2_mac,
that I can make the generic interface object type function directly as
a BinarySink so that clients don't have to call h->sink() and worry
about the separate sink object they get back from that.
This piece of tidying-up has come out particularly well in terms of
saving tedious repetition and boilerplate. I've managed to remove
three pointless methods from every MAC implementation by means of
writing them once centrally in terms of the implementation-specific
methods; another method (hmacmd5_sink) vanished because I was able to
make the interface type 'ssh2_mac' be directly usable as a BinarySink
by way of a new delegation system; and because all the method
implementations can now find their own vtable, I was even able to
merge a lot of keying and output functions that had previously
differed only in length parameters by having them look up the lengths
in whatever vtable they were passed.
This is more or less the same job as the SSH-1 case, only more
extensive, because we have a wider range of ciphers.
I'm a bit disappointed about the AES case, in particular, because I
feel as if it ought to have been possible to arrange to combine this
layer of vtable dispatch with the subsidiary one that selects between
hardware and software implementations of the underlying cipher. I may
come back later and have another try at that, in fact.
Just as I did a few commits ago with the low-level SHA_Bytes type
functions, the ssh_hash and ssh_mac abstract types now no longer have
a direct foo->bytes() update method at all. Instead, each one has a
foo->sink() function that returns a BinarySink with the same lifetime
as the hash context, and then the caller can feed data into that in
the usual way.
This lets me get rid of a couple more duplicate marshalling routines
in ssh.c: hash_string(), hash_uint32(), hash_mpint().
In fact, those functions don't even exist any more. The only way to
get data into a primitive hash state is via the new put_* system. Of
course, that means put_data() is a viable replacement for every
previous call to one of the per-hash update functions - but just
mechanically doing that would have missed the opportunity to simplify
a lot of the call sites.
I've finally got tired of all the code throughout PuTTY that repeats
the same logic about how to format the SSH binary primitives like
uint32, string, mpint. We've got reasonably organised code in ssh.c
that appends things like that to 'struct Packet'; something similar in
sftp.c which repeats a lot of the work; utility functions in various
places to format an mpint to feed to one or another hash function; and
no end of totally ad-hoc stuff in functions like public key blob
formatters which actually have to _count up_ the size of data
painstakingly, then malloc exactly that much and mess about with
PUT_32BIT.
It's time to bring all of that into one place, and stop repeating
myself in error-prone ways everywhere. The new marshal.h defines a
system in which I centralise all the actual marshalling functions, and
then layer a touch of C macro trickery on top to allow me to (look as
if I) pass a wide range of different types to those functions, as long
as the target type has been set up in the right way to have a write()
function.
This commit adds the new header and source file, and sets up some
general centralised types (strbuf and the various hash-function
contexts like SHA_State), but doesn't use the new calls for anything
yet.
(I've also renamed some internal functions in import.c which were
using the same names that I've just defined macros over. That won't
last long - those functions are going to go away soon, so the changed
names are strictly temporary.)
Some old CPUs do not support CPUID to be called with eax=7
To prevent failures, call CPUID with eax=0 to get the highest possible
eax value (leaf) and compare it to 7.
GCC does this check internally with __get_cpuid_count function
Thanks to Jeffrey Walton for noticing.
Clang generates an internal failure if the same function
has different target attributes in definition and declaration.
To work around that, we made a proxy predeclared function
without target attribute.
SHA1-NI code is conditionally enabled if CPU supports SHA extensions.
The procedure is based on Jeffrey Walton's SHA1 implementation:
https://github.com/noloader/SHA-Intrinsics
These pointers will be required in next commits
where subroutines with new instructions are introduced.
Depending on CPUID dynamic check, pointers will refer to old
SW-only implementations or to new instructions subroutines
The key derivation code has been assuming (though non-critically, as
it happens) that the size of the MAC output is the same as the size of
the MAC key. That isn't even a good assumption for the HMAC family,
due to HMAC-SHA1-96 and also the bug-compatible versions of HMAC-SHA1
that only use 16 bytes of key material; so now we have an explicit
key-length field separate from the MAC-length field.
This permits a hash state to be cloned in the middle of being used, so
that multiple strings with the same prefix can be hashed without
having to repeat all the computation over the prefix.
Having done that, we'll also sometimes need to free a hash state that
we aren't generating actual hash output from, so we need a free method
as well.
Now that we have modes in which the MAC verification happens before
any other crypto operation and hence will be the only thing seen by an
attacker, it seems like about time we got round to doing it in a
cautious way that tries to prevent the attacker from using our memcmp
as a timing oracle.
So, here's an smemeq() function which has the semantics of !memcmp but
attempts to run in time dependent only on the length parameter. All
the MAC implementations now use this in place of !memcmp to verify the
MAC on input data.
This causes the initial length field of the SSH-2 binary packet to be
unencrypted (with the knock-on effect that now the packet length not
including MAC must be congruent to 4 rather than 0 mod the cipher
block size), and then the MAC is applied over the unencrypted length
field and encrypted ciphertext (prefixed by the sequence number as
usual). At the cost of exposing some information about the packet
lengths to an attacker (but rarely anything they couldn't have
inferred from the TCP headers anyway), this closes down any
possibility of a MITM using the client as a decryption oracle, unless
they can _first_ fake a correct MAC.
ETM mode is enabled by means of selecting a different MAC identifier,
all the current ones of which are constructed by appending
"-etm@openssh.com" to the name of a MAC that already existed.
We currently prefer the original SSH-2 binary packet protocol (i.e. we
list all the ETM-mode MACs last in our KEXINIT), on the grounds that
it's better tested and more analysed, so at the moment the new mode is
only activated if a server refuses to speak anything else.
briefly worried that it might not be doing what I thought it was
doing, but examining these diagnostics shows that it is after all, and
now I've written them it would be a shame not to keep them for future
use.
[originally from svn r9938]
zero but does it in such a way that over-clever compilers hopefully
won't helpfully optimise the call away if you do it just before
freeing something or letting it go out of scope. Use this for
(hopefully) every memset whose job is to destroy sensitive data that
might otherwise be left lying around in the process's memory.
[originally from svn r9586]
under SSH-2, don't risk looking at the length field of an incoming packet
until we've successfully MAC'ed the packet.
This requires a change to the MAC mechanics so that we can calculate MACs
incrementally, and output a MAC for the packet so far while still being
able to add more data to the packet later.
[originally from svn r8334]
discussed. Use Barrett and Silverman's convention of "SSH-1" for SSH protocol
version 1 and "SSH-2" for protocol 2 ("SSH1"/"SSH2" refer to ssh.com
implementations in this scheme). <http://www.snailbook.com/terms.html>
[originally from svn r5480]
malloc functions, which automatically cast to the same type they're
allocating the size of. Should prevent any future errors involving
mallocing the size of the wrong structure type, and will also make
life easier if we ever need to turn the PuTTY core code from real C
into C++-friendly C. I haven't touched the Mac frontend in this
checkin because I couldn't compile or test it.
[originally from svn r3014]