This makes it easier to compile in multiple debugging modes, or on
Windows, without having to constantly paste annoying test-compile
commands out of comments in sshbn.c.
The new binary is compiled into the build directory, but not shipped
by 'make install', just like fuzzterm. Unlike fuzzterm, though, testbn
is also compiled on Windows, for which we didn't already have a
mechanism for building unshipped binaries; I've done the very simplest
thing for the moment, of providing a target in Makefile.vc to delete
them.
In order to comply with the PuTTY makefile system's constraint of
never compiling the same object multiple times with different ifdefs,
I've also moved testbn's main() out into its own source file.
As I mentioned in the previous commit, I'm going to want PuTTY to be
able to run sensibly when compiled with 64-bit Visual Studio,
including handling bignums in 64-bit chunks for speed. Unfortunately,
64-bit VS does not provide any type we can use as BignumDblInt in that
situation (unlike 64-bit gcc and clang, which give us __uint128_t).
The only facilities it provides are compiler intrinsics to access an
add-with-carry operation and a 64x64->128 multiplication (the latter
delivering its product in two separate 64-bit output chunks).
Hence, here's a substantial rework of the bignum code to make it
implement everything in terms of _those_ primitives, rather than
depending throughout on having BignumDblInt available to use ad-hoc.
BignumDblInt does still exist, for the moment, but now it's an
internal implementation detail of sshbn.h, only declared inside a new
set of macros implementing arithmetic primitives, and not accessible
to any code outside sshbn.h (which confirms that I really did catch
all uses of it and remove them).
The resulting code is surprisingly nice-looking, actually. You'd
expect more hassle and roundabout circumlocutions when you drop down
to using a more basic set of primitive operations, but actually, in
many cases it's turned out shorter to write things in terms of the new
BignumADC and BignumMUL macros - because almost all my uses of
BignumDblInt were implementing those operations anyway, taking several
lines at a time, and now they can do each thing in just one line.
The biggest headache was Poly1305: I wasn't able to find any sensible
way to adapt the existing Python script that generates the various
per-int-size implementations of arithmetic mod 2^130-5, and so I had
to rewrite it from scratch instead, with nothing in common with the
old version beyond a handful of comments. But even that seems to have
worked out nicely: the new version has much more legible descriptions
of the high-level algorithms, by virtue of having a 'Multiprecision'
type which wraps up the division into words, and yet Multiprecision's
range analysis allows it to automatically drop out special cases such
as multiplication by 5 being much easier than multiplication by
another multi-word integer.
DIVMOD_WORD is a portability hazard, because implementing it requires
either a way to get direct access to the x86 DIV instruction or
equivalent (be it inline assembler or a compiler intrinsic), or else
an integer type we can use as BignumDblInt. But I'm starting to think
about porting to 64-bit Visual Studio with a 64-bit BignumInt, and in
that situation neither of those options will be available.
I could write a piece of _out_-of-line x86-64 assembler in a separate
source file and put a function call in DIVMOD_WORD, but instead I've
decided to solve the problem in a more futureproof way: remove
DIVMOD_WORD totally and write a division function that doesn't need it
at all, solving not only today's porting headache but all future ones
in this area.
The new implementation works by precomputing (a good enough
approximation to) the leading word of the reciprocal of the modulus,
and then getting each word of quotient by multiplying by that
reciprocal, where we previously used DIVMOD_WORD to divide by the
leading word of the actual modulus. The reciprocal itself is computed
outside internal_mod() and passed in as a parameter, allowing me to
save time by only computing it once when I'm about to do a modpow.
To some extent this complicates the implementation: the advantage of
DIVMOD_WORD was that it yielded a full word q of quotient every time
it was used, so the subtraction of q*m from the input could be done in
a nicely word-aligned way. But the reciprocal multiply approach yields
_almost_ a full word of quotient, because you have to make the
reciprocal a bit short to avoid overflow at multiplication time. For a
start, this means we have to do fractionally more iterations of the
main loop; but more painfully, we can no longer depend on the
subtraction of q*m at every step being word-aligned, and instead we
have to be prepared to do it at any bit shift.
But the flip side is that once we've implemented that, the rest of the
algorithm becomes a lot less full of horrible special cases: in
particular, we can now completely throw away the horribleness at all
the call sites where we shift the modulus up by a fractional word to
set its top bit, and then have to do a little dance to get the last
few bits of quotient involving a second call to internal_mod.
So there are points both for and against the new implementation in
simplicity terms; but I think on balance it's more comprehensible than
the old one, and a quick timing test suggests it also ends up a touch
faster overall - the new testbn gets through the output of
testdata/bignum.py in 4.034s where the old one took 4.392s.
I'm about to rewrite the division code, so it'll be useful to have a
way to test it directly, particularly one which exercises difficult
cases such as extreme values of the leading word and remainders just
above and below zero.
In many places I was using an 'unsigned int', or an implicit int by
virtue of writing an undecorated integer literal, where what was
really wanted was a BignumInt. In particular, this substitution breaks
in any situation where BignumInt is _larger_ than unsigned - which it
is shortly about to be.
This allows files other than sshbn.c to work with the primitives
necessary to build multi-word arithmetic functions satisfying all of
PuTTY's portability constraints.
snew(), and most of the bignum functions, are deliberately written to
fail an assertion and terminate the program rather than return NULL,
so there's no point carefully checking their every return value for
NULL. This removes a huge amount of pointless error-checking code, and
makes the elliptic curve arithmetic almost legible in places :-)
I've kept error checks after modinv(), because that can return NULL if
asked to invert zero. bigsub() can also fail in principle, because our
bignums are non-negative only, but in the couple of cases where it's
used there's a preceding compare that should prevent it, so I've just
added assertions.
Having found a lot of unfixed constness issues in recent development,
I thought perhaps it was time to get proactive, so I compiled the
whole codebase with -Wwrite-strings. That turned up a huge load of
const problems, which I've fixed in this commit: the Unix build now
goes cleanly through with -Wwrite-strings, and the Windows build is as
close as I could get it (there are some lingering issues due to
occasional Windows API functions like AcquireCredentialsHandle not
having the right constness).
Notable fallout beyond the purely mechanical changing of types:
- the stuff saved by cmdline_save_param() is now explicitly
dupstr()ed, and freed in cmdline_run_saved.
- I couldn't make both string arguments to cmdline_process_param()
const, because it intentionally writes to one of them in the case
where it's the argument to -pw (in the vain hope of being at least
slightly friendly to 'ps'), so elsewhere I had to temporarily
dupstr() something for the sake of passing it to that function
- I had to invent a silly parallel version of const_cmp() so I could
pass const string literals in to lookup functions.
- stripslashes() in pscp.c and psftp.c has the annoying strchr nature
We were allocating a new array in which to make up a random number
every time we went round the loop, and not freeing any of them. Now we
allocate a single array to use for all loop iterations, and clear and
free it properly afterwards.
Patch due to Tim Kosse.
bn_restore_invariant (and the many loops that duplicate it) leaves a
single zero word in a bignum representing 0, whereas the constant
'Zero' does not have any data words at all. Cope with this in
bignum_cmp.
(It would be a better plan to decide on one representation and stick
with it, but this is the less disruptive fix for the moment.)
[originally from svn r9996]
now check that all the modular functions (modpow, modinv, modmul,
bigdivmod) have nonzero moduli, and that modinv also has a nonzero
thing to try to invert.
[originally from svn r9987]
bignum code's test harness. Thanks to Sup Yut Sum for fixing this in
TortoisePlink and Sven Strickroth for bringing it to my attention.
[originally from svn r9731]
zero but does it in such a way that over-clever compilers hopefully
won't helpfully optimise the call away if you do it just before
freeing something or letting it go out of scope. Use this for
(hopefully) every memset whose job is to destroy sensitive data that
might otherwise be left lying around in the process's memory.
[originally from svn r9586]
of array indices. You'd hope that compilers could automatically turn
the one representation into the other if it was faster to do so, but
apparently not: even on gcc -O3, this source transformation gains over
15% performance.
[originally from svn r9105]
routines into their callers, where they'll be done once for a whole
modpow rather than many times within each multiply. Doesn't save much
time as far as I can see - perhaps a couple of percent, one second in
the minute it takes to run the new bignum test suite - but seems like
a sensible idea anyway on general principles.
[originally from svn r9103]
fallback for when Montgomery is inapplicable.
(I may also at some point switch to using it for small exponents, if
speed testing should reveal that there's a noticeable threshold beyond
which preparing the Montgomery setup is uneconomical.)
[originally from svn r9099]
testdata/bignum.py is twice the size of the rest of the PuTTY source
put together, so I'm not checking it in.
This reveals bugs in the new multiplication code, which I have yet to
fix.
[originally from svn r9097]
exponentiation by replacing the modulo operation by a cleverly chosen
multiplication. This was not worth doing in the previous state of the
code (because my multiply was about as slow as my modulo), but now
that multiplication has been sped up by the Karatsuba optimisation,
Montgomery becomes worthwhile.
[originally from svn r9094]
setting BignumInt to 32 bits. gcc defines _LP64 on x86-64 and
presumably on other 64-bit architectures, so I've conditioned my
defines on that in the hope that they won't need redoing for the next
few such architectures.
I've also added a set for _LLP64, but it's untested as yet.
[originally from svn r9092]
a zero input.
This shouldn't matter for PuTTY, as these routines are only used in PuTTYgen,
to output SSH-1 format public key exponents/moduli, which should be nonzero.
[originally from svn r6731]
don't do a function call for each divmod, and don't rely on details of
the calling convention.
(This didn't actually make any measurable difference to runtime in any
of my tests, but we may as well keep it as it's neater.)
Also document some general caveats of the divmod macro.
[originally from svn r6475]
[r6469 == d72e4b718c]
a patch by Lionel Fourquaux. Seems to be about a factor of four improvement
(see wishlist item for details).
I don't claim to understand this in detail, so I can't vouch for its
correctness, but it didn't fall over immediately. It also produces some
compiler warnings, unfortunately.
[originally from svn r6469]
[this svn revision also touched putty-wishlist]
discussed. Use Barrett and Silverman's convention of "SSH-1" for SSH protocol
version 1 and "SSH-2" for protocol 2 ("SSH1"/"SSH2" refer to ssh.com
implementations in this scheme). <http://www.snailbook.com/terms.html>
[originally from svn r5480]