misc.c has always contained a combination of things that are tied
tightly into the PuTTY code base (e.g. they use the conf system, or
work with our sockets abstraction) and things that are pure standalone
utility functions like nullstrcmp() which could quite happily be
dropped into any C program without causing a link failure.
Now the latter kind of standalone utility code lives in the new source
file utils.c, whose only external dependency is on memory.c (for snew,
sfree etc), which in turn requires the user to provide an
out_of_memory() function. So it should now be much easier to link test
programs that use PuTTY's low-level functions without also pulling in
half its bulky infrastructure.
In the process, I came across a memory allocation logging system
enabled by -DMALLOC_LOG that looks long since bit-rotted; in any case
we have much more advanced tools for that kind of thing these days,
like valgrind and Leak Sanitiser, so I've just removed it rather than
trying to transplant it somewhere sensible. (We can always pull it
back out of the version control history if really necessary, but I
haven't used it in at least a decade.)
The other slightly silly thing I did was to give bufchain a function
pointer field that points to queue_idempotent_callback(), and disallow
direct setting of the 'ic' field in favour of calling
bufchain_set_callback which will fill that pointer in too. That allows
the bufchain system to live in utils.c rather than misc.c, so that
programs can use it without also having to link in the callback system
or provide an annoying stub of that function. In fact that's just
allowed me to remove stubs of that kind from PuTTYgen and Pageant!
Taking a leaf out of the LLVM code base: this macro still includes an
assert(false) so that the message will show up in a typical build, but
it follows it up with a call to a function explicitly marked as no-
return.
So this ought to do a better job of convincing compilers that once a
code path hits this function it _really doesn't_ have to still faff
about with making up a bogus return value or filling in a variable
that 'might be used uninitialised' in the following code that won't be
reached anyway.
I've gone through the existing code looking for the assert(false) /
assert(0) idiom and replaced all the ones I found with the new macro,
which also meant I could remove a few pointless return statements and
variable initialisations that I'd already had to put in to placate
compiler front ends.
In the past, I've had a lot of macros which you call with double
parentheses, along the lines of debug(("format string", params)), so
that the inner parens protect the commas and permit the macro to treat
the whole printf-style argument list as one macro argument.
That's all very well, but it's a bit inconvenient (it doesn't leave
you any way to implement such a macro by prepending another argument
to the list), and now this code base's rules allow C99isms, I can
switch all those macros to using a single pair of parens, using the
C99 ability to say '...' in the parameter list of the #define and get
at the corresponding suffix of the arguments as __VA_ARGS__.
So I'm doing it. I've made the following printf-style macros variadic:
bpp_logevent, ppl_logevent, ppl_printf and debug.
While I'm here, I've also fixed up a collection of conditioned-out
calls to debug() in the Windows front end which were clearly expecting
a macro with a different calling syntax, because they had an integer
parameter first. If I ever have a need to condition those back in,
they should actually work now.
The ad-hoc code that received data from the SCP or SFTP server
predated even not-very-modern conveniences such as bufchain, and was
quite horrible and cumbersome.
Particularly nasty was the part where ssh_scp_recv set a _global_
pointer variable to the buffer it was in the middle of writing to, and
then recursed and expected a callback to use that pointer. That caused
clang-analyzer to grumble at me, in a particular case where the output
buffer was in the ultimate caller's stack frame; even though I'm
confident the code _worked_, I can't blame clang for being unhappy!
So now we do things the modern and much simpler way: the callback when
data comes in just puts it on a bufchain, and the top-level
ssh_scp_recv repeatedly waits until data arrives in the bufchain and
then copies it to the output buffer.
Notably toint(), which ought to compile down to the identity function
in any case so you don't really want to put in a pointless call
overhead, and make_ptrlen() (and a couple of its wrappers) which is
standing in for what ought to be a struct-literal syntax.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
The annoying int64.h is completely retired, since C99 guarantees a
64-bit integer type that you can actually treat like an ordinary
integer. Also, I've replaced the local typedefs uint32 and word32
(scattered through different parts of the crypto code) with the
standard uint32_t.
A function to compare two strings _both_ in ptrlen form (I've had
ptrlen_eq_string for ages, but for some reason, never quite needed
ptrlen_eq_ptrlen). A function to ask whether one ptrlen starts with
another (and, optionally, return a ptrlen giving the remaining part of
the longer string). And the va_list version of logeventf, which I
really ought to have written in the first place by sheer habit, even
if it was only needed by logeventf itself.
One to make one from a NUL-terminated string, and another to make one
from a strbuf. I've switched over all the obvious cases where I should
have been using these functions.
LogContext is now the owner of the logevent() function that back ends
and so forth are constantly calling. Previously, logevent was owned by
the Frontend, which would store the message into its list for the GUI
Event Log dialog (or print it to standard error, or whatever) and then
pass it _back_ to LogContext to write to the currently open log file.
Now it's the other way round: LogContext gets the message from the
back end first, writes it to its log file if it feels so inclined, and
communicates it back to the front end.
This means that lots of parts of the back end system no longer need to
have a pointer to a full-on Frontend; the only thing they needed it
for was logging, so now they just have a LogContext (which many of
them had to have anyway, e.g. for logging SSH packets or session
traffic).
LogContext itself also doesn't get a full Frontend pointer any more:
it now talks back to the front end via a little vtable of its own
called LogPolicy, which contains the method that passes Event Log
entries through, the old askappend() function that decides whether to
truncate a pre-existing log file, and an emergency function for
printing an especially prominent message if the log file can't be
created. One minor nice effect of this is that console and GUI apps
can implement that last function subtly differently, so that Unix
console apps can write it with a plain \n instead of the \r\n
(harmless but inelegant) that the old centralised implementation
generated.
One other consequence of this is that the LogContext has to be
provided to backend_init() so that it's available to backends from the
instant of creation, rather than being provided via a separate API
call a couple of function calls later, because backends have typically
started doing things that need logging (like making network
connections) before the call to backend_provide_logctx. Fortunately,
there's no case in the whole code base where we don't already have
logctx by the time we make a backend (so I don't actually remember why
I ever delayed providing one). So that shortens the backend API by one
function, which is always nice.
While I'm tidying up, I've also moved the printf-style logeventf() and
the handy logevent_and_free() into logging.c, instead of having copies
of them scattered around other places. This has also let me remove
some stub functions from a couple of outlying applications like
Pageant. Finally, I've removed the pointless "_tag" at the end of
LogContext's official struct name.
Now there's a centralised routine in misc.c to do the sanitisation,
which copies data on to an outgoing bufchain. This allows me to remove
from_backend_untrusted() completely from the frontend API, simplifying
code in several places.
Two use cases for untrusted-terminal-data sanitisation were in the
terminal.c prompts handler, and in the collection of SSH-2 userauth
banners. Both of those were writing output to a bufchain anyway, so
it was very convenient to just replace a bufchain_add with
sanitise_term_data and then not have to worry about it again.
There was also a simplistic sanitiser in uxcons.c, which I've now
replaced with a call to the good one - and in wincons.c there was a
FIXME saying I ought to get round to that, which now I have!
I'm quite surprised I haven't needed this for anything else yet. I
suppose if I had it, I could have written most of my ptrlen_eq_strings
in terms of it, and saved a lot of gratuitous runtime strlens.
This should make it easier to do formatted-string based logging
outside ssh.c, because I can wrap up a local macro in any source file
I like that expands to logevent_and_free(wherever my Frontend is,
dupprintf(macro argument)).
It caused yet another stub function to be needed in testbn, but there
we go.
(Also, while I'm here, removed a redundant declaration of logevent
itself from ssh.h. The one in putty.h is all we need.)
The tree234 storing currently active port forwardings - both local and
remote - now lives in portfwd.c, as does the complicated function that
updates it based on a Conf listing the new set of desired forwardings.
Local port forwardings are passed to ssh.c via the same route as
before - once the listening port receives a connection and portfwd.c
knows where it should be directed to (in particular, after the SOCKS
exchange, if any), it calls ssh_send_port_open.
Remote forwardings are now initiated by calling ssh_rportfwd_alloc,
which adds an entry to the rportfwds tree (which _is_ still in ssh.c,
and still confusingly sorted by a different criterion depending on SSH
protocol version) and sends out the appropriate protocol request.
ssh_rportfwd_remove cancels one again, sending a protocol request too.
Those functions look enough like ssh_{alloc,remove}_sharing_rportfwd
that I've merged those into the new pair as well - now allocating an
rportfwd allows you to specify either a destination host/port or a
sharing context, and returns a handy pointer you can use to cancel the
forwarding later.
There are several old functions that the previous commits have removed
all, or nearly all, of the references to. match_ssh_id is superseded
by ptrlen_eq_string; get_ssh_{string,uint32} is yet another replicated
set of decode functions (this time _partly_ centralised into misc.c);
the old APIs for the SSH-1 RSA decode functions are gone (together
with their last couple of holdout clients), as are
ssh{1,2}_{read,write}_bignum and ssh{1,2}_bignum_length.
Particularly odd was the use of ssh1_{read,write}_bignum in the SSH-2
Diffie-Hellman implementation. I'd completely forgotten I did that!
Now replaced with a raw bignum_from_bytes, which is simpler anyway.
This wraps up a (pointer, length) pair into a convenient struct that
lets me return it by value from a function, and also pass it through
to other functions in one go.
Ideally quite a lot of this code base could be switched over to using
ptrlen in place of separate pointer and length variables or function
parameters. (In fact, in my personal ideal conception of C, the usual
string type would be of this form, and all the string.h functions
would operate on ptrlens instead of zero-terminated 'char *'.)
For the moment, I'm just introducing it to make some upcoming
refactoring less inconvenient. Bulk migration of existing code to
ptrlen is a project for another time.
Along with the type itself, I've provided a convenient system of
including the contents of a ptrlen in a printf; a constructor function
that wraps up a pointer and length so you can make a ptrlen on the fly
in mid-expression; a function to compare a ptrlen against an ordinary
C string (which I mostly expect to use with string literals); and a
function 'mkstr' to make a dynamically allocated C string out of one.
That last function replaces a function of the same name in sftp.c,
which I'm promoting to a whole-codebase facility and adjusting its
API.
This removes a lot of pointless duplications of those constants.
Of course, _ideally_, I should upgrade to C99 bool throughout the code
base, replacing TRUE and FALSE with true and false and tagging
variables explicitly as bool when that's what they semantically are.
But that's a much bigger piece of work, and shouldn't block this
trivial cleanup!
This simplifies the client code both in ssh.c and in the client side
of Pageant.
I've cheated a tiny bit by preparing agent requests in a strbuf that
has space reserved at the front for the packet frame, which makes life
easier for the code that sends them off.
I've finally got tired of all the code throughout PuTTY that repeats
the same logic about how to format the SSH binary primitives like
uint32, string, mpint. We've got reasonably organised code in ssh.c
that appends things like that to 'struct Packet'; something similar in
sftp.c which repeats a lot of the work; utility functions in various
places to format an mpint to feed to one or another hash function; and
no end of totally ad-hoc stuff in functions like public key blob
formatters which actually have to _count up_ the size of data
painstakingly, then malloc exactly that much and mess about with
PUT_32BIT.
It's time to bring all of that into one place, and stop repeating
myself in error-prone ways everywhere. The new marshal.h defines a
system in which I centralise all the actual marshalling functions, and
then layer a touch of C macro trickery on top to allow me to (look as
if I) pass a wide range of different types to those functions, as long
as the target type has been set up in the right way to have a write()
function.
This commit adds the new header and source file, and sets up some
general centralised types (strbuf and the various hash-function
contexts like SHA_State), but doesn't use the new calls for anything
yet.
(I've also renamed some internal functions in import.c which were
using the same names that I've just defined macros over. That won't
last long - those functions are going to go away soon, so the changed
names are strictly temporary.)
Now, instead of being a black box that you shovel strings into and
eventually extract a final answer, it exposes enough structure fields
to the world that you can append things to it _and_ look inside its
current contents. For convenience, it exports its internal pointer as
both a char * and an unsigned char *.
This centralises a few things that multiple header files were
previously defining, and were protecting against each other's
redefinition with ifdefs - small things like structs and typedefs. Now
all those things are in a defs.h which is by definition safe to
include _first_ (out of all the codebase-local headers) and only need
to be defined once.
bufchain_fetch_consume is a one-stop function that moves a given
number of bytes out of the head of a bufchain into an output buffer,
removing them from the bufchain in the process.
That function will fail an assertion (just like bufchain_fetch) if the
bufchain doesn't actually _have_ at least that many bytes to read, so
I also provide bufchain_try_fetch_consume which will return a success
or failure status.
Nothing uses these functions yet, but they will.
This shows the build platform (32- vs 64-bit in particular, and also
whether Unix GTK builds were compiled with or without the X11 pieces),
what compiler was used to build the binary, and any interesting build
options that might have been set on the make command line (especially,
but not limited to, the security-damaging ones like NO_SECURITY or
UNPROTECT). This will probably be useful all over the place, but in
particular it should allow the different Windows binaries to be told
apart!
Commits 21101c739 and 2eb952ca3 laid the groundwork for this, by
allowing the various About boxes to contain free text and also
ensuring they could be copied and pasted easily as part of a bug
report.
I'm faintly surprised I haven't needed this before. Basically it's an
allocating string formatter, like dupprintf, except that it
concatenates on to the end of a previous string. You instantiate a
strbuf, then repeatedly call strbuf_catf to append pieces of formatted
output to it, and then you can extract the whole string and free it
(separately or both in one step).
This was defined in misc.h, and also in network.h (because one
function prototype needed to refer to it in the latter), leading to a
build failure if any source file inconveniently included both those
headers.
Fixed by guarding each copy of the typedef with a #ifdef.
ssh_pkt_getstring can return (NULL,0) if the input packet is too short
to contain a valid string.
In quite a few places we were passing the returned pointer,length pair
to a printf function with "%.*s" type format, which seems in practice
to have not been dereferencing the pointer but the C standard doesn't
actually guarantee that. In one place we were doing the same job by
hand with memcpy, and apparently that _can_ dereference the pointer in
practice (so a server could have caused a NULL-dereference crash by
sending an appropriately malformed "x11" type channel open request).
And also I spotted a logging call in the "forwarded-tcpip" channel
open handler which had forgotten the field width completely, so it was
erroneously relying on the string happening to be NUL-terminated in
the received packet.
I've tightened all of this up in general by normalising (NULL,0) to
("",0) before calling printf("%.*s"), and replacing the two even more
broken cases with the corrected version of that same idiom.
The initial test for a line ending with "PRIVATE KEY-----" failed to
take into account the possibility that the line might be shorter than
that. Fixed by introducing a new library function strendswith(), and
strstartswith() for good measure, and using that.
Thanks to Hanno Böck for spotting this, with the aid of AFL.
(cherry picked from commit fa7b23ce90)
Conflicts:
misc.c
misc.h
(cherry-picker's note: the conflicts were only due to other functions
introduced on trunk just next to the ones introduced by this commit)
The initial test for a line ending with "PRIVATE KEY-----" failed to
take into account the possibility that the line might be shorter than
that. Fixed by introducing a new library function strendswith(), and
strstartswith() for good measure, and using that.
Thanks to Hanno Böck for spotting this, with the aid of AFL.
PuTTY's main mb_to_wc() function is all very well for embedding in
fiddly data pipelines, but for the simple job of turning a C string
into a C wide string, really I want something much more like
dupprintf. So here is one.
I've had to put it in a new separate source file miscucs.c rather than
throwing it into misc.c, because misc.c is linked into tools that
don't also include a module providing the internal Unicode API (winucs
or uxucs). The new miscucs.c appears only in Unicode-using tools.
(cherry picked from commit 7762d71226)
PuTTY's main mb_to_wc() function is all very well for embedding in
fiddly data pipelines, but for the simple job of turning a C string
into a C wide string, really I want something much more like
dupprintf. So here is one.
I've had to put it in a new separate source file miscucs.c rather than
throwing it into misc.c, because misc.c is linked into tools that
don't also include a module providing the internal Unicode API (winucs
or uxucs). The new miscucs.c appears only in Unicode-using tools.
Now that we have modes in which the MAC verification happens before
any other crypto operation and hence will be the only thing seen by an
attacker, it seems like about time we got round to doing it in a
cautious way that tries to prevent the attacker from using our memcmp
as a timing oracle.
So, here's an smemeq() function which has the semantics of !memcmp but
attempts to run in time dependent only on the length parameter. All
the MAC implementations now use this in place of !memcmp to verify the
MAC on input data.
(cherry picked from commit 9d5a164021)
Cherry-picker's notes: the above commit comment isn't really true on
this branch, since the ETM packet protocol changes haven't been
cherry-picked. But it seemed silly to deliberately leave out even a
small safety measure.
I'm finding missing constifications all over the place this week.
Turns out that dmemdump() has been taking a non-const memory pointer
ever since the beginning, and it's never come up until now. How silly.
I'm about to use these in a new piece of code, but they may come in
helpful elsewhere as well. match_ssh_id in particular wraps an idiom
that's quite common in the rest of the codebase.
Now that we have modes in which the MAC verification happens before
any other crypto operation and hence will be the only thing seen by an
attacker, it seems like about time we got round to doing it in a
cautious way that tries to prevent the attacker from using our memcmp
as a timing oracle.
So, here's an smemeq() function which has the semantics of !memcmp but
attempts to run in time dependent only on the length parameter. All
the MAC implementations now use this in place of !memcmp to verify the
MAC on input data.
This option is available from the command line as '-hostkey', and is
also configurable through the GUI. When enabled, it completely
replaces all of the automated host key management: the server's host
key will be checked against the manually configured list, and the
connection will be allowed or disconnected on that basis, and the host
key store in the registry will not be either consulted or updated.
The main aim is to provide a means of automatically running Plink,
PSCP or PSFTP deep inside Windows services where HKEY_CURRENT_USER
isn't available to have stored the right host key in. But it also
permits you to specify a list of multiple host keys, which means a
second use case for the same mechanism will probably be round-robin
DNS names that select one of several servers with different host keys.
Host keys can be specified as the standard MD5 fingerprint or as an
SSH-2 base64 blob, and are canonicalised on input. (The base64 blob is
more unwieldy, especially with Windows command-line length limits, but
provides a means of specifying the _whole_ public key in case you
don't trust MD5. I haven't bothered to provide an analogous mechanism
for SSH-1, on the basis that anyone worrying about MD5 should have
stopped using SSH-1 already!)
[originally from svn r10220]
I'm about to need to refer to it from a source file that won't
necessarily always be linked against sshpubk.c, so it needs to live
somewhere less specialist. Now it sits alongside base64_encode_atom
(already in misc.c for another reason), which is neater anyway.
[originally from svn r10218]
These are intended to make it easier to handle strings of the form
"hostname:port" or other colon-separated things including hostnames
(such as the -L and -R command-line option arguments), even though the
hostname part might be a square-bracketed IPv6 address literal
containing colons that have to _not_ be treated as separating the
top-level string components.
Three of these functions have semantics as much like existing C
library functions as I could make them (host_strchr, host_strrchr,
host_strcspn) so that it wouldn't be too error-prone to replace
existing C functions with them at lots of call sites. The fourth
function (host_strduptrim) just strips square brackets off anything
that looks like an IPv6 literal.
[originally from svn r10119]
I've enabled gcc's format-string checking on dupprintf, by declaring
it in misc.h to have the appropriate GNU-specific attribute. This
pointed out a selection of warnings, which I've fixed.
[originally from svn r10084]
of the GET_32BIT macros and then used as length fields. Missing bounds
checks against zero have been added, and also I've introduced a helper
function toint() which casts from unsigned to int in such a way as to
avoid C undefined behaviour, since I'm not sure I trust compilers any
more to do the obviously sensible thing.
[originally from svn r9918]