In the past, I've had a lot of macros which you call with double
parentheses, along the lines of debug(("format string", params)), so
that the inner parens protect the commas and permit the macro to treat
the whole printf-style argument list as one macro argument.
That's all very well, but it's a bit inconvenient (it doesn't leave
you any way to implement such a macro by prepending another argument
to the list), and now this code base's rules allow C99isms, I can
switch all those macros to using a single pair of parens, using the
C99 ability to say '...' in the parameter list of the #define and get
at the corresponding suffix of the arguments as __VA_ARGS__.
So I'm doing it. I've made the following printf-style macros variadic:
bpp_logevent, ppl_logevent, ppl_printf and debug.
While I'm here, I've also fixed up a collection of conditioned-out
calls to debug() in the Windows front end which were clearly expecting
a macro with a different calling syntax, because they had an integer
parameter first. If I ever have a need to condition those back in,
they should actually work now.
It's just silly to have _two_ systems for traversing a string of
comma-separated protocol ids. I think the new get_commasep_word
technique for looping over the elements of a string is simpler and
more general than the old membership-testing approach, and also it's
necessary for the modern KEX untangling system (which has to be able
to loop over one string, even if it used a membership test to check
things in the other). So this commit rewrites the two remaining uses
of in_commasep_string to use get_commasep_word instead, and deletes
the former.
My normal habit these days, in new code, is to treat int and bool as
_almost_ completely separate types. I'm still willing to use C's
implicit test for zero on an integer (e.g. 'if (!blob.len)' is fine,
no need to spell it out as blob.len != 0), but generally, if a
variable is going to be conceptually a boolean, I like to declare it
bool and assign to it using 'true' or 'false' rather than 0 or 1.
PuTTY is an exception, because it predates the C99 bool, and I've
stuck to its existing coding style even when adding new code to it.
But it's been annoying me more and more, so now that I've decided C99
bool is an acceptable thing to require from our toolchain in the first
place, here's a quite thorough trawl through the source doing
'boolification'. Many variables and function parameters are now typed
as bool rather than int; many assignments of 0 or 1 to those variables
are now spelled 'true' or 'false'.
I managed this thorough conversion with the help of a custom clang
plugin that I wrote to trawl the AST and apply heuristics to point out
where things might want changing. So I've even managed to do a decent
job on parts of the code I haven't looked at in years!
To make the plugin's work easier, I pushed platform front ends
generally in the direction of using standard 'bool' in preference to
platform-specific boolean types like Windows BOOL or GTK's gboolean;
I've left the platform booleans in places they _have_ to be for the
platform APIs to work right, but variables only used by my own code
have been converted wherever I found them.
In a few places there are int values that look very like booleans in
_most_ of the places they're used, but have a rarely-used third value,
or a distinction between different nonzero values that most users
don't care about. In these cases, I've _removed_ uses of 'true' and
'false' for the return values, to emphasise that there's something
more subtle going on than a simple boolean answer:
- the 'multisel' field in dialog.h's list box structure, for which
the GTK front end in particular recognises a difference between 1
and 2 but nearly everything else treats as boolean
- the 'urgent' parameter to plug_receive, where 1 vs 2 tells you
something about the specific location of the urgent pointer, but
most clients only care about 0 vs 'something nonzero'
- the return value of wc_match, where -1 indicates a syntax error in
the wildcard.
- the return values from SSH-1 RSA-key loading functions, which use
-1 for 'wrong passphrase' and 0 for all other failures (so any
caller which already knows it's not loading an _encrypted private_
key can treat them as boolean)
- term->esc_query, and the 'query' parameter in toggle_mode in
terminal.c, which _usually_ hold 0 for ESC[123h or 1 for ESC[?123h,
but can also hold -1 for some other intervening character that we
don't support.
In a few places there's an integer that I haven't turned into a bool
even though it really _can_ only take values 0 or 1 (and, as above,
tried to make the call sites consistent in not calling those values
true and false), on the grounds that I thought it would make it more
confusing to imply that the 0 value was in some sense 'negative' or
bad and the 1 positive or good:
- the return value of plug_accepting uses the POSIXish convention of
0=success and nonzero=error; I think if I made it bool then I'd
also want to reverse its sense, and that's a job for a separate
piece of work.
- the 'screen' parameter to lineptr() in terminal.c, where 0 and 1
represent the default and alternate screens. There's no obvious
reason why one of those should be considered 'true' or 'positive'
or 'success' - they're just indices - so I've left it as int.
ssh_scp_recv had particularly confusing semantics for its previous int
return value: its call sites used '<= 0' to check for error, but it
never actually returned a negative number, just 0 or 1. Now the
function and its call sites agree that it's a bool.
In a couple of places I've renamed variables called 'ret', because I
don't like that name any more - it's unclear whether it means the
return value (in preparation) for the _containing_ function or the
return value received from a subroutine call, and occasionally I've
accidentally used the same variable for both and introduced a bug. So
where one of those got in my way, I've renamed it to 'toret' or 'retd'
(the latter short for 'returned') in line with my usual modern
practice, but I haven't done a thorough job of finding all of them.
Finally, one amusing side effect of doing this is that I've had to
separate quite a few chained assignments. It used to be perfectly fine
to write 'a = b = c = TRUE' when a,b,c were int and TRUE was just a
the 'true' defined by stdbool.h, that idiom provokes a warning from
gcc: 'suggest parentheses around assignment used as truth value'!
This commit includes <stdbool.h> from defs.h and deletes my
traditional definitions of TRUE and FALSE, but other than that, it's a
100% mechanical search-and-replace transforming all uses of TRUE and
FALSE into the C99-standardised lowercase spellings.
No actual types are changed in this commit; that will come next. This
is just getting the noise out of the way, so that subsequent commits
can have a higher proportion of signal.
ssh2connection.c now knows how to unmarshal the message formats for
all the channel requests we'll need to handle when we're the server
and a client sends them. Each one is translated into a call to a new
method in the Channel vtable, which is implemented by a trivial
'always fail' routine in every channel type we know about so far.
This is a major code reorganisation in preparation for making this
code base into one that can build an SSH server as well as a client.
(Mostly for purposes of using the server as a regression test suite
for the client, though I have some other possible uses in mind too.
However, it's currently no part of my plan to harden the server to the
point where it can sensibly be deployed in a hostile environment.)
In this preparatory commit, I've broken up the SSH-2 transport and
connection layers, and the SSH-1 connection layer, into multiple
source files, with each layer having its own header file containing
the shared type definitions. In each case, the new source file
contains code that's specific to the client side of the protocol, so
that a new file can be swapped in in its place when building the
server.
Mostly this is just a straightforward moving of code without changing
it very much, but there are a couple of actual changes in the process:
The parsing of SSH-2 global-request and channel open-messages is now
done by a new pair of functions in the client module. For channel
opens, I've invented a new union data type to be the return value from
that function, representing either failure (plus error message),
success (plus Channel instance to manage the new channel), or an
instruction to hand the channel over to a sharing downstream (plus a
pointer to the downstream in question).
Also, the tree234 of remote port forwardings in ssh2connection is now
initialised on first use by the client-specific code, so that's where
its compare function lives. The shared ssh2connection_free() still
takes responsibility for freeing it, but now has to check if it's
non-null first.
The outer shell of the ssh2_lportfwd_open method, for making a
local-to-remote port forwarding, is still centralised in
ssh2connection.c, but the part of it that actually constructs the
outgoing channel-open message has moved into the client code, because
that will have to change depending on whether the channel-open has to
have type direct-tcpip or forwarded-tcpip.
In the SSH-1 connection layer, half the filter_queue method has moved
out into the new client-specific code, but not all of it -
bidirectional channel maintenance messages are still handled
centrally. One exception is SSH_MSG_PORT_OPEN, which can be sent in
both directions, but with subtly different semantics - from server to
client, it's referring to a previously established remote forwarding
(and must be rejected if there isn't one that matches it), but from
client to server it's just a "direct-tcpip" request with no prior
context. So that one is in the client-specific module, and when I add
the server code it will have its own different handler.
Some kinds of channel, even after they've sent EOF in both directions,
still have something to do before they initiate the CLOSE mechanism
and wind up the channel completely. For example, a session channel
with a subprocess running inside it will want to be sure to send the
"exit-status" or "exit-signal" notification, even if that happens
after bidirectional EOF of the data channels.
Previously, the SSH-2 connection layer had the standard policy that
once EOF had been both sent and received, it would start the final
close procedure. There's a method chan_want_close() by which a Channel
could vary this policy in one direction, by indicating that it wanted
the close procedure to commence after EOF was sent in only one
direction. Its parameters are a pair of booleans saying whether EOF
has been sent, and whether it's been received.
Now chan_want_close can vary the policy in the other direction as
well: if it returns FALSE even when _both_ parameters are true, the
connection layer will honour that, and not send CHANNEL_CLOSE. If it
does that, the Channel is responsible for indicating when it _does_
want close later, by calling sshfwd_initiate_close.
The function takes the two KEXINIT packets in their string form,
together with a list of mappings from names to known algorithm
implementations, and returns the selected one of each kind, along with
all the other necessary auxiliary stuff.
I've introduced a new POD struct type 'ssh_ttymodes' which stores an
encoding of everything you can specify in the "pty-req" packet or the
SSH-1 equivalent. This allows me to split up
write_ttymodes_to_packet_from_conf() into two separate functions, one
to parse all the ttymode data out of a Conf (and a Seat for fallback)
and return one of those structures, and the other to write it into an
SSH packet.
While I'm at it, I've moved the special case of terminal speeds into
the same mechanism, simplifying the call sites in both versions of the
SSH protocol.
The new master definition of all terminal modes lives in a header
file, with an ifdef around each item, so that later on I'll be able to
include it in a context that only enumerates the modes supported by
the particular target Unix platform.
Each of the new subroutines corresponds to one of the channel types
for which we know how to parse a CHANNEL_OPEN, and has a collection of
parameters corresponding to the fields of that message structure.
ssh2_connection_filter_queue now confines itself to parsing the
message, calling one of those functions, and constructing an
appropriate reply message if any.
Instead of the central code in ssh2_connection_filter_queue doing both
the job of parsing the channel request and deciding whether it's
acceptable, each Channel vtable now has a method for every channel
request type we recognise.
This is a new vtable-based abstraction which is passed to a backend in
place of Frontend, and it implements only the subset of the Frontend
functions needed by a backend. (Many other Frontend functions still
exist, notably the wide range of things called by terminal.c providing
platform-independent operations on the GUI terminal window.)
The purpose of making it a vtable is that this opens up the
possibility of creating a backend as an internal implementation detail
of some other activity, by providing just that one backend with a
custom Seat that implements the methods differently.
For example, this refactoring should make it feasible to directly
implement an SSH proxy type, aka the 'jump host' feature supported by
OpenSSH, aka 'open a secondary SSH session in MAINCHAN_DIRECT_TCP
mode, and then expose the main channel of that as the Socket for the
primary connection'. (Which of course you can already do by spawning
'plink -nc' as a separate proxy process, but this would permit it in
the _same_ process without anything getting confused.)
I've centralised a full set of stub methods in misc.c for the new
abstraction, which allows me to get rid of several annoying stubs in
the previous code. Also, while I'm here, I've moved a lot of
duplicated modalfatalbox() type functions from application main
program files into wincons.c / uxcons.c, which I think saves
duplication overall. (A minor visible effect is that the prefixes on
those console-based fatal error messages will now be more consistent
between applications.)
LogContext is now the owner of the logevent() function that back ends
and so forth are constantly calling. Previously, logevent was owned by
the Frontend, which would store the message into its list for the GUI
Event Log dialog (or print it to standard error, or whatever) and then
pass it _back_ to LogContext to write to the currently open log file.
Now it's the other way round: LogContext gets the message from the
back end first, writes it to its log file if it feels so inclined, and
communicates it back to the front end.
This means that lots of parts of the back end system no longer need to
have a pointer to a full-on Frontend; the only thing they needed it
for was logging, so now they just have a LogContext (which many of
them had to have anyway, e.g. for logging SSH packets or session
traffic).
LogContext itself also doesn't get a full Frontend pointer any more:
it now talks back to the front end via a little vtable of its own
called LogPolicy, which contains the method that passes Event Log
entries through, the old askappend() function that decides whether to
truncate a pre-existing log file, and an emergency function for
printing an especially prominent message if the log file can't be
created. One minor nice effect of this is that console and GUI apps
can implement that last function subtly differently, so that Unix
console apps can write it with a plain \n instead of the \r\n
(harmless but inelegant) that the old centralised implementation
generated.
One other consequence of this is that the LogContext has to be
provided to backend_init() so that it's available to backends from the
instant of creation, rather than being provided via a separate API
call a couple of function calls later, because backends have typically
started doing things that need logging (like making network
connections) before the call to backend_provide_logctx. Fortunately,
there's no case in the whole code base where we don't already have
logctx by the time we make a backend (so I don't actually remember why
I ever delayed providing one). So that shortens the backend API by one
function, which is always nice.
While I'm tidying up, I've also moved the printf-style logeventf() and
the handy logevent_and_free() into logging.c, instead of having copies
of them scattered around other places. This has also let me remove
some stub functions from a couple of outlying applications like
Pageant. Finally, I've removed the pointless "_tag" at the end of
LogContext's official struct name.
I haven't needed these until now, but I'm about to need to inspect the
entire contents of a packet queue before deciding whether to process
the first item on it.
I've changed the single 'vtable method' in packet queues from get(),
which returned the head of the queue and optionally popped it, to
after() which does the same bug returns the item after a specified
tree node. So if you pass the special end node to after(), then it
behaves like get(), but now you can also use it to retrieve the
successor of a packet.
(Orthogonality says that you can also _pop_ the successor of a packet
by calling after() with prev != pq.end and pop == TRUE. I don't have a
use for that one yet.)
Ian Jackson points out that the Linux kernel has a macro of this name
with the same purpose, and suggests that it's a good idea to use the
same name as they do, so that at least some people reading one code
base might recognise it from the other.
I never really thought very hard about what order FROMFIELD's
parameters should go in, and therefore I'm pleasantly surprised to
find that my order agrees with the kernel's, so I don't have to
permute every call site as part of making this change :-)
I've tried to separate out as many individually coherent changes from
this work as I could into their own commits, but here's where I run
out and have to commit the rest of this major refactoring as a
big-bang change.
Most of ssh.c is now no longer in ssh.c: all five of the main
coroutines that handle layers of the SSH-1 and SSH-2 protocols now
each have their own source file to live in, and a lot of the
supporting functions have moved into the appropriate one of those too.
The new abstraction is a vtable called 'PacketProtocolLayer', which
has an input and output packet queue. Each layer's main coroutine is
invoked from the method ssh_ppl_process_queue(), which is usually
(though not exclusively) triggered automatically when things are
pushed on the input queue. In SSH-2, the base layer is the transport
protocol, and it contains a pair of subsidiary queues by which it
passes some of its packets to the higher SSH-2 layers - first userauth
and then connection, which are peers at the same level, with the
former abdicating in favour of the latter at the appropriate moment.
SSH-1 is simpler: the whole login phase of the protocol (crypto setup
and authentication) is all in one module, and since SSH-1 has no
repeat key exchange, that setup layer abdicates in favour of the
connection phase when it's done.
ssh.c itself is now about a tenth of its old size (which all by itself
is cause for celebration!). Its main job is to set up all the layers,
hook them up to each other and to the BPP, and to funnel data back and
forth between that collection of modules and external things such as
the network and the terminal. Once it's set up a collection of packet
protocol layers, it communicates with them partly by calling methods
of the base layer (and if that's ssh2transport then it will delegate
some functionality to the corresponding methods of its higher layer),
and partly by talking directly to the connection layer no matter where
it is in the stack by means of the separate ConnectionLayer vtable
which I introduced in commit 8001dd4cb, and to which I've now added
quite a few extra methods replacing services that used to be internal
function calls within ssh.c.
(One effect of this is that the SSH-1 and SSH-2 channel storage is now
no longer shared - there are distinct struct types ssh1_channel and
ssh2_channel. That means a bit more code duplication, but on the plus
side, a lot fewer confusing conditionals in the middle of half-shared
functions, and less risk of a piece of SSH-1 escaping into SSH-2 or
vice versa, which I remember has happened at least once in the past.)
The bulk of this commit introduces the five new source files, their
common header sshppl.h and some shared supporting routines in
sshcommon.c, and rewrites nearly all of ssh.c itself. But it also
includes a couple of other changes that I couldn't separate easily
enough:
Firstly, there's a new handling for socket EOF, in which ssh.c sets an
'input_eof' flag in the BPP, and that responds by checking a flag that
tells it whether to report the EOF as an error or not. (This is the
main reason for those new BPP_READ / BPP_WAITFOR macros - they can
check the EOF flag every time the coroutine is resumed.)
Secondly, the error reporting itself is changed around again. I'd
expected to put some data fields in the public PacketProtocolLayer
structure that it could set to report errors in the same way as the
BPPs have been doing, but in the end, I decided propagating all those
data fields around was a pain and that even the BPPs shouldn't have
been doing it that way. So I've reverted to a system where everything
calls back to functions in ssh.c itself to report any connection-
ending condition. But there's a new family of those functions,
categorising the possible such conditions by semantics, and each one
has a different set of detailed effects (e.g. how rudely to close the
network connection, what exit status should be passed back to the
whole application, whether to send a disconnect message and/or display
a GUI error box).
I don't expect this to be immediately perfect: of course, the code has
been through a big upheaval, new bugs are expected, and I haven't been
able to do a full job of testing (e.g. I haven't tested every auth or
kex method). But I've checked that it _basically_ works - both SSH
protocols, all the different kinds of forwarding channel, more than
one auth method, Windows and Linux, connection sharing - and I think
it's now at the point where the easiest way to find further bugs is to
let it out into the wild and see what users can spot.
This is a convenient place for it because it abstracts away the
difference in disconnect packet formats between SSH-1 and -2, so when
I start restructuring, I'll be able to call it even from places that
don't know which version of SSH they're running.
Now, instead of writing each packet straight on to the raw output
bufchain by calling the BPP's format_packet function, the higher
protocol layers will put the packets on to a queue, which will
automatically trigger a callback (using the new mechanism for
embedding a callback in any packet queue) to make the BPP format its
queue on to the raw-output bufchain. That in turn triggers a second
callback which moves the data to the socket.
This means in particular that the CBC ignore-message workaround can be
moved into the new BPP routine to process the output queue, which is a
good place for it because then it can easily arrange to only put an
ignore message at the start of any sequence of packets that are being
formatted as a single output blob.
This means that someone putting things on a packet queue doesn't need
to separately hold a pointer to someone who needs notifying about it,
or remember to call the notification function every time they push
things on the queue. It's all taken care of automatically, without
having to put extra stuff at the call sites.
The precise semantics are that the callback will be scheduled whenever
_new_ packets appear on the queue, but not when packets are removed.
(Because the expectation is that the callback is notifying whoever is
consuming the queue.)
This is essentially trivial, because the only thing it needed from the
Ssh structure was the Conf. So the version in sshcommon.c just takes
an actual Conf as an argument, and now it doesn't need access to the
big structure definition any more.
This is a new idea I've had to make memory-management of PktIn even
easier. The idea is that a PktIn is essentially _always_ an element of
some linked-list queue: if it's not one of the queues by which packets
move through ssh.c, then it's a special 'free queue' which holds
packets that are unowned and due to be freed.
pq_pop() on a PktInQueue automatically relinks the packet to the free
queue, and also triggers an idempotent callback which will empty the
queue and really free all the packets on it. Hence, you can pop a
packet off a real queue, parse it, handle it, and then just assume
it'll get tidied up at some point - the only constraint being that you
have to finish with it before returning to the application's main loop.
The exception is that it's OK to pq_push() the packet back on to some
other PktInQueue, because a side effect of that will be to _remove_ it
from the free queue again. (And if _all_ the incoming packets get that
treatment, then when the free-queue handler eventually runs, it may
find it has nothing to do - which is harmless.)
Now I've got a list macro defining all the packet types we recognise,
I can use it to write a test for 'is this a recognised code?', and use
that in turn to centralise detection of completely unrecognised codes
into the binary packet protocol, where any such messages will be
handled entirely internally and never even be seen by the next level
up. This lets me remove another big pile of boilerplate in ssh.c.
This allows me to share just one definition of the packet types
between the enum declarations in ssh.h and the string translation
functions in sshcommon.c. No functional change.
The style of list macro is slightly unusual; instead of the
traditional 'X-macro' in which LIST(X) expands to invocations of the
form X(list element), this is an 'X-y macro', where LIST(X,y) expands
to invocations of the form X(y, list element). That style makes it
possible to wrap the list macro up in another macro and pass a
parameter through from the wrapper to the per-element macro. I'm not
using that facility just yet, but I will in the next commit.
Commit 6a5d4d083 introduced a foolish list-handling bug: concatenating
a non-empty queue to an empty queue would set the tail of the output
list to the _head_ of the non-empty one, instead of to its tail. Of
course, you don't notice this until you have more than one packet in
the queue in question!
That function _did_ depend on ssh.c's internal facilities, namely the
layout of 'struct ssh_channel'. In place of that, it now takes an
extra integer argument telling it where to find the channel id in
whatever data structure you give it a tree of - so now I can split up
the SSH-1 and SSH-2 channel handling without losing the services of
that nice channel-number allocator.
While I'm at it, I've brought it all into a single function: the
parsing of data from Conf, the list of modes, and even the old
callback system for writing to the destination buffer is now a simple
if statement that formats mode parameters as byte or uint32 depending
on SSH version. Also, the terminal speeds and the end byte are part of
the same setup, so it's all together in one place instead of scattered
all over ssh.c.
It doesn't really have to be in ssh.c sharing that file's internal
data structures; it's as much an independent object implementation as
any of the less trivial Channel instances. So it's another thing we
can get out of that too-large source file.
It's really just a concatenator for a pair of linked lists, but
unhelpfully restricted in which of the lists it replaces with the
output. Better to have a three-argument function that puts the output
wherever you like, whether it overlaps either or neither one of the
inputs.
These are essentially data-structure maintenance, and it seems silly
to have them be part of the same file that manages the topmost
structure of the SSH connection.