NFC: I'm just moving a small piece of code out into a separate
function, which does processing on incoming SSH-2 packets that is
completely independent of the packet type. (Specifically, we count up
the total amount of data so far transferred, and use it to trigger a
rekey when we get over the per-session-key data limit.)
The aim is that I'll be able to call this function from a central
location that's not SSH-2 specific, by using a function pointer that
points to this function in SSH-2 mode or is null in SSH-1 mode.
I've completely removed the top-level coroutine ssh_gotdata(), and
replaced it with a system in which ssh_receive (which is a plug
function, i.e. called directly from the network code) simply adds the
incoming data to a new bufchain called ssh->incoming_data, and then
queues an idempotent callback to ensure that whatever function is
currently responsible for the top-level handling of wire data will be
invoked in the near future.
So the decisions that ssh_gotdata was previously making are now made
by changing which function is invoked by that idempotent callback:
when we finish doing SSH greeting exchange and move on to the packet-
structured main phase of the protocol, we just change
ssh->current_incoming_data_fn and ensure that the new function gets
called to take over anything still outstanding in the queue.
This simplifies the _other_ end of the API of the rdpkt functions. In
the previous commit, they stopped returning their 'struct Packet'
directly, and instead put it on a queue; in this commit, they're no
longer receiving a (data, length) pair in their parameter list, and
instead, they're just reading from ssh->incoming_data. So now, API-
wise, they take no arguments at all except the main 'ssh' state
structure.
It's not just the rdpkt functions that needed to change, of course.
The SSH greeting handlers have also had to switch to reading from
ssh->incoming_data, and are quite substantially rewritten as a result.
(I think they look simpler in the new style, personally.)
This new bufchain takes over from the previous queued_incoming_data,
which was only used at all in cases where we throttled the entire SSH
connection. Now, data is unconditionally left on the new bufchain
whether we're throttled or not, and the only question is whether we're
currently bothering to read it; so all the special-purpose code to
read data from a bufchain and pass it to rdpkt can go away, because
rdpkt itself already knows how to do that job.
One slightly fiddly point is that we now have to defer processing of
EOF from the SSH server: if we have data already in the incoming
bufchain and then the server slams the connection shut, we want to
process the data we've got _before_ reacting to the remote EOF, just
in case that data gives us some reason to change our mind about how we
react to the EOF, or a last-minute important piece of data we might
need to log.
Each of the coroutines that parses the incoming wire data into a
stream of 'struct Packet' now delivers those packets to a PacketQueue
called ssh->pq_full (containing the full, unfiltered stream of all
packets received on the SSH connection), replacing the old API in
which each coroutine would directly return a 'struct Packet *' to its
caller, or NULL if it didn't have one ready yet.
This simplifies the function-call API of the rdpkt coroutines (they
now return void). It increases the complexity at the other end,
because we've now got a function ssh_process_pq_full (scheduled as an
idempotent callback whenever rdpkt appends anything to the queue)
which pulls things out of the queue and passes them to ssh->protocol.
But that's only a temporary complexity increase; by the time I finish
the upcoming stream of refactorings, there won't be two chained
functions there any more.
One small workaround I had to add in this commit is a flag called
'pending_newkeys', which ssh2_rdpkt sets when it's just returned an
SSH_MSG_NEWKEYS packet, and then waits for the transport layer to
process the NEWKEYS and set up the new encryption context before
processing any more wire data. This wasn't necessary before, because
the old architecture was naturally synchronous - ssh2_rdpkt would
return a NEWKEYS, which would be immediately passed to
do_ssh2_transport, which would finish processing it immediately, and
by the time ssh2_rdpkt was next called, the keys would already be in
place.
This change adds a big while loop around the whole of each rdpkt
function, so it's easiest to read it as a whitespace-ignored diff.
NFC for the moment, because the bufchain is always specially
constructed to hold exactly the same data that would have been passed
in to the function as a (pointer,length) pair. But this API change
allows get_userpass_input to express the idea that it consumed some
but not all of the data in the bufchain, which means that later on
I'll be able to point the same function at a longer-lived bufchain
containing the full stream of keyboard input and avoid dropping
keystrokes that arrive too quickly after the end of an interactive
password prompt.
The crWaitUntil macros have do-while type semantics, i.e. they always
crReturn _at least_ once, and then perhaps more times if their
termination condition is still not met. But sometimes a coroutine will
want to wait for a condition that may _already_ be true - the key
examples being non-emptiness of a bufchain or a PacketQueue, which may
already be non-empty in spite of you having just removed something
from its head.
In that situation, it's obviously more convenient not to bother with a
crReturn in the first place than to do one anyway and have to fiddle
about with toplevel callbacks to make sure we resume later. So here's
a new pair of macros crMaybeWaitUntil{,V}, which have the semantics of
while rather than do-while, i.e. they test the condition _first_ and
don't return at all if it's already met.
This is a set of convenience wrappers around the existing toplevel
callback function, which arranges to avoid scheduling a second call to
a callback function if one is already in the queue.
Just like the last few commits, this is a piece of infrastructure that
nothing is yet using. But it will.
This is another piece of not-yet-used infrastructure, which later on
will simplify my life when I start processing PacketQueues and adding
some of their packets to other PacketQueues, because this way the code
can unref every packet removed from the source queue in the same way,
whether or not the packet is actually finished with.
bufchain_fetch_consume is a one-stop function that moves a given
number of bytes out of the head of a bufchain into an output buffer,
removing them from the bufchain in the process.
That function will fail an assertion (just like bufchain_fetch) if the
bufchain doesn't actually _have_ at least that many bytes to read, so
I also provide bufchain_try_fetch_consume which will return a success
or failure status.
Nothing uses these functions yet, but they will.
This is just a linked list of 'struct Packet' with a convenience API
on the front. As yet it's unused, so ssh.c will currently not compile
with gcc -Werror unless you also add -Wno-unused-function. But all the
functions I've added here will be used in later commits in the current
patch series, so that's only a temporary condition.
NFC: this is a preliminary refactoring, intended to make my life
easier when I start changing around the APIs used to pass user
keyboard input around. The fewer functions even _have_ such an API,
the less I'll have to do at that point.
Previously it was local, which _mostly_ worked, except that if the SSH
host key needed verifying via a non-modal dialog box, there could be a
crReturn in between writing it and reading it.
It's pretty tempting to suggest that because nobody has noticed this
before, SSH-1 can't be needed any more! But actually I suspect the
intervening crReturn has only appeared since the last release,
probably around November when I was messing about with GTK dialog box
modality. (I observed the problem just now on the GTK build, while
trying to check that a completely different set of changes hadn't
broken SSH-1.)
It hasn't been used since 2012, when commit 8e0ab8be5 introduced a new
method of getting the do_ssh2_authconn coroutine started, and didn't
notice that the variable we were previously using was now completely
unused.
Changing the window's font size with Alt-< or Alt-> was not setting
any of the flags that make drawing_area_setup consider itself to have
been non-spuriously called, so the real window would enlarge without
the backing surface also doing so.
Since Pageant contains its own passphrase prompt system rather than
delegating it to another process, it's not trivial to use it in other
contexts. But having gone to the effort of coming up with my own
askpass system that (I think) does a better job at not revealing the
length of the password, I _want_ to use it in other contexts where a
GUI passphrase or password prompt is needed. Solution: an --askpass
option.
Mostly for debugging purposes, because I'm tired of having to use
'setsid' to force Pageant to select the GUI passphrase prompt when I'm
trying to fix bugs in gtkask.c. But I can also imagine situations in
which the ability to force a GUI prompt window might be useful to end
users, for example if the process does _technically_ have a
controlling terminal but it's not a user-visible one (say, in the back
end of some automation tool like expect(1)).
For symmetry, I also provide an option to force the tty prompt. That's
less obviously useful, because that's already the preferred prompt
type when both methods are available - so the only use for it would be
if you wanted to ensure that Pageant didn't _accidentally_ try to
launch a GUI prompt, and aborted with an error if it couldn't use a
tty prompt.
I've found Unix Pageant's GTK password prompt to be a bit flaky on
Ubuntu 18.04. Part of the reason for that seems to be (I _think_) that
GTK has changed its internal order of setting things up, so that you
can no longer call gtk_widget_show_now() and expect that when it
returns everything is ready to do a gdk_seat_grab. Another part is
that - completely mysteriously as far as I can see - a _failed_
gdk_seat_grab(GDK_SEAT_CAPABILITY_KEYBOARD) has the side effect of
calling gdk_window_hide on the window you gave it!
So I've done a considerable restructuring that means we no longer
attempt to do the keyboard grab synchronously in gtk_askpass_setup.
Instead, we make keyboard grab attempts during the run of gtk_main,
scheduling each one on a timer if the previous attempt fails.
This means I need a visual indication of 'not ready for you to type
anything yet', which I've arranged by filling in the three drawing
areas to mid-grey. At the point when the keyboard grab completes and
the window becomes receptive to input, they turn into the usual one
black and two white.
Thanks to Jiri Kaspar for sending this patch (apart from the new docs
section, which is in my own words), which implements a feature we've
had as a wishlist item ('utf8-plus-vt100') for a long time.
I was actually surprised it was possible to implement it in so few
lines of code! I'd forgotten, or possibly never noticed in the first
place, that even in UTF-8 mode PuTTY not only accepts but still
_processes_ all the ISO 2022 control sequences and shift characters,
and keeps running track of all the same state in term->cset and
term->cset_attrs that it tracks in IS0-2022-enabled modes. It's just
that in UTF-8 mode, at the very last minute when a character+attribute
pair is about to be written into the terminal's character buffer, it
deliberately ignores the contents of those variables.
So all that was needed was a new flag checked at that last moment
which causes it not quite to ignore them after all, and bingo,
utf8-plus-vt100 is supported. And it works no matter which ISO 2022
sequences you're using; whether you're using ESC ( 0 to select the
line drawing set directly into GL and ESC ( B to get back when you're
done, or whether you send a preliminary ESC ( B ESC ) 0 to get GL/GR
to be ASCII and line drawing respectively so you can use SI and SO as
one-byte mode switches thereafter, both work just as well.
This implementation strategy has a couple of consequences, which I
don't think matter very much one way or the other but I document them
just in case they turn out to be important later:
- if an application expecting this mode has already filled your
terminal window with lqqqqqqqqk, then enabling this mode in Change
Settings won't retroactively turn them into the line drawing
characters you wanted, because no memory is preserved in the screen
buffer of what the ISO 2022 state was when they were printed. So
the application still has to do a screen refresh.
- on the other hand, if you already sent the ESC ( 0 or whatever to
put the terminal _into_ line drawing mode, and then you turn on
this mode in Change Settings, you _will_ still be in line drawing
mode, because the system _does_ remember your current ISO 2022
state at all times, whether it's currently applying it to output
printing characters or not.
In GTK 3.10 and above, high-DPI support is arranged by each window
having a property called a 'scale factor', which translates logical
pixels as seen by most of the GTK API (widget and window sizes and
positions, coordinates in the "draw" event, etc) into the physical
pixels on the screen. This is handled more or less transparently,
except that one side effect is that your Cairo-based drawing code had
better be able to cope with that scaling without getting confused.
PuTTY's isn't, because we do all our serious drawing on a separate
Cairo surface we made ourselves, and then blit subrectangles of that
to the window during updates. This has two bad consequences. Firstly,
our surface has a size derived from what GTK told us the drawing area
size is, i.e. corresponding to GTK's _logical_ pixels, so when the
scale factor is (say) 2, our drawing takes place at half size and then
gets scaled up by the final blit in the draw event, making it look
blurry and unpleasant. Secondly, those final blits seem to end up
offset by half a pixel, so that a second blit over the same
subrectangle doesn't _quite_ completely wipe out the previously
blitted data - so there's a ghostly rectangle left behind everywhere
the cursor has been.
It's not that GTK doesn't _let_ you find out the scale factor; it's
just that it's in an out-of-the-way piece of API that you have to call
specially. So now we do: our backing surface is now created at a pixel
resolution matching the screen's real pixels, and we translate GTK's
scale factor into an ordinary cairo_scale() before we commence
drawing. So we still end up drawing the same text at the same size -
and this strategy also means that non-text elements like cursor
outlines and underlining will be scaled up with the screen DPI rather
than stubbornly staying one physical pixel thick - but now it's nice
and sharp at full screen resolution, and the subrectangle blits in the
draw event are back to affecting the exact set of pixels we expect
them to.
One silly consequence is that, immediately after removing the last
one, I've installed a handler for the GTK "configure-event" signal!
That's because the GTK 3 docs claim that that's how you get notified
that your scale factor has changed at run time (e.g. if you
reconfigure the scale factor of a whole monitor in the GNOME settings
dialog). Actually in practice I seem to find out via the "draw" event
before "configure" bothers to tell me, but now I've got a usefully
idempotent function for 'check whether the scale factor has changed
and sort it out if so', I don't see any harm in calling it from
anywhere it _might_ be useful.
Every time I do my standard re-test against all three major versions
of GTK, I have to annoyingly remember that the GTK1 headers contain
code that depends on the old gcc language standard, and manually add
this flag on the configure command line. Time to put it where it
belongs, in configure.ac so I don't have to remember it again.
I've been using that signal since the very first commit of this source
file, as a combined way to be notified when the size of the drawing
area changes (typically due to user window resizing actions) and also
when the drawing area is first created and available to be drawn on.
Unfortunately, testing on Ubuntu 18.04, I ran into an oddity, in which
the call to gtk_widget_show(inst->window) in new_session_window() has
the side effect of delivering a spurious configure_event on the
drawing area with size 1x46 pixels. This causes the terminal to resize
itself to 1 column wide, and the mistake isn't rectified until a
followup configure-event arrives after new_session_window returns to
the GTK main loop. But that means terminal output can occur between
those two configure events (the connection-sharing "Reusing a shared
connection to host.name" is a good example), and when it does, it gets
embarrassingly wrapped at one character per line down the left column.
I briefly tried to bodge around this by trying to heuristically guess
which configure events were real and which were spurious, but I have
no faith in that strategy continuing to work. I think a better
approach is to abandon configure-event completely, and move to a
system in which the two purposes I was using it for are handled by two
_different_ GTK signals, namely "size-allocate" (for knowing when we
get resized) and "realize" (for knowing when the drawing area
physically exists for us to start setting up Cairo or GDK machinery).
The result seems to have fixed the silly one-column wrapping bug, and
retained the ability to handle window resizes, on every GTK version I
have conveniently available to test on, including GTK 3 both before
and after these spurious configures started to happen.
GTK 3 PuTTY/pterm has always assumed that if it was compiled with
_support_ for talking to the raw X11 layer underneath GTK and GDK,
then it was entitled to expect that raw X11 layer to exist at all
times, i.e. that GDK_DISPLAY_XDISPLAY would return a meaningful X
display that it could do useful things with. So if you ran it over the
GDK Wayland backend, it would immediately segfault.
Modern GTK applications need to cope with multiple GDK backends at run
time. It's fine for GTK PuTTY to _contain_ the code to find and use
underlying X11 primitives like the display and the X window id, but it
should be prepared to find that it's running on Wayland (or something
else again!) so those functions don't return anything useful - in
which case it should degrade gracefully to the subset of functionality
that can be accessed through backend-independent GTK calls.
Accordingly, I've centralised the use of GDK_DISPLAY_XDISPLAY into a
support function get_x_display() in gtkmisc.c, which starts by
checking that there actually is one first. All previous direct uses of
GDK_*_XDISPLAY now go via that function, and check the result for NULL
afterwards. (To save faffing about calling that function too many
times, I'm also caching the display pointer in more places, and
passing it as an extra argument to various subfunctions, mostly in
gtkfont.c.)
Similarly, the get_windowid() function that retrieves the window id to
put in the environment of pterm's child process has to be prepared for
there not to be a window id.
This isn't a complete fix for all Wayland-related problems. The other
one I'm currently aware of is that the default font is "server:fixed",
which is a bad default now that it won't be available on all backends.
And I expect that further problems will show up with more testing. But
it's a start.
The 2-minutely check to see whether new GSS credentials need to be
forwarded to the server is pointless if we're not even in the mode
where we _have_ forwarded a previous set.
This was made obvious by the overly verbose diagnostic fixed in the
previous commit, so it's a good thing that bug was temporarily there!
Now we don't generate that message as a side effect of the periodic
check for new GSS credentials; we only generate it as part of the much
larger slew of messages that happen during a rekey.
Commit d515e4f1a went through a lot of very different shapes before it
was finally pushed. In some of them, GSS kex had its own value in the
kex enumeration, but it was used in ssh.c but not in config.c
(because, as in the final version, it wasn't configured by the same
drag-list system as the rest of them). So we had to distinguish the
set of key exchange ids known to the program as a whole from the set
controllable in the configuration.
In the final version, GSS kex ended up even more separated from the
kex enumeration than that: the enum value KEX_GSS_SHA1_K5 isn't used
at all. Instead, GSS key exchange appears in the list at the point of
translation from the list of enum values into the list of pointers to
data structures full of kex methods.
But after all the changes, everyone involved forgot to revert the part
of the patch which split KEX_MAX in two and introduced the pointless
value KEX_GSS_SHA1_K5! Better late than never: I'm reverting it now,
to avoid confusion, and because I don't have any reason to think the
distinction will be useful for any other purpose.
Simon tells me it was left over from an abandoned configuration design
for GSS key exchange. Let's get rid of it before it starts cluttering
snapshot users' saved sessions.
The former has advantages in terms of keeping Kerberos credentials up
to date, but it also does something sufficiently weird to the usual
SSH host key system that I think it's worth making sure users have a
means of turning it off separately from the less intrusive GSS
userauth.
This was originally sent in as part of the GSSAPI patch, but I've
extracted into a separate commit because that patch was more than
complicated enough by itself.
This is a heavily edited (by me) version of a patch originally due to
Nico Williams and Viktor Dukhovni. Their comments:
* Don't delegate credentials when rekeying unless there's a new TGT
or the old service ticket is nearly expired.
* Check for the above conditions more frequently (every two minutes
by default) and rekey when we would delegate credentials.
* Do not rekey with very short service ticket lifetimes; some GSSAPI
libraries may lose the race to use an almost expired ticket. Adjust
the timing of rekey checks to try to avoid this possibility.
My further comments:
The most interesting thing about this patch to me is that the use of
GSS key exchange causes a switch over to a completely different model
of what host keys are for. This comes from RFC 4462 section 2.1: the
basic idea is that when your session is mostly bidirectionally
authenticated by the GSSAPI exchanges happening in initial kex and
every rekey, host keys become more or less vestigial, and their
remaining purpose is to allow a rekey to happen if the requirements of
the SSH protocol demand it at an awkward moment when the GSS
credentials are not currently available (e.g. timed out and haven't
been renewed yet). As such, there's no need for host keys to be
_permanent_ or to be a reliable identifier of a particular host, and
RFC 4462 allows for the possibility that they might be purely
transient and only for this kind of emergency fallback purpose.
Therefore, once PuTTY has done a GSS key exchange, it disconnects
itself completely from the permanent host key cache functions in
storage.h, and instead switches to a _transient_ host key cache stored
in memory with the lifetime of just that SSH session. That cache is
populated with keys received from the server as a side effect of GSS
kex (via the optional SSH2_MSG_KEXGSS_HOSTKEY message), and used if
later in the session we have to fall back to a non-GSS key exchange.
However, in practice servers we've tested against do not send a host
key in that way, so we also have a fallback method of populating the
transient cache by triggering an immediate non-GSS rekey straight
after userauth (reusing the code path we also use to turn on OpenSSH
delayed encryption without the race condition).
This is a preliminary refactoring for an upcoming change which will
need to affect every use of schedule_timer to wait for the next rekey:
those calls to schedule_timer are now centralised into a function that
does an organised piece of thinking about when the next timer should
be.
A side effect of this change is that the translation from
CONF_ssh_rekey_time to an actual tick count is now better proofed
against integer overflow (just in case the user entered a completely
silly value).
Used the wrong kind of brackets when initialising the actual hash (as
opposed to hash ref) %disc_reasons. Not sure how I didn't notice the
warning in yesterday's testing!
I'm increasingly wishing I'd written this parsing program in Python,
and yet another reason why is that using argparse for the command-line
handling makes it a lot harder to forget to write the --help text when
you add an extra option.
This makes it more feasible to use logparse.pl as an output filter on
a PuTTY SSH log file and discard the original file.
In particular, ever since commit b4fde270c, I've been finding it
useful when testing new code to direct my SSH logs to a named pipe and
have another terminal window give a real-time dump of them by running
'while cat $named_pipe; do :; done'. Now I can replace the 'cat' in
that shell command with 'logparse.pl -ve' and still get the Event Log
messages as well as the unpacked contents of all the packets.
This includes picking apart the various asymmetric crypto formats
(public keys, signatures, elliptic-curve point encodings) as far as
possible, but since the verbose decoder system in logparse.pl
currently has to work without benefit of statefulness, it's not always
possible - some of the ECC formats depend for their decoding on
everyone remembering _which_ ECC protocol was negotiated by the
KEXINITs.
The variable in question holds the return value of strchr when its
input string was const, so it ought logically to be const itself even
though the official prototype of strchr permits it not to be.
The type code for an mpint in the input format string is "m", not
"mpint". This hasn't come up yet as far as I can see, but as and when
I add verbose dump routines for packet types that involve asymmetric
crypto, it will.
This allows me to request a verbose dump of the contents of some
particular packet type, or for all packet types.
Currently, the only packet type for which I've written a verbose dump
function is KEXINIT, but the framework is there to add further verbose
dumpers as and when they're needed.
Colin Watson reports that on pre-releases of Ubuntu 18.04, configure
events which don't actually involve a change of window size show up
annoyingly often. Our handling of configure events involves throwing
away the backing Cairo surface, making a fresh blank one, and
scheduling a top-level callback to get terminal.c to do a repaint and
populate the new surface; so a draw event before that callback occurs
causes the window contents to flicker off and on again, not to mention
wasting a lot of time.
The simplest solution is to spot spurious configures, and respond by
not throwing away the previous Cairo surface in the first place.
My theory that this report was completely obsolete seems to have been
scuppered, in the most infuriating way possible: a user sent a report
from 0.70 of a null-pointer crash happening moments _before_ that
check, because the compressed line pointer passed to decompressline()
was NULL.
So there's still some need for this thing after all, and moreover, it
should be happening just before that decompressline() call as well as
after it!