We currently discard console and network input when the guest has no
input buffers. This patch changes that, so that we simply stop
listening to that fd until the guest refills its input buffers.
This is particularly important because hvc_console without interrupts
does backoff polling and so often lose characters if we discard.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Implements virtio-based console, network and block servers. The block
server uses a thread so it's async, which is an improvement over the
old synchronous implementation (but a little more complex).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes lguest able to use the virtio devices.
We change the device descriptor page from a simple array to a variable
length "type, config_len, status, config data..." format, and
implement virtio_config_ops to read from that config data.
We use the virtio ring implementation for an efficient Guest <-> Host
virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
host when it changes.
We also use LHCALL_NOTIFY on kernel addresses for very very early
console output. We could have another hypercall, but this hack works
quite well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch gets rid of the old lguest host I/O infrastructure and
replaces it with a single hypercall "LHCALL_NOTIFY" which takes an
address.
The main change is the removal of io.c: that mainly did inter-guest
I/O, which virtio doesn't yet support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This gets rid of the lguest bus, drivers and DMA mechanism, to make
way for a generic virtio mechanism.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio. Unlike the previous lguest
implementation:
1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
cacheline.
5) We do a modulo on a variable. We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.
Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
This adds the logic to convert the virtio ids into module aliases, and
includes a modalias entry in sysfs and the env var to make probing work.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is an hvc-based virtio console driver. It's suboptimal becuase
hvc expects to have raw access to interrupts and virtio doesn't assume
that, so it currently polls.
There are two solutions: expose hvc's "kick" interface, or wean off hvc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The block driver uses scatter-gather lists with sg[0] being the
request information (struct virtio_blk_outhdr) with the type, sector
and inbuf id. The next N sg entries are the bio itself, then the last
sg is the status byte. Whether the N entries are in or out depends on
whether it's a read or a write.
We accept the normal (SCSI) ioctls: they get handed through to the other
side which can then handle it or reply that it's unsupported. It's
not clear that this actually works in general, since I don't know
if blk_pc_request() requests have an accurate rq_data_dir().
Although we try to reply -ENOTTY on unsupported commands, ioctl(fd,
CDROMEJECT) returns success to userspace. This needs a separate
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <jens.axboe@oracle.com>
The network driver uses two virtqueues: one for input packets and one
for output packets. This has nice locking properties (ie. we don't do
any for recv vs send).
TODO:
1) Big packets.
2) Multi-client devices (maybe separate driver?).
3) Resolve freeing of old xmit skbs (Christian Borntraeger)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
This attempts to implement a "virtual I/O" layer which should allow
common drivers to be efficiently used across most virtual I/O
mechanisms. It will no-doubt need further enhancement.
The virtio drivers add buffers to virtio queues; as the buffers are consumed
the driver "interrupt" callbacks are invoked.
There is also a generic implementation of config space which drivers can query
to get setup information from the host.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
1) This allows us to get alot closer to booting bzImages.
2) It means we don't have to know page_offset.
3) The Guest needs to modify the boot pagetables to create the
PAGE_OFFSET mapping before jumping to C code.
4) guest_pa() walks the page tables rather than using page_offset.
5) We don't use page_offset to figure out whether to emulate: it was
always kinda quesationable, and won't work for instructions done
before remapping (bzImage unpacking in particular).
6) We still want the kernel address for tlb flushing: have the initial
hypercall give us that, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).
This patch allows Guests to specify what system call vector they want,
and we try to reserve it. We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is my first step in the migration of page_tables.c to the kernel
types and functions/macros (2.6.23-rc3). Seems to be working OK.
Signed-off-by: Matias Zabaljauregui <matias.zabaljauregui@cern.ch>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given
that this is very architecture specific.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Apply Clue 2x4 to lguest userland<->kernel handling code and the
lguest launcher. Pointers are not to be passed in u32's!
Basic rule of thumb: Anything passing u32's back and forth should be
passing unsigned longs to be portable to 64 bit archs.
For those who forgotten already, I repeat: NO POINTERS IN u32!
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.
This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently we look at the "trapnum" to see if the Guest wants a
hypercall. But once the hypercall is done we have to reset trapnum to
a bogus value, otherwise if we exit to userspace and return, we'd run
the same hypercall twice (that was a nasty bug to find!).
This has two main effects:
1) When Jes's patch changes the hypercall args to be a generic "struct
hcall_args" we simply change the type of "lg->hcall". It's set by
arch code, so if it has to copy args or something it can do so, and
point "hcall" into lg->arch somewhere.
2) Async hypercalls only get run when an actual hypercall is pending.
This simplfies the code a little and is a more logical semantic.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they
will be located together and allow it to map directly to a struct
hcall_ring entry (which will be renamed struct hcall_args as in a
subsequent patch).
This is in preparation for making the code hcall code architecture
independent.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This simplifies the code a little, in preparation for allowing
alternate system call vectors in guests (Plan 9 uses 0x40).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Back when we had all the Guest state in the switcher, we had a fixed
array of them. This is no longer necessary.
If we switch the network code to using random_ether_addr (46 bits is
enough to avoid clashes), we can get rid of the concept of "guest id"
altogether.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In order to avoid problematic special linking of the Launcher, we give
the Host an offset: this means we can use any memory region in the
Launcher as Guest memory rather than insisting on mmap() at 0.
The result is quite pleasing: a number of casts are replaced with
simple additions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Plan9 kernel binaries don't neatly align their ELF sections to our
page boundaries.
Signed-off-by: Ronald G. Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lguest uses a "switcher" shim mapped high to bounce between host and
guest. As lguest becomes less i386-centric, we separate this code
into a subdir.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops). This moves the
guest side to arch/x86/lguest where it's closer to related code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Move architecture specific portion of lg_hcall code to asm-i386/lg_hcall.h
and have it included from linux/lguest.h.
[Changed to asm-i386/lguest_hcall.h so documentation finds it -RR]
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jes Sorensen <jes@sgi.com>
Currently lguest will spend a lot of of time waking up the host, as it
cannot go tickless (if the [host] TSC has been marked unstable). On my
laptop I was getting ~40% of wakeups from lguest.
With this patch applied, my laptop is much happier!
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lguest_launcher.h uses "u32" not "__u32", which sets a bad example. Fix that,
and include <linux/types.h>.
This means we need to use -I on the Launcher build line so types.h is found.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
o Describe the new split configurations
o Highlight code documentation in drivers/lguest/README
o Point out necessity of having a getty on /dev/hvc0
o Remove gratuitous "m" in example
o Don't discuss I/O model here, stick to user documentation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use copy_to_user() when copying a struct timespec to the guest -
put_user() cannot handle two long's in one go on a 64bit arch.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
These two callsites should really be errx instead of err, since there is
no errno associated with them in the moment they are issued.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Glauber de Oliveira Costa <gcosta@redhat.com>
To actually write a bootloader (or, say, the lguest launcher)
currently requires duplication of these structures. Making them
includable from userspace is much nicer.
We merge the common userspace-required definitions of e820_32/64.h
into e820.h for export.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Fix mnt_flush_task() misplaced kernel-doc.
Fix typos in some of the doc text.
Warning(linux-2.6.23-git17//fs/proc/base.c:2280): No description found for parameter 'mnt'
Warning(linux-2.6.23-git17//fs/proc/base.c:2280): No description found for parameter 'pid'
Warning(linux-2.6.23-git17//fs/proc/base.c:2280): No description found for parameter 'tgid'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc for auditsc parameter changes.
Warning(linux-2.6.23-git17//kernel/auditsc.c:1623): No description found for parameter 'dentry'
Warning(linux-2.6.23-git17//kernel/auditsc.c:1666): No description found for parameter 'dentry'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 238e4f142c ("ide: add
IDE_HFLAG_NO_LBA48 and IDE_HFLAG_NO_LBA48_DMA host flags") caused a
regression because the host_flags in struct hwif_s wasn't expanded to
cope with the fact that the host flags no longer fit in 16 bits.
Signed-off-by: David S. Miller <davem@davemloft.net>
[ I hate having to add good commit descriptions. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: use KBUILD_CFLAGS and KBUILD_AFLAGS in Makefile
Blackfin arch: Javier Herrer writes: fix building when icache and dcache is disabled
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
sched: don't clear PF_VCPU in scheduler
KVM: Improve local apic timer wraparound handling
KVM: Fix local apic timer divide by zero
KVM: Move kvm_guest_exit() after local_irq_enable()
KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3
KVM: VMX: Force vm86 mode if setting flags during real mode
KVM: x86 emulator: implement 'movnti mem, reg'
KVM: VMX: Reset mmu context when entering real mode
KVM: VMX: Handle NMIs before enabling interrupts and preemption
KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()
KVM: x86 emulator: fix repne/repnz decoding
KVM: x86 emulator: fix merge screwup due to emulator split
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
[IPSEC] IPV6: Fix to add tunnel mode SA correctly.
[NET]: Cut off the queue_mapping field from sk_buff
[NET]: Hide the queue_mapping field inside netif_subqueue_stopped
[NET]: Make and use skb_get_queue_mapping
[NET]: Use the skb_set_queue_mapping where appropriate
[INET]: Use MODULE_ALIAS_NET_PF_PROTO_TYPE where possible.
[INET]: Let inet_diag and friends autoload
[NIU]: Cleanup PAGE_SIZE checks a bit
[NET]: Fix SKB_WITH_OVERHEAD calculation
[ATM]: Fix clip module reload crash.
[TG3]: Update version to 3.85
[TG3]: PCI command adjustment
[TG3]: Add management FW version to ethtool report
[TG3]: Add 5723 support
[Bluetooth] Convert RFCOMM to use kthread API
[Bluetooth] Add constant for Bluetooth socket options level
[Bluetooth] Add support for handling simple eSCO links
[Bluetooth] Add address and channel attribute to RFCOMM TTY device
[Bluetooth] Fix wrong argument in debug code of HIDP
[Bluetooth] Add generic driver for Bluetooth USB devices
...
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Enable restart support for lite5200 board
[POWERPC] Add restart support for mpc52xx based platforms
[POWERPC] Update device tree binding for mpc5200 gpt
[POWERPC] Add mpc52xx_find_and_map_path(), refactor utility functions
[POWERPC] bestcomm: Restrict bus prefetch bugfix to original mpc5200 silicon.