2019-05-30 02:57:35 +03:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2009-07-14 03:02:34 +04:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2009, Microsoft Corporation.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Haiyang Zhang <haiyangz@microsoft.com>
|
|
|
|
* Hank Janssen <hjanssen@microsoft.com>
|
|
|
|
*/
|
2011-03-30 00:58:47 +04:00
|
|
|
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
|
|
|
|
2021-10-25 15:21:11 +03:00
|
|
|
#include <linux/io.h>
|
2009-08-18 04:22:08 +04:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/mm.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
|
|
|
#include <linux/slab.h>
|
2009-07-30 01:00:13 +04:00
|
|
|
#include <linux/vmalloc.h>
|
2011-10-04 23:29:52 +04:00
|
|
|
#include <linux/hyperv.h>
|
2018-03-05 08:17:18 +03:00
|
|
|
#include <linux/random.h>
|
2015-01-10 10:54:32 +03:00
|
|
|
#include <linux/clockchips.h>
|
2021-04-16 17:34:49 +03:00
|
|
|
#include <linux/delay.h>
|
2021-03-03 00:38:18 +03:00
|
|
|
#include <linux/interrupt.h>
|
2019-07-01 07:25:56 +03:00
|
|
|
#include <clocksource/hyperv_timer.h>
|
2015-01-10 10:54:32 +03:00
|
|
|
#include <asm/mshyperv.h>
|
2011-05-13 06:34:28 +04:00
|
|
|
#include "hyperv_vmbus.h"
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2009-07-28 00:47:24 +04:00
|
|
|
/* The one and only */
|
2018-10-18 08:09:28 +03:00
|
|
|
struct hv_context hv_context;
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2010-03-05 01:11:00 +03:00
|
|
|
/*
|
2010-11-09 01:04:42 +03:00
|
|
|
* hv_init - Main initialization routine.
|
2009-09-01 07:23:33 +04:00
|
|
|
*
|
|
|
|
* This routine must be called before any other routines in here are called
|
|
|
|
*/
|
2010-11-09 01:04:42 +03:00
|
|
|
int hv_init(void)
|
2009-07-14 03:02:34 +04:00
|
|
|
{
|
2017-02-12 09:02:19 +03:00
|
|
|
hv_context.cpu_context = alloc_percpu(struct hv_per_cpu_context);
|
|
|
|
if (!hv_context.cpu_context)
|
|
|
|
return -ENOMEM;
|
2011-08-25 20:48:51 +04:00
|
|
|
return 0;
|
2009-07-14 03:02:34 +04:00
|
|
|
}
|
|
|
|
|
2021-03-03 00:38:13 +03:00
|
|
|
/*
|
|
|
|
* Functions for allocating and freeing memory with size and
|
|
|
|
* alignment HV_HYP_PAGE_SIZE. These functions are needed because
|
|
|
|
* the guest page size may not be the same as the Hyper-V page
|
|
|
|
* size. We depend upon kmalloc() aligning power-of-two size
|
|
|
|
* allocations to the allocation size boundary, so that the
|
|
|
|
* allocated memory appears to Hyper-V as a page of the size
|
|
|
|
* it expects.
|
|
|
|
*/
|
|
|
|
|
|
|
|
void *hv_alloc_hyperv_page(void)
|
|
|
|
{
|
|
|
|
BUILD_BUG_ON(PAGE_SIZE < HV_HYP_PAGE_SIZE);
|
|
|
|
|
|
|
|
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
|
|
|
|
return (void *)__get_free_page(GFP_KERNEL);
|
|
|
|
else
|
|
|
|
return kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
|
|
|
|
}
|
|
|
|
|
|
|
|
void *hv_alloc_hyperv_zeroed_page(void)
|
|
|
|
{
|
|
|
|
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
|
|
|
|
return (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
|
|
|
|
else
|
|
|
|
return kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
|
|
|
|
}
|
|
|
|
|
|
|
|
void hv_free_hyperv_page(unsigned long addr)
|
|
|
|
{
|
|
|
|
if (PAGE_SIZE == HV_HYP_PAGE_SIZE)
|
|
|
|
free_page(addr);
|
|
|
|
else
|
|
|
|
kfree((void *)addr);
|
|
|
|
}
|
|
|
|
|
2010-03-05 01:11:00 +03:00
|
|
|
/*
|
2010-11-09 01:04:42 +03:00
|
|
|
* hv_post_message - Post a message using the hypervisor message IPC.
|
2009-09-01 07:23:33 +04:00
|
|
|
*
|
|
|
|
* This involves a hypercall.
|
|
|
|
*/
|
2012-03-28 10:58:07 +04:00
|
|
|
int hv_post_message(union hv_connection_id connection_id,
|
2010-11-09 01:04:41 +03:00
|
|
|
enum hv_message_type message_type,
|
|
|
|
void *payload, size_t payload_size)
|
2009-07-14 03:02:34 +04:00
|
|
|
{
|
2010-11-09 01:04:41 +03:00
|
|
|
struct hv_input_post_message *aligned_msg;
|
2017-02-12 09:02:19 +03:00
|
|
|
struct hv_per_cpu_context *hv_cpu;
|
2015-12-15 03:01:40 +03:00
|
|
|
u64 status;
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2010-11-09 01:04:41 +03:00
|
|
|
if (payload_size > HV_MESSAGE_PAYLOAD_BYTE_COUNT)
|
2011-06-07 02:50:09 +04:00
|
|
|
return -EMSGSIZE;
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2017-02-12 09:02:19 +03:00
|
|
|
hv_cpu = get_cpu_ptr(hv_context.cpu_context);
|
|
|
|
aligned_msg = hv_cpu->post_msg_page;
|
2010-11-09 01:04:41 +03:00
|
|
|
aligned_msg->connectionid = connection_id;
|
2014-08-29 05:29:52 +04:00
|
|
|
aligned_msg->reserved = 0;
|
2010-11-09 01:04:41 +03:00
|
|
|
aligned_msg->message_type = message_type;
|
|
|
|
aligned_msg->payload_size = payload_size;
|
|
|
|
memcpy((void *)aligned_msg->payload, payload, payload_size);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2021-10-25 15:21:12 +03:00
|
|
|
if (hv_isolation_type_snp())
|
|
|
|
status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
|
|
|
|
(void *)aligned_msg, NULL,
|
|
|
|
sizeof(*aligned_msg));
|
|
|
|
else
|
|
|
|
status = hv_do_hypercall(HVCALL_POST_MESSAGE,
|
|
|
|
aligned_msg, NULL);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2017-05-18 20:46:07 +03:00
|
|
|
/* Preemption must remain disabled until after the hypercall
|
|
|
|
* so some other thread can't get scheduled onto this cpu and
|
|
|
|
* corrupt the per-cpu post_msg_page
|
|
|
|
*/
|
|
|
|
put_cpu_ptr(hv_cpu);
|
|
|
|
|
2021-04-17 03:43:03 +03:00
|
|
|
return hv_result(status);
|
2009-07-14 03:02:34 +04:00
|
|
|
}
|
|
|
|
|
2013-06-19 07:28:10 +04:00
|
|
|
int hv_synic_alloc(void)
|
|
|
|
{
|
|
|
|
int cpu;
|
2018-08-11 02:06:11 +03:00
|
|
|
struct hv_per_cpu_context *hv_cpu;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* First, zero all per-cpu memory areas so hv_synic_free() can
|
|
|
|
* detect what memory has been allocated and cleanup properly
|
|
|
|
* after any failures.
|
|
|
|
*/
|
|
|
|
for_each_present_cpu(cpu) {
|
|
|
|
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
|
|
|
|
memset(hv_cpu, 0, sizeof(*hv_cpu));
|
|
|
|
}
|
2013-06-19 07:28:10 +04:00
|
|
|
|
treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:
kzalloc(a * b, gfp)
with:
kcalloc(a * b, gfp)
as well as handling cases of:
kzalloc(a * b * c, gfp)
with:
kzalloc(array3_size(a, b, c), gfp)
as it's slightly less ugly than:
kzalloc_array(array_size(a, b), c, gfp)
This does, however, attempt to ignore constant size factors like:
kzalloc(4 * 1024, gfp)
though any constants defined via macros get caught up in the conversion.
Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.
The Coccinelle script used for this was:
// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@
(
kzalloc(
- (sizeof(TYPE)) * E
+ sizeof(TYPE) * E
, ...)
|
kzalloc(
- (sizeof(THING)) * E
+ sizeof(THING) * E
, ...)
)
// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@
(
kzalloc(
- sizeof(u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * (COUNT)
+ COUNT
, ...)
|
kzalloc(
- sizeof(u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(__u8) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(char) * COUNT
+ COUNT
, ...)
|
kzalloc(
- sizeof(unsigned char) * COUNT
+ COUNT
, ...)
)
// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@
(
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_ID)
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_ID
+ COUNT_ID, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (COUNT_CONST)
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * COUNT_CONST
+ COUNT_CONST, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_ID)
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_ID
+ COUNT_ID, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (COUNT_CONST)
+ COUNT_CONST, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * COUNT_CONST
+ COUNT_CONST, sizeof(THING)
, ...)
)
// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@
- kzalloc
+ kcalloc
(
- SIZE * COUNT
+ COUNT, SIZE
, ...)
// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@
(
kzalloc(
- sizeof(TYPE) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(TYPE) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(TYPE))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * (COUNT) * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * (STRIDE)
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
|
kzalloc(
- sizeof(THING) * COUNT * STRIDE
+ array3_size(COUNT, STRIDE, sizeof(THING))
, ...)
)
// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@
(
kzalloc(
- sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(THING1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(THING1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * COUNT
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
|
kzalloc(
- sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+ array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
, ...)
)
// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@
(
kzalloc(
- (COUNT) * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * STRIDE * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- (COUNT) * (STRIDE) * (SIZE)
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
|
kzalloc(
- COUNT * STRIDE * SIZE
+ array3_size(COUNT, STRIDE, SIZE)
, ...)
)
// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@
(
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(
- (E1) * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * E3
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- (E1) * (E2) * (E3)
+ array3_size(E1, E2, E3)
, ...)
|
kzalloc(
- E1 * E2 * E3
+ array3_size(E1, E2, E3)
, ...)
)
// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@
(
kzalloc(sizeof(THING) * C2, ...)
|
kzalloc(sizeof(TYPE) * C2, ...)
|
kzalloc(C1 * C2 * C3, ...)
|
kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * (E2)
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(TYPE) * E2
+ E2, sizeof(TYPE)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * (E2)
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- sizeof(THING) * E2
+ E2, sizeof(THING)
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * E2
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- (E1) * (E2)
+ E1, E2
, ...)
|
- kzalloc
+ kcalloc
(
- E1 * E2
+ E1, E2
, ...)
)
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-13 00:03:40 +03:00
|
|
|
hv_context.hv_numa_map = kcalloc(nr_node_ids, sizeof(struct cpumask),
|
2018-03-05 08:17:12 +03:00
|
|
|
GFP_KERNEL);
|
2015-08-05 10:52:38 +03:00
|
|
|
if (hv_context.hv_numa_map == NULL) {
|
|
|
|
pr_err("Unable to allocate NUMA map\n");
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2016-12-07 12:16:25 +03:00
|
|
|
for_each_present_cpu(cpu) {
|
2018-08-11 02:06:11 +03:00
|
|
|
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
|
2017-02-12 09:02:19 +03:00
|
|
|
|
|
|
|
tasklet_init(&hv_cpu->msg_dpc,
|
|
|
|
vmbus_on_msg_dpc, (unsigned long) hv_cpu);
|
|
|
|
|
2021-10-25 15:21:11 +03:00
|
|
|
/*
|
|
|
|
* Synic message and event pages are allocated by paravisor.
|
|
|
|
* Skip these pages allocation here.
|
|
|
|
*/
|
|
|
|
if (!hv_isolation_type_snp()) {
|
|
|
|
hv_cpu->synic_message_page =
|
|
|
|
(void *)get_zeroed_page(GFP_ATOMIC);
|
|
|
|
if (hv_cpu->synic_message_page == NULL) {
|
|
|
|
pr_err("Unable to allocate SYNIC message page\n");
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
hv_cpu->synic_event_page =
|
|
|
|
(void *)get_zeroed_page(GFP_ATOMIC);
|
|
|
|
if (hv_cpu->synic_event_page == NULL) {
|
|
|
|
pr_err("Unable to allocate SYNIC event page\n");
|
|
|
|
goto err;
|
|
|
|
}
|
2013-06-19 07:28:10 +04:00
|
|
|
}
|
2014-08-29 05:29:52 +04:00
|
|
|
|
2017-02-12 09:02:19 +03:00
|
|
|
hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
|
|
|
|
if (hv_cpu->post_msg_page == NULL) {
|
2014-08-29 05:29:52 +04:00
|
|
|
pr_err("Unable to allocate post msg page\n");
|
|
|
|
goto err;
|
|
|
|
}
|
2013-06-19 07:28:10 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
err:
|
2018-08-02 06:08:25 +03:00
|
|
|
/*
|
|
|
|
* Any memory allocations that succeeded will be freed when
|
|
|
|
* the caller cleans up by calling hv_synic_free()
|
|
|
|
*/
|
2013-06-19 07:28:10 +04:00
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
void hv_synic_free(void)
|
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
|
2017-02-12 09:02:19 +03:00
|
|
|
for_each_present_cpu(cpu) {
|
|
|
|
struct hv_per_cpu_context *hv_cpu
|
|
|
|
= per_cpu_ptr(hv_context.cpu_context, cpu);
|
|
|
|
|
2018-08-02 06:08:25 +03:00
|
|
|
free_page((unsigned long)hv_cpu->synic_event_page);
|
|
|
|
free_page((unsigned long)hv_cpu->synic_message_page);
|
|
|
|
free_page((unsigned long)hv_cpu->post_msg_page);
|
2017-02-12 09:02:19 +03:00
|
|
|
}
|
|
|
|
|
2015-08-05 10:52:38 +03:00
|
|
|
kfree(hv_context.hv_numa_map);
|
2013-06-19 07:28:10 +04:00
|
|
|
}
|
|
|
|
|
2010-03-05 01:11:00 +03:00
|
|
|
/*
|
2018-03-05 08:17:13 +03:00
|
|
|
* hv_synic_init - Initialize the Synthetic Interrupt Controller.
|
2009-09-01 07:23:33 +04:00
|
|
|
*
|
|
|
|
* If it is already initialized by another entity (ie x2v shim), we need to
|
|
|
|
* retrieve the initialized message and event pages. Otherwise, we create and
|
|
|
|
* initialize the message and event pages.
|
|
|
|
*/
|
2019-09-06 02:01:15 +03:00
|
|
|
void hv_synic_enable_regs(unsigned int cpu)
|
2009-07-14 03:02:34 +04:00
|
|
|
{
|
2017-02-12 09:02:19 +03:00
|
|
|
struct hv_per_cpu_context *hv_cpu
|
|
|
|
= per_cpu_ptr(hv_context.cpu_context, cpu);
|
2009-08-20 23:11:26 +04:00
|
|
|
union hv_synic_simp simp;
|
|
|
|
union hv_synic_siefp siefp;
|
2010-11-09 01:04:41 +03:00
|
|
|
union hv_synic_sint shared_sint;
|
2009-08-20 23:11:26 +04:00
|
|
|
union hv_synic_scontrol sctrl;
|
2010-01-22 22:17:50 +03:00
|
|
|
|
|
|
|
/* Setup the Synic's message page */
|
2021-03-03 00:38:15 +03:00
|
|
|
simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
|
2010-11-09 01:04:39 +03:00
|
|
|
simp.simp_enabled = 1;
|
2021-10-25 15:21:11 +03:00
|
|
|
|
|
|
|
if (hv_isolation_type_snp()) {
|
|
|
|
hv_cpu->synic_message_page
|
|
|
|
= memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
|
|
|
|
HV_HYP_PAGE_SIZE, MEMREMAP_WB);
|
|
|
|
if (!hv_cpu->synic_message_page)
|
|
|
|
pr_err("Fail to map syinc message page.\n");
|
|
|
|
} else {
|
|
|
|
simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
|
|
|
|
>> HV_HYP_PAGE_SHIFT;
|
|
|
|
}
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2010-01-22 22:17:50 +03:00
|
|
|
/* Setup the Synic's event page */
|
2021-03-03 00:38:15 +03:00
|
|
|
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
|
2010-11-09 01:04:39 +03:00
|
|
|
siefp.siefp_enabled = 1;
|
2021-10-25 15:21:11 +03:00
|
|
|
|
|
|
|
if (hv_isolation_type_snp()) {
|
|
|
|
hv_cpu->synic_event_page =
|
|
|
|
memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
|
|
|
|
HV_HYP_PAGE_SIZE, MEMREMAP_WB);
|
|
|
|
|
|
|
|
if (!hv_cpu->synic_event_page)
|
|
|
|
pr_err("Fail to map syinc event page.\n");
|
|
|
|
} else {
|
|
|
|
siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
|
|
|
|
>> HV_HYP_PAGE_SHIFT;
|
|
|
|
}
|
2010-01-22 22:17:50 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
|
2009-09-01 07:23:33 +04:00
|
|
|
|
|
|
|
/* Setup the shared SINT. */
|
2021-03-03 00:38:18 +03:00
|
|
|
if (vmbus_irq != -1)
|
|
|
|
enable_percpu_irq(vmbus_irq, 0);
|
2021-03-03 00:38:15 +03:00
|
|
|
shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
|
|
|
|
VMBUS_MESSAGE_SINT);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2021-03-03 00:38:18 +03:00
|
|
|
shared_sint.vector = vmbus_interrupt;
|
2010-11-09 01:04:41 +03:00
|
|
|
shared_sint.masked = false;
|
2021-03-03 00:38:17 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* On architectures where Hyper-V doesn't support AEOI (e.g., ARM64),
|
|
|
|
* it doesn't provide a recommendation flag and AEOI must be disabled.
|
|
|
|
*/
|
|
|
|
#ifdef HV_DEPRECATING_AEOI_RECOMMENDED
|
|
|
|
shared_sint.auto_eoi =
|
|
|
|
!(ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED);
|
|
|
|
#else
|
|
|
|
shared_sint.auto_eoi = 0;
|
|
|
|
#endif
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
|
|
|
|
shared_sint.as_uint64);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2009-07-28 00:47:24 +04:00
|
|
|
/* Enable the global synic bit */
|
2021-03-03 00:38:15 +03:00
|
|
|
sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
|
2010-11-09 01:04:39 +03:00
|
|
|
sctrl.enable = 1;
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
|
2019-09-06 02:01:15 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
int hv_synic_init(unsigned int cpu)
|
|
|
|
{
|
|
|
|
hv_synic_enable_regs(cpu);
|
2009-07-14 03:02:34 +04:00
|
|
|
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 04:11:49 +03:00
|
|
|
hv_stimer_legacy_init(cpu, VMBUS_MESSAGE_SINT);
|
2015-02-27 22:25:57 +03:00
|
|
|
|
2019-07-01 07:25:56 +03:00
|
|
|
return 0;
|
2015-02-27 22:25:57 +03:00
|
|
|
}
|
|
|
|
|
2010-03-05 01:11:00 +03:00
|
|
|
/*
|
2010-11-09 01:04:42 +03:00
|
|
|
* hv_synic_cleanup - Cleanup routine for hv_synic_init().
|
2009-09-01 07:23:33 +04:00
|
|
|
*/
|
2019-09-06 02:01:15 +03:00
|
|
|
void hv_synic_disable_regs(unsigned int cpu)
|
2009-07-14 03:02:34 +04:00
|
|
|
{
|
2021-10-25 15:21:11 +03:00
|
|
|
struct hv_per_cpu_context *hv_cpu
|
|
|
|
= per_cpu_ptr(hv_context.cpu_context, cpu);
|
2010-11-09 01:04:41 +03:00
|
|
|
union hv_synic_sint shared_sint;
|
2009-08-20 23:11:26 +04:00
|
|
|
union hv_synic_simp simp;
|
|
|
|
union hv_synic_siefp siefp;
|
2015-02-27 22:25:55 +03:00
|
|
|
union hv_synic_scontrol sctrl;
|
2019-09-06 02:01:15 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
|
|
|
|
VMBUS_MESSAGE_SINT);
|
2019-09-06 02:01:15 +03:00
|
|
|
|
|
|
|
shared_sint.masked = 1;
|
|
|
|
|
|
|
|
/* Need to correctly cleanup in the case of SMP!!! */
|
|
|
|
/* Disable the interrupt */
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
|
|
|
|
shared_sint.as_uint64);
|
2019-09-06 02:01:15 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
|
2021-10-25 15:21:11 +03:00
|
|
|
/*
|
|
|
|
* In Isolation VM, sim and sief pages are allocated by
|
|
|
|
* paravisor. These pages also will be used by kdump
|
|
|
|
* kernel. So just reset enable bit here and keep page
|
|
|
|
* addresses.
|
|
|
|
*/
|
2019-09-06 02:01:15 +03:00
|
|
|
simp.simp_enabled = 0;
|
2021-10-25 15:21:11 +03:00
|
|
|
if (hv_isolation_type_snp())
|
|
|
|
memunmap(hv_cpu->synic_message_page);
|
|
|
|
else
|
|
|
|
simp.base_simp_gpa = 0;
|
2019-09-06 02:01:15 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
|
2019-09-06 02:01:15 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
|
2019-09-06 02:01:15 +03:00
|
|
|
siefp.siefp_enabled = 0;
|
2021-10-25 15:21:11 +03:00
|
|
|
|
|
|
|
if (hv_isolation_type_snp())
|
|
|
|
memunmap(hv_cpu->synic_event_page);
|
|
|
|
else
|
|
|
|
siefp.base_siefp_gpa = 0;
|
2019-09-06 02:01:15 +03:00
|
|
|
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
|
2019-09-06 02:01:15 +03:00
|
|
|
|
|
|
|
/* Disable the global synic bit */
|
2021-03-03 00:38:15 +03:00
|
|
|
sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
|
2019-09-06 02:01:15 +03:00
|
|
|
sctrl.enable = 0;
|
2021-03-03 00:38:15 +03:00
|
|
|
hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
|
2021-03-03 00:38:18 +03:00
|
|
|
|
|
|
|
if (vmbus_irq != -1)
|
|
|
|
disable_percpu_irq(vmbus_irq);
|
2019-09-06 02:01:15 +03:00
|
|
|
}
|
|
|
|
|
2021-04-16 17:34:49 +03:00
|
|
|
#define HV_MAX_TRIES 3
|
|
|
|
/*
|
|
|
|
* Scan the event flags page of 'this' CPU looking for any bit that is set. If we find one
|
|
|
|
* bit set, then wait for a few milliseconds. Repeat these steps for a maximum of 3 times.
|
|
|
|
* Return 'true', if there is still any set bit after this operation; 'false', otherwise.
|
|
|
|
*
|
|
|
|
* If a bit is set, that means there is a pending channel interrupt. The expectation is
|
|
|
|
* that the normal interrupt handling mechanism will find and process the channel interrupt
|
|
|
|
* "very soon", and in the process clear the bit.
|
|
|
|
*/
|
|
|
|
static bool hv_synic_event_pending(void)
|
|
|
|
{
|
|
|
|
struct hv_per_cpu_context *hv_cpu = this_cpu_ptr(hv_context.cpu_context);
|
|
|
|
union hv_synic_event_flags *event =
|
|
|
|
(union hv_synic_event_flags *)hv_cpu->synic_event_page + VMBUS_MESSAGE_SINT;
|
|
|
|
unsigned long *recv_int_page = event->flags; /* assumes VMBus version >= VERSION_WIN8 */
|
|
|
|
bool pending;
|
|
|
|
u32 relid;
|
|
|
|
int tries = 0;
|
|
|
|
|
|
|
|
retry:
|
|
|
|
pending = false;
|
|
|
|
for_each_set_bit(relid, recv_int_page, HV_EVENT_FLAGS_COUNT) {
|
|
|
|
/* Special case - VMBus channel protocol messages */
|
|
|
|
if (relid == 0)
|
|
|
|
continue;
|
|
|
|
pending = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (pending && tries++ < HV_MAX_TRIES) {
|
|
|
|
usleep_range(10000, 20000);
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
return pending;
|
|
|
|
}
|
2021-03-03 00:38:15 +03:00
|
|
|
|
2019-09-06 02:01:15 +03:00
|
|
|
int hv_synic_cleanup(unsigned int cpu)
|
|
|
|
{
|
2016-12-08 01:53:12 +03:00
|
|
|
struct vmbus_channel *channel, *sc;
|
|
|
|
bool channel_found = false;
|
2009-07-14 03:02:34 +04:00
|
|
|
|
2021-04-16 17:34:49 +03:00
|
|
|
if (vmbus_connection.conn_state != CONNECTED)
|
|
|
|
goto always_cleanup;
|
|
|
|
|
2020-04-06 03:15:04 +03:00
|
|
|
/*
|
|
|
|
* Hyper-V does not provide a way to change the connect CPU once
|
2020-11-10 22:01:18 +03:00
|
|
|
* it is set; we must prevent the connect CPU from going offline
|
|
|
|
* while the VM is running normally. But in the panic or kexec()
|
|
|
|
* path where the vmbus is already disconnected, the CPU must be
|
|
|
|
* allowed to shut down.
|
2020-04-06 03:15:04 +03:00
|
|
|
*/
|
2021-04-16 17:34:49 +03:00
|
|
|
if (cpu == VMBUS_CONNECT_CPU)
|
2020-04-06 03:15:04 +03:00
|
|
|
return -EBUSY;
|
|
|
|
|
2016-12-08 01:53:12 +03:00
|
|
|
/*
|
|
|
|
* Search for channels which are bound to the CPU we're about to
|
2020-04-06 03:15:12 +03:00
|
|
|
* cleanup. In case we find one and vmbus is still connected, we
|
|
|
|
* fail; this will effectively prevent CPU offlining.
|
|
|
|
*
|
|
|
|
* TODO: Re-bind the channels to different CPUs.
|
2016-12-08 01:53:12 +03:00
|
|
|
*/
|
|
|
|
mutex_lock(&vmbus_connection.channel_mutex);
|
|
|
|
list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
|
|
|
|
if (channel->target_cpu == cpu) {
|
|
|
|
channel_found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
list_for_each_entry(sc, &channel->sc_list, sc_list) {
|
|
|
|
if (sc->target_cpu == cpu) {
|
|
|
|
channel_found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (channel_found)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
mutex_unlock(&vmbus_connection.channel_mutex);
|
|
|
|
|
2021-04-16 17:34:49 +03:00
|
|
|
if (channel_found)
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* channel_found == false means that any channels that were previously
|
|
|
|
* assigned to the CPU have been reassigned elsewhere with a call of
|
|
|
|
* vmbus_send_modifychannel(). Scan the event flags page looking for
|
|
|
|
* bits that are set and waiting with a timeout for vmbus_chan_sched()
|
|
|
|
* to process such bits. If bits are still set after this operation
|
|
|
|
* and VMBus is connected, fail the CPU offlining operation.
|
|
|
|
*/
|
|
|
|
if (vmbus_proto_version >= VERSION_WIN10_V4_1 && hv_synic_event_pending())
|
2016-12-08 01:53:12 +03:00
|
|
|
return -EBUSY;
|
|
|
|
|
2021-04-16 17:34:49 +03:00
|
|
|
always_cleanup:
|
x86/hyperv: Initialize clockevents earlier in CPU onlining
Hyper-V has historically initialized stimer-based clockevents late in the
process of onlining a CPU because clockevents depend on stimer
interrupts. In the original Hyper-V design, stimer interrupts generate a
VMbus message, so the VMbus machinery must be running first, and VMbus
can't be initialized until relatively late. On x86/64, LAPIC timer based
clockevents are used during early initialization before VMbus and
stimer-based clockevents are ready, and again during CPU offlining after
the stimer clockevents have been shut down.
Unfortunately, this design creates problems when offlining CPUs for
hibernation or other purposes. stimer-based clockevents are shut down
relatively early in the offlining process, so clockevents_unbind_device()
must be used to fallback to the LAPIC-based clockevents for the remainder
of the offlining process. Furthermore, the late initialization and early
shutdown of stimer-based clockevents doesn't work well on ARM64 since there
is no other timer like the LAPIC to fallback to. So CPU onlining and
offlining doesn't work properly.
Fix this by recognizing that stimer Direct Mode is the normal path for
newer versions of Hyper-V on x86/64, and the only path on other
architectures. With stimer Direct Mode, stimer interrupts don't require any
VMbus machinery. stimer clockevents can be initialized and shut down
consistent with how it is done for other clockevent devices. While the old
VMbus-based stimer interrupts must still be supported for backward
compatibility on x86, that mode of operation can be treated as legacy.
So add a new Hyper-V stimer entry in the CPU hotplug state list, and use
that new state when in Direct Mode. Update the Hyper-V clocksource driver
to allocate and initialize stimer clockevents earlier during boot. Update
Hyper-V initialization and the VMbus driver to use this new design. As a
result, the LAPIC timer is no longer used during boot or CPU
onlining/offlining and clockevents_unbind_device() is not called. But
retain the old design as a legacy implementation for older versions of
Hyper-V that don't support Direct Mode.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
2019-11-13 04:11:49 +03:00
|
|
|
hv_stimer_legacy_cleanup(cpu);
|
2015-02-27 22:25:57 +03:00
|
|
|
|
2019-09-06 02:01:15 +03:00
|
|
|
hv_synic_disable_regs(cpu);
|
2016-12-08 01:53:11 +03:00
|
|
|
|
|
|
|
return 0;
|
2009-07-14 03:02:34 +04:00
|
|
|
}
|