WSL2-Linux-Kernel/kernel
Stephane Eranian e050e3f0a7 perf: Fix broken interrupt rate throttling
This patch fixes the sampling interrupt throttling mechanism.

It was broken in v3.2. Events were not being unthrottled. The
unthrottling mechanism required that events be checked at each
timer tick.

This patch solves this problem and also separates:

  - unthrottling
  - multiplexing
  - frequency-mode period adjustments

Not all of them need to be executed at each timer tick.

This third version of the patch is based on my original patch +
PeterZ proposal (https://lkml.org/lkml/2012/1/7/87).

At each timer tick, for each context:

  - if the current CPU has throttled events, we unthrottle events

  - if context has frequency-based events, we adjust sampling periods

  - if we have reached the jiffies interval, we multiplex (rotate)

We decoupled rotation (multiplexing) from frequency-mode sampling
period adjustments.  They should not necessarily happen at the same
rate. Multiplexing is subject to jiffies_interval (currently at 1
but could be higher once the tunable is exposed via sysfs).

We have grouped frequency-mode adjustment and unthrottling into the
same routine to minimize code duplication. When throttled while in
frequency mode, we scan the events only once.

We have fixed the threshold enforcement code in __perf_event_overflow().
There was a bug whereby it would allow more than the authorized rate
because an increment of hwc->interrupts was not executed at the right
place.

The patch was tested with low sampling limit (2000) and fixed periods,
frequency mode, overcommitted PMU.

On a 2.1GHz AMD CPU:

 $ cat /proc/sys/kernel/perf_event_max_sample_rate
 2000

We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000):

 $ perf record -e cycles,cycles -c 700000  noploop 10
 $ perf report -D | tail -21

 Aggregated stats:
           TOTAL events:      80086
            MMAP events:         88
            COMM events:          2
            EXIT events:          4
        THROTTLE events:      19996
      UNTHROTTLE events:      19996
          SAMPLE events:      40000

 cycles stats:
           TOTAL events:      40006
            MMAP events:          5
            COMM events:          1
            EXIT events:          4
        THROTTLE events:       9998
      UNTHROTTLE events:       9998
          SAMPLE events:      20000

 cycles stats:
           TOTAL events:      39996
        THROTTLE events:       9998
      UNTHROTTLE events:       9998
          SAMPLE events:      20000

For 10s, the cap is 2x2000x10 = 40000 samples.
We get exactly that: 20000 samples/event.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org> # v3.2+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120126160319.GA5655@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-27 12:06:39 +01:00
..
debug module: struct module_ref should contains long fields 2012-01-13 09:32:14 +10:30
events perf: Fix broken interrupt rate throttling 2012-01-27 12:06:39 +01:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq module_param: make bool parameters really bool (core code) 2012-01-13 09:32:18 +10:30
power PM / Hibernate: Correct additional pages number calculation 2012-01-19 23:23:10 +01:00
sched kernel-doc: fix kernel-doc warnings in sched 2012-01-23 08:44:54 -08:00
time Merge branch 'rcu/fixes-for-v3.2' into rcu/urgent 2012-01-16 09:41:18 -08:00
trace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-15 11:26:35 -08:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt sched: Isolate preempt counting in its own config option 2011-06-10 15:15:40 +02:00
Makefile PM: Make sysrq-o be available for CONFIG_PM unset 2012-01-14 00:33:03 +01:00
acct.c Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-01-08 12:19:57 -08:00
async.c kernel/async: remove redundant declaration. 2012-01-13 09:32:18 +10:30
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit 2012-01-17 16:41:31 -08:00
audit.h audit: remove AUDIT_SETUP_CONTEXT as it isn't used 2012-01-17 16:16:57 -05:00
audit_tree.c audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu() 2011-07-20 14:10:11 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
auditfilter.c audit: allow interfield comparison in audit rules 2012-01-17 16:17:01 -05:00
auditsc.c kernel-doc: fix new warnings in auditsc.c 2012-01-23 08:44:53 -08:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Revert "capabitlies: ns_capable can use the cap helpers rather than lsm call" 2012-01-17 10:19:41 -08:00
cgroup.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
cgroup_freezer.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
compat.c kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure 2011-10-31 19:30:05 -04:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm 2012-01-08 13:10:57 -08:00
cpu_pm.c cpu_pm: call notifiers during suspend 2011-09-23 12:05:29 +05:30
cpuset.c Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-01-09 12:59:24 -08:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
elfcore.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit 2012-01-17 16:41:31 -08:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit 2012-01-17 16:41:31 -08:00
freezer.c freezer: kill unused set_freezable_with_signal() 2011-11-23 09:28:17 -08:00
futex.c futex: Fix uninterruptible loop due to gate_area 2011-12-31 11:48:28 -08:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
groups.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
hrtimer.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2011-11-28 08:43:52 -08:00
hung_task.c hung_task: fix false positive during vfork 2012-01-03 16:14:32 -08:00
irq_work.c kernel: fix two implicit header assumptions in irq_work.c 2011-10-31 09:20:12 -04:00
itimer.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
jump_label.c Merge remote-tracking branch 'tip/perf/core' into kvm-updates/3.3 2011-12-27 11:22:24 +02:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
kexec.c kdump: crashk_res init check for /sys/kernel/kexec_crash_size 2012-01-12 20:13:11 -08:00
kfifo.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
kmod.c Merge branch 'pm-sleep' into pm-for-linus 2011-12-25 23:42:20 +01:00
kprobes.c kprobes: initialize before using a hlist 2012-01-23 08:38:48 -08:00
ksysfs.c kernel: ksysfs.c is implicitly using stat.h 2011-10-31 09:20:13 -04:00
kthread.c freezer: kill unused set_freezable_with_signal() 2011-11-23 09:28:17 -08:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-06 08:02:58 -08:00
lockdep_internals.h
lockdep_proc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep_states.h
module.c error: implicit declaration of function 'module_flags_taint' 2012-01-15 16:21:07 -08:00
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
padata.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
panic.c panic: don't print redundant backtraces on oops 2012-01-12 20:13:11 -08:00
params.c module_param: avoid bool abuse, add bint for special cases. 2012-01-13 09:32:17 +10:30
pid.c sysctl: add the kernel.ns_last_pid control 2012-01-12 20:13:11 -08:00
pid_namespace.c sysctl: add the kernel.ns_last_pid control 2012-01-12 20:13:11 -08:00
posix-cpu-timers.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
posix-timers.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
printk.c module_param: make bool parameters really bool (core code) 2012-01-13 09:32:18 +10:30
profile.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ptrace.c Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security 2012-01-14 18:36:33 -08:00
range.c range: fix bogus misuse of module.h to get printk() 2011-10-31 09:20:11 -04:00
rcu.h rcu: Deconfuse dynticks entry-exit tracing 2011-12-11 10:31:42 -08:00
rcupdate.c rcu: Detect illegal rcu dereference in extended quiescent state 2011-12-11 10:31:30 -08:00
rcutiny.c rcu: Augment rcu_batch_end tracing for idle and callback state 2011-12-11 10:32:22 -08:00
rcutiny_plugin.h rcu: Apply ACCESS_ONCE() to rcu_boost() return value 2011-12-11 10:33:19 -08:00
rcutorture.c rcu: Add missing __cpuinit annotation in rcutorture code 2012-01-16 09:44:05 -08:00
rcutree.c rcu: Augment rcu_batch_end tracing for idle and callback state 2011-12-11 10:32:22 -08:00
rcutree.h rcu: Keep invoking callbacks if CPU otherwise idle 2011-12-11 10:32:09 -08:00
rcutree_plugin.h rcu: Apply ACCESS_ONCE() to rcu_boost() return value 2011-12-11 10:33:19 -08:00
rcutree_trace.c rcu: Track idleness independent of idle tasks 2011-12-11 10:31:24 -08:00
relay.c switch debugfs to umode_t 2012-01-03 22:54:56 -05:00
res_counter.c net: introduce res_counter_charge_nofail() for socket allocations 2012-01-22 15:08:46 -05:00
resource.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
rtmutex-debug.c lockdep, rtmutex, bug: Show taint flags on error 2011-12-06 08:16:49 +01:00
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: convert sysdev_class to a regular subsystem 2011-12-14 14:54:22 -08:00
rtmutex.c Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" 2011-12-11 10:33:18 -08:00
rtmutex.h
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rwsem.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
seccomp.c seccomp: audit abnormal end to a process due to seccomp 2012-01-17 16:16:55 -05:00
semaphore.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
signal.c user namespace: make signal.c respect user namespaces 2012-01-10 16:30:54 -08:00
smp.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
softirq.c rcu: Fix early call to rcu_idle_enter() 2011-12-11 10:31:38 -08:00
spinlock.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
srcu.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys.c c/r: prctl: add PR_SET_MM codes to set up mm_struct entries 2012-01-12 20:13:13 -08:00
sys_ni.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
sysctl.c x86: Panic on detection of stack overflow 2011-12-05 11:37:47 +01:00
sysctl_binary.c binary_sysctl(): fix memory leak 2011-12-20 10:25:04 -08:00
sysctl_check.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
taskstats.c Make TASKSTATS require root access 2011-09-19 17:04:37 -07:00
test_kprobes.c
time.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
timeconst.pl
timer.c Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-01-06 07:53:34 -08:00
tracepoint.c tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT 2012-01-16 11:35:57 -05:00
tsacct.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user-return-notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user_namespace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
utsname.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
utsname_sysctl.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
wait.c lockdep/waitqueues: Add better annotation 2011-12-21 10:07:39 +01:00
watchdog.c watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL 2011-10-31 17:30:53 -07:00
workqueue.c workqueue: make alloc_workqueue() take printf fmt and args for name 2012-01-10 16:30:54 -08:00
workqueue_sched.h