WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Frederic Weisbecker	ee375328f5	posix-cpu-timers: Recalc next expiration when timer_settime() ends up not queueing There are several scenarios that can result in posix_cpu_timer_set() not queueing the timer but still leaving the threadgroup cputime counter running or keeping the tick dependency around for a random amount of time. 1) If timer_settime() is called with a 0 expiration on a timer that is already disabled, the process wide cputime counter will be started and won't ever get a chance to be stopped by stop_process_timer() since no timer is actually armed to be processed. The following snippet is enough to trigger the issue. void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, TIMER_ABSTIME, &val, NULL); timer_delete(id); } 2) If timer_settime() is called with a 0 expiration on a timer that is already armed, the timer is dequeued but not really disarmed. So the process wide cputime counter and the tick dependency may still remain a while around. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } 3) If the timer was initially deactivated, this call to timer_settime() with an early expiration may have started the process wide cputime counter even though the timer hasn't been queued and armed because it has fired early and inline within posix_cpu_timer_set() itself. As a result the process wide cputime counter may never stop until a new timer is ever armed in the future. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } 4) If the timer was initially armed with a former expiration value before this call to timer_settime() and the current call sets an early deadline that has already expired, the timer fires inline within posix_cpu_timer_set(). In this case it must have been dequeued before firing inline with its new expiration value, yet it hasn't been disarmed in this case. So the process wide cputime counter and the tick dependency may still be around for a while even after the timer fired. The following code snippet can reproduce this: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; signal(SIGALRM, SIG_IGN); timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); val.it_value.tv_sec = 100; timer_settime(id, TIMER_ABSTIME, &val, NULL); val.it_value.tv_sec = 0; val.it_value.tv_nsec = 1; timer_settime(id, TIMER_ABSTIME, &val, NULL); } Fix all these issues with triggering the related base next expiration recalculation on the next tick. This also implies to re-evaluate the need to keep around the process wide cputime counter and the tick dependency, in a similar fashion to disarm_timer(). Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-7-frederic@kernel.org	2021-08-10 17:09:59 +02:00
Frederic Weisbecker	175cc3ab28	posix-cpu-timers: Force next_expiration recalc after timer deletion A timer deletion only dequeues the timer but it doesn't shutdown the related costly process wide cputimer counter and the tick dependency. The following code snippet keeps this overhead around for one week after the timer deletion: void trigger_process_counter(void) { timer_t id; struct itimerspec val = { }; val.it_value.tv_sec = 604800; timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id); timer_settime(id, 0, &val, NULL); timer_delete(id); } Make sure the next target's tick recalculates the nearest expiration and clears the process wide counter and tick dependency if necessary. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210726125513.271824-3-frederic@kernel.org	2021-08-10 17:09:59 +02:00
Thomas Gleixner	1fb497dd00	posix-cpu-timers: Provide mechanisms to defer timer handling to task_work Running posix CPU timers in hard interrupt context has a few downsides: - For PREEMPT_RT it cannot work as the expiry code needs to take sighand lock, which is a 'sleeping spinlock' in RT. The original RT approach of offloading the posix CPU timer handling into a high priority thread was clumsy and provided no real benefit in general. - For fine grained accounting it's just wrong to run this in context of the timer interrupt because that way a process specific CPU time is accounted to the timer interrupt. - Long running timer interrupts caused by a large amount of expiring timers which can be created and armed by unpriviledged user space. There is no hard requirement to expire them in interrupt context. If the signal is targeted at the task itself then it won't be delivered before the task returns to user space anyway. If the signal is targeted at a supervisor process then it might be slightly delayed, but posix CPU timers are inaccurate anyway due to the fact that they are tied to the tick. Provide infrastructure to schedule task work which allows splitting the posix CPU timer code into a quick check in interrupt context and a thread context expiry and signal delivery function. This has to be enabled by architectures as it requires that the architecture specific KVM implementation handles pending task work before exiting to guest mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de	2020-08-06 16:50:59 +02:00
Eric W. Biederman	55e8c8eb2c	posix-cpu-timers: Store a reference to a pid not a task posix cpu timers do not handle the death of a process well. This is most clearly seen when a multi-threaded process calls exec from a thread that is not the leader of the thread group. The posix cpu timer code continues to pin the old thread group leader and is unable to find the siglock from there. This results in posix_cpu_timer_del being unable to delete a timer, posix_cpu_timer_set being unable to set a timer. Further to compensate for the problems in posix_cpu_timer_del on a multi-threaded exec all timers that point at the multi-threaded task are stopped. The code for the timers fundamentally needs to check if the target process/thread is alive. This needs an extra level of indirection. This level of indirection is already available in struct pid. So replace cpu.task with cpu.pid to get the needed extra layer of indirection. In addition to handling things more cleanly this reduces the amount of memory a timer can pin when a process exits and then is reaped from a task_struct to the vastly smaller struct pid. Fixes: `e0a7021710` ("posix-cpu-timers: workaround to suppress the problems with mt exec") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87wo86tz6d.fsf@x220.int.ebiederm.org	2020-03-04 09:54:55 +01:00
Thomas Gleixner	00d9e47f8e	posix-cpu-timers: Always clear head pointer on dequeue The head pointer in struct cpu_timer is checked to be NULL in posix_cpu_timer_del() when the delete raced with the exit cleanup. The works correctly as long as the timer is actually dequeued via posix_cpu_timers_exit*(). But if the timer was dequeued due to expiry the head pointer is still set and triggers the warning. In fact keeping the head pointer around after any dequeue is pointless as is has no meaning at all after that. Clear the head pointer always on dequeue and remove the unused requeue function while at it. Fixes: `60bda037f1` ("posix-cpu-timers: Utilize timerqueue for storage") Reported-by: syzbot+55acd54b57bb4b3840a4@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190905120539.707986830@linutronix.de	2019-09-05 21:16:22 +02:00
Thomas Gleixner	8f2edb4a78	posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build The rework of the posix-cpu-timers patch series dropped the empty declaration of struct cpu_timer for the CONFIG_POSIX_TIMERS=n case which causes the build to fail: ./include/linux/posix-timers.h:218:20: error: field 'cpu' has incomplete type Add it back. Fixes: `60bda037f1` ("posix-cpu-timers: Utilize timerqueue for storage") Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2019-08-29 08:25:21 +02:00
Thomas Gleixner	60bda037f1	posix-cpu-timers: Utilize timerqueue for storage Using a linear O(N) search for timer insertion affects execution time and D-cache footprint badly with a larger number of timers. Switch the storage to a timerqueue which is already used for hrtimers and alarmtimers. It does not affect the size of struct k_itimer as it.alarm is still larger. The extra list head for the expiry list will go away later once the expiry is moved into task work context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908272129220.1939@nanos.tec.linutronix.de	2019-08-28 11:50:43 +02:00
Thomas Gleixner	244d49e306	posix-cpu-timers: Move state tracking to struct posix_cputimers Put it where it belongs and clean up the ifdeffery in fork completely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190821192922.743229404@linutronix.de	2019-08-28 11:50:42 +02:00
Thomas Gleixner	2bbdbdae05	posix-cpu-timers: Get rid of zero checks Deactivation of the expiry cache is done by setting all clock caches to 0. That requires to have a check for zero in all places which update the expiry cache: if (cache == 0 \|\| new < cache) cache = new; Use U64_MAX as the deactivated value, which allows to remove the zero checks when updating the cache and reduces it to the obvious check: if (new < cache) cache = new; This also removes the weird workaround in do_prlimit() which was required to convert a RLIMIT_CPU value of 0 (immediate expiry) to 1 because handing in 0 to the posix CPU timer code would have effectively disarmed it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192922.275086128@linutronix.de	2019-08-28 11:50:40 +02:00
Thomas Gleixner	87dc64480f	posix-cpu-timers: Restructure expiry array Now that the abused struct task_cputime is gone, it's more natural to bundle the expiry cache and the list head of each clock into a struct and have an array of those structs. Follow the hrtimer naming convention of 'bases' and rename the expiry cache to 'nextevt' and adapt all usage sites. Generates also better code .text size shrinks by 80 bytes. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908262021140.1939@nanos.tec.linutronix.de	2019-08-28 11:50:39 +02:00
Thomas Gleixner	46b883995c	posix-cpu-timers: Remove cputime_expires The last users of the magic struct cputime based expiry cache are gone. Remove the leftovers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.790209622@linutronix.de	2019-08-28 11:50:38 +02:00
Thomas Gleixner	bbc9bae1e4	posix-cpu-timers: Remove the odd field rename defines The last users of the odd define based renaming of struct task_cputime members are gone. Good riddance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.499058279@linutronix.de	2019-08-28 11:50:37 +02:00
Thomas Gleixner	11b8462f7e	posix-cpu-timers: Provide array based access to expiry cache Using struct task_cputime for the expiry cache is a pretty odd choice and comes with magic defines to rename the fields for usage in the expiry cache. struct task_cputime is basically a u64 array with 3 members, but it has distinct members. The expiry cache content is different than the content of task_cputime because expiry[PROF] = task_cputime.stime + task_cputime.utime expiry[VIRT] = task_cputime.utime expiry[SCHED] = task_cputime.sum_exec_runtime So there is no direct mapping between task_cputime and the expiry cache and the #define based remapping is just a horrible hack. Having the expiry cache array based allows further simplification of the expiry code. To avoid an all in one cleanup which is hard to review add a temporary anonymous union into struct task_cputime which allows array based access to it. That requires to reorder the members. Add a build time sanity check to validate that the members are at the same place. The union and the build time checks will be removed after conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.105793824@linutronix.de	2019-08-28 11:50:35 +02:00
Thomas Gleixner	3a245c0f11	posix-cpu-timers: Move expiry cache into struct posix_cputimers The expiry cache belongs into the posix_cputimers container where the other cpu timers information is. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192921.014444012@linutronix.de	2019-08-28 11:50:35 +02:00
Thomas Gleixner	2b69942f90	posix-cpu-timers: Create a container struct Per task/process data of posix CPU timers is all over the place which makes the code hard to follow and requires ifdeffery. Create a container to hold all this information in one place, so data is consolidated and the ifdeffery can be confined to the posix timer header file and removed from places like fork. As a first step, move the cpu_timers list head array into the new struct and clean up the initializers and simplify fork. The remaining #ifdef in fork will be removed later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190821192920.819418976@linutronix.de	2019-08-28 11:50:33 +02:00
Thomas Gleixner	dce3e8fd03	posix-cpu-timers: Remove tsk argument from run_posix_cpu_timers() It's always current. Don't give people wrong ideas. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190819143801.945469967@linutronix.de	2019-08-21 20:27:16 +02:00
Thomas Gleixner	ce03f61346	posix-timers: Cleanup forward declarations and includes - Rename struct siginfo to kernel_siginfo as that is used and required - Add a forward declaration for task_struct and remove sched.h include - Remove timex.h include as it is not needed Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/20190819143801.472005793@linutronix.de	2019-08-20 22:09:52 +02:00
Sebastian Andrzej Siewior	5d99b32a00	posix-timers: Move rcu_head out of it union Timer deletion on PREEMPT_RT is prone to priority inversion and live locks. The hrtimer code has a synchronization mechanism for this. Posix CPU timers will grow one. But that mechanism cannot be invoked while holding the k_itimer lock because that can deadlock against the running timer callback. So the lock must be dropped which allows the timer to be freed. The timer free can be prevented by taking RCU readlock before dropping the lock, but because the rcu_head is part of the 'it' union a concurrent free will overwrite the hrtimer on which the task is trying to synchronize. Move the rcu_head out of the union to prevent this. [ tglx: Fixed up kernel-doc. Rewrote changelog ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.965541887@linutronix.de	2019-08-01 20:51:25 +02:00
Thomas Gleixner	16118794ed	posix-cpu-timers: Remove private interval storage Posix CPU timers store the interval in private storage for historical reasons (it_interval used to be a non scalar representation on 32bit systems). This is gone and there is no reason for duplicated storage anymore. Use it_interval everywhere. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "H.J. Lu" <hjl.tools@gmail.com> Link: https://lkml.kernel.org/r/20190111133500.945255655@linutronix.de	2019-01-15 16:36:13 +01:00
Eric W. Biederman	ae7795bc61	signal: Distinguish between kernel_siginfo and siginfo Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-10-03 16:47:43 +02:00
Thomas Gleixner	78c9c4dfbf	posix-timers: Sanitize overrun handling The posix timer overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. The k_clock::timer_forward() callbacks return a 64 bit value now. Make k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal accounting is correct. 3Remove the temporary (int) casts. Add a helper function which clamps the overrun value returned to user space via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value between 0 and INT_MAX. INT_MAX is an indicator for user space that the overrun value has been clamped. Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de	2018-07-02 11:33:25 +02:00
Nick Desaulniers	29f1b2b0fe	posix-timers: Prevent UB from shifting negative signed value Shifting a negative signed number is undefined behavior. Looking at the macros MAKE_PROCESS_CPUCLOCK and FD_TO_CLOCKID, it seems that the subexpression: (~(clockid_t) (pid) << 3) where clockid_t resolves to a signed int, which once negated, is undefined behavior to shift the value of if the results thus far are negative. It was further suggested to make these macros into inline functions. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dimitri Sivanich <sivanich@hpe.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-kselftest@vger.kernel.org Cc: Shuah Khan <shuah@kernel.org> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Link: https://lkml.kernel.org/r/1514517100-18051-1-git-send-email-nick.desaulniers@gmail.com	2018-01-04 14:57:10 +01:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
Deepa Dinamani	d5b7ffbfbd	time: introduce {get,put}_itimerspec64 As we change the user space type for the timerfd and posix timer functions to newer data types, we need some form of conversion helpers to avoid duplicating that logic. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-06-25 21:58:46 -04:00
Al Viro	edbeda4632	time/posix-timers: Move the compat copyouts to the nanosleep implementations Turn restart_block.nanosleep.{rmtp,compat_rmtp} into a tagged union (kind = 1 -> native, kind = 2 -> compat, kind = 0 -> nothing) and make the places doing actual copyout handle compat as well as native (that will become a helper in the next commit). Result: compat wrappers, messing with reassignments, etc. are gone. [ tglx: Folded in a variant of Peter Zijlstras enum patch ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170607084241.28657-6-viro@ZenIV.linux.org.uk	2017-06-14 00:00:42 +02:00
Thomas Gleixner	21e55c1f83	posix-timers: Add active flag to k_itimer Keep track of the activation state of posix timers. This is a preparatory change for making common_timer_get() usable by both hrtimer and alarm timer implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.967783982@linutronix.de	2017-06-04 15:40:26 +02:00
Thomas Gleixner	96fe3b072f	posix-timers: Rename do_schedule_next_timer That function is a misnomer. Rename it with a proper prefix to posixtimer_rearm(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.811362578@linutronix.de	2017-06-04 15:40:25 +02:00
Thomas Gleixner	d97bb75ddd	posix-timers: Store k_clock pointer in k_itimer Having the k_clock pointer in the k_itimer struct avoids the lookup in several code pathes and makes the next steps of unification of the hrtimer and alarmtimer based posix timers simpler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.641222072@linutronix.de	2017-06-04 15:40:25 +02:00
Thomas Gleixner	80105cd0e6	posix-timers: Move interval out of the union Preparatory patch to unify the alarm timer and hrtimer based posix interval timer handling. The interval is used as a criteria for rearming decisions so moving it out of the clock specific data structures allows later unification. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.563922908@linutronix.de	2017-06-04 15:40:24 +02:00
Thomas Gleixner	bab0aae9dc	posix-timers: Move posix-timer internals to core None of these declarations is required outside of kernel/time. Move them to an internal header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170530211656.394803853@linutronix.de	2017-06-04 15:40:23 +02:00
Thomas Gleixner	03676b41a8	posix-timers: Cleanup struct k_itimer As a preparation for further changes, cleanup the formatting of the k_itimer structure and add kernel doc comments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20170530211656.316574129@linutronix.de	2017-06-04 15:40:23 +02:00
Christoph Hellwig	31ea70e030	posix-timers: Move the do_schedule_next_timer declaration Having it in asm-generic/siginfo.h doesn't make any sense as it is in no way architecture specific. Move it to posix-timers.h instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-ia64@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: sparclinux@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20170603190102.28866-4-hch@lst.de	2017-06-04 15:11:46 +02:00
Dimitri Sivanich	f822798e3c	posix-timers: Remove mmtimer leftovers After removing mmtimer, the mmtimer struct can be removed from the k_itimer struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russ Anderson <rja@sgi.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Mike Travis <mike.travis@hpe.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170526130534.GE30788@hpe.com	2017-05-27 12:58:46 +02:00
Christoph Hellwig	d3ba5a9a34	posix-timers: Make posix_clocks immutable There are no more modular users providing a posix clock. The register function is now pointless so the posix clock array can be initialized statically at compile time and the array including the various k_clock structs can be marked 'const'. Inspired by changes in the Grsecurity patch set, but done proper. [ tglx: Massaged changelog and fixed the POSIX_TIMER=n case ] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Travis <mike.travis@hpe.com> Cc: Dimitri Sivanich <sivanich@hpe.com> Link: http://lkml.kernel.org/r/20170526090311.3377-3-hch@lst.de	2017-05-27 09:46:35 +02:00
Deepa Dinamani	ad19638463	time: Change k_clock nsleep() to use timespec64 struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Note that the restart_block parameter for nanosleep has also been left unchanged and will be part of syscall series noted above. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-8-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 21:49:56 +02:00
Deepa Dinamani	5f252b3256	time: Change k_clock timer_set() and timer_get() to use timespec64 struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. struct itimerspec internally uses struct timespec. Use struct itimerspec64 which uses struct timespec64. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-7-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 21:49:56 +02:00
Deepa Dinamani	0fe6afe383	time: Change k_clock clock_set() to use timespec64 struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-6-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 21:49:56 +02:00
Deepa Dinamani	d2e3e0ca5d	time: Change k_clock clock_getres() to use timespec64 struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. The clock_getres() interface has also been changed to use timespec64 even though this particular interface is not affected by the y2038 problem. This helps verification for internal kernel code for y2038 readiness by getting rid of time_t/ timeval/ timespec completely. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-5-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 21:49:55 +02:00
Deepa Dinamani	3c9c12f4b4	time: Change k_clock clock_get() to use timespec64 struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-4-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-14 21:49:55 +02:00
Frederic Weisbecker	858cf3a8c5	timers/itimer: Convert internal cputime_t units to nsec Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Also convert itimers to use nsec based internal counters. This simplifies it and removes the whole game with error/inc_error which served to deal with cputime_t random granularity. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-20-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:13:55 +01:00
Frederic Weisbecker	ebd7e7fc4b	timers/posix-timers: Convert internals to use nsecs Use the new nsec based cputime accessors as part of the whole cputime conversion from cputime_t to nsecs. Also convert posix-cpu-timers to use nsec based internal counters to simplify it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1485832191-26889-19-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-02-01 09:13:54 +01:00
Frederic Weisbecker	b78783000d	posix-cpu-timers: Migrate to use new tick dependency mask model Instead of providing asynchronous checks for the nohz subsystem to verify posix cpu timers tick dependency, migrate the latter to the new mask. In order to keep track of the running timers and expose the tick dependency accordingly, we must probe the timers queuing and dequeuing on threads and process lists. Unfortunately it implies both task and signal level dependencies. We should be able to further optimize this and merge all that on the task level dependency, at the cost of a bit of complexity and may be overhead. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2016-03-02 16:44:27 +01:00
Frederic Weisbecker	55ccb616a6	posix_cpu_timer: consolidate expiry time type The posix cpu timer expiry time is stored in a union of two types: a 64 bits field if we rely on scheduler precise accounting, or a cputime_t if we rely on jiffies. This results in quite some duplicate code and special cases to handle the two types. Just unify this into a single 64 bits field. cputime_t can always fit into it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2013-07-03 16:16:20 +02:00
Linus Torvalds	534c97b095	Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull 'full dynticks' support from Ingo Molnar: "This tree from Frederic Weisbecker adds a new, (exciting! :-) core kernel feature to the timer and scheduler subsystems: 'full dynticks', or CONFIG_NO_HZ_FULL=y. This feature extends the nohz variable-size timer tick feature from idle to busy CPUs (running at most one task) as well, potentially reducing the number of timer interrupts significantly. This feature got motivated by real-time folks and the -rt tree, but the general utility and motivation of full-dynticks runs wider than that: - HPC workloads get faster: CPUs running a single task should be able to utilize a maximum amount of CPU power. A periodic timer tick at HZ=1000 can cause a constant overhead of up to 1.0%. This feature removes that overhead - and speeds up the system by 0.5%-1.0% on typical distro configs even on modern systems. - Real-time workload latency reduction: CPUs running critical tasks should experience as little jitter as possible. The last remaining source of kernel-related jitter was the periodic timer tick. - A single task executing on a CPU is a pretty common situation, especially with an increasing number of cores/CPUs, so this feature helps desktop and mobile workloads as well. The cost of the feature is mainly related to increased timer reprogramming overhead when a CPU switches its tick period, and thus slightly longer to-idle and from-idle latency. Configuration-wise a third mode of operation is added to the existing two NOHZ kconfig modes: - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named as a config option. This is the traditional Linux periodic tick design: there's a HZ tick going on all the time, regardless of whether a CPU is idle or not. - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the periodic tick when a CPU enters idle mode. - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the tick when a CPU is idle, also slows the tick down to 1 Hz (one timer interrupt per second) when only a single task is running on a CPU. The .config behavior is compatible: existing !CONFIG_NO_HZ and CONFIG_NO_HZ=y settings get translated to the new values, without the user having to configure anything. CONFIG_NO_HZ_FULL is turned off by default. This feature is based on a lot of infrastructure work that has been steadily going upstream in the last 2-3 cycles: related RCU support and non-periodic cputime support in particular is upstream already. This tree adds the final pieces and activates the feature. The pull request is marked RFC because: - it's marked 64-bit only at the moment - the 32-bit support patch is small but did not get ready in time. - it has a number of fresh commits that came in after the merge window. The overwhelming majority of commits are from before the merge window, but still some aspects of the tree are fresh and so I marked it RFC. - it's a pretty wide-reaching feature with lots of effects - and while the components have been in testing for some time, the full combination is still not very widely used. That it's default-off should reduce its regression abilities and obviously there are no known regressions with CONFIG_NO_HZ_FULL=y enabled either. - the feature is not completely idempotent: there is no 100% equivalent replacement for a periodic scheduler/timer tick. In particular there's ongoing work to map out and reduce its effects on scheduler load-balancing and statistics. This should not impact correctness though, there are no known regressions related to this feature at this point. - it's a pretty ambitious feature that with time will likely be enabled by most Linux distros, and we'd like you to make input on its design/implementation, if you dislike some aspect we missed. Without flaming us to crisp! :-) Future plans: - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off the periodic tick altogether when there's a single busy task on a CPU. We'd first like 1 Hz to be exposed more widely before we go for the 0 Hz target though. - once we reach 0 Hz we can remove the periodic tick assumption from nr_running>=2 as well, by essentially interrupting busy tasks only as frequently as the sched_latency constraints require us to do - once every 4-40 msecs, depending on nr_running. I am personally leaning towards biting the bullet and doing this in v3.10, like the -rt tree this effort has been going on for too long - but the final word is up to you as usual. More technical details can be found in Documentation/timers/NO_HZ.txt" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) sched: Keep at least 1 tick per second for active dynticks tasks rcu: Fix full dynticks' dependency on wide RCU nocb mode nohz: Protect smp_processor_id() in tick_nohz_task_switch() nohz_full: Add documentation. cputime_nsecs: use math64.h for nsec resolution conversion helpers nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config nohz: Reduce overhead under high-freq idling patterns nohz: Remove full dynticks' superfluous dependency on RCU tree nohz: Fix unavailable tick_stop tracepoint in dynticks idle nohz: Add basic tracing nohz: Select wide RCU nocb for full dynticks nohz: Disable the tick when irq resume in full dynticks CPU nohz: Re-evaluate the tick for the new task after a context switch nohz: Prepare to stop the tick on irq exit nohz: Implement full dynticks kick nohz: Re-evaluate the tick from the scheduler IPI sched: New helper to prevent from stopping the tick in full dynticks sched: Kick full dynticks CPU that have more than one task enqueued. perf: New helper to prevent full dynticks CPUs from stopping tick perf: Kick full dynticks CPU if events rotation is needed ...	2013-05-05 13:23:27 -07:00
Frederic Weisbecker	555347f6c0	posix_timers: New API to prevent from stopping the tick when timers are running Bring a new helper that the full dynticks infrastructure can call in order to know if it can safely stop the tick from the posix cpu timers subsystem point of view. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2013-04-19 16:31:40 +02:00
Pavel Emelyanov	5ed67f05f6	posix timers: Allocate timer id per process (v2) Currently kernel generates IDs for posix timers in a global manner -- there's a kernel-wide IDR tree from which IDs are created. This makes it impossible to recreate a timer with a desired ID (in particular this is done by the CRIU checkpoint-restore project) -- since these IDs are global it may happen, that at the time we recreate a timer, the ID we want for it is already busy by some other timer. In order to address this, replace the IDR tree with a global hash table for timers and makes timer IDs unique per signal_struct (to which timers are linked anyway). With this, two timers belonging to different processes may have equal IDs and we can recreate either of them with the ID we want. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Link: http://lkml.kernel.org/r/513D9FF5.9010004@parallels.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-04-17 20:51:01 +02:00
John Stultz	9e26476243	alarmtimers: Remove period from alarm structure Now that periodic alarmtimers are managed by the handler function, remove the period value from the alarm structure and let the handlers manage the interval on their own. CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2011-08-10 14:55:26 -07:00
Eric Dumazet	8af088710d	posix-timers: RCU conversion Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard a single spinlock (idr_lock) in posix-timers code, on its 48 core machine. Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time used in ticket_spin_lock, from lock_timer Ref: http://www.spinics.net/lists/kvm/msg51526.html Switching to RCU is quite easy, IDR being already RCU ready. idr_lock should be locked only for an insert/delete, not a lookup. Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime(). Before : real 1m18.669s user 0m1.346s sys 1m17.180s After : real 0m3.296s user 0m1.366s sys 0m1.926s Reported-by: Ben Nagy <ben@iagu.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Ben Nagy <ben@iagu.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Richard Cochran <richard.cochran@omicron.at> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-05-24 12:10:51 +02:00
John Stultz	9a7adcf5c6	timers: Posix interface for alarm-timers This patch exposes alarm-timers to userland via the posix clock and timers interface, using two new clockids: CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM. Both clockids behave identically to CLOCK_REALTIME and CLOCK_BOOTTIME, respectively, but timers set against the _ALARM suffixed clockids will wake the system if it is suspended. Some background can be found here: https://lwn.net/Articles/429925/ The concept for Alarm-timers was inspired by the Android Alarm driver (by Arve Hjønnevåg) found in the Android kernel tree. See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36 While the in-kernel interface is pretty similar between alarm-timers and Android alarm driver, the user-space interface for the Android alarm driver is via ioctls to a new char device. As mentioned above, I've instead chosen to export this functionality via the posix interface, as it seemed a little simpler and avoids creating duplicate interfaces to things like CLOCK_REALTIME and CLOCK_MONOTONIC under alternate names (ie:ANDROID_ALARM_RTC and ANDROID_ALARM_SYSTEMTIME). The semantics of the Android alarm driver are different from what this posix interface provides. For instance, threads other then the thread waiting on the Android alarm driver are able to modify the alarm being waited on. Also this interface does not allow the same wakelock semantics that the Android driver provides (ie: kernel takes a wakelock on RTC alarm-interupt, and holds it through process wakeup, and while the process runs, until the process either closes the char device or calls back in to wait on a new alarm). One potential way to implement similar semantics may be via the timerfd infrastructure, but this needs more research. There may also need to be some sort of sysfs system level policy hooks that allow alarm timers to be disabled to keep them from firing at inappropriate times (ie: laptop in a well insulated bag, mid-flight). CC: Arve Hjønnevåg <arve@android.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2011-04-26 14:01:46 -07:00
Richard Cochran	0606f422b4	posix clocks: Introduce dynamic clocks This patch adds support for adding and removing posix clocks. The clock lifetime cycle is patterned after usb devices. Each clock is represented by a standard character device. In addition, the driver may optionally implement custom character device operations. The posix clock and timer system calls listed below now work with dynamic posix clocks, as well as the traditional static clocks. The following system calls are affected: - clock_adjtime (brand new syscall) - clock_gettime - clock_getres - clock_settime - timer_create - timer_delete - timer_gettime - timer_settime [ tglx: Adapted to the posix-timer cleanup. Moved clock_posix_dynamic to posix-clock.c and made all referenced functions static ] Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20110201134420.164172635@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-02-02 15:28:20 +01:00

1 2

70 Коммитов