WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Heiko Carstens	457f218095	s390/uaccess: rework uaccess code - fix locking issues The current uaccess code uses a page table walk in some circumstances, e.g. in case of the in atomic futex operations or if running on old hardware which doesn't support the mvcos instruction. However it turned out that the page table walk code does not correctly lock page tables when accessing page table entries. In other words: a different cpu may invalidate a page table entry while the current cpu inspects the pte. This may lead to random data corruption. Adding correct locking however isn't trivial for all uaccess operations. Especially copy_in_user() is problematic since that requires to hold at least two locks, but must be protected against ABBA deadlock when a different cpu also performs a copy_in_user() operation. So the solution is a different approach where we change address spaces: User space runs in primary address mode, or access register mode within vdso code, like it currently already does. The kernel usually also runs in home space mode, however when accessing user space the kernel switches to primary or secondary address mode if the mvcos instruction is not available or if a compare-and-swap (futex) instruction on a user space address is performed. KVM however is special, since that requires the kernel to run in home address space while implicitly accessing user space with the sie instruction. So we end up with: User space: - runs in primary or access register mode - cr1 contains the user asce - cr7 contains the user asce - cr13 contains the kernel asce Kernel space: - runs in home space mode - cr1 contains the user or kernel asce -> the kernel asce is loaded when a uaccess requires primary or secondary address mode - cr7 contains the user or kernel asce, (changed with set_fs()) - cr13 contains the kernel asce In case of uaccess the kernel changes to: - primary space mode in case of a uaccess (copy_to_user) and uses e.g. the mvcp instruction to access user space. However the kernel will stay in home space mode if the mvcos instruction is available - secondary space mode in case of futex atomic operations, so that the instructions come from primary address space and data from secondary space In case of kvm the kernel runs in home space mode, but cr1 gets switched to contain the gmap asce before the sie instruction gets executed. When the sie instruction is finished cr1 will be switched back to contain the user asce. A context switch between two processes will always load the kernel asce for the next process in cr1. So the first exit to user space is a bit more expensive (one extra load control register instruction) than before, however keeps the code rather simple. In sum this means there is no need to perform any error prone page table walks anymore when accessing user space. The patch seems to be rather large, however it mainly removes the the page table walk code and restores the previously deleted "standard" uaccess code, with a couple of changes. The uaccess without mvcos mode can be enforced with the "uaccess_primary" kernel parameter. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-04-03 14:31:04 +02:00
Heiko Carstens	db85eaeb52	s390/bitops: fix comment Fix some numbers in the comments describing the layout of the bit maps. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:20 +01:00
Heiko Carstens	56f15e518c	s390/uaccess: introduce 'uaccesspt' kernel parameter The uaccesspt kernel parameter allows to enforce using the uaccess page table walk variant. This is mainly for debugging purposes, so this mode can also be enabled on machines which support the mvcos instruction. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:17 +01:00
Heiko Carstens	ca04ddbf53	s390/setup: get rid of MACHINE_HAS_MVCOS machine flag MACHINE_HAS_MVCOS is used exactly once when the machine is brought up. There is no need to cache the flag in the machine_flags. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:15 +01:00
Heiko Carstens	211deca6bf	s390/uaccess: consistent types The types 'size_t' and 'unsigned long' have been used randomly for the uaccess functions. This looks rather confusing. So let's change all functions to use unsigned long instead and get rid of size_t in order to have a consistent interface. The only exception is strncpy_from_user() which uses 'long' since it may return a signed value (-EFAULT). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:15 +01:00
Heiko Carstens	4f41c2b456	s390/uaccess: get rid of indirect function calls There are only two uaccess variants on s390 left: the version that is used if the mvcos instruction is available, and the page table walk variant. So there is no need for expensive indirect function calls. By default the mvcos variant will be called. If the mvcos instruction is not available it will call the page table walk variant. For minimal performance impact the "if (mvcos_is_available)" is implemented with a jump label, which will be a six byte nop on machines with mvcos. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:14 +01:00
Heiko Carstens	cfa785e623	s390/uaccess: normalize order of parameters of indirect uaccess function calls For some unknown reason the indirect uaccess functions on s390 implement a different parameter order than what is usual. e.g.: unsigned long copy_to_user(void to, const void from, unsigned long n); vs. size_t (copy_to_user)(size_t n, void __user to, const void *from); Let's get rid of this confusing parameter reordering. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-02-21 08:50:13 +01:00
Heiko Carstens	0ff2fe5236	s390/uaccess: remove dead extern declarations, make functions static Remove some dead uaccess extern declarations and also make some functions static, since they are only used locally. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-01-22 14:02:17 +01:00
Heiko Carstens	b03b467944	s390/uaccess: test if current->mm is set before walking page tables If get_fs() == USER_DS we better test if current->mm is not zero before walking page tables. The page table walk code would try to lock mm->page_table_lock, however if mm is zero this might crash. Now it is arguably incorrect trying to access userspace if current->mm is zero, however we have seen that and s390 would be the only architecture which would crash in such a case. So we better make the page table walk code a bit more robust and report always a fault instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-01-22 14:02:16 +01:00
Hendrik Brueckner	b4a960159e	s390: Fix misspellings using 'codespell' tool Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2014-01-16 16:40:13 +01:00
Heiko Carstens	71a86ef055	s390/uaccess: add missing page table walk range check When translating a user space address, the address must be checked against the ASCE limit of the process. If the address is larger than the maximum address that is reachable with the ASCE, an ASCE type exception must be generated. The current code simply ignored the higher order bits. This resulted in an address wrap around in user space instead of an exception in user space. Cc: stable@vger.kernel.org # v3.9+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-11-25 09:15:38 +01:00
Martin Schwidefsky	e258d719ff	s390/uaccess: always run the kernel in home space Simplify the uaccess code by removing the user_mode=home option. The kernel will now always run in the home space mode. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-24 17:16:57 +02:00
Heiko Carstens	7d7c7b24e4	s390/bitops: rename find_first_bit_left() to find_first_bit_inv() find_first_bit_left() and friends have nothing to do with the normal LSB0 bit numbering for big endian machines used in Linux (least significant bit has bit number 0). Instead they use MSB0 bit numbering, where the most signficant bit has bit number 0. So rename find_first_bit_left() and friends to find_first_bit_inv(), to avoid any confusion. Also provide inv versions of set_bit, clear_bit and test_bit. This also removes the confusing use of e.g. set_bit() in airq.c which uses a "be_to_le" bit number conversion, which could imply that instead set_bit_le() could be used. But that is entirely wrong since the _le bitops variant uses yet another bit numbering scheme. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-24 17:16:56 +02:00
Heiko Carstens	746479cdcb	s390/bitops: use generic find bit functions / reimplement _left variant Just like all other architectures we should use out-of-line find bit operations, since the inline variant bloat the size of the kernel image. And also like all other architecures we should only supply optimized variants of the __ffs, ffs, etc. primitives. Therefore this patch removes the inlined s390 find bit functions and uses the generic out-of-line variants instead. The optimization of the primitives follows with the next patch. With this patch also the functions find_first_bit_left() and find_next_bit_left() have been reimplemented, since logically, they are nothing else but a find_first_bit()/find_next_bit() implementation that use an inverted __fls() instead of __ffs(). Also the restriction that these functions only work on machines which support the "flogr" instruction is gone now. This reduces the size of the kernel image (defconfig, -march=z9-109) by 144,482 bytes. Alone the size of the function build_sched_domains() gets reduced from 7 KB to 3,5 KB. We also git rid of unused functions like find_first_bit_le()... Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-24 17:16:55 +02:00
Martin Schwidefsky	8c071b0f19	s390/time: correct use of store clock fast The result of the store-clock-fast (STCKF) instruction is a bit fuzzy. It can happen that the value stored on one CPU is smaller than the value stored on another CPU, although the order of the stores is the other way around. This can cause deltas of get_tod_clock() values to become negative when they should not be. We need to be more careful with store-clock-fast, this patch partially reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time: always use stckf instead of stck if available". The get_tod_clock() function now uses the store-clock-extended (STCKE) instruction. get_tod_clock_fast() can be used if the fuzziness of store-clock-fast is acceptable e.g. for wait loops local to a CPU. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-10-22 09:16:40 +02:00
Martin Schwidefsky	0587d409ec	s390/time: return with irqs disabled from psw_idle Modify the psw_idle waiting logic in entry[64].S to return with interrupts disabled. This avoids potential issues with udelay and interrupt loops as interrupts are not reenabled after clock comparator interrupts. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-08-28 09:19:23 +02:00
Martin Schwidefsky	e509861105	s390/mm: cleanup page table definitions Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-08-22 12:20:06 +02:00
Heiko Carstens	12d8471315	s390/uaccess: add "fallthrough" comments Add "fallthrough" comments so nobody wonders if a break statement is missing. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-05-02 15:50:19 +02:00
Stephen Boyd	446f24d119	Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS The help text for this config is duplicated across the x86, parisc, and s390 Kconfig.debug files. Arnd Bergman noted that the help text was slightly misleading and should be fixed to state that enabling this option isn't a problem when using pre 4.4 gcc. To simplify the rewording, consolidate the text into lib/Kconfig.debug and modify it there to be more explicit about when you should say N to this config. Also, make the text a bit more generic by stating that this option enables compile time checks so we can cover architectures which emit warnings vs. ones which emit errors. The details of how an architecture decided to implement the checks isn't as important as the concept of compile time checking of copy_from_user() calls. While we're doing this, remove all the copy_from_user_overflow() code that's duplicated many times and place it into lib/ so that any architecture supporting this option can get the function for free. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Helge Deller <deller@gmx.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:09 -07:00
Heiko Carstens	ea81531de2	s390/uaccess: fix page table walk When translating user space addresses to kernel addresses the follow_table() function had two bugs: - PROT_NONE mappings could be read accessed via the kernel mapping. That is e.g. putting a filename into a user page, then protecting the page with PROT_NONE and afterwards issuing the "open" syscall with a pointer to the filename would incorrectly succeed. - when walking the page tables it used the pgd/pud/pmd/pte primitives which with dynamic page tables give no indication which real level of page tables is being walked (region2, region3, segment or page table). So in case of an exception the translation exception code passed to __handle_fault() is not necessarily correct. This is not really an issue since __handle_fault() doesn't evaluate the code. Only in case of e.g. a SIGBUS this code gets passed to user space. If user space can do something sane with the value is a different question though. To fix these issues don't use any Linux primitives. Only walk the page tables like the hardware would do it, however we leave quite some checks away since we know that we only have full size page tables and each index is within bounds. In theory this should fix all issues... Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-04-02 08:53:08 +02:00
Heiko Carstens	b7fef2dd72	s390/uaccess: fix clear_user_pt() The page table walker variant of clear_user() is supposed to copy the contents of the empty zero page to user space. However since `238ec4ef` "[S390] zero page cache synonyms" empty_zero_page is not anymore the page itself but contains the pointer to the empty zero pages. Therefore the page table walker variant of clear_user() copied the address of the first empty zero page and afterwards more or less random data to user space instead of clearing the given user space range. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-03-21 13:35:38 +01:00
Heiko Carstens	066c437359	s390/uaccess: fix kernel ds access for page table walk When the kernel resides in home space and the mvcos instruction is not available uaccesses for kernel ds happen via simple strnlen() or memcpy() calls. This however can break badly, since uaccesses in kernel space may fail as well, especially if CONFIG_DEBUG_PAGEALLOC is turned on. To fix this implement strnlen_kernel() and copy_in_kernel() functions which can only be used by the page table uaccess functions. These two functions detect invalid memory accesses and return the correct length of processed data.. Both functions are more or less a copy of the std variants without sacf calls. Fixes ipl crashes on 31 bit machines as well on 64 bit machines without mvcos. Caused by changing the default address space of the kernel being home space. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-28 09:37:12 +01:00
Heiko Carstens	225cf8d69c	s390/uaccess: fix strncpy_from_user string length check The "standard" and page table walk variants of strncpy_from_user() first check the length of the to be copied string in userspace. The string is then copied to kernel space and the length returned to the caller. However userspace can modify the string at any time while the kernel checks for the length of the string or copies the string. In result the returned length of the string is not necessarily correct. Fix this by copying in a loop which mimics the mvcos variant of strncpy_from_user(), which handles this correctly. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-28 09:37:11 +01:00
Heiko Carstens	f45655f6a6	s390/uaccess: fix strncpy_from_user/strnlen_user zero maxlen case If the maximum length specified for the to be accessed string for strncpy_from_user() and strnlen_user() is zero the following incorrect values would be returned or incorrect memory accesses would happen: strnlen_user_std() and strnlen_user_pt() incorrectly return "1" strncpy_from_user_pt() would incorrectly access "dst[maxlen - 1]" strncpy_from_user_mvcos() would incorrectly return "-EFAULT" Fix all these oddities by adding early checks. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-28 09:37:08 +01:00
Heiko Carstens	d7b788cd06	s390/uaccess: shorten strncpy_from_user/strnlen_user Always stay within page boundaries when copying from user within strlen_user_mvcos()/strncpy_from_user_mvcos(). This allows to shorten the code a bit and may prevent unnecessary faults, since we copy quite large amounts of memory to kernel space. Also directly call the mvcos variants of copy_from_user() to avoid indirect branches. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-28 09:37:07 +01:00
Martin Schwidefsky	abf09bed3c	s390/mm: implement software dirty bits The s390 architecture is unique in respect to dirty page detection, it uses the change bit in the per-page storage key to track page modifications. All other architectures track dirty bits by means of page table entries. This property of s390 has caused numerous problems in the past, e.g. see git commit `ef5d437f71` "mm: fix XFS oops due to dirty pages without buffers on s390". To avoid future issues in regard to per-page dirty bits convert s390 to a fault based software dirty bit detection mechanism. All user page table entries which are marked as clean will be hardware read-only, even if the pte is supposed to be writable. A write by the user process will trigger a protection fault which will cause the user pte to be marked as dirty and the hardware read-only bit is removed. With this change the dirty bit in the storage key is irrelevant for Linux as a host, but the storage key is still required for KVM guests. The effect is that page_test_and_clear_dirty and the related code can be removed. The referenced bit in the storage key is still used by the page_test_and_clear_young primitive to provide page age information. For page cache pages of mappings with mapping_cap_account_dirty there will not be any change in behavior as the dirty bit tracking already uses read-only ptes to control the amount of dirty pages. Only for swap cache pages and pages of mappings without mapping_cap_account_dirty there can be additional protection faults. To avoid an excessive number of additional faults the mk_pte primitive checks for PageDirty if the pgprot value allows for writes and pre-dirties the pte. That avoids all additional faults for tmpfs and shmem pages until these pages are added to the swap cache. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-14 15:55:23 +01:00
Heiko Carstens	1aae0560d1	s390/time: rename tod clock access functions Fix name clash with some common code device drivers and add "tod" to all tod clock access function names. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2013-02-14 15:55:10 +01:00
Gerald Schaefer	156152f84e	s390/mm: use pmd_large() instead of pmd_huge() Without CONFIG_HUGETLB_PAGE, pmd_huge() will always return 0. So pmd_large() should be used instead in places where both transparent huge pages and hugetlbfs pages can occur. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-10-26 16:44:23 +02:00
Heiko Carstens	535c611ddd	s390/string: provide asm lib functions for memcpy and memcmp Our memcpy and memcmp variants were implemented by calling the corresponding gcc builtin variants. However gcc is free to replace a call to __builtin_memcmp with a call to memcmp which, when called, will result in an endless recursion within memcmp. So let's provide asm variants and also fix the variants that are used for uncompressing the kernel image. In addition remove all other occurences of builtin function calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-09-26 15:44:50 +02:00
Gerald Schaefer	4db84d4f07	s390/mm: fix user access page-table walk code The s390 page-table walk code, used for user copy and futex, currently cannot handle huge pages. As far as user copy is concerned, that is not really a problem because those functions will only be used on old hardware that has no huge page support. But the futex code will also use pagetable walk functions on current hardware when user space runs in primary space mode. So, if a futex sits in a huge page, the futex operation on it will result in a page fault loop or even data corruption. This patch adds the code for resolving huge page mappings in the user access pagetable walk code on s390. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-09-17 09:44:24 +02:00
Martin Schwidefsky	27f6b41662	s390/vtimer: rework virtual timer interface The current virtual timer interface is inherently per-cpu and hard to use. The sole user of the interface is appldata which uses it to execute a function after a specific amount of cputime has been used over all cpus. Rework the virtual timer interface to hook into the cputime accounting. This makes the interface independent from the CPU timer interrupts, and makes the virtual timers global as opposed to per-cpu. Overall the code is greatly simplified. The downside is that the accuracy is not as good as the original implementation, but it is still good enough for appldata. Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-07-20 11:15:08 +02:00
Heiko Carstens	a53c8fab3f	s390/comments: unify copyright messages and remove file names Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2012-07-20 11:15:04 +02:00
Heiko Carstens	f4815ac6c9	s390/headers: replace __s390x__ with CONFIG_64BIT where possible Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-05-24 10:10:10 +02:00
Martin Schwidefsky	4c1051e37a	[S390] rework idle code Whenever the cpu loads an enabled wait PSW it will appear as idle to the underlying host system. The code in default_idle calls vtime_stop_cpu which does the necessary voodoo to get the cpu time accounting right. The udelay code just loads an enabled wait PSW. To correct this rework the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts to entry[64].S, vtime_stop_cpu can now be called from anywhere and vtime_start_cpu is gone. The correction of the cpu time during wakeup from an enabled wait PSW is done with a critical section in entry[64].S. As vtime_start_cpu is gone, s390_idle_check can be removed as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-03-11 11:59:28 -04:00
Martin Schwidefsky	8b646bd759	[S390] rework smp code Define struct pcpu and merge some of the NR_CPUS arrays into it, including __cpu_logical_map, current_set and smp_cpu_state. Split smp related functions to those operating on physical cpus and the functions operating on a logical cpu number. Make the functions for physical cpus use a pointer to a struct pcpu. This hides the knowledge about cpu addresses in smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header. The PSW restart mechanism is used to start secondary cpus, calling a function on an online cpu, calling a function on the ipl cpu, and for the nmi signal. Replace the different assembler functions with a single function restart_int_handler. The new entry point calls a function whose pointer is stored in the lowcore of the target cpu and it can wait for the source cpu to stop. This covers all existing use cases. Overall the code is now simpler and there are ~380 lines less code. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-03-11 11:59:28 -04:00
Martin Schwidefsky	3c52e49d7c	[S390] sparse: fix sparse warnings with __user pointers Use __force to quiet sparse warnings about user address space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-10-30 15:16:46 +01:00
Martin Schwidefsky	b50511e41a	[S390] cleanup psw related bits and pieces Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros. Change psw_kernel_bits / psw_user_bits to contain only the bits that are always set in the respective mode. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-10-30 15:16:43 +01:00
Jan Glauber	144d634a21	[S390] fix s390 assembler code alignments The alignment is missing for various global symbols in s390 assembly code. With a recent gcc and an instruction like stgrl this can lead to a specification exception if the instruction uses such a mis-aligned address. Specify the alignment explicitely and while add it define __ALIGN for s390 and use the ENTRY define to save some lines of code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-07-24 10:48:21 +02:00
Heiko Carstens	b396637841	[S390] delay: implement ndelay Implement ndelay() on s390 as well. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-05-26 09:48:25 +02:00
Michel Lespinasse	8d7718aa08	futex: Sanitize futex ops argument types Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic prototypes to use u32 types for the futex as this is the data type the futex core code uses all over the place. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311025058.GD26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-11 12:23:31 +01:00
Michel Lespinasse	37a9d912b2	futex: Sanitize cmpxchg_futex_value_locked API The cmpxchg_futex_value_locked API was funny in that it returned either the original, user-exposed futex value OR an error code such as -EFAULT. This was confusing at best, and could be a source of livelocks in places that retry the cmpxchg_futex_value_locked after trying to fix the issue by running fault_in_user_writeable(). This change makes the cmpxchg_futex_value_locked API more similar to the get_futex_value_locked one, returning an error code and updating the original value through a reference argument. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile] Acked-by: Tony Luck <tony.luck@intel.com> [ia64] Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michal Simek <monstr@monstr.eu> [microblaze] Acked-by: David Howells <dhowells@redhat.com> [frv] Cc: Darren Hart <darren@dvhart.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110311024851.GC26122@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-03-11 12:23:08 +01:00
Martin Schwidefsky	e4d82692f4	[S390] missing sacf in uaccess The uaccess functions copy_in_user_std and clear_user_std fail to switch back from secondary space mode to primary space mode with sacf in case of an unresolvable page fault. We need to make sure that the switch back to primary mode is done in all cases, otherwise the code following the uaccess inline assembly will crash. Reported-by: Alexander Graf <agraf@suse.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-01-31 11:30:20 +01:00
Heiko Carstens	545b288dcb	[S390] time: let local_tick_enable/disable() reprogram the clock comparator Let local_tick_enable/disable() reprogram the clock comparator so the function names make semantically more sense. Also that way the functions are more symmetric since normally each local_tick_enable() call usually would have a subsequent call to set_clock_comparator() anyway. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-01-05 12:47:25 +01:00
Heiko Carstens	e8129c6421	[S390] nmi: fix clock comparator revalidation On each machine check all registers are revalidated. The save area for the clock comparator however only contains the upper most seven bytes of the former contents, if valid. Therefore the machine check handler uses a store clock instruction to get the current time and writes that to the clock comparator register which in turn will generate an immediate timer interrupt. However within the lowcore the expected time of the next timer interrupt is stored. If the interrupt happens before that time the handler won't be called. In turn the clock comparator won't be reprogrammed and therefore the interrupt condition stays pending which causes an interrupt loop until the expected time is reached. On NOHZ machines this can result in unresponsive machines since the time of the next expected interrupted can be a couple of days in the future. To fix this just revalidate the clock comparator register with the expected value. In addition the special handling for udelay must be changed as well. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-11-25 09:52:59 +01:00
Heiko Carstens	596a95cdf0	[S390] uaccess: make sure copy_from_user_overflow is builtin If there is no in kernel image caller modules will suffer: ERROR: "copy_from_user_overflow" [net/core/pktgen.ko] undefined! ERROR: "copy_from_user_overflow" [net/can/can-raw.ko] undefined! ERROR: "copy_from_user_overflow" [fs/cifs/cifs.ko] undefined! Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-03-08 12:25:29 +01:00
Gerald Schaefer	59b6978745	[S390] spinlock: check virtual cpu running status This patch introduces a new function that checks the running status of a cpu in a hypervisor. This status is not virtualized, so the check is only correct if running in an LPAR. On acquiring a spinlock, if the cpu holding the lock is scheduled by the hypervisor, we do a busy wait on the lock. If it is not scheduled, we yield over to that cpu. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-02-26 22:37:31 +01:00
Heiko Carstens	1dcec254af	[S390] uaccess: implement strict user copy checks Same as on x86 and sparc, besides the fact that enabling the option will just emit compile time warnings instead of errors. Keeps allyesconfig kernels compiling. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-02-26 22:37:29 +01:00
Heiko Carstens	fb380aadfe	[S390] Move __cpu_logical_map to smp.c Finally move it to the place where it belongs to and make get rid of it for !CONFIG_SMP. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-01-13 20:44:45 +01:00
Thomas Gleixner	e5931943d0	locking: Convert raw_rwlock functions to arch_rwlock Name space cleanup for rwlock functions. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Thomas Gleixner	fb3a6bbc91	locking: Convert raw_rwlock to arch_rwlock Not strictly necessary for -rt as -rt does not have non sleeping rwlocks, but it's odd to not have a consistent naming convention. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Thomas Gleixner	0199c4e68d	locking: Convert __raw_spin* functions to arch_spin* Name space cleanup. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Thomas Gleixner	445c89514b	locking: Convert raw_spinlock to arch_spinlock The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt. Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Gerald Schaefer	6c1e3e7943	[S390] Use do_exception() in pagetable walk usercopy functions. The pagetable walk usercopy functions have used a modified copy of the do_exception() function for fault handling. This lead to inconsistencies with recent changes to do_exception(), e.g. performance counters. This patch changes the pagetable walk usercopy code to call do_exception() directly, eliminating the redundancy. A new parameter is added to do_exception() to specify the fault address. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-12-07 12:51:34 +01:00
Martin Schwidefsky	b11b533427	[S390] Improve address space mode selection. Introduce user_mode to replace the two variables switch_amode and s390_noexec. There are three valid combinations of the old values: 1) switch_amode == 0 && s390_noexec == 0 2) switch_amode == 1 && s390_noexec == 0 3) switch_amode == 1 && s390_noexec == 1 They get replaced by 1) user_mode == HOME_SPACE_MODE 2) user_mode == PRIMARY_SPACE_MODE 3) user_mode == SECONDARY_SPACE_MODE The new kernel parameter user_mode=[primary,secondary,home] lets you choose the address space mode the user space processes should use. In addition the CONFIG_S390_SWITCH_AMODE config option is removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-12-07 12:51:33 +01:00
Gerald Schaefer	af9d2ff9af	[S390] Add EX_TABLE for addressing exception in usercopy functions. This patch adds an EX_TABLE entry to mvc{p\|s\|os} usercopy functions that may be called with KERNEL_DS. In combination with collaborative memory management, kernel pages marked as unused may trigger an adressing exception in the usercopy functions. This fixes an unhandled addressing exception bug where strncpy_from_user() is used with len > strnlen and KERNEL_DS, crossing a page boundary to an unused page. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-06 10:35:10 +02:00
Heiko Carstens	0cd6a403e8	[S390] Provide arch specific mdelay implementation. Use an own implementation instead of the common code udelay loop. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-06 10:35:08 +02:00
Christian Borntraeger	78d81f2f84	[S390] Fix enabled udelay for short delays. When udelay() gets called with a delay that would expire before the next clock event it reprograms the clock comparator. When the interrupt happens the clock comparator won't be resetted therefore the interrupt condition doesn't get cleared. The result is an endless timer interrupt loop until the next clock event would expire (stored in lowcore). So udelay() usually would wait much longer for small delays than it should. Fix this by disabling the local tick which makes sure that the clock comparator will be resetted when a timer interrupt happens. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-10-06 10:35:08 +02:00
Heiko Carstens	5075baca2e	[S390] add __ucmpdi2() helper function Provide __ucmpdi2() helper function on 31 bit so we don't run again and again in compile errors like this one: kernel/built-in.o: In function `T.689': perf_counter.c:(.text+0x56c86): undefined reference to `__ucmpdi2' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-07-07 16:37:53 +02:00
Heiko Carstens	bb8c29caff	[S390] udelay: disable lockdep to avoid false positives Our udelay implementation enables interrupts to receive a special timer interrupt regardless of the context it is called from. This might lead to false positive lockdep reports. Since lockdep isn't aware of the fact that only a single interrupt source is enabled it warns about possible deadlocks that in reality won't happen, like the one below. To fix this disable lockdep before enabling interrupts. [ 254.040888] ================================= [ 254.040904] [ INFO: inconsistent lock state ] [ 254.040910] 2.6.30 #9 [ 254.040914] --------------------------------- [ 254.040920] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ 254.040927] swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 254.040934] (sch->lock){?.-...}, at: [<00000000002e4778>] ccw_device_timeout+0x48/0x2f0 [ 254.040961] {IN-HARDIRQ-W} state was registered at: [ 254.040969] [<0000000000096f74>] __lock_acquire+0x9d4/0x188c [ 254.040985] [<0000000000097f68>] lock_acquire+0x13c/0x16c [ 254.040998] [<00000000004527e0>] _spin_lock+0x74/0xb8 [ 254.041016] [<0000000000457eb2>] do_IRQ+0xde/0x208 [ 254.041031] [<000000000002d190>] io_return+0x0/0x8 [ 254.041049] [<0000000000029faa>] vtime_stop_cpu+0xbe/0x114 [ 254.041066] irq event stamp: 259629 [ 254.041076] hardirqs last enabled at (259628): [<000000000045238e>] _spin_unlock_irq+0x5e/0x9c [ 254.041095] hardirqs last disabled at (259629): [<000000000045292e>] _spin_lock_irq+0x4a/0xc4 [ 254.041126] softirqs last enabled at (259614): [<000000000006500e>] __do_softirq+0x296/0x2b0 [ 254.041137] softirqs last disabled at (259619): [<0000000000024cf6>] do_softirq+0x102/0x108 [ 254.041147] [ 254.041148] other info that might help us debug this: [ 254.041153] 2 locks held by swapper/0: [ 254.041157] #0: (&priv->timer){+.-...}, at: [<000000000006bf9a>] run_timer_softirq+0x19a/0x340 [ 254.041170] #1: (sch->lock){?.-...}, at: [<00000000002e4778>] ccw_device_timeout+0x48/0x2f0 [ 254.041182] [ 254.041310] Call Trace: [ 254.041313] ([<00000000000174fc>] show_trace+0x16c/0x170) [ 254.041321] [<0000000000017578>] show_stack+0x78/0x104 [ 254.041327] [<000000000044d0ca>] dump_stack+0xc6/0xd4 [ 254.041342] [<00000000000949b4>] print_usage_bug+0x1c8/0x1fc [ 254.041353] [<0000000000094e8a>] mark_lock+0x4a2/0x670 [ 254.041364] [<00000000000950e2>] mark_held_locks+0x8a/0xb4 [ 254.041375] [<0000000000095398>] trace_hardirqs_on_caller+0x74/0x1ac [ 254.041388] [<00000000000954fa>] trace_hardirqs_on+0x2a/0x38 [ 254.041402] [<000000000025f1ec>] __udelay_disabled+0xac/0xfc [ 254.041419] [<000000000025f432>] __udelay+0x12a/0x148 [ 254.041433] [<00000000002d64d8>] cio_commit_config+0x170/0x290 [ 254.041451] [<00000000002d6978>] cio_disable_subchannel+0x120/0x1cc [ 254.041468] [<00000000002e32a4>] ccw_device_recog_done+0x54/0x2f4 [ 254.041485] [<00000000002e3638>] ccw_device_sense_id_done+0x50/0x90 [ 254.041508] [<00000000002e615a>] snsid_callback+0xfa/0x3a8 [ 254.041515] [<00000000002dd96c>] ccwreq_stop+0x80/0x90 [ 254.041523] [<00000000002dda8e>] ccw_request_timeout+0xc2/0xd0 [ 254.041530] [<00000000002e2f70>] ccw_device_request_event+0x58/0x90 [ 254.041537] [<00000000002e47ae>] ccw_device_timeout+0x7e/0x2f0 [ 254.041555] [<000000000006c02a>] run_timer_softirq+0x22a/0x340 [ 254.041566] [<0000000000064eb0>] __do_softirq+0x138/0x2b0 [ 254.041578] [<0000000000024cf6>] do_softirq+0x102/0x108 [ 254.041590] [<00000000000647ce>] irq_exit+0xee/0x114 [ 254.041603] [<0000000000457d88>] do_extint+0x130/0x17c [ 254.041617] [<000000000002d41e>] ext_no_vtime+0x1e/0x22 [ 254.041631] [<0000000000029faa>] vtime_stop_cpu+0xbe/0x114 [ 254.041646] ([<0000000000029f58>] vtime_stop_cpu+0x6c/0x114) [ 254.041662] [<000000000001d842>] cpu_idle+0x122/0x1c0 [ 254.041679] [<00000000004482c6>] start_secondary+0xce/0xe0 [ 254.041696] [<0000000000000000>] 0x0 [ 254.041715] [<0000000000000000>] 0x0 [ 254.041745] INFO: lockdep is turned off. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-07-07 16:37:51 +02:00
Linus Torvalds	d06063cc22	Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-21 13:08:22 -07:00
Heiko Carstens	ce58ae6f7f	[S390] implement interrupt-enabling rwlocks arch backend for `f5f7eac41d` "Allow rwlocks to re-enable interrupts". Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-06-12 10:27:29 +02:00
Heiko Carstens	1edad85b16	[S390] use compiler builtin versions of strlen/strcpy/strcat Use builtin variants if gcc 4 or newer is used to compile the kernel. Generates better code than the asm variants. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-03-26 15:24:24 +01:00
Heiko Carstens	1485c5c884	[S390] move EXPORT_SYMBOLs to definitions Move all EXPORT_SYMBOLs to their corresponding definitions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-03-26 15:24:11 +01:00
Gerald Schaefer	2887fc5aa6	[S390] Dont check for pfn_valid() in uaccess_pt.c pfn_valid() actually checks for a valid struct page and not for a valid pfn. Using xip mappings w/o struct pages, this will result in -EFAULT returned by the (page table walk) user copy functions, even though there is valid memory. Those user copy functions don't need a struct page, so this patch just removes the pfn_valid() check. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-03-18 13:28:13 +01:00
Martin Schwidefsky	4fa81ed277	[S390] __div64_31 broken for CONFIG_MARCH_G5 The implementation of __div64_31 for G5 machines is broken. The comments in __div64_31 are correct, only the code does not do what the comments say. The part "If the remainder has overflown subtract base and increase the quotient" is only partially realized, the base is subtracted correctly but the quotient is only increased if the dividend had the last bit set. Using the correct instruction fixes the problem. Cc: stable@kernel.org Reported-by: Frans Pop <elendil@planet.nl> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-03-18 13:28:12 +01:00
Heiko Carstens	5a0d0e6537	[S390] Move private simple udelay function to arch/s390/lib/delay.c. Move cio's private simple udelay function to lib/delay.c and turn it into something much more readable. So we have all implementations at one place. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-10-10 21:33:58 +02:00
Heiko Carstens	d3d238c774	[S390] nohz: Fix __udelay. This fixes a regression that came with `934b2857cc` ("[S390] nohz/sclp: disable timer on synchronous waits."). If udelay() gets called from a disabled context it sets the clock comparator to a value where it expects the next interrupt. When the interrupt happens the clock comparator gets not reset and therefore the interrupt condition doesn't get cleared. The result is an endless timer interrupt loop. In addition this patch fixes also the following: rcutorture reveals that our __udelay implementation is still buggy, since it might schedule tasklets, but prevents their execution: NOHZ: local_softirq_pending 42 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 142 NOHZ: local_softirq_pending 02 To fix this we make sure that only the clock comparator interrupt is enabled when the enabled wait psw is loaded. Also no code gets called anymore which might schedule tasklets. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-10-03 21:55:54 +02:00
Heiko Carstens	934b2857cc	[S390] nohz/sclp: disable timer on synchronous waits. sclp_sync_wait wait synchronously for an sclp interrupt and disables timer interrupts. However on the irq enter paths there is an extra check if a timer interrupt would be due and calls the timer callback. This would schedule softirqs in the wrong context. So introduce local_tick_enable/disable which prevents this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-08-01 16:39:30 +02:00
Heiko Carstens	ccf183e469	[S390] uaccess_mvcos: #ifdef config dependent code. arch/s390/lib/uaccess_mvcos.c:166: warning: 'strnlen_user_mvcos' defined but not used arch/s390/lib/uaccess_mvcos.c:186: warning: 'strncpy_from_user_mvcos' defined but not used Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-04-30 13:38:46 +02:00
Mathieu Desnoyers	47494f6a84	[S390] remove -traditional Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-04-30 13:38:44 +02:00
Heiko Carstens	3f12ebce6a	[S390] uaccess: Always access the correct address space. The current uaccess page table walk code assumes at a few places that any access is a user space access. This is not correct if somebody has issued a set_fs(KERNEL_DS) in advance. Add code which checks which address space we are in and with this make sure we access the correct address space. This way we get also rid of the dirty if (!currrent-mm) return -EFAULT; hack in futex_atomic_cmpxchg_pt. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2008-04-17 07:47:06 +02:00
Heiko Carstens	5a62b19219	[S390] Convert s390 to GENERIC_CLOCKEVENTS. This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks variant instead. In addition we get high resolution timers for free. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2008-04-17 07:47:05 +02:00
Heiko Carstens	504e75d0ed	[S390] futex: let futex_atomic_cmpxchg_pt survive early functional tests. `a0c1e9073e` "futex: runtime enable pi and robust functionality" introduces a test wether futex in atomic stuff works or not. It does that by writing to address 0 of the kernel address space. This will crash on older machines where addressing mode switching is enabled but where the mvcos instruction is not available. Page table walking is done by hand and therefore the code tries to access current->mm which is NULL. Therefore add an extra check, so we survive the early test. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-03-20 17:33:46 +01:00
Heiko Carstens	d5b02b3ff1	[S390] Fix futex_atomic_cmpxchg_std inline assembly. Add missing exception table entry so that the kernel can handle proctection exceptions as well on the cs instruction. Currently only specification exceptions are handled correctly. The missing entry allows user space to crash the kernel. Cc: stable <stable@kernel.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-02-19 15:29:35 +01:00
Hisashi Hifumi	894cdde26b	[S390] do local_irq_restore while spinning in spin_lock_irqsave. In s390's spin_lock_irqsave, interrupts remain disabled while spinning. In other architectures like x86 and powerpc, interrupts are re-enabled while spinning if IRQ is not masked before spin_lock_irqsave is called. The following patch re-enables interrupts through local_irq_restore while spinning for a lock acquisition. This can improve system response. [heiko.carstens@de.ibm.com: removed saving of pc] Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-01-26 14:11:31 +01:00
Heiko Carstens	3b4beb3175	[S390] Remove owner_pc member from raw_spinlock_t. Used to contain the address of the holder of the lock. But since the spinlock code is not inlined anymore all locks contain the same address anyway. And since in addtition nobody complained about that for ages its obviously unused. So remove it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-01-26 14:11:14 +01:00
Martin Schwidefsky	190a1d722a	[S390] 4level-fixup cleanup Get independent from asm-generic/4level-fixup.h Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-10-22 12:52:49 +02:00
Martin Schwidefsky	e4aa402e7a	[S390] Introduce follow_table in uaccess_pt.c Define and use follow_table inline in uaccess_pt.c to simplify the code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-10-22 12:52:49 +02:00
Serge E. Hallyn	b460cbc581	pid namespaces: define is_global_init() and is_container_init() is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Nick Piggin	83c54070ee	mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Martin Schwidefsky	8a88367088	[S390] Bogomips calculation for 64 bit. The bogomips calculation triggered via reading from /proc/cpuinfo can return incorrect values if the qrnnd assembly is called with a pointer in %r2 with any of the upper 32 bits set. Fix this by using 64 bit division / remainder operation provided by gcc instead of calling the assembly. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-07-10 11:24:47 +02:00
David S. Miller	cb8c181f28	[S390]: Fix build on 31-bit. Allow s390 to properly override the generic __div64_32() implementation by: 1) Using obj-y for div64.o in s390's makefile instead of lib-y 2) Adding the weak attribute to the generic implementation. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:53 -07:00
Martin Schwidefsky	bf6f6aa46f	[S390] prevent softirqs if delay is called disabled The new delay implementation uses the clock comparator and an external interrupt even if it is called disabled for interrupts. To do this all external interrupt source except clock comparator are switched of before enabling external interrupts. The external interrupt at the end of the delay period may not execute softirqs or we can end up in a dead-lock. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-21 10:55:00 +01:00
Heiko Carstens	4d284cac76	[S390] Avoid excessive inlining. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:18:53 +01:00
Martin Schwidefsky	31ee4b2f40	[S390] Calibrate delay and bogomips. Preset the bogomips number to the cpu capacity value reported by store system information in SYSIB 1.2.2. This value is constant for a particular machine model and can be used to determine relative performance differences between machines. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:18:31 +01:00
Martin Schwidefsky	d54853ef8c	[S390] ETR support. This patch adds support for clock synchronization to an external time reference (ETR). The external time reference sends an oscillator signal and a synchronization signal every 2^20 microseconds to keep the TOD clocks of all connected servers in sync. For availability two ETR units can be connected to a machine. If the clock deviates for more than the sync-check tolerance all cpus get a machine check that indicates that the clock is out of sync. For the lovely details how to get the clock back in sync see the code below. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:18:19 +01:00
Gerald Schaefer	c1821c2e97	[S390] noexec protection This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:18:17 +01:00
Heiko Carstens	2b67fc4606	[S390] Get rid of a lot of sparse warnings. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:16:47 +01:00
Heiko Carstens	d8ad075ef6	[S390] don't call handle_mm_fault() if in an atomic context. There are several places in the futex code where a spin_lock is held and still uaccesses happen. Deadlocks are avoided by increasing the preempt count. The pagefault handler will then not take any locks but will immediately search the fixup tables. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-01-09 10:18:50 +01:00
Heiko Carstens	22155914b6	[S390] uaccess_pt: add missing down_read() and convert to is_init(). Doesn't seem to be a good idea to duplicate code :) Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-12-08 15:53:49 +01:00
Peter Zijlstra	a866374aec	[PATCH] mm: pagefault_{disable,enable}() Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Gerald Schaefer	59f35d53fd	[S390] Add dynamic size check for usercopy functions. Use a wrapper for copy_to/from_user to chose the best usercopy method. The mvcos instruction is better for sizes greater than 256 bytes, if mvcos is not available a page table walk is better for sizes greater than 1024 bytes. Also removed the redundant copy_to/from_user_std_small functions. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-12-04 15:40:45 +01:00
Martin Schwidefsky	3c1fcfe229	[PATCH] Directed yield: direct yield of spinlocks for s390. Use the new diagnose 0x9c in the spinlock implementation for s390. It yields the remaining timeslice of the virtual cpu that tries to acquire a lock to the virtual cpu that is the current holder of the lock. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:22 -07:00
Martin Schwidefsky	94c12cc7d1	[S390] Inline assembly cleanup. Major cleanup of all s390 inline assemblies. They now have a common coding style. Quite a few have been shortened, mainly by using register asm variables. Use of the EX_TABLE macro helps as well. The atomic ops, bit ops and locking inlines new use the Q-constraint if a newer gcc is used. That results in slightly better code. Thanks to Christian Borntraeger for proof reading the changes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-09-28 16:56:43 +02:00
Martin Schwidefsky	52149ba6b0	[S390] user readable uninitialised kernel memory. A user space program can read uninitialised kernel memory by appending to a file from a bad address and then reading the result back. The cause is the copy_from_user function that does not clear the remaining bytes of the kernel buffer after it got a fault on the user space address. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-09-28 16:56:03 +02:00
Martin Schwidefsky	d9f7a745d5	[S390] __div64_32 for 31 bit. The clocksource infrastructure introduced with commit `ad596171ed` broke 31 bit s390. The reason is that the do_div() primitive for 31 bit always had a restriction: it could only divide an unsigned 64 bit integer by an unsigned 31 bit integer. The clocksource code now uses do_div() with a base value that has the most significant bit set. The result is that clock->cycle_interval has a funny value which causes the linux time to jump around like mad. The solution is "obvious": implement a proper __div64_32 function for 31 bit s390. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-09-28 16:55:39 +02:00
Gerald Schaefer	6c2a9e6df6	[S390] Use alternative user-copy operations for new hardware. This introduces new user-copy operations which are optimized for copying more than 256 Bytes on new hardware. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-09-20 15:59:44 +02:00
Gerald Schaefer	d02765d1af	[S390] Make user-copy operations run-time configurable. Introduces a struct uaccess_ops which allows setting user-copy operations at run-time. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-09-20 15:59:42 +02:00
Martin Schwidefsky	af313e5a4f	[S390] broken copy_in_user function. The copy_in_user primitive does not work as advertised. If the source and target area are available copy_in_user copies one byte too much. If one of the memory areas is not available it does not copy as much data as it can, but up to 257 bytes less. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-08-30 14:33:30 +02:00
Heiko Carstens	d2c993d845	[S390] Fix sparse warnings. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-07-12 16:41:55 +02:00

1 2 3 4

161 Коммитов