WSL2-Linux-Kernel/arch/x86/lib/memcpy_32.c

#include <linux/string.h>
#include <linux/module.h>

#undef memcpy
#undef memset

void *memcpy(void *to, const void *from, size_t n)
{
#ifdef CONFIG_X86_USE_3DNOW
	return __memcpy3d(to, from, n);
#else
	return __memcpy(to, from, n);
#endif
}
EXPORT_SYMBOL(memcpy);

void *memset(void *s, int c, size_t count)
{
	return __memset(s, c, count);
}
EXPORT_SYMBOL(memset);

void *memmove(void *dest, const void *src, size_t n)
{
	int d0, d1, d2;

	if (dest < src) {
		if ((dest + n) < src)
			 return memcpy(dest, src, n);
		else
			__asm__ __volatile__(
				"rep\n\t"
				"movsb\n\t"
				: "=&c" (d0), "=&S" (d1), "=&D" (d2)
				:"0" (n),
				 "1" (src),
				 "2" (dest)
				:"memory");
	} else {
		if((src + n) < dest)
			return memcpy(dest, src, n);
		else
			__asm__ __volatile__(
				"std\n\t"
				"rep\n\t"
				"movsb\n\t"
				"cld"
				: "=&c" (d0), "=&S" (d1), "=&D" (d2)
				:"0" (n),
				 "1" (n-1+src),
				 "2" (n-1+dest)
				:"memory");
	}

	return dest;
}
EXPORT_SYMBOL(memmove);
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 02:20:36 +04:00			`#include <linux/string.h>`
			`#include <linux/module.h>`

			`#undef memcpy`
			`#undef memset`

			`void memcpy(void to, const void *from, size_t n)`
			`{`
			`#ifdef CONFIG_X86_USE_3DNOW`
			`return __memcpy3d(to, from, n);`
			`#else`
			`return __memcpy(to, from, n);`
			`#endif`
			`}`
			`EXPORT_SYMBOL(memcpy);`

			`void memset(void s, int c, size_t count)`
			`{`
			`return __memset(s, c, count);`
			`}`
			`EXPORT_SYMBOL(memset);`

			`void memmove(void dest, const void *src, size_t n)`
			`{`
			`int d0, d1, d2;`

			`if (dest < src) {`
x86, mem: Don't implement forward memmove() as memcpy() memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> 2010-08-24 01:11:12 +04:00			`if ((dest + n) < src)`
			`return memcpy(dest, src, n);`
			`else`
			`__asm__ __volatile__(`
			`"rep\n\t"`
			`"movsb\n\t"`
			`: "=&c" (d0), "=&S" (d1), "=&D" (d2)`
			`:"0" (n),`
			`"1" (src),`
			`"2" (dest)`
			`:"memory");`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 02:20:36 +04:00			`} else {`
x86, mem: Optimize memcpy by avoiding memory false dependece All read operations after allocation stage can run speculatively, all write operation will run in program order, and if addresses are different read may run before older write operation, otherwise wait until write commit. However CPU don't check each address bit, so read could fail to recognize different address even they are in different page.For example if rsi is 0xf004, rdi is 0xe008, in following operation there will generate big performance latency. 1. movq (%rsi), %rax 2. movq %rax, (%rdi) 3. movq 8(%rsi), %rax 4. movq %rax, 8(%rdi) If %rsi and rdi were in really the same meory page, there are TRUE read-after-write dependence because instruction 2 write 0x008 and instruction 3 read 0x00c, the two address are overlap partially. Actually there are in different page and no any issues, but without checking each address bit CPU could think they are in the same page, and instruction 3 have to wait for instruction 2 to write data into cache from write buffer, then load data from cache, the cost time read spent is equal to mfence instruction. We may avoid it by tuning operation sequence as follow. 1. movq 8(%rsi), %rax 2. movq %rax, 8(%rdi) 3. movq (%rsi), %rax 4. movq %rax, (%rdi) Instruction 3 read 0x004, instruction 2 write address 0x010, no any dependence. At last on Core2 we gain 1.83x speedup compared with original instruction sequence. In this patch we first handle small size(less 20bytes), then jump to different copy mode. Based on our micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We use our micro-benchmark, and will do further test according to your requirment) Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> 2010-06-28 23:24:25 +04:00			`if((src + n) < dest)`
			`return memcpy(dest, src, n);`
x86, mem: Don't implement forward memmove() as memcpy() memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> 2010-08-24 01:11:12 +04:00			`else`
			`__asm__ __volatile__(`
			`"std\n\t"`
			`"rep\n\t"`
			`"movsb\n\t"`
			`"cld"`
			`: "=&c" (d0), "=&S" (d1), "=&D" (d2)`
			`:"0" (n),`
			`"1" (n-1+src),`
			`"2" (n-1+dest)`
			`:"memory");`
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 02:20:36 +04:00			`}`
x86, mem: Don't implement forward memmove() as memcpy() memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> 2010-08-24 01:11:12 +04:00
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip! 2005-04-17 02:20:36 +04:00			`return dest;`
			`}`
			`EXPORT_SYMBOL(memmove);`