зеркало из https://github.com/microsoft/X-Mem.git
Major revision! Works for GNU/Linux on x86, x86-64, x86-64 AVX, ARMv7, and hopefully ARMv8 (no CPU to test, but compiles). On Windows, works for x86, x86-64, x86-64 AVX. ARM port on Windows still needs to be tested.
This commit is contained in:
Родитель
754272aa65
Коммит
43ca5d63d4
2
Doxyfile
2
Doxyfile
|
@ -38,7 +38,7 @@ PROJECT_NAME = X-Mem
|
|||
# could be handy for archiving the generated documentation or if some version
|
||||
# control system is used.
|
||||
|
||||
PROJECT_NUMBER = 2.1.16
|
||||
PROJECT_NUMBER = 2.2.0
|
||||
|
||||
# Using the PROJECT_BRIEF tag one can provide an optional one line description
|
||||
# for a project that appears at the top of each page and should give viewer a
|
||||
|
|
12
README.md
12
README.md
|
@ -1,7 +1,7 @@
|
|||
README
|
||||
------------------------------------------------------------------------------------------------------------
|
||||
|
||||
X-Mem: Extensible Memory Benchmarking Tool v2.1.16
|
||||
X-Mem: Extensible Memory Benchmarking Tool v2.2.0
|
||||
------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The flexible open-source research tool for characterizing memory hierarchy throughput, latency, and power.
|
||||
|
@ -50,13 +50,13 @@ Flexibility: Easy reconfiguration for different combinations of tests
|
|||
- Multi-threading support
|
||||
- Large page support
|
||||
|
||||
Extensibility: modularity via C++ object-oriented principles
|
||||
Extensibility: Modularity via C++ object-oriented principles
|
||||
- Supports rapid addition of new benchmark kernel routines
|
||||
- Example: stream triad algorithm, impact of false sharing, etc. are possible with minor changes
|
||||
|
||||
Cross-platform: Currently implemented for Windows and GNU/Linux on x86, x86-64, and x86-64 with AVX extensions CPUs
|
||||
- Designed to allow straightforward porting to other operating systems and ISAs
|
||||
- ARM port under development
|
||||
- ARM port under development (currently implemented for GNU/Linux for 32-bit and 64-bit ARM)
|
||||
|
||||
Memory throughput:
|
||||
- Accurate measurement of sustained memory throughput to all levels of cache and memory
|
||||
|
@ -79,7 +79,7 @@ Documentation:
|
|||
|
||||
INCLUDED EXTENSIONS (under src/include/ext and src/ext directories):
|
||||
- Loaded latency benchmark variant with load delays inserted as nop instructions between memory instructions.
|
||||
This is done for 64 and 256-bit chunks on x86-64 with AVX extensions, forward sequential read load threads only at the moment.
|
||||
This is done for 32, 64, 128, and 256-bit load chunk sizes where applicable using the forward sequential read pattern.
|
||||
|
||||
For feature requests, please refer to the contact information at the end of this README.
|
||||
|
||||
|
@ -91,8 +91,8 @@ There are a few runtime prerequisites in order for the software to run correctly
|
|||
|
||||
HARDWARE:
|
||||
|
||||
- Intel x86 or x86-64 CPU with optional support for AVX extensions. AMD CPUs should also work although this has not been tested.
|
||||
- COMING SOON: ARM CPUs
|
||||
- Intel x86, x86-64, or x86-64 with AVX CPU. AMD CPUs should also work although this has not been tested.
|
||||
- ARM Cortex-A series processors with VFP and NEON extensions. Specifically tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A should also work but have not been tested.
|
||||
|
||||
WINDOWS:
|
||||
|
||||
|
|
Двоичные данные
X-Mem_Developer_Manual.pdf
Двоичные данные
X-Mem_Developer_Manual.pdf
Двоичный файл не отображается.
Двоичные данные
bin/xmem-linux-arm
Двоичные данные
bin/xmem-linux-arm
Двоичный файл не отображается.
Двоичные данные
bin/xmem-linux-arm64
Двоичные данные
bin/xmem-linux-arm64
Двоичный файл не отображается.
Двоичные данные
bin/xmem-linux-x64
Двоичные данные
bin/xmem-linux-x64
Двоичный файл не отображается.
Двоичные данные
bin/xmem-linux-x64_avx
Двоичные данные
bin/xmem-linux-x64_avx
Двоичный файл не отображается.
Двоичные данные
bin/xmem-linux-x86
Двоичные данные
bin/xmem-linux-x86
Двоичный файл не отображается.
|
@ -654,9 +654,13 @@ bool BenchmarkManager::runExtDelayInjectedLoadedLatencyBenchmark() {
|
|||
|
||||
//Put the enumerations into vectors to make constructing benchmarks more loopable
|
||||
std::vector<chunk_size_t> chunks;
|
||||
chunks.push_back(CHUNK_32b);
|
||||
#ifdef HAS_WORD_64
|
||||
chunks.push_back(CHUNK_64b);
|
||||
#endif
|
||||
#ifdef HAS_WORD_128
|
||||
chunks.push_back(CHUNK_128b);
|
||||
#endif
|
||||
#ifdef HAS_WORD_256
|
||||
chunks.push_back(CHUNK_256b);
|
||||
#endif
|
||||
|
|
|
@ -55,12 +55,11 @@ Configurator::Configurator(
|
|||
__runThroughput(true),
|
||||
__working_set_size_per_thread(DEFAULT_WORKING_SET_SIZE_PER_THREAD),
|
||||
__num_worker_threads(DEFAULT_NUM_WORKER_THREADS),
|
||||
#ifndef HAS_WORD_64
|
||||
__use_chunk_32b(true),
|
||||
#endif
|
||||
#ifdef HAS_WORD_64
|
||||
__use_chunk_32b(false),
|
||||
__use_chunk_64b(true),
|
||||
#else
|
||||
__use_chunk_32b(true),
|
||||
#endif
|
||||
#ifdef HAS_WORD_128
|
||||
__use_chunk_128b(false),
|
||||
|
@ -70,8 +69,7 @@ Configurator::Configurator(
|
|||
#endif
|
||||
#ifdef HAS_NUMA
|
||||
__numa_enabled(true),
|
||||
#endif
|
||||
#ifndef HAS_NUMA
|
||||
#else
|
||||
__numa_enabled(false),
|
||||
#endif
|
||||
__iterations(1),
|
||||
|
@ -97,85 +95,6 @@ Configurator::Configurator(
|
|||
{
|
||||
}
|
||||
|
||||
//TODO: delete this monstrosity
|
||||
/*Configurator::Configurator(
|
||||
bool runExtensions,
|
||||
#ifdef EXT_DELAY_INJECTED_LOADED_LATENCY_BENCHMARK
|
||||
bool run_ext_delay_injected_loaded_latency_benchmark,
|
||||
#endif
|
||||
#ifdef EXT_STREAM_BENCHMARK
|
||||
bool run_ext_stream_benchmark,
|
||||
#endif
|
||||
bool runLatency,
|
||||
bool runThroughput,
|
||||
size_t working_set_size_per_thread,
|
||||
uint32_t num_worker_threads,
|
||||
bool use_chunk_32b,
|
||||
bool use_chunk_64b,
|
||||
bool use_chunk_128b,
|
||||
bool use_chunk_256b,
|
||||
bool numa_enabled,
|
||||
uint32_t iterations_per_test,
|
||||
bool use_random_access_pattern,
|
||||
bool use_sequential_access_pattern,
|
||||
uint32_t starting_test_index,
|
||||
std::string filename,
|
||||
bool use_output_file,
|
||||
bool verbose,
|
||||
bool use_large_pages,
|
||||
bool use_reads,
|
||||
bool use_writes,
|
||||
bool use_stride_p1,
|
||||
bool use_stride_n1,
|
||||
bool use_stride_p2,
|
||||
bool use_stride_n2,
|
||||
bool use_stride_p4,
|
||||
bool use_stride_n4,
|
||||
bool use_stride_p8,
|
||||
bool use_stride_n8,
|
||||
bool use_stride_p16,
|
||||
bool use_stride_n16
|
||||
) :
|
||||
__configured(true),
|
||||
__runExtensions(runExtensions),
|
||||
#ifdef EXT_DELAY_INJECTED_LOADED_LATENCY_BENCHMARK
|
||||
__run_ext_delay_injected_loaded_latency_benchmark(run_ext_delay_injected_loaded_latency_benchmark),
|
||||
#endif
|
||||
#ifdef EXT_STREAM_BENCHMARK
|
||||
__run_ext_stream_benchmark(run_ext_stream_benchmark),
|
||||
#endif
|
||||
__runLatency(runLatency),
|
||||
__runThroughput(runThroughput),
|
||||
__working_set_size_per_thread(working_set_size_per_thread),
|
||||
__num_worker_threads(num_worker_threads),
|
||||
__use_chunk_32b(use_chunk_32b),
|
||||
__use_chunk_64b(use_chunk_64b),
|
||||
__use_chunk_128b(use_chunk_128b),
|
||||
__use_chunk_256b(use_chunk_256b),
|
||||
__numa_enabled(numa_enabled),
|
||||
__iterations(iterations_per_test),
|
||||
__use_random_access_pattern(use_random_access_pattern),
|
||||
__use_sequential_access_pattern(use_sequential_access_pattern),
|
||||
__starting_test_index(starting_test_index),
|
||||
__filename(filename),
|
||||
__use_output_file(use_output_file),
|
||||
__verbose(verbose),
|
||||
__use_large_pages(use_large_pages),
|
||||
__use_reads(use_reads),
|
||||
__use_writes(use_writes),
|
||||
__use_stride_p1(use_stride_p1),
|
||||
__use_stride_n1(use_stride_n1),
|
||||
__use_stride_p2(use_stride_p2),
|
||||
__use_stride_n2(use_stride_n2),
|
||||
__use_stride_p4(use_stride_p4),
|
||||
__use_stride_n4(use_stride_n4),
|
||||
__use_stride_p8(use_stride_p8),
|
||||
__use_stride_n8(use_stride_n8),
|
||||
__use_stride_p16(use_stride_p16),
|
||||
__use_stride_n16(use_stride_n16)
|
||||
{
|
||||
}*/
|
||||
|
||||
int32_t Configurator::configureFromInput(int argc, char* argv[]) {
|
||||
if (__configured) { //If this object was already configured, cannot override from user inputs. This is to prevent an invalid state.
|
||||
std::cerr << "WARNING: Something bad happened when configuring X-Mem. This is probably not your fault." << std::endl;
|
||||
|
@ -293,8 +212,12 @@ int32_t Configurator::configureFromInput(int argc, char* argv[]) {
|
|||
}
|
||||
|
||||
//Check NUMA selection
|
||||
if (options[NUMA_DISABLE]) //NUMA is not supported currently on anything but x86-64 systems anyway.
|
||||
if (options[NUMA_DISABLE]) {
|
||||
__numa_enabled = false;
|
||||
#ifndef HAS_NUMA
|
||||
std::cerr << "WARNING: NUMA is not supported on this build, so the NUMA-disable option has no effect." << std::endl;
|
||||
#endif
|
||||
}
|
||||
|
||||
//Check if large pages should be used for allocation of memory under test.
|
||||
if (options[USE_LARGE_PAGES]) {
|
||||
|
@ -639,18 +562,24 @@ int32_t Configurator::configureFromInput(int argc, char* argv[]) {
|
|||
std::cout << std::endl;
|
||||
std::cout << "---> Number of worker threads: ";
|
||||
std::cout << __num_worker_threads << std::endl;
|
||||
#ifdef HAS_NUMA
|
||||
std::cout << "---> NUMA enabled: ";
|
||||
#ifdef HAS_NUMA
|
||||
if (__numa_enabled)
|
||||
std::cout << "yes" << std::endl;
|
||||
else
|
||||
std::cout << "no" << std::endl;
|
||||
#else
|
||||
std::cout << "not supported" << std::endl;
|
||||
#endif
|
||||
std::cout << "---> Large pages: ";
|
||||
#ifdef HAS_LARGE_PAGES
|
||||
if (__use_large_pages)
|
||||
std::cout << "yes" << std::endl;
|
||||
else
|
||||
std::cout << "no" << std::endl;
|
||||
#else
|
||||
std::cout << "not supported" << std::endl;
|
||||
#endif
|
||||
std::cout << "---> Iterations: ";
|
||||
std::cout << __iterations << std::endl;
|
||||
std::cout << "---> Starting test index: ";
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
README
|
||||
------------------------------------------------------------------------------------------------------------
|
||||
|
||||
X-Mem: Extensible Memory Benchmarking Tool v2.1.16
|
||||
X-Mem: Extensible Memory Benchmarking Tool v2.2.0
|
||||
------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The flexible open-source research tool for characterizing memory hierarchy throughput, latency, and power.
|
||||
|
@ -50,13 +50,13 @@ Flexibility: Easy reconfiguration for different combinations of tests
|
|||
- Multi-threading support
|
||||
- Large page support
|
||||
|
||||
Extensibility: modularity via C++ object-oriented principles
|
||||
Extensibility: Modularity via C++ object-oriented principles
|
||||
- Supports rapid addition of new benchmark kernel routines
|
||||
- Example: stream triad algorithm, impact of false sharing, etc. are possible with minor changes
|
||||
|
||||
Cross-platform: Currently implemented for Windows and GNU/Linux on x86, x86-64, and x86-64 with AVX extensions CPUs
|
||||
- Designed to allow straightforward porting to other operating systems and ISAs
|
||||
- ARM port under development
|
||||
- ARM port under development (currently implemented for GNU/Linux for 32-bit and 64-bit ARM)
|
||||
|
||||
Memory throughput:
|
||||
- Accurate measurement of sustained memory throughput to all levels of cache and memory
|
||||
|
@ -79,7 +79,7 @@ Documentation:
|
|||
|
||||
INCLUDED EXTENSIONS (under src/include/ext and src/ext directories):
|
||||
- Loaded latency benchmark variant with load delays inserted as nop instructions between memory instructions.
|
||||
This is done for 64 and 256-bit chunks on x86-64 with AVX extensions, forward sequential read load threads only at the moment.
|
||||
This is done for 32, 64, 128, and 256-bit load chunk sizes where applicable using the forward sequential read pattern.
|
||||
|
||||
For feature requests, please refer to the contact information at the end of this README.
|
||||
|
||||
|
@ -91,8 +91,8 @@ There are a few runtime prerequisites in order for the software to run correctly
|
|||
|
||||
HARDWARE:
|
||||
|
||||
- Intel x86 or x86-64 CPU with optional support for AVX extensions. AMD CPUs should also work although this has not been tested.
|
||||
- COMING SOON: ARM CPUs
|
||||
- Intel x86, x86-64, or x86-64 with AVX CPU. AMD CPUs should also work although this has not been tested.
|
||||
- ARM Cortex-A series processors with VFP and NEON extensions. Specifically tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A should also work but have not been tested.
|
||||
|
||||
WINDOWS:
|
||||
|
||||
|
|
|
@ -120,6 +120,61 @@ bool DelayInjectedLoadedLatencyBenchmark::_run_core() {
|
|||
SequentialFunction load_kernel_dummy_fptr = NULL;
|
||||
if (_num_worker_threads > 1) { //If we only have one worker thread, it is used for latency measurement only, and no load threads will be used.
|
||||
switch (_chunk_size) {
|
||||
case CHUNK_32b:
|
||||
switch (__delay) {
|
||||
case 0:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32; //not an extended kernel
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32; //not an extended kernel
|
||||
break;
|
||||
case 1:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay1;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay1;
|
||||
break;
|
||||
case 2:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay2;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay2;
|
||||
break;
|
||||
case 4:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay4;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay4;
|
||||
break;
|
||||
case 8:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay8;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay8;
|
||||
break;
|
||||
case 16:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay16;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay16;
|
||||
break;
|
||||
case 32:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay32;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay32;
|
||||
break;
|
||||
case 64:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay64;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay64;
|
||||
break;
|
||||
case 128:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay128;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay128;
|
||||
break;
|
||||
case 256:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay256;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay256;
|
||||
break;
|
||||
case 512:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay512;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay512plus;
|
||||
break;
|
||||
case 1024:
|
||||
load_kernel_fptr = &forwSequentialRead_Word32_Delay1024;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay512plus;
|
||||
break;
|
||||
default:
|
||||
std::cerr << "ERROR: Failed to find appropriate benchmark kernel." << std::endl;
|
||||
return false;
|
||||
}
|
||||
break;
|
||||
#ifdef HAS_WORD_64
|
||||
case CHUNK_64b:
|
||||
switch (__delay) {
|
||||
|
@ -177,6 +232,63 @@ bool DelayInjectedLoadedLatencyBenchmark::_run_core() {
|
|||
}
|
||||
break;
|
||||
#endif
|
||||
#ifdef HAS_WORD_128
|
||||
case CHUNK_128b:
|
||||
switch (__delay) {
|
||||
case 0:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128; //not an extended kernel
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128; //not an extended kernel
|
||||
break;
|
||||
case 1:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay1;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay1;
|
||||
break;
|
||||
case 2:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay2;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay2;
|
||||
break;
|
||||
case 4:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay4;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay4;
|
||||
break;
|
||||
case 8:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay8;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay8;
|
||||
break;
|
||||
case 16:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay16;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay16;
|
||||
break;
|
||||
case 32:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay32;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay32;
|
||||
break;
|
||||
case 64:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay64;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay64;
|
||||
break;
|
||||
case 128:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay128;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
|
||||
break;
|
||||
case 256:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay256;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
|
||||
break;
|
||||
case 512:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay512;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
|
||||
break;
|
||||
case 1024:
|
||||
load_kernel_fptr = &forwSequentialRead_Word128_Delay1024;
|
||||
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
|
||||
break;
|
||||
default:
|
||||
std::cerr << "ERROR: Failed to find appropriate benchmark kernel." << std::endl;
|
||||
return false;
|
||||
}
|
||||
break;
|
||||
#endif
|
||||
#ifdef HAS_WORD_256
|
||||
case CHUNK_256b:
|
||||
switch (__delay) {
|
||||
|
|
|
@ -52,6 +52,96 @@ using namespace xmem;
|
|||
|
||||
/* -------------------- DUMMY BENCHMARK ROUTINES ------------------------- */
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay1(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL512(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay2(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL256(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay4(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL128(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay8(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL64(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay16(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL32(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay32(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL16(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay64(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL8(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay128(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL4(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay256(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL2(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay512plus(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
wordptr++;
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
#ifdef HAS_WORD_64
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word64_Delay1(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
|
@ -135,6 +225,80 @@ int32_t xmem::dummy_forwSequentialLoop_Word64_Delay256plus(void* start_address,
|
|||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_128
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay1(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL128(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay2(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL64(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay4(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL32(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay8(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL16(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay16(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL8(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay32(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL4(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay64(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL2(wordptr++;)
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay128plus(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
wordptr++;
|
||||
placeholder = 0;
|
||||
}
|
||||
return placeholder;
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_256
|
||||
int32_t xmem::dummy_forwSequentialLoop_Word256_Delay1(void* start_address, void* end_address) {
|
||||
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
|
||||
|
@ -202,6 +366,94 @@ int32_t xmem::dummy_forwSequentialLoop_Word256_Delay64plus(void* start_address,
|
|||
|
||||
/* -------------------- CORE BENCHMARK ROUTINES -------------------------- */
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay1(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL512(val = *wordptr++; my_nop();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay2(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL256(val = *wordptr++; my_nop2();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay4(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL128(val = *wordptr++; my_nop4();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay8(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL64(val = *wordptr++; my_nop8();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay16(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL32(val = *wordptr++; my_nop16();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay32(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL16(val = *wordptr++; my_nop32();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay64(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL8(val = *wordptr++; my_nop64();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay128(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL4(val = *wordptr++; my_nop128();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay256(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL2(val = *wordptr++; my_nop256();)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay512(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop512();
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word32_Delay1024(void* start_address, void* end_address) {
|
||||
register Word32_t val;
|
||||
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop1024();
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef HAS_WORD_64
|
||||
int32_t xmem::forwSequentialRead_Word64_Delay1(void* start_address, void* end_address) {
|
||||
register Word64_t val;
|
||||
|
@ -292,6 +544,150 @@ int32_t xmem::forwSequentialRead_Word64_Delay1024(void* start_address, void* end
|
|||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_128
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay1(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL128(val = *wordptr++; my_nop();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay2(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL64(val = *wordptr++; my_nop2();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay4(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL32(val = *wordptr++; my_nop4();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay8(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL16(val = *wordptr++; my_nop8();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay16(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL8(val = *wordptr++; my_nop16();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay32(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL4(val = *wordptr++; my_nop32();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay64(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
UNROLL2(val = *wordptr++; my_nop64();)
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay128(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop128();
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay256(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop256();
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay512(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop512();
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int32_t xmem::forwSequentialRead_Word128_Delay1024(void* start_address, void* end_address) {
|
||||
#ifdef _WIN32
|
||||
return 0; //TODO: Not yet implemented for Windows.
|
||||
#endif
|
||||
#ifdef __gnu_linux__
|
||||
register Word128_t val;
|
||||
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
|
||||
val = *wordptr++; my_nop1024();
|
||||
}
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_256
|
||||
int32_t xmem::forwSequentialRead_Word256_Delay1(void* start_address, void* end_address) {
|
||||
|
|
|
@ -139,81 +139,6 @@ namespace xmem {
|
|||
*/
|
||||
Configurator();
|
||||
|
||||
//TODO: delete this monstrosity
|
||||
/**
|
||||
* @brief Specialized constructor for when you don't want to get config from input, and you want to pass it in directly.
|
||||
* @param runExtensions Indicates if user-defined code should be run in addition to other standard functionality.
|
||||
* @param run_ext_delay_injected_loaded_latency_benchmark If true, and this extension is included at compile-time, run the delay-injected loaded latency benchmark extension.
|
||||
* @param run_ext_stream_benchmark If true, and this extension is included at compile-time, run the STREAM-like benchmark extension.
|
||||
* @param runLatency Indicates latency benchmarks should be run.
|
||||
* @param runThroughput Indicates throughput benchmarks should be run.
|
||||
* @param working_set_size_per_thread The total size of memory to test in all benchmarks, in bytes, per thread. This MUST be a multiple of 4KB pages.
|
||||
* @param num_worker_threads The number of threads to use in throughput benchmarks, loaded latency benchmarks, and stress tests.
|
||||
* @param use_chunk_32b If true, include 32-bit chunks for relevant benchmarks.
|
||||
* @param use_chunk_64b If true, include 64-bit chunks for relevant benchmarks.
|
||||
* @param use_chunk_128b If true, include 128-bit chunks for relevant benchmarks.
|
||||
* @param use_chunk_256b If true, include 256-bit chunks for relevant benchmarks.
|
||||
* @param numa_enable If true, then test all combinations of CPU/memory NUMA nodes.
|
||||
* @param iterations_per_test For each unique benchmark test, this is the number of times to repeat it.
|
||||
* @param use_random_access_pattern If true, use random-access patterns in throughput benchmarks.
|
||||
* @param use_sequential_access_pattern If true, use sequential-access patterns in throughput benchmarks.
|
||||
* @param starting_test_index Numerical index to use for the first test. This is an aid for end-user interpreting and post-processing of result CSV file, if relevant.
|
||||
* @param filename Output filename to use.
|
||||
* @param use_output_file If true, use the provided output filename.
|
||||
* @param verbose If true, then X-Mem should be more verbose in its console reporting.
|
||||
* @param use_large_pages If true, then X-Mem will attempt to force usage of large pages.
|
||||
* @param use_reads If true, then throughput benchmarks should use reads.
|
||||
* @param use_writes If true, then throughput benchmarks should use writes.
|
||||
* @param use_stride_p1 If true, include stride of +1 for relevant benchmarks.
|
||||
* @param use_stride_n1 If true, include stride of -1 for relevant benchmarks.
|
||||
* @param use_stride_p2 If true, include stride of +2 for relevant benchmarks.
|
||||
* @param use_stride_n2 If true, include stride of -2 for relevant benchmarks.
|
||||
* @param use_stride_p4 If true, include stride of +4 for relevant benchmarks.
|
||||
* @param use_stride_n4 If true, include stride of -4 for relevant benchmarks.
|
||||
* @param use_stride_p8 If true, include stride of +8 for relevant benchmarks.
|
||||
* @param use_stride_n8 If true, include stride of -8 for relevant benchmarks.
|
||||
* @param use_stride_p16 If true, include stride of +16 for relevant benchmarks.
|
||||
* @param use_stride_n16 If true, include stride of -16 for relevant benchmarks.
|
||||
*/
|
||||
/*Configurator(
|
||||
bool runExtensions,
|
||||
#ifdef EXT_DELAY_INJECTED_LATENCY_BENCHMARK
|
||||
bool run_ext_delay_injected_loaded_latency_benchmark,
|
||||
#endif
|
||||
#ifdef EXT_STREAM_BENCHMARK
|
||||
bool run_ext_stream_benchmark,
|
||||
#endif
|
||||
bool runLatency,
|
||||
bool runThroughput,
|
||||
size_t working_set_size_per_thread,
|
||||
uint32_t num_worker_threads,
|
||||
bool use_chunk_32b,
|
||||
bool use_chunk_64b,
|
||||
bool use_chunk_128b,
|
||||
bool use_chunk_256b,
|
||||
bool numa_enable,
|
||||
uint32_t iterations_per_test,
|
||||
bool use_random_access_pattern,
|
||||
bool use_sequential_access_pattern,
|
||||
uint32_t starting_test_index,
|
||||
std::string filename,
|
||||
bool use_output_file,
|
||||
bool verbose,
|
||||
bool use_large_pages,
|
||||
bool use_reads,
|
||||
bool use_writes,
|
||||
bool use_stride_p1,
|
||||
bool use_stride_n1,
|
||||
bool use_stride_p2,
|
||||
bool use_stride_n2,
|
||||
bool use_stride_p4,
|
||||
bool use_stride_n4,
|
||||
bool use_stride_p8,
|
||||
bool use_stride_n8,
|
||||
bool use_stride_p16,
|
||||
bool use_stride_n16
|
||||
);*/
|
||||
|
||||
/**
|
||||
* @brief Configures the tool based on user's command-line inputs.
|
||||
* @param argc The argc from main().
|
||||
|
|
|
@ -49,7 +49,7 @@
|
|||
|
||||
namespace xmem {
|
||||
|
||||
#define VERSION "2.1.16"
|
||||
#define VERSION "2.2.0"
|
||||
|
||||
#if !defined(_WIN32) && !defined(__gnu_linux__)
|
||||
#error Neither Windows/GNULinux build environments were detected!
|
||||
|
|
|
@ -30,6 +30,9 @@
|
|||
#ifndef __DELAY_INJECTED_BENCHMARK_KERNELS_H
|
||||
#define __DELAY_INJECTED_BENCHMARK_KERNELS_H
|
||||
|
||||
//Headers
|
||||
#include <common.h>
|
||||
|
||||
//Libraries
|
||||
#include <cstdint>
|
||||
#if defined(_WIN32) && defined(ARCH_INTEL_AVX)
|
||||
|
@ -41,11 +44,11 @@
|
|||
|
||||
//Helper macros for inserting delays with nops.
|
||||
#ifdef _WIN32
|
||||
#define my_nop() __nop()
|
||||
#define my_nop() __nop() //Works on both x86 family and ARM family as-is
|
||||
#endif
|
||||
|
||||
#ifdef __gnu_linux__
|
||||
#define my_nop() asm("nop")
|
||||
#define my_nop() asm("nop") //Works on both x86 family and ARM family as-is
|
||||
#endif
|
||||
|
||||
#define my_nop2() my_nop(); my_nop()
|
||||
|
@ -68,7 +71,88 @@ namespace xmem {
|
|||
***********************************************************************/
|
||||
|
||||
/* -------------------- DUMMY BENCHMARK ROUTINES -------------------------- */
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay1(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay2(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay4(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay8(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay16(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay32(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay64(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay128(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay256(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word32_Delay512plus(void* start_address, void* end_address);
|
||||
|
||||
#ifdef HAS_WORD_64
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
|
@ -140,9 +224,77 @@ namespace xmem {
|
|||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word64_Delay256plus(void* start_address, void* end_address);
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_128
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay1(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay2(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay4(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay8(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay16(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay32(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay64(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word128_Delay128plus(void* start_address, void* end_address);
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_256
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -150,7 +302,7 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay1(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -158,7 +310,7 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay2(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -166,7 +318,7 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay4(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -174,7 +326,7 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay8(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -182,7 +334,7 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay16(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
|
@ -190,15 +342,105 @@ namespace xmem {
|
|||
int32_t dummy_forwSequentialLoop_Word256_Delay32(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
|
||||
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t dummy_forwSequentialLoop_Word256_Delay64plus(void* start_address, void* end_address);
|
||||
#endif
|
||||
|
||||
/* --------------------- CORE BENCHMARK ROUTINES --------------------------- */
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 1 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay1(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 2 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay2(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 4 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay4(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 8 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay8(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 16 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay16(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 32 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay32(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 64 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay64(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 128 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay128(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 256 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay256(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 512 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay512(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 1024 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word32_Delay1024(void* start_address, void* end_address);
|
||||
|
||||
#ifdef HAS_WORD_64
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 64-bit chunks. 1 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
|
@ -286,7 +528,99 @@ namespace xmem {
|
|||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word64_Delay1024(void* start_address, void* end_address);
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_128
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 1 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay1(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 2 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay2(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 4 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay4(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 8 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay8(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 16 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay16(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 32 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay32(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 64 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay64(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 128 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay128(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 256 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay256(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 512 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay512(void* start_address, void* end_address);
|
||||
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 1024 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
* @param end_address The end of the memory region of interest.
|
||||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word128_Delay1024(void* start_address, void* end_address);
|
||||
#endif
|
||||
|
||||
#ifdef HAS_WORD_256
|
||||
/**
|
||||
* @brief Walks over the allocated memory forward sequentially, reading in 64-bit chunks. 1 delays (nops) are inserted between memory instructions.
|
||||
* @param start_address The beginning of the memory region of interest.
|
||||
|
@ -374,6 +708,7 @@ namespace xmem {
|
|||
* @returns Undefined.
|
||||
*/
|
||||
int32_t forwSequentialRead_Word256_Delay1024(void* start_address, void* end_address);
|
||||
#endif
|
||||
};
|
||||
|
||||
#endif
|
||||
|
|
Загрузка…
Ссылка в новой задаче