Major revision! Works for GNU/Linux on x86, x86-64, x86-64 AVX, ARMv7, and hopefully ARMv8 (no CPU to test, but compiles). On Windows, works for x86, x86-64, x86-64 AVX. ARM port on Windows still needs to be tested.

This commit is contained in:
Mark Gottscho 2015-04-24 17:26:38 -07:00
Родитель 754272aa65
Коммит 43ca5d63d4
16 изменённых файлов: 885 добавлений и 184 удалений

Просмотреть файл

@ -38,7 +38,7 @@ PROJECT_NAME = X-Mem
# could be handy for archiving the generated documentation or if some version
# control system is used.
PROJECT_NUMBER = 2.1.16
PROJECT_NUMBER = 2.2.0
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a

Просмотреть файл

@ -1,7 +1,7 @@
README
------------------------------------------------------------------------------------------------------------
X-Mem: Extensible Memory Benchmarking Tool v2.1.16
X-Mem: Extensible Memory Benchmarking Tool v2.2.0
------------------------------------------------------------------------------------------------------------
The flexible open-source research tool for characterizing memory hierarchy throughput, latency, and power.
@ -50,13 +50,13 @@ Flexibility: Easy reconfiguration for different combinations of tests
- Multi-threading support
- Large page support
Extensibility: modularity via C++ object-oriented principles
Extensibility: Modularity via C++ object-oriented principles
- Supports rapid addition of new benchmark kernel routines
- Example: stream triad algorithm, impact of false sharing, etc. are possible with minor changes
Cross-platform: Currently implemented for Windows and GNU/Linux on x86, x86-64, and x86-64 with AVX extensions CPUs
- Designed to allow straightforward porting to other operating systems and ISAs
- ARM port under development
- ARM port under development (currently implemented for GNU/Linux for 32-bit and 64-bit ARM)
Memory throughput:
- Accurate measurement of sustained memory throughput to all levels of cache and memory
@ -79,7 +79,7 @@ Documentation:
INCLUDED EXTENSIONS (under src/include/ext and src/ext directories):
- Loaded latency benchmark variant with load delays inserted as nop instructions between memory instructions.
This is done for 64 and 256-bit chunks on x86-64 with AVX extensions, forward sequential read load threads only at the moment.
This is done for 32, 64, 128, and 256-bit load chunk sizes where applicable using the forward sequential read pattern.
For feature requests, please refer to the contact information at the end of this README.
@ -91,8 +91,8 @@ There are a few runtime prerequisites in order for the software to run correctly
HARDWARE:
- Intel x86 or x86-64 CPU with optional support for AVX extensions. AMD CPUs should also work although this has not been tested.
- COMING SOON: ARM CPUs
- Intel x86, x86-64, or x86-64 with AVX CPU. AMD CPUs should also work although this has not been tested.
- ARM Cortex-A series processors with VFP and NEON extensions. Specifically tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A should also work but have not been tested.
WINDOWS:

Двоичные данные
X-Mem_Developer_Manual.pdf

Двоичный файл не отображается.

Двоичные данные
bin/xmem-linux-arm

Двоичный файл не отображается.

Двоичные данные
bin/xmem-linux-arm64

Двоичный файл не отображается.

Двоичные данные
bin/xmem-linux-x64

Двоичный файл не отображается.

Двоичные данные
bin/xmem-linux-x64_avx

Двоичный файл не отображается.

Двоичные данные
bin/xmem-linux-x86

Двоичный файл не отображается.

Просмотреть файл

@ -654,9 +654,13 @@ bool BenchmarkManager::runExtDelayInjectedLoadedLatencyBenchmark() {
//Put the enumerations into vectors to make constructing benchmarks more loopable
std::vector<chunk_size_t> chunks;
chunks.push_back(CHUNK_32b);
#ifdef HAS_WORD_64
chunks.push_back(CHUNK_64b);
#endif
#ifdef HAS_WORD_128
chunks.push_back(CHUNK_128b);
#endif
#ifdef HAS_WORD_256
chunks.push_back(CHUNK_256b);
#endif

Просмотреть файл

@ -55,12 +55,11 @@ Configurator::Configurator(
__runThroughput(true),
__working_set_size_per_thread(DEFAULT_WORKING_SET_SIZE_PER_THREAD),
__num_worker_threads(DEFAULT_NUM_WORKER_THREADS),
#ifndef HAS_WORD_64
__use_chunk_32b(true),
#endif
#ifdef HAS_WORD_64
__use_chunk_32b(false),
__use_chunk_64b(true),
#else
__use_chunk_32b(true),
#endif
#ifdef HAS_WORD_128
__use_chunk_128b(false),
@ -70,8 +69,7 @@ Configurator::Configurator(
#endif
#ifdef HAS_NUMA
__numa_enabled(true),
#endif
#ifndef HAS_NUMA
#else
__numa_enabled(false),
#endif
__iterations(1),
@ -97,85 +95,6 @@ Configurator::Configurator(
{
}
//TODO: delete this monstrosity
/*Configurator::Configurator(
bool runExtensions,
#ifdef EXT_DELAY_INJECTED_LOADED_LATENCY_BENCHMARK
bool run_ext_delay_injected_loaded_latency_benchmark,
#endif
#ifdef EXT_STREAM_BENCHMARK
bool run_ext_stream_benchmark,
#endif
bool runLatency,
bool runThroughput,
size_t working_set_size_per_thread,
uint32_t num_worker_threads,
bool use_chunk_32b,
bool use_chunk_64b,
bool use_chunk_128b,
bool use_chunk_256b,
bool numa_enabled,
uint32_t iterations_per_test,
bool use_random_access_pattern,
bool use_sequential_access_pattern,
uint32_t starting_test_index,
std::string filename,
bool use_output_file,
bool verbose,
bool use_large_pages,
bool use_reads,
bool use_writes,
bool use_stride_p1,
bool use_stride_n1,
bool use_stride_p2,
bool use_stride_n2,
bool use_stride_p4,
bool use_stride_n4,
bool use_stride_p8,
bool use_stride_n8,
bool use_stride_p16,
bool use_stride_n16
) :
__configured(true),
__runExtensions(runExtensions),
#ifdef EXT_DELAY_INJECTED_LOADED_LATENCY_BENCHMARK
__run_ext_delay_injected_loaded_latency_benchmark(run_ext_delay_injected_loaded_latency_benchmark),
#endif
#ifdef EXT_STREAM_BENCHMARK
__run_ext_stream_benchmark(run_ext_stream_benchmark),
#endif
__runLatency(runLatency),
__runThroughput(runThroughput),
__working_set_size_per_thread(working_set_size_per_thread),
__num_worker_threads(num_worker_threads),
__use_chunk_32b(use_chunk_32b),
__use_chunk_64b(use_chunk_64b),
__use_chunk_128b(use_chunk_128b),
__use_chunk_256b(use_chunk_256b),
__numa_enabled(numa_enabled),
__iterations(iterations_per_test),
__use_random_access_pattern(use_random_access_pattern),
__use_sequential_access_pattern(use_sequential_access_pattern),
__starting_test_index(starting_test_index),
__filename(filename),
__use_output_file(use_output_file),
__verbose(verbose),
__use_large_pages(use_large_pages),
__use_reads(use_reads),
__use_writes(use_writes),
__use_stride_p1(use_stride_p1),
__use_stride_n1(use_stride_n1),
__use_stride_p2(use_stride_p2),
__use_stride_n2(use_stride_n2),
__use_stride_p4(use_stride_p4),
__use_stride_n4(use_stride_n4),
__use_stride_p8(use_stride_p8),
__use_stride_n8(use_stride_n8),
__use_stride_p16(use_stride_p16),
__use_stride_n16(use_stride_n16)
{
}*/
int32_t Configurator::configureFromInput(int argc, char* argv[]) {
if (__configured) { //If this object was already configured, cannot override from user inputs. This is to prevent an invalid state.
std::cerr << "WARNING: Something bad happened when configuring X-Mem. This is probably not your fault." << std::endl;
@ -293,8 +212,12 @@ int32_t Configurator::configureFromInput(int argc, char* argv[]) {
}
//Check NUMA selection
if (options[NUMA_DISABLE]) //NUMA is not supported currently on anything but x86-64 systems anyway.
if (options[NUMA_DISABLE]) {
__numa_enabled = false;
#ifndef HAS_NUMA
std::cerr << "WARNING: NUMA is not supported on this build, so the NUMA-disable option has no effect." << std::endl;
#endif
}
//Check if large pages should be used for allocation of memory under test.
if (options[USE_LARGE_PAGES]) {
@ -639,18 +562,24 @@ int32_t Configurator::configureFromInput(int argc, char* argv[]) {
std::cout << std::endl;
std::cout << "---> Number of worker threads: ";
std::cout << __num_worker_threads << std::endl;
#ifdef HAS_NUMA
std::cout << "---> NUMA enabled: ";
#ifdef HAS_NUMA
if (__numa_enabled)
std::cout << "yes" << std::endl;
else
std::cout << "no" << std::endl;
#else
std::cout << "not supported" << std::endl;
#endif
std::cout << "---> Large pages: ";
#ifdef HAS_LARGE_PAGES
if (__use_large_pages)
std::cout << "yes" << std::endl;
else
std::cout << "no" << std::endl;
#else
std::cout << "not supported" << std::endl;
#endif
std::cout << "---> Iterations: ";
std::cout << __iterations << std::endl;
std::cout << "---> Starting test index: ";

Просмотреть файл

@ -1,7 +1,7 @@
README
------------------------------------------------------------------------------------------------------------
X-Mem: Extensible Memory Benchmarking Tool v2.1.16
X-Mem: Extensible Memory Benchmarking Tool v2.2.0
------------------------------------------------------------------------------------------------------------
The flexible open-source research tool for characterizing memory hierarchy throughput, latency, and power.
@ -50,13 +50,13 @@ Flexibility: Easy reconfiguration for different combinations of tests
- Multi-threading support
- Large page support
Extensibility: modularity via C++ object-oriented principles
Extensibility: Modularity via C++ object-oriented principles
- Supports rapid addition of new benchmark kernel routines
- Example: stream triad algorithm, impact of false sharing, etc. are possible with minor changes
Cross-platform: Currently implemented for Windows and GNU/Linux on x86, x86-64, and x86-64 with AVX extensions CPUs
- Designed to allow straightforward porting to other operating systems and ISAs
- ARM port under development
- ARM port under development (currently implemented for GNU/Linux for 32-bit and 64-bit ARM)
Memory throughput:
- Accurate measurement of sustained memory throughput to all levels of cache and memory
@ -79,7 +79,7 @@ Documentation:
INCLUDED EXTENSIONS (under src/include/ext and src/ext directories):
- Loaded latency benchmark variant with load delays inserted as nop instructions between memory instructions.
This is done for 64 and 256-bit chunks on x86-64 with AVX extensions, forward sequential read load threads only at the moment.
This is done for 32, 64, 128, and 256-bit load chunk sizes where applicable using the forward sequential read pattern.
For feature requests, please refer to the contact information at the end of this README.
@ -91,8 +91,8 @@ There are a few runtime prerequisites in order for the software to run correctly
HARDWARE:
- Intel x86 or x86-64 CPU with optional support for AVX extensions. AMD CPUs should also work although this has not been tested.
- COMING SOON: ARM CPUs
- Intel x86, x86-64, or x86-64 with AVX CPU. AMD CPUs should also work although this has not been tested.
- ARM Cortex-A series processors with VFP and NEON extensions. Specifically tested on ARM Cortex A9 (32-bit) which is ARMv7. 64-bit builds for ARMv8-A should also work but have not been tested.
WINDOWS:

Просмотреть файл

@ -120,6 +120,61 @@ bool DelayInjectedLoadedLatencyBenchmark::_run_core() {
SequentialFunction load_kernel_dummy_fptr = NULL;
if (_num_worker_threads > 1) { //If we only have one worker thread, it is used for latency measurement only, and no load threads will be used.
switch (_chunk_size) {
case CHUNK_32b:
switch (__delay) {
case 0:
load_kernel_fptr = &forwSequentialRead_Word32; //not an extended kernel
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32; //not an extended kernel
break;
case 1:
load_kernel_fptr = &forwSequentialRead_Word32_Delay1;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay1;
break;
case 2:
load_kernel_fptr = &forwSequentialRead_Word32_Delay2;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay2;
break;
case 4:
load_kernel_fptr = &forwSequentialRead_Word32_Delay4;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay4;
break;
case 8:
load_kernel_fptr = &forwSequentialRead_Word32_Delay8;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay8;
break;
case 16:
load_kernel_fptr = &forwSequentialRead_Word32_Delay16;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay16;
break;
case 32:
load_kernel_fptr = &forwSequentialRead_Word32_Delay32;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay32;
break;
case 64:
load_kernel_fptr = &forwSequentialRead_Word32_Delay64;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay64;
break;
case 128:
load_kernel_fptr = &forwSequentialRead_Word32_Delay128;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay128;
break;
case 256:
load_kernel_fptr = &forwSequentialRead_Word32_Delay256;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay256;
break;
case 512:
load_kernel_fptr = &forwSequentialRead_Word32_Delay512;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay512plus;
break;
case 1024:
load_kernel_fptr = &forwSequentialRead_Word32_Delay1024;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word32_Delay512plus;
break;
default:
std::cerr << "ERROR: Failed to find appropriate benchmark kernel." << std::endl;
return false;
}
break;
#ifdef HAS_WORD_64
case CHUNK_64b:
switch (__delay) {
@ -177,6 +232,63 @@ bool DelayInjectedLoadedLatencyBenchmark::_run_core() {
}
break;
#endif
#ifdef HAS_WORD_128
case CHUNK_128b:
switch (__delay) {
case 0:
load_kernel_fptr = &forwSequentialRead_Word128; //not an extended kernel
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128; //not an extended kernel
break;
case 1:
load_kernel_fptr = &forwSequentialRead_Word128_Delay1;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay1;
break;
case 2:
load_kernel_fptr = &forwSequentialRead_Word128_Delay2;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay2;
break;
case 4:
load_kernel_fptr = &forwSequentialRead_Word128_Delay4;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay4;
break;
case 8:
load_kernel_fptr = &forwSequentialRead_Word128_Delay8;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay8;
break;
case 16:
load_kernel_fptr = &forwSequentialRead_Word128_Delay16;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay16;
break;
case 32:
load_kernel_fptr = &forwSequentialRead_Word128_Delay32;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay32;
break;
case 64:
load_kernel_fptr = &forwSequentialRead_Word128_Delay64;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay64;
break;
case 128:
load_kernel_fptr = &forwSequentialRead_Word128_Delay128;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
break;
case 256:
load_kernel_fptr = &forwSequentialRead_Word128_Delay256;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
break;
case 512:
load_kernel_fptr = &forwSequentialRead_Word128_Delay512;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
break;
case 1024:
load_kernel_fptr = &forwSequentialRead_Word128_Delay1024;
load_kernel_dummy_fptr = &dummy_forwSequentialLoop_Word128_Delay128plus;
break;
default:
std::cerr << "ERROR: Failed to find appropriate benchmark kernel." << std::endl;
return false;
}
break;
#endif
#ifdef HAS_WORD_256
case CHUNK_256b:
switch (__delay) {

Просмотреть файл

@ -52,6 +52,96 @@ using namespace xmem;
/* -------------------- DUMMY BENCHMARK ROUTINES ------------------------- */
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay1(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL512(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay2(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL256(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay4(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL128(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay8(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL64(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay16(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL32(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay32(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL16(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay64(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL8(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay128(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL4(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay256(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL2(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word32_Delay512plus(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
wordptr++;
placeholder = 0;
}
return placeholder;
}
#ifdef HAS_WORD_64
int32_t xmem::dummy_forwSequentialLoop_Word64_Delay1(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
@ -135,6 +225,80 @@ int32_t xmem::dummy_forwSequentialLoop_Word64_Delay256plus(void* start_address,
}
#endif
#ifdef HAS_WORD_128
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay1(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL128(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay2(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL64(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay4(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL32(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay8(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL16(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay16(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL8(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay32(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL4(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay64(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL2(wordptr++;)
placeholder = 0;
}
return placeholder;
}
int32_t xmem::dummy_forwSequentialLoop_Word128_Delay128plus(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
wordptr++;
placeholder = 0;
}
return placeholder;
}
#endif
#ifdef HAS_WORD_256
int32_t xmem::dummy_forwSequentialLoop_Word256_Delay1(void* start_address, void* end_address) {
volatile int32_t placeholder = 0; //Try our best to defeat compiler optimizations
@ -202,6 +366,94 @@ int32_t xmem::dummy_forwSequentialLoop_Word256_Delay64plus(void* start_address,
/* -------------------- CORE BENCHMARK ROUTINES -------------------------- */
int32_t xmem::forwSequentialRead_Word32_Delay1(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL512(val = *wordptr++; my_nop();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay2(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL256(val = *wordptr++; my_nop2();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay4(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL128(val = *wordptr++; my_nop4();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay8(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL64(val = *wordptr++; my_nop8();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay16(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL32(val = *wordptr++; my_nop16();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay32(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL16(val = *wordptr++; my_nop32();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay64(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL8(val = *wordptr++; my_nop64();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay128(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL4(val = *wordptr++; my_nop128();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay256(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
UNROLL2(val = *wordptr++; my_nop256();)
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay512(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop512();
}
return 0;
}
int32_t xmem::forwSequentialRead_Word32_Delay1024(void* start_address, void* end_address) {
register Word32_t val;
for (volatile Word32_t* wordptr = static_cast<Word32_t*>(start_address), *endptr = static_cast<Word32_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop1024();
}
return 0;
}
#ifdef HAS_WORD_64
int32_t xmem::forwSequentialRead_Word64_Delay1(void* start_address, void* end_address) {
register Word64_t val;
@ -292,6 +544,150 @@ int32_t xmem::forwSequentialRead_Word64_Delay1024(void* start_address, void* end
}
#endif
#ifdef HAS_WORD_128
int32_t xmem::forwSequentialRead_Word128_Delay1(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL128(val = *wordptr++; my_nop();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay2(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL64(val = *wordptr++; my_nop2();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay4(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL32(val = *wordptr++; my_nop4();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay8(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL16(val = *wordptr++; my_nop8();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay16(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL8(val = *wordptr++; my_nop16();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay32(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL4(val = *wordptr++; my_nop32();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay64(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
UNROLL2(val = *wordptr++; my_nop64();)
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay128(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop128();
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay256(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop256();
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay512(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop512();
}
return 0;
#endif
}
int32_t xmem::forwSequentialRead_Word128_Delay1024(void* start_address, void* end_address) {
#ifdef _WIN32
return 0; //TODO: Not yet implemented for Windows.
#endif
#ifdef __gnu_linux__
register Word128_t val;
for (volatile Word128_t* wordptr = static_cast<Word128_t*>(start_address), *endptr = static_cast<Word128_t*>(end_address); wordptr < endptr;) {
val = *wordptr++; my_nop1024();
}
return 0;
#endif
}
#endif
#ifdef HAS_WORD_256
int32_t xmem::forwSequentialRead_Word256_Delay1(void* start_address, void* end_address) {

Просмотреть файл

@ -139,81 +139,6 @@ namespace xmem {
*/
Configurator();
//TODO: delete this monstrosity
/**
* @brief Specialized constructor for when you don't want to get config from input, and you want to pass it in directly.
* @param runExtensions Indicates if user-defined code should be run in addition to other standard functionality.
* @param run_ext_delay_injected_loaded_latency_benchmark If true, and this extension is included at compile-time, run the delay-injected loaded latency benchmark extension.
* @param run_ext_stream_benchmark If true, and this extension is included at compile-time, run the STREAM-like benchmark extension.
* @param runLatency Indicates latency benchmarks should be run.
* @param runThroughput Indicates throughput benchmarks should be run.
* @param working_set_size_per_thread The total size of memory to test in all benchmarks, in bytes, per thread. This MUST be a multiple of 4KB pages.
* @param num_worker_threads The number of threads to use in throughput benchmarks, loaded latency benchmarks, and stress tests.
* @param use_chunk_32b If true, include 32-bit chunks for relevant benchmarks.
* @param use_chunk_64b If true, include 64-bit chunks for relevant benchmarks.
* @param use_chunk_128b If true, include 128-bit chunks for relevant benchmarks.
* @param use_chunk_256b If true, include 256-bit chunks for relevant benchmarks.
* @param numa_enable If true, then test all combinations of CPU/memory NUMA nodes.
* @param iterations_per_test For each unique benchmark test, this is the number of times to repeat it.
* @param use_random_access_pattern If true, use random-access patterns in throughput benchmarks.
* @param use_sequential_access_pattern If true, use sequential-access patterns in throughput benchmarks.
* @param starting_test_index Numerical index to use for the first test. This is an aid for end-user interpreting and post-processing of result CSV file, if relevant.
* @param filename Output filename to use.
* @param use_output_file If true, use the provided output filename.
* @param verbose If true, then X-Mem should be more verbose in its console reporting.
* @param use_large_pages If true, then X-Mem will attempt to force usage of large pages.
* @param use_reads If true, then throughput benchmarks should use reads.
* @param use_writes If true, then throughput benchmarks should use writes.
* @param use_stride_p1 If true, include stride of +1 for relevant benchmarks.
* @param use_stride_n1 If true, include stride of -1 for relevant benchmarks.
* @param use_stride_p2 If true, include stride of +2 for relevant benchmarks.
* @param use_stride_n2 If true, include stride of -2 for relevant benchmarks.
* @param use_stride_p4 If true, include stride of +4 for relevant benchmarks.
* @param use_stride_n4 If true, include stride of -4 for relevant benchmarks.
* @param use_stride_p8 If true, include stride of +8 for relevant benchmarks.
* @param use_stride_n8 If true, include stride of -8 for relevant benchmarks.
* @param use_stride_p16 If true, include stride of +16 for relevant benchmarks.
* @param use_stride_n16 If true, include stride of -16 for relevant benchmarks.
*/
/*Configurator(
bool runExtensions,
#ifdef EXT_DELAY_INJECTED_LATENCY_BENCHMARK
bool run_ext_delay_injected_loaded_latency_benchmark,
#endif
#ifdef EXT_STREAM_BENCHMARK
bool run_ext_stream_benchmark,
#endif
bool runLatency,
bool runThroughput,
size_t working_set_size_per_thread,
uint32_t num_worker_threads,
bool use_chunk_32b,
bool use_chunk_64b,
bool use_chunk_128b,
bool use_chunk_256b,
bool numa_enable,
uint32_t iterations_per_test,
bool use_random_access_pattern,
bool use_sequential_access_pattern,
uint32_t starting_test_index,
std::string filename,
bool use_output_file,
bool verbose,
bool use_large_pages,
bool use_reads,
bool use_writes,
bool use_stride_p1,
bool use_stride_n1,
bool use_stride_p2,
bool use_stride_n2,
bool use_stride_p4,
bool use_stride_n4,
bool use_stride_p8,
bool use_stride_n8,
bool use_stride_p16,
bool use_stride_n16
);*/
/**
* @brief Configures the tool based on user's command-line inputs.
* @param argc The argc from main().

Просмотреть файл

@ -49,7 +49,7 @@
namespace xmem {
#define VERSION "2.1.16"
#define VERSION "2.2.0"
#if !defined(_WIN32) && !defined(__gnu_linux__)
#error Neither Windows/GNULinux build environments were detected!

Просмотреть файл

@ -30,6 +30,9 @@
#ifndef __DELAY_INJECTED_BENCHMARK_KERNELS_H
#define __DELAY_INJECTED_BENCHMARK_KERNELS_H
//Headers
#include <common.h>
//Libraries
#include <cstdint>
#if defined(_WIN32) && defined(ARCH_INTEL_AVX)
@ -41,11 +44,11 @@
//Helper macros for inserting delays with nops.
#ifdef _WIN32
#define my_nop() __nop()
#define my_nop() __nop() //Works on both x86 family and ARM family as-is
#endif
#ifdef __gnu_linux__
#define my_nop() asm("nop")
#define my_nop() asm("nop") //Works on both x86 family and ARM family as-is
#endif
#define my_nop2() my_nop(); my_nop()
@ -68,7 +71,88 @@ namespace xmem {
***********************************************************************/
/* -------------------- DUMMY BENCHMARK ROUTINES -------------------------- */
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay1(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay2(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay4(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay8(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay16(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay32(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay64(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay128(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay256(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 32 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word32_Delay512plus(void* start_address, void* end_address);
#ifdef HAS_WORD_64
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
@ -140,9 +224,77 @@ namespace xmem {
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word64_Delay256plus(void* start_address, void* end_address);
#endif
#ifdef HAS_WORD_128
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay1(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay2(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay4(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay8(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay16(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay32(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay64(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 128 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word128_Delay128plus(void* start_address, void* end_address);
#endif
#ifdef HAS_WORD_256
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -150,7 +302,7 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay1(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -158,7 +310,7 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay2(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -166,7 +318,7 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay4(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -174,7 +326,7 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay8(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -182,7 +334,7 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay16(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
@ -190,15 +342,105 @@ namespace xmem {
int32_t dummy_forwSequentialLoop_Word256_Delay32(void* start_address, void* end_address);
/**
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 64 loops except for the memory access and delays themselves.
* @brief Used for measuring the time spent doing everything in delay-injected forward sequential Word 256 loops except for the memory access and delays themselves.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t dummy_forwSequentialLoop_Word256_Delay64plus(void* start_address, void* end_address);
#endif
/* --------------------- CORE BENCHMARK ROUTINES --------------------------- */
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 1 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay1(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 2 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay2(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 4 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay4(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 8 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay8(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 16 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay16(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 32 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay32(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 64 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay64(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 128 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay128(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 256 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay256(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 512 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay512(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 32-bit chunks. 1024 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word32_Delay1024(void* start_address, void* end_address);
#ifdef HAS_WORD_64
/**
* @brief Walks over the allocated memory forward sequentially, reading in 64-bit chunks. 1 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
@ -286,7 +528,99 @@ namespace xmem {
* @returns Undefined.
*/
int32_t forwSequentialRead_Word64_Delay1024(void* start_address, void* end_address);
#endif
#ifdef HAS_WORD_128
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 1 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay1(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 2 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay2(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 4 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay4(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 8 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay8(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 16 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay16(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 32 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay32(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 64 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay64(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 128 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay128(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 256 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay256(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 512 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay512(void* start_address, void* end_address);
/**
* @brief Walks over the allocated memory forward sequentially, reading in 128-bit chunks. 1024 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
* @param end_address The end of the memory region of interest.
* @returns Undefined.
*/
int32_t forwSequentialRead_Word128_Delay1024(void* start_address, void* end_address);
#endif
#ifdef HAS_WORD_256
/**
* @brief Walks over the allocated memory forward sequentially, reading in 64-bit chunks. 1 delays (nops) are inserted between memory instructions.
* @param start_address The beginning of the memory region of interest.
@ -374,6 +708,7 @@ namespace xmem {
* @returns Undefined.
*/
int32_t forwSequentialRead_Word256_Delay1024(void* start_address, void* end_address);
#endif
};
#endif