Yahoo! Cloud Serving Benchmark

Version 0.1.2


Home - Core workloads - Tips and FAQ

Implementing new workloads - overview

A workload represents the load that a given application will put on the database system. For benchmarking purposes, we must define workloads that are relatively simple compared to real applications, so that we can better reason about the benchmarking results we get. However, a workload should be detailed enough so that once we measure the database's performance, we know what kinds of applications might experience similar performance.

In the context of YCSB, a workload defines both a data set, which is a set of records to be loaded into the database, and a transaction set, which are the set of read and write operations against the database. Creating the transactions requires understanding the structure of the records, which is why both the data and the transactions must be defined in the workload.

For a complete benchmark, multiple important (but distinct) workloads might be grouped together into a workload package. The CoreWorkload package included with the YCSB client is an example of such a collection of workloads.

Typically a workload consists of two files:

For example, a workload class file might generate some combination of read and update operations against the database. The parameter file might specify whether the mix of reads and updates is 50/50, 80/20, etc.

There are two ways to create a new workload or package of workloads.

Option 1: new parameter files

The core workloads included with YCSB are defined by a set of parameter files (workloada, workloadb, etc.) You can create your own parameter file with new values for the read/write mix, request distribution, etc. For example, the workloada file has the following contents:

workload=site.ycsb.workloads.CoreWorkload

readallfields=false

readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0

requestdistribution=zipfian
Creating a new file that changes any of these values will produce a new workload with different characteristics. The set of properties that can be specified is here.

Option 2: new java class

The workload java class will be created by the YCSB Client at runtime, and will use an instance of the DB interface layer to generate the actual operations against the database. Thus, the java class only needs to decide (based on settings in the parameter file) what records to create for the data set, and what reads, updates etc. to generate for the transaction phase. The YCSB Client will take care of creating the workload java class, passing it to a worker thread for executing, deciding how many records to create or how many operations to execute, and measuring the resulting performance.

If the CoreWorkload (or some other existing package) does not have the ability to generate the workload you desire, you can create a new workload java class. This is done using the following steps:

Step 1. Extend site.ycsb.Workload

The base class of all workload classes is site.ycsb.Workload. This is an abstract class, so you create a new workload that extends this base class. Your class must have a public no-argument constructor, because the workload will be created in a factory using the no-argument constructor. The YCSB Client will create one Workload object for each worker thread, so if you run the Client with multiple threads, multiple workload objects will be created.

Step 2. Write code to initialize your workload class

The parameter fill will be passed to the workload object after the constructor has been called, so if you are using any parameter properties, you must use them to initialize your workload using either the init() or initThread() methods. In either case, you can access the parameter properties using the Properties object passed in to both methods. These properties will include all properties defined in any property file passed to the YCSB Client or defined on the client command line.

Step 3. Write any cleanup code

The cleanup() method is called once for all workload instances, after the workload has completed.

Step 4. Define the records to be inserted

The YCSB Client will call the doInsert() method once for each record to be inserted into the database. So you should implement this method to create and insert a single record. The DB object you can use to perform the insert will be passed to the doInsert() method.

Step 5. Define the transactions

The YCSB Client will call the doTransaction() method once for every transaction that is to be executed. So you should implement this method to execute a single transaction, using the DB object passed in to access the database. Your implementation of this method can choose between different types of transactions, and can make multiple calls to the DB interface layer. However, each invocation of the method should be a logical transaction. In particular, when you run the client, you'll specify the number of operations to execute; if you request 1000 operations then doTransaction() will be executed 1000 times.

Note that you do not have to do any throttling of your transactions (or record insertions) to achieve the target throughput. The YCSB Client will do the throttling for you.

Note also that it is allowable to insert records inside the doTransaction() method. You might do this if you wish the database to grow during the workload. In this case, the initial dataset will be constructed using calls to the doInsert() method, while additional records would be inserted using calls to the doTransaction() method.

Step 6 - Measure latency, if necessary

The YCSB client will automatically measure the latency and throughput of database operations, even for workloads that you define. However, the client will only measure the latency of individual calls to the database, not of more complex transactions. Consider for example a workload that reads a record, modifies it, and writes the changes back to the database. The YCSB client will automatically measure the latency of the read operation to the database; and separately will automatically measure the latency of the update operation. However, if you would like to measure the latency of the entire read-modify-write transaction, you will need to add an additional timing step to your code.

Measurements are gathered using the Measurements.measure() call. There is a singleton instance of Measurements, which can be obtained using the Measurements.getMeasurements() static method. For each metric you are measuring, you need to assign a string tag; this tag will label the resulting average, min, max, histogram etc. measurements output by the tool at the end of the workload. For example, consider the following code:

long st=System.currentTimeMillis();
db.read(TABLENAME,keyname,fields,new HashMap());
db.update(TABLENAME,keyname,values);
long en=System.currentTimeMillis();
Measurements.getMeasurements().measure("READ-MODIFY-WRITE", (int)(en-st));
In this code, the calls to System.currentTimeMillis() are used to time the read and write transaction. Then, the call to measure() reports the latency to the measurement component.

Using this pattern, your custom measurements will be gathered and aggregated using the same mechanism that is used to gather measurements for individual READ, UPDATE etc. operations.

Step 7 - Use it with the YCSB Client

Make sure that the classes for your implementation (or a jar containing those classes) are available on your CLASSPATH, as well as any libraries/jar files used by your implementation. Now, when you run the YCSB Client, specify the "workload" property to provide the fully qualified classname of your DB class. For example:
workload=com.foo.YourWorkloadClass

YCSB - Yahoo! Research - Contact cooperb@yahoo-inc.com.