diff --git a/doc/index.html b/doc/index.html index 936790ed..a2ef77a3 100644 --- a/doc/index.html +++ b/doc/index.html @@ -11,8 +11,7 @@
- % cd ycsb - % ant --If your build was successful, you should be able to run the Client to get the usage message: -
- % java -cp build/ycsb.jar com.yahoo.ycsb.Client -Usage: java com.yahoo.ycsb.Client [options] -Options: - -threads n: execute using n threads (default: 1) - can also be specified as the - "threadcount" property using -p - -target n: attempt to do n operations per second (default: unlimited) - can also - be specified as the "target" property using -p - -load: run the loading phase of the workload - -t: run the transactions phase of the workload (default) - -db dbname: specify the name of the DB to use (default: com.yahoo.ycsb.BasicDB) - - can also be specified as the "db" property using -p - -P propertyfile: load properties from the given file. Multiple files can - be specified, and will be processed in the order specified - -p name=value: specify a property to be passed to the DB and workloads; - multiple properties can be specified, and override any - values in the propertyfile - -s: show status during run (default: no status) - -l label: use label for status (e.g. to label one experiment out of a whole batch) - -Required properties: - recordcount: how many records in the database - operationcount: how many transactions to execute - workload: the name of the workload class to use (e.g. com.yahoo.ycsb.workloads.CoreWorkload) --
-You must also create or set up tables/keyspaces/storage buckets to store records. The details vary according to each database system, and depend on the workload you wish -to run. Before the YCSB Client runs, the tables must be created, since the Client itself will not request to create the tables. This is because for some systems, there -is a manual (human-operated) step to create tables, and for other systems, the table must be created before the database cluster is started. -
-The tables that must be created depends on the workload. For CoreWorkload, the YCSB Client will assume that there is a "table" called "usertable" with a -flexible schema: columns can be added at runtime as desired. This "usertable" can be mapped into whatever storage container is appropriate. For example, in MySQL you would "CREATE TABLE," in -Cassandra you would define a keyspace in the Cassandra configuration, and so on. The database interface layer (described in step 2) will receive requests for -reading or writing records in "usertable" and translate them into requests for the actual storage you have allocated. This may mean that you have to provide -information for the database interface layer to help it understand what the structure of the underlying storage is. For example, in Cassandra, you must define -"column families" in addition to keyspaces. Thus, it is necessary to create a column family and give the family some name (for example, you might use "values.") Then, -the database access layer will need to know to refer to the "values" column family, either because the string "values" is passed in as a property, or because it is -hardcoded in the database interface layer. -
-The YCSB Client is distributed with a simple dummy interface layer, com.yahoo.ycsb.BasicDB. This layer just prints the operations it would have executed to System.out. It can be useful for ensuring -that the client is operating properly, and for debugging your workloads. -
-Other sample DB interface layer classes are distributed in src/com/yahoo/ycsb/db. To build those classes, run: -
-% ant dbcompile --Note that these classes will not be built using a normal ant execution, and not included in the resulting ycsb.jar. Thus, to use these classes, you will need -to 1. have them on your classpath, and 2. have any required libraries also on your classpath. For example, the Cassandra database interface layer is com.yahoo.ycsb.db.CassandraClient, and requires the libthrift.jar -to be accessible on your classpath. -
-For more details about implementing a DB interface layer, see here. -
-You specify both the java class and the parameter file on the command line when you run the YCSB Client. The Client will dynamically load your workload class, pass -it the properties from the parameters file (and any additional properties specified on the command line) and then execute the workload. This happens both for the loading and transaction phases, -as the same properties and workload logic applies to both. For example, if the loading phase creates records with 10 fields, then the transaction phase must know that there are 10 fields -it can query and modify. -
-The CoreWorkload is a package of standard workloads that is distributed with the YCSB and can be used directly. In particular, the CoreWorkload defines a simple mix of read/insert/update/scan operations. The relative -frequency of each operation is defined in the parameter file, as are other properties of the workload. Thus, by changing the parameter file, a variety of different concrete workloads can be executed. For more details -on the CoreWorkload, see here. -
-If the CoreWorkload does not satisfy your needs, you can define your own workload by subclassing the com.yahoo.ycsb.Workload class. Details for doing this are here. - -
-For example, consider the benchmark workload A (more details about the standard workloads are XXX here XXX). To load the standard dataset: - -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.BasicDB -P workloads/workloada --A few notes about this command: -
-The standard workload paramter files create very small databases; for example, workloada creates only 1,000 records. This is useful while debugging your setup. However, -to run an actual benchmark you'll want to generate a much larger database. For example, imagine you want to load 100 million records. Then, you will need to -override the default "recordcount" property in the workloada file. This can be done in one of two ways: -
-recordcount=100000000 --Then, run the client as follows: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat --The client will load both property files, but will use the value of recordcount from the last file it loaded, e.g. large.dat -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.BasicDB -P workloads/workloada -p recordcount=100000000 --
-Because a large database load will take a long time, you may wish to 1. require the Client to produce status, and 2. direct any output to a datafile. Thus, you might -execute the following to load your database: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat -s > load.dat --The "-s" parameter will require the Client to produce status report on System.err. Thus, the output of this command might be: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat -s > load.dat -Loading workload... (might take a few minutes in some cases for large data sets) -Starting test. - 0 sec: 0 operations - 10 sec: 61731 operations; 6170.6317473010795 operations/sec - 20 sec: 129054 operations; 6450.76477056883 operations/sec -... --This status output will help you to see how quickly load operations are executing (so you can estimate the completion time of the load) as well as verify that the -load is making progress. -
-When the load completes, the Client will report statistics about the performance of the load. These statistics are the same as in the transaction phase, so see below for information on -interpreting those statistics. -
-To execute the workload, you can use the following command: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat -s > transactions.dat --The main difference in this invocation is that we used the "-t" parameter to tell the Client to use the transaction section instead of the data section. -If you used BasicDB, and examine the resulting transactions.dat file, you will see a combination of read and update requests, as well as statistics about the execution. -
-Typically you will want to use the "-threads" and "-target" parameters to control the amount of offered load. For example, we might want 10 threads attempting a total of 100 operations per second (e.g. 10 operations/sec per thread.) As long -as the average latency of operations is not above 100 ms, each thread will be able to carry out its intended 10 operations per second. In general, you need enough threads so that no thread is attempting more operations per second than is possible, otherwise -your achieved throughput will be less than the specified target throughput. For this example, we can execute: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat -s -threads 10 -target 100 > transactions.dat --Note in this example we have used the "-threads 10" command line parameter to specify 10 threads, and "-target 100" command line parameter to specify 100 operations per second as the target. -Alternatively, both values can be set in your parameters file using the "threads" and "target" properties respectively. For example: -
-threads=10 -target=100 --
- -At the end of the run, the Client will report performance statistics on System.out. In the above example, these statistics will be written to the transactions.dat file. -The default is to produce average, min, max, 95th and 99th percentile latency for each operation type (read, update, etc.), a count of the return codes for each operation, and a histogram of latencies for each -operation. The return codes are defined by your database interface layer, and allow you to see if there were any errors during the workload. For the above example, we might get output like: -
-[OVERALL],RunTime(ms), 10110 -[OVERALL],Throughput(ops/sec), 98.91196834817013 -[UPDATE], Operations, 491 -[UPDATE], AverageLatency(ms), 0.054989816700611 -[UPDATE], MinLatency(ms), 0 -[UPDATE], MaxLatency(ms), 1 -[UPDATE], 95thPercentileLatency(ms), 1 -[UPDATE], 99thPercentileLatency(ms), 1 -[UPDATE], Return=0, 491 -[UPDATE], 0, 464 -[UPDATE], 1, 27 -[UPDATE], 2, 0 -[UPDATE], 3, 0 -[UPDATE], 4, 0 -... --This output indicates: -
-While a histogram of latencies is often useful, sometimes a timeseries is more useful. To request a time series, specify the "measurementtype=timeseries" property on the Client command line -or in a properties file. By default, the Client will report average latency for each interval of 1000 milliseconds. You can specify a different granularity for reporting -using the "timeseries.granularity" property. For example: -
-% java -cp build/ycsb.jar com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.BasicDB -P workloads/workloada -P large.dat -s -threads 10 -target 100 -p measurementtype=timeseries -p timeseries.granularity=2000 > transactions.dat --will report a timeseries, with readings averaged every 2,000 milliseconds (e.g. 2 seconds). The result will be: -
-[OVERALL],RunTime(ms), 10077 -[OVERALL],Throughput(ops/sec), 9923.58836955443 -[UPDATE], Operations, 50396 -[UPDATE], AverageLatency(ms), 0.04339630129375347 -[UPDATE], MinLatency(ms), 0 -[UPDATE], MaxLatency(ms), 338 -[UPDATE], Return=0, 50396 -[UPDATE], 0, 0.10264765784114054 -[UPDATE], 2000, 0.026989343690867442 -[UPDATE], 4000, 0.0352882703777336 -[UPDATE], 6000, 0.004238958990536277 -[UPDATE], 8000, 0.052813085033008175 -[UPDATE], 10000, 0.0 -[READ], Operations, 49604 -[READ], AverageLatency(ms), 0.038242883638416256 -[READ], MinLatency(ms), 0 -[READ], MaxLatency(ms), 230 -[READ], Return=0, 49604 -[READ], 0, 0.08997245741099663 -[READ], 2000, 0.02207505518763797 -[READ], 4000, 0.03188493260913297 -[READ], 6000, 0.004869141813755326 -[READ], 8000, 0.04355329949238579 -[READ], 10000, 0.005405405405405406 --This output shows separate time series for update and read operations, with data reported every 2000 milliseconds. The data reported for a time point is the average over the previous 2000 milliseconds only. -(In this case we used 100,000 operations and a target of 10,000 operations per second for a more interesting output.) -A note about latency measurements: the Client measures the end to end latency of executing a particular operation against the database. That is, it starts a timer before calling the appropriate method in the DB interface layer -class, and stops the timer when the method returns. Thus latencies include: executing inside the interface layer, network latency to the database server, and database execution time. They do not include delays introduced for throttling -the target throughput. That is, if you specify a target of 10 operations per second (and a single thread) then the Client will only execute an operation every 100 milliseconds. If the operation takes 12 milliseconds, then the client -will wait for an additional 88 milliseconds before trying the next operation. However, the reported latency will not include this wait time; a latency of 12 milliseconds, not 100, will be reported. + +