зеркало из https://github.com/Azure/YCSB.git
[scylla] enable token aware LB by default, improve the docs (#1507)
* [scylla] token awareness is enabled by default See https://docs.datastax.com/en/developer/java-driver/3.10/manual/load_balancing/ : > If you don’t explicitly configure the policy, > you get the default, which is a datacenter-aware, > token-aware policy. * [scylla] driver: update to 3.10.2-scylla-1 * [scylla] log used consistency level * [scylla] doc: add latency correction section * [scylla] doc: dump and merge histograms * [scylla] doc: don't configure connections manually https://github.com/scylladb/java-driver/commits/3.10.2-scylla * [scylla] doc: details to sections 1,2,4,5
This commit is contained in:
Родитель
a2d512771a
Коммит
ce3eb9ce51
2
pom.xml
2
pom.xml
|
@ -143,7 +143,7 @@ LICENSE file.
|
|||
<rocksdb.version>6.2.2</rocksdb.version>
|
||||
<s3.version>1.10.20</s3.version>
|
||||
<seaweed.client.version>1.4.1</seaweed.client.version>
|
||||
<scylla.cql.version>3.10.1-scylla-0</scylla.cql.version>
|
||||
<scylla.cql.version>3.10.2-scylla-1</scylla.cql.version>
|
||||
<solr7.version>7.7.2</solr7.version>
|
||||
<tarantool.version>1.6.5</tarantool.version>
|
||||
<thrift.version>0.8.0</thrift.version>
|
||||
|
|
136
scylla/README.md
136
scylla/README.md
|
@ -53,7 +53,6 @@ Create a keyspace, and a table as mentioned above. Load data with:
|
|||
-p readproportion=0 -p updateproportion=0 \
|
||||
-p fieldcount=10 -p fieldlength=128 \
|
||||
-p insertstart=0 -p insertcount=1000000000 \
|
||||
-p cassandra.coreconnections=14 -p cassandra.maxconnections=14 \
|
||||
-p cassandra.username=cassandra -p cassandra.password=cassandra \
|
||||
-p scylla.hosts=ip1,ip2,ip3,...
|
||||
|
||||
|
@ -63,27 +62,32 @@ Use as following:
|
|||
-target 120000 -threads 840 -p recordcount=1000000000 \
|
||||
-p fieldcount=10 -p fieldlength=128 \
|
||||
-p operationcount=50000000 \
|
||||
-p scylla.coreconnections=280 -p scylla.maxconnections=280 \
|
||||
-p scylla.username=cassandra -p scylla.password=cassandra \
|
||||
-p scylla.hosts=ip1,ip2,ip3,... \
|
||||
-p scylla.tokenaware=true
|
||||
-p scylla.hosts=ip1,ip2,ip3,...
|
||||
|
||||
## On choosing meaningful configuration
|
||||
|
||||
### 1. Load target
|
||||
|
||||
You want to test how a database handles load. To get the performance picture
|
||||
you would want to look at the latency distribution and utilization under the
|
||||
constant load. To select load use `-target` to state desired throughput level.
|
||||
Suppose, you want to test how a database handles an OLTP load.
|
||||
|
||||
In this case, to get the performance picture you want to look at the latency
|
||||
distribution and utilization at the sustained throughput that is independent
|
||||
of the processing speed. This kind of system called an open-loop system.
|
||||
Use the `-target` flag to state desired requests arrival rate.
|
||||
|
||||
For example `-target 120000` means that we expect YCSB workers to generate
|
||||
120,000 requests per second (RPS, QPS or TPS) to the database.
|
||||
120,000 requests per second (RPS, QPS or TPS) overall to the database.
|
||||
|
||||
Why is this important? Because without setting target throughput you will be
|
||||
looking only on the system equilibrium point that in the face of constantly
|
||||
varying latency will not allow you to see either throughput, nor latency.
|
||||
Why is this important? First, we want to look at the latency at some sustained
|
||||
throughput target, not visa versa. Second, without a throughput target,
|
||||
the system+loader pair will converge to the closed-loop system that has completely
|
||||
different characteristics than what we wanted to measure. The load will settle
|
||||
at the system equilibrium point. You will be able to find the throughput that will depend
|
||||
on the number of loader threads (workers) but not the latency - only service time.
|
||||
This is not something we expected.
|
||||
|
||||
For more information check out these resources on the coordinated omission problem:
|
||||
For more information check out these resources on the coordinated omission problem.
|
||||
|
||||
See
|
||||
[[1]](http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html)
|
||||
|
@ -92,7 +96,51 @@ See
|
|||
and [[this]](https://www.youtube.com/watch?v=lJ8ydIuPFeU)
|
||||
great talk by Gil Tene.
|
||||
|
||||
### 2. Parallelism factor and threads
|
||||
### 2. Latency correction
|
||||
|
||||
To measure latency, it is not enough to just set a target.
|
||||
The latencies must be measured with the correction as we apply
|
||||
a closed-class loader to the open-class problem. This is what YCSB
|
||||
calls an Intended operation.
|
||||
|
||||
Intended operations have points in time when they were intended to be executed
|
||||
according to the scheduler defined by the load target (--target). We must correct
|
||||
measurement if we did not manage to execute an operation in time.
|
||||
|
||||
The fair measurement consists of the operation latency and its correction
|
||||
to the point of its intended execution. Even if you don’t want to have
|
||||
a completely fair measurement, use “both”:
|
||||
|
||||
-p measurement.interval=both
|
||||
|
||||
Other options are “op” and “intended”. “op” is the default.
|
||||
|
||||
Another flag that affects measurement quality is the type of histogram
|
||||
“-p measurementtype” but for a long time, it uses “hdrhistogram” that
|
||||
must be fine for most use cases.
|
||||
|
||||
### 3. Latency percentiles and multiple loaders
|
||||
|
||||
Latencies percentiles can't be averaged. Don't fall into this trap.
|
||||
Neither averages nor p99 averages do not make any sense.
|
||||
|
||||
If you run a single loader instance look for P99 - 99 percentile.
|
||||
If you run multiple loaders dump result histograms with:
|
||||
|
||||
-p measurement.histogram.verbose=true
|
||||
|
||||
or
|
||||
|
||||
-p hdrhistogram.fileoutput=true
|
||||
-p hdrhistogram.output.path=file.hdr
|
||||
|
||||
merge them manually and extract required percentiles out of the
|
||||
joined result.
|
||||
|
||||
Remember that running multiple workloads may distort original
|
||||
workloads distributions they were intended to produce.
|
||||
|
||||
### 4. Parallelism factor and threads
|
||||
|
||||
Scylla utilizes [thread-per-core](https://www.scylladb.com/product/technology/) architecture design.
|
||||
That means that a Node consists of shards that are mapped to the CPU cores 1-per-core.
|
||||
|
@ -113,41 +161,45 @@ of shards, and the number of nodes in the cluster. For example:
|
|||
|
||||
=>
|
||||
|
||||
threads = K * shards * nodes = K * 14 * nodes
|
||||
threads = K * shards per node * nodes
|
||||
|
||||
for i3.4xlarge where
|
||||
|
||||
- K is parallelism factor >= 1,
|
||||
- K is parallelism factor:
|
||||
|
||||
K >= Target Throughput / QPS per Worker / Shards per node / Nodes / Workers per shard >= 1
|
||||
where
|
||||
Target Throughput = --target
|
||||
QPS per Worker = 1000 [ms/second] / Latency in ms expected at target Percentile
|
||||
Shards per node = vCPU per cluster node - 2
|
||||
Nodes = a number of nodes in the cluster.
|
||||
Workers per shard = Target Throughput / Shards per node / Nodes / QPS per Worker
|
||||
|
||||
- Nodes is number of nodes in the cluster.
|
||||
|
||||
For example for 3 nodes `i3.4xlarge` and `-threads 840` means
|
||||
`K = 20`, `shards = 14`, and `threads = 14 * 20 * 3`.
|
||||
|
||||
Thus, the `K` - the parallelism factor must be selected in the first order. If you
|
||||
don't know what you want out of it start with 1.
|
||||
|
||||
For picking desired parallelism factor it is useful to come from desired `target`
|
||||
parameter. It is better if the `target` is a multiple of `threads`.
|
||||
|
||||
Another concern is that for high throughput scenarios you would probably
|
||||
want to keep shards incoming queues non-empty. For that your parallelism factor
|
||||
must be at least 2.
|
||||
|
||||
### 3. Number of connections
|
||||
### 5. Number of connections
|
||||
|
||||
Both `scylla.coreconnections` and `scylla.maxconnections` define limits
|
||||
per node. When you see `-p scylla.coreconnections=280 -p scylla.maxconnections=280`
|
||||
that means 280 connections per node.
|
||||
If you use original Cassandra drivers you need to pick the proper number
|
||||
of connections per host. Scylla drivers do not require this to be configured
|
||||
and by default create a connection per shard. For example if your node has
|
||||
16 vCPU and thus 14 shards Scylla drivers will pick to create 14 connections
|
||||
per host. An excess of connections may result in degraded latency.
|
||||
|
||||
Number of connections must be a multiple of:
|
||||
Database client protocol is asynchronous and allows queueing requests in
|
||||
a single connection. The default queue limit for local keys is 1024 and 256
|
||||
for remote ones. Current binding implementation do not require this.
|
||||
|
||||
- number of _shards_
|
||||
- parallelism factor `K`
|
||||
Both `scylla.coreconnections` and `scylla.maxconnections` define limits per node.
|
||||
When you see `-p scylla.coreconnections=14 -p scylla.maxconnections=14` that means
|
||||
14 connections per node.
|
||||
|
||||
For example, for `i3.4xlarge` that has 14 shards per node and `K = 20`
|
||||
it makes sense to pick `connections = shards * K = 14 * 20 = 280`.
|
||||
Pick the number of connections per host to be divisible by the number of _shards_.
|
||||
|
||||
### 4. Other considerations
|
||||
### 6. Other considerations
|
||||
|
||||
Consistency levels do not change consistency model or its strongness.
|
||||
Even with `-p scylla.writeconsistencylevel=ONE` the data will be written
|
||||
|
@ -159,16 +211,13 @@ latency picture a bit but would not affect utilization.
|
|||
Remember that you can't measure CPU utilization with Scylla by normal
|
||||
Unix tools. Check out Scylla own metrics to see real reactors utilization.
|
||||
|
||||
Always use [token aware](https://www.scylladb.com/2019/03/27/best-practices-for-scylla-applications/)
|
||||
load balancing `-p scylla.tokenaware=true`.
|
||||
|
||||
For best performance it is crucial to evenly load all available shards.
|
||||
|
||||
### 5. Expected performance target
|
||||
### 7. Expected performance target
|
||||
|
||||
You can expect about 12500 uOPS / core (shard), where uOPS are basic
|
||||
reads and writes operations post replication. Don't forget that usually
|
||||
`Core = 2 * vCPU` for HT systems.
|
||||
`Core = 2 vCPU` for HT systems.
|
||||
|
||||
For example if we insert a row with RF = 3 we can count at least 3 writes -
|
||||
1 write per each replica. That is 1 Transaction = 3 u operations.
|
||||
|
@ -235,12 +284,9 @@ of 3 nodes of i3.4xlarge (16 vCPU per node) and target of 120000 is:
|
|||
* Default is false
|
||||
* https://docs.scylladb.com/using-scylla/tracing/
|
||||
|
||||
- `scylla.tokenaware`
|
||||
- Enable token awareness
|
||||
- Default value is false.
|
||||
|
||||
- `scylla.tokenaware_local_dc`
|
||||
- Restrict Round Robin child policy with the local dc nodes
|
||||
* `scylla.local_dc`
|
||||
- Specify local datacenter for multi-dc setup.
|
||||
- By default uses LOCAL_QUORUM consistency level.
|
||||
- Default value is empty.
|
||||
|
||||
- `scylla.lwt`
|
||||
|
|
|
@ -16,10 +16,10 @@
|
|||
package site.ycsb.db.scylla;
|
||||
|
||||
import com.datastax.driver.core.*;
|
||||
import com.datastax.driver.core.policies.LoadBalancingPolicy;
|
||||
import com.datastax.driver.core.querybuilder.*;
|
||||
import com.datastax.driver.core.policies.TokenAwarePolicy;
|
||||
import com.datastax.driver.core.policies.DCAwareRoundRobinPolicy;
|
||||
import com.datastax.driver.core.policies.LoadBalancingPolicy;
|
||||
import com.datastax.driver.core.policies.TokenAwarePolicy;
|
||||
import com.datastax.driver.core.querybuilder.*;
|
||||
import site.ycsb.ByteArrayByteIterator;
|
||||
import site.ycsb.ByteIterator;
|
||||
import site.ycsb.DB;
|
||||
|
@ -86,9 +86,7 @@ public class ScyllaCQLClient extends DB {
|
|||
|
||||
public static final String SCYLLA_LWT = "scylla.lwt";
|
||||
|
||||
public static final String TOKEN_AWARE = "scylla.tokenaware";
|
||||
public static final String TOKEN_AWARE_DEFAULT = "false";
|
||||
public static final String TOKEN_AWARE_LOCAL_DC = "scylla.tokenaware_local_dc";
|
||||
public static final String TOKEN_AWARE_LOCAL_DC = "scylla.local_dc";
|
||||
|
||||
public static final String TRACING_PROPERTY = "scylla.tracing";
|
||||
public static final String TRACING_PROPERTY_DEFAULT = "false";
|
||||
|
@ -162,17 +160,22 @@ public class ScyllaCQLClient extends DB {
|
|||
.addContactPoints(hosts);
|
||||
}
|
||||
|
||||
if (Boolean.parseBoolean(getProperties().getProperty(TOKEN_AWARE, TOKEN_AWARE_DEFAULT))) {
|
||||
LoadBalancingPolicy child;
|
||||
String localDc = getProperties().getProperty(TOKEN_AWARE_LOCAL_DC);
|
||||
if (localDc != null && !localDc.isEmpty()) {
|
||||
child = DCAwareRoundRobinPolicy.builder().withLocalDc(localDc).build();
|
||||
LOGGER.info("Using shard awareness with local DC: {}\n", localDc);
|
||||
} else {
|
||||
child = DCAwareRoundRobinPolicy.builder().build();
|
||||
LOGGER.info("Using shard awareness\n");
|
||||
final String localDC = getProperties().getProperty(TOKEN_AWARE_LOCAL_DC);
|
||||
if (localDC != null && !localDC.isEmpty()) {
|
||||
final LoadBalancingPolicy local = DCAwareRoundRobinPolicy.builder().withLocalDc(localDC).build();
|
||||
final TokenAwarePolicy tokenAware = new TokenAwarePolicy(local);
|
||||
builder = builder.withLoadBalancingPolicy(tokenAware);
|
||||
|
||||
LOGGER.info("Using local datacenter with token awareness: {}\n", localDC);
|
||||
|
||||
// If was not overridden explicitly, set LOCAL_QUORUM
|
||||
if (getProperties().getProperty(READ_CONSISTENCY_LEVEL_PROPERTY) == null) {
|
||||
readConsistencyLevel = ConsistencyLevel.LOCAL_QUORUM;
|
||||
}
|
||||
|
||||
if (getProperties().getProperty(WRITE_CONSISTENCY_LEVEL_PROPERTY) == null) {
|
||||
writeConsistencyLevel = ConsistencyLevel.LOCAL_QUORUM;
|
||||
}
|
||||
builder = builder.withLoadBalancingPolicy(new TokenAwarePolicy(child));
|
||||
}
|
||||
|
||||
cluster = builder.build();
|
||||
|
@ -224,6 +227,10 @@ public class ScyllaCQLClient extends DB {
|
|||
} else {
|
||||
LOGGER.info("Not using LWT\n");
|
||||
}
|
||||
|
||||
LOGGER.info("Read consistency: {}, Write consistency: {}\n",
|
||||
readConsistencyLevel.name(),
|
||||
writeConsistencyLevel.name());
|
||||
} catch (Exception e) {
|
||||
throw new DBException(e);
|
||||
}
|
||||
|
|
Загрузка…
Ссылка в новой задаче