YCSB/hbase1/README.md

<!--
Copyright (c) 2015-2017 YCSB contributors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you
may not use this file except in compliance with the License. You
may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the License. See accompanying
LICENSE file.
-->

# HBase (1.y) Driver for YCSB
This driver is a binding for the YCSB facilities to operate against a HBase 1 cluster, using a shaded client that tries to avoid leaking third party libraries.

# Testing HBase
## 1. Start a HBase Server
You need to start a single node or a cluster to point the client at. Please see [Apache HBase Reference Guide](http://hbase.apache.org/book.html) for more details and instructions.

## 2. Set up YCSB

Download the [latest YCSB](https://github.com/brianfrankcooper/YCSB/releases/latest) file. Follow the instructions.

## 3. Create a HBase table for testing

For best results, use the pre-splitting strategy recommended in [HBASE-4163](https://issues.apache.org/jira/browse/HBASE-4163):

```
hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)
hbase(main):002:0> create 'usertable', 'family', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
```

*Failing to do so will cause all writes to initially target a single region server*.

## 4. Run the Workload
Before you can actually run the workload, you need to "load" the data first.

You should specify a HBase config directory(or any other directory containing your hbase-site.xml) and a table name and a column family(-cp is used to set java classpath and -p is used to set various properties).

```
bin/ycsb load hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family
```

Then, you can run the workload:

```
bin/ycsb run hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family
```

Please see the general instructions in the `doc` folder if you are not sure how it all works. You can apply additional properties (as seen in the next section) like this:

```
bin/ycsb run hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family -p clientbuffering=true
```

## Configuration Options
Following options can be configurable using `-p`.

* `columnfamily`: The HBase column family to target.
* `debug` : If true, debugging logs are activated. The default is false.
* `hbase.usepagefilter` : If true, HBase
  [PageFilter](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html)s
  are used to limit the number of records consumed in a scan operation. The default is true.
* `principal`: If testing need to be done against a secure HBase cluster using Kerberos Keytab,
  this property can be used to pass the principal in the keytab file.
* `keytab`: The Kerberos keytab file name and location can be passed through this property.
* `clientbuffering`: Whether or not to use client side buffering and batching of write operations. This can significantly improve performance and defaults to true.
* `writebuffersize`: The maximum amount, in bytes, of data to buffer on the client side before a flush is forced. The default is 12MB. Only used when `clientbuffering` is true.
* `durability`: Whether or not writes should be appended to the WAL. Bypassing the WAL can improve throughput but data cannot be recovered in the event of a crash. The default is true.

Additional HBase settings should be provided in the `hbase-site.xml` file located in your `/HBASE-HOME-DIR/conf` directory. Typically this will be `/etc/hbase/conf`.

## Bigtable

Google's Bigtable service provides an implementation of the HBase API for migrating existing applications. Users can perform load tests against Bigtable using this binding.

### 1. Setup a Bigtable Cluster

Login to the Google Cloud Console and follow the [Creating Cluster](https://cloud.google.com/bigtable/docs/creating-cluster) steps. Make a note of your cluster name, zone and project ID.

### 2. Launch the Bigtable Shell

From the Cloud Console, launch a shell and follow the [Quickstart](https://cloud.google.com/bigtable/docs/quickstart) up to step 4 where you launch the HBase shell.

### 3. Create a Table

For best results, use the pre-splitting strategy recommended in [HBASE-4163](https://issues.apache.org/jira/browse/HBASE-4163):

```
hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)
hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
```

Make a note of the column family, in this example it's `cf``.

### 4. Download the Bigtable Client Jar with required dependencies:

```
mvn -N dependency:copy -Dartifact=com.google.cloud.bigtable:bigtable-hbase-1.x-hadoop:1.0.0 -DoutputDirectory=target/bigtable-deps
mvn -N dependency:copy -Dartifact=io.dropwizard.metrics:metrics-core:3.1.2 -DoutputDirectory=target/bigtable-deps
```

Download the latest `bigtable-hbase-1.x-hadoop` jar from [Maven](http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22com.google.cloud.bigtable%22%20AND%20a%3A%22bigtable-hbase-1.x-hadoop%22) to your host.

### 5. Download JSON Credentials

Follow these instructions for [Generating a JSON key](https://cloud.google.com/bigtable/docs/installing-hbase-shell#service-account) and save it to your host.

### 6. Create or Edit hbase-site.xml

If you have an existing HBase configuration directory with an `hbase-site.xml` file, edit the file as per below. If not, create a directory called `conf` under the `hbase10` directory. Create a file in the conf directory named `hbase-site.xml`. Provide the following settings in the XML file, making sure to replace the bracketed examples with the proper values from your Cloud console.

```
<configuration>
  <property>
    <name>hbase.client.connection.impl</name>
    <value>com.google.cloud.bigtable.hbase1_x.BigtableConnection</value>
  </property>
  <property>
    <name>google.bigtable.project.id</name>
    <value>[YOUR-PROJECT-ID]</value>
  </property>
  <property>
    <name>google.bigtable.instance.id</name>
    <value>[YOUR-INSTANCE-ID]</value>
  </property>
  <property>
    <name>google.bigtable.auth.service.account.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>google.bigtable.auth.json.keyfile</name>
    <value>[PATH-TO-YOUR-KEY-FILE]</value>
  </property>
</configuration>
```

If you have an existing HBase config directory, make sure to add it to the class path via `-cp <PATH_TO_BIGTABLE_JAR>:<CONF_DIR>`.

### 7. Execute a Workload

Switch to the root of the YCSB repo and choose the workload you want to run and `load` it first. With the CLI you must provide the column family, cluster properties and the ALPN jar to load.

```
bin/ycsb load hbase1 -p columnfamily=cf -cp 'target/bigtable-deps/*' -P workloads/workloada

```

The `load` step only executes inserts into the datastore. After loading data, run the same workload to mix reads with writes.

```
bin/ycsb run hbase1 -p columnfamily=cf -cp 'target/bigtable-deps/* -P workloads/workloada

```
[hbase] Separate HBase v0.9x and v1.x bindings hbase/ now contains the HBase v0.9 binding code. hbase1/ now contains the HBase 1.0 binding code. 2015-08-19 23:25:18 +03:00			`<!--`
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`Copyright (c) 2015-2017 YCSB contributors. All rights reserved.`
[hbase] Separate HBase v0.9x and v1.x bindings hbase/ now contains the HBase v0.9 binding code. hbase1/ now contains the HBase 1.0 binding code. 2015-08-19 23:25:18 +03:00
			`Licensed under the Apache License, Version 2.0 (the "License"); you`
			`may not use this file except in compliance with the License. You`
			`may obtain a copy of the License at`

			`http://www.apache.org/licenses/LICENSE-2.0`

			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS,`
			`WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or`
			`implied. See the License for the specific language governing`
			`permissions and limitations under the License. See accompanying`
			`LICENSE file.`
			`-->`

[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`# HBase (1.y) Driver for YCSB`
			`This driver is a binding for the YCSB facilities to operate against a HBase 1 cluster, using a shaded client that tries to avoid leaking third party libraries.`
[hbase] Separate HBase v0.9x and v1.x bindings hbase/ now contains the HBase v0.9 binding code. hbase1/ now contains the HBase 1.0 binding code. 2015-08-19 23:25:18 +03:00
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`# Testing HBase`
			`## 1. Start a HBase Server`
			`You need to start a single node or a cluster to point the client at. Please see [Apache HBase Reference Guide](http://hbase.apache.org/book.html) for more details and instructions.`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`## 2. Set up YCSB`

			`Download the [latest YCSB](https://github.com/brianfrankcooper/YCSB/releases/latest) file. Follow the instructions.`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`## 3. Create a HBase table for testing`

			`For best results, use the pre-splitting strategy recommended in [HBASE-4163](https://issues.apache.org/jira/browse/HBASE-4163):`

			```
			`hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)`
			`hbase(main):002:0> create 'usertable', 'family', {SPLITS => (1..n_splits).map {\|i\| "user#{1000+i*(9999-1000)/n_splits}"}}`
			```

			`Failing to do so will cause all writes to initially target a single region server.`

			`## 4. Run the Workload`
			`Before you can actually run the workload, you need to "load" the data first.`

			`You should specify a HBase config directory(or any other directory containing your hbase-site.xml) and a table name and a column family(-cp is used to set java classpath and -p is used to set various properties).`

			```
			`bin/ycsb load hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family`
			```

			`Then, you can run the workload:`

			```
			`bin/ycsb run hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family`
			```

			Please see the general instructions in the `doc` folder if you are not sure how it all works. You can apply additional properties (as seen in the next section) like this:

			```
			`bin/ycsb run hbase1 -P workloads/workloada -cp /HBASE-HOME-DIR/conf -p table=usertable -p columnfamily=family -p clientbuffering=true`
			```

			`## Configuration Options`
			Following options can be configurable using `-p`.

			* `columnfamily`: The HBase column family to target.
			* `debug` : If true, debugging logs are activated. The default is false.
			* `hbase.usepagefilter` : If true, HBase
			`[PageFilter](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html)s`
			`are used to limit the number of records consumed in a scan operation. The default is true.`
			* `principal`: If testing need to be done against a secure HBase cluster using Kerberos Keytab,
			`this property can be used to pass the principal in the keytab file.`
			* `keytab`: The Kerberos keytab file name and location can be passed through this property.
			* `clientbuffering`: Whether or not to use client side buffering and batching of write operations. This can significantly improve performance and defaults to true.
			* `writebuffersize`: The maximum amount, in bytes, of data to buffer on the client side before a flush is forced. The default is 12MB. Only used when `clientbuffering` is true.
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00			* `durability`: Whether or not writes should be appended to the WAL. Bypassing the WAL can improve throughput but data cannot be recovered in the event of a crash. The default is true.

[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			Additional HBase settings should be provided in the `hbase-site.xml` file located in your `/HBASE-HOME-DIR/conf` directory. Typically this will be `/etc/hbase/conf`.

[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00			`## Bigtable`

			`Google's Bigtable service provides an implementation of the HBase API for migrating existing applications. Users can perform load tests against Bigtable using this binding.`

			`### 1. Setup a Bigtable Cluster`

			`Login to the Google Cloud Console and follow the [Creating Cluster](https://cloud.google.com/bigtable/docs/creating-cluster) steps. Make a note of your cluster name, zone and project ID.`

			`### 2. Launch the Bigtable Shell`

			`From the Cloud Console, launch a shell and follow the [Quickstart](https://cloud.google.com/bigtable/docs/quickstart) up to step 4 where you launch the HBase shell.`

			`### 3. Create a Table`

			`For best results, use the pre-splitting strategy recommended in [HBASE-4163](https://issues.apache.org/jira/browse/HBASE-4163):`

			```
			`hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)`
			`hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {\|i\| "user#{1000+i*(9999-1000)/n_splits}"}}`
			```

			Make a note of the column family, in this example it's `cf``.

[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`### 4. Download the Bigtable Client Jar with required dependencies:`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			```
			`mvn -N dependency:copy -Dartifact=com.google.cloud.bigtable:bigtable-hbase-1.x-hadoop:1.0.0 -DoutputDirectory=target/bigtable-deps`
			`mvn -N dependency:copy -Dartifact=io.dropwizard.metrics:metrics-core:3.1.2 -DoutputDirectory=target/bigtable-deps`
			```
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			Download the latest `bigtable-hbase-1.x-hadoop` jar from [Maven](http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22com.google.cloud.bigtable%22%20AND%20a%3A%22bigtable-hbase-1.x-hadoop%22) to your host.
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`### 5. Download JSON Credentials`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
			`Follow these instructions for [Generating a JSON key](https://cloud.google.com/bigtable/docs/installing-hbase-shell#service-account) and save it to your host.`

[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`### 6. Create or Edit hbase-site.xml`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
			If you have an existing HBase configuration directory with an `hbase-site.xml` file, edit the file as per below. If not, create a directory called `conf` under the `hbase10` directory. Create a file in the conf directory named `hbase-site.xml`. Provide the following settings in the XML file, making sure to replace the bracketed examples with the proper values from your Cloud console.

			```
			`<configuration>`
			`<property>`
			`<name>hbase.client.connection.impl</name>`
[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`<value>com.google.cloud.bigtable.hbase1_x.BigtableConnection</value>`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00			`</property>`
			`<property>`
			`<name>google.bigtable.project.id</name>`
			`<value>[YOUR-PROJECT-ID]</value>`
			`</property>`
			`<property>`
[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`<name>google.bigtable.instance.id</name>`
			`<value>[YOUR-INSTANCE-ID]</value>`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00			`</property>`
			`<property>`
			`<name>google.bigtable.auth.service.account.enable</name>`
			`<value>true</value>`
			`</property>`
			`<property>`
			`<name>google.bigtable.auth.json.keyfile</name>`
			`<value>[PATH-TO-YOUR-KEY-FILE]</value>`
			`</property>`
			`</configuration>`
			```

			If you have an existing HBase config directory, make sure to add it to the class path via `-cp <PATH_TO_BIGTABLE_JAR>:<CONF_DIR>`.

[bigtable] update bigtable to 1.0 (#1070) 2018-05-12 02:07:39 +03:00			`### 7. Execute a Workload`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
			Switch to the root of the YCSB repo and choose the workload you want to run and `load` it first. With the CLI you must provide the column family, cluster properties and the ALPN jar to load.

			```
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`bin/ycsb load hbase1 -p columnfamily=cf -cp 'target/bigtable-deps/*' -P workloads/workloada`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
			```

			The `load` step only executes inserts into the datastore. After loading data, run the same workload to mix reads with writes.

			```
[hbase] update HBase bindings for eom (#1396) * remove 0.98, 1.0, 1.2, and 2.0 bindings * change the 1.4 binding to be a "HBase 1.y" binding * add "HBase 2.y" binding and default it to the current 2.2 release * incorporate README from 0.98 binding into current bindings * incorporate README on bigtable testing from 1.0 binding into 1.4 binding * incorporate implementation from 1.0 client into current bindings * updated asynchbase binding to include parts of removed bindings it referenced * update maprdb binding for the change in the hbase bindings * update 1.4 and 2.2 to current releases * use shaded client test for all hbase bindings. * make hbase bindings consistently use log4j * fixes #1173 * fixes #1172 2020-02-05 20:53:58 +03:00			`bin/ycsb run hbase1 -p columnfamily=cf -cp 'target/bigtable-deps/* -P workloads/workloada`
[hbase10] Add missing `durability` documentation to readme. Add documentation on how to use the hbase10 client with Google's Bigtable. 2016-04-08 21:50:15 +03:00
[hbase094] remove the hbase 0.94 binding. (#1032) 2017-09-22 15:04:36 +03:00			```