Merge pull request #219 from kasun04/master

Kafka Compaction sample
This commit is contained in:
Eric Lam (MSFT) 2022-11-16 15:00:51 -08:00 коммит произвёл GitHub
Родитель d224d5b1c2 e1ce3c6931
Коммит 436fdb4fac
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
10 изменённых файлов: 499 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,109 @@
# Use Kafka Compaction with Azure Event Hubs
This quickstart shows how you can use Kafka compaction with Azure Event Hubs. With log compaction feature of Event Hubs, you can use event key-based retention mechanism where Event Hubs retrains the last known value for each event key of an event hub or a Kafka topic.
In this quickstart, the example producer application publishes a series of events and then publishes updated events for the same set of keys. Therefore, once the compaction job for the event hub/topic completes, the consumer should only see the updated events.
## Prerequisites
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?ref=microsoft.com&utm_source=microsoft.com&utm_medium=docs&utm_campaign=visualstudio) before you begin.
In addition:
* [Java Development Kit (JDK) 1.7+](http://www.oracle.com/technetwork/java/javase/downloads/index.html)
* On Ubuntu, run `apt-get install default-jdk` to install the JDK.
* Be sure to set the JAVA_HOME environment variable to point to the folder where the JDK is installed.
* [Download](http://maven.apache.org/download.cgi) and [install](http://maven.apache.org/install.html) a Maven binary archive
* On Ubuntu, you can run `apt-get install maven` to install Maven.
* [Git](https://www.git-scm.com/downloads)
* On Ubuntu, you can run `sudo apt-get install git` to install Git.
## Create an Event Hubs namespace
An Event Hubs namespace is required to send or receive from any Event Hubs service. See [Create Kafka-enabled Event Hubs](https://docs.microsoft.com/azure/event-hubs/event-hubs-create-kafka-enabled) for instructions on getting an Event Hubs Kafka endpoint. Make sure to copy the Event Hubs connection string for later use.
## Create a compact event hub/Kafaka topic
You can create a new event hub inside the namespace that you created in the previous step. To create a event hubs/Kafka topic which has log compaction enabled, make sure you set the *compaction policy* as *compaction* and provide the desired value for *tombstone retention time*. See [Create an event hub
](https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-create#create-an-event-hub) for instruction on how to create an event hub using the Azure portal.
### FQDN
For these samples, you will need the connection string from the portal as well as the FQDN that points to your Event Hub namespace. **The FQDN can be found within your connection string as follows**:
`Endpoint=sb://`**`mynamespace.servicebus.windows.net`**`/;SharedAccessKeyName=XXXXXX;SharedAccessKey=XXXXXX`
If your Event Hubs namespace is deployed on a non-Public cloud, your domain name may differ (e.g. \*.servicebus.chinacloudapi.cn, \*.servicebus.usgovcloudapi.net, or \*.servicebus.cloudapi.de).
## Clone the example project
Now that you have a Kafka-enabled Event Hubs connection string, clone the Azure Event Hubs for Kafka repository and navigate to the `compaction/java` subfolder:
```bash
git clone https://github.com/Azure/azure-event-hubs-for-kafka.git
cd azure-event-hubs-for-kafka/tutorials/compaction/java
```
## Producer
Using the provided producer example, send messages to the Event Hubs service.
Producer application publishes 100 events(which has the event value prefixed with `V1-`) using the keys from 1 to 100. Then another set of updated events(which has the event value prefixed with `V2-`) for the keys from 1 to 50.
Therefore, once the Kafka topic is compacted, the consumer application should only see the updated events for keys from 1 to 50.
### Provide an Event Hubs Kafka endpoint
#### producer.config
Update the `bootstrap.servers` and `sasl.jaas.config` values in `producer/src/main/resources/producer.config` to direct the producer to the Event Hubs Kafka endpoint with the correct authentication.
```config
bootstrap.servers=mynamespace.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=XXXXXX;SharedAccessKey=XXXXXX";
```
### Run producer from command line
This sample is configured to send messages to topic `contoso-compacted`, if you would like to change the topic, change the TOPIC constant in `producer/src/main/java/TestProducer.java`.
To run the producer from the command line, generate the JAR and then run from within Maven (alternatively, generate the JAR using Maven, then run in Java by adding the necessary Kafka JAR(s) to the classpath):
```bash
mvn clean package
mvn exec:java -Dexec.mainClass="TestProducer"
```
The producer will now begin sending events to the Kafka-enabled Event Hub at topic `contoso-compacted` (or whatever topic you chose) and printing the events to stdout.
## Consumer
Before running the consumer, you should wait a few mins to so that the topic compaction job completes its execution. Then you can use the provided consumer example to receive messages from the Kafka API of Event Hubs. If the compaction job has sucessfully completed, you should only see the updated events (event payload/value with `V2-` prefix) for keys 1 to 50.
### Provide an Event Hubs Kafka endpoint
#### consumer.config
Change the `bootstrap.servers` and `sasl.jaas.config` values in `consumer/src/main/resources/consumer.config` to direct the consumer to the Event Hubs endpoint with the correct authentication.
```config
bootstrap.servers=mynamespace.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=XXXXXX;SharedAccessKey=XXXXXX";
```
### Run consumer from command line
This sample is configured to receive messages from topic `contoso-compacted`, if you would like to change the topic, change the TOPIC constant in `consumer/src/main/java/TestConsumer.java`.
To run the producer from the command line, generate the JAR and then run from within Maven (alternatively, generate the JAR using Maven, then run in Java by adding the necessary Kafka JAR(s) to the classpath):
```bash
mvn clean package
mvn exec:java -Dexec.mainClass="TestConsumer"
```
If the Kafka-enabled Event Hub has incoming events (for instance, if your example producer is also running), then the consumer should now begin receiving events from topic `contoso-compacted` (or whatever topic you chose).
By default, Kafka consumers will read from the end of the stream rather than the beginning. This means any events queued before you begin running your consumer will not be read. If you started your consumer but it isn't receiving any events, try running your producer again while your consumer is polling. Alternatively, you can use Kafka's [`auto.offset.reset` consumer config](https://kafka.apache.org/documentation/#newconsumerconfigs) to make your consumer read from the beginning of the stream!

Просмотреть файл

@ -0,0 +1,68 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.app</groupId>
<artifactId>event-hubs-kafka-java-consumer-for-compaction</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<dependencies>
<!--v1.0-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.0.0</version>
</dependency>
<!--v1.1-->
<!-- <dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.1.0</version>
</dependency> -->
</dependencies>
<build>
<defaultGoal>install</defaultGoal>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<goals>
<goal>java</goal>
</goals>
</execution>
</executions>
<configuration>
<mainClass>TestConsumer</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>

Просмотреть файл

@ -0,0 +1,20 @@
//Copyright (c) Microsoft Corporation. All rights reserved.
//Licensed under the MIT License.
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class TestConsumer {
//Change constant to send messages to the desired topic
private final static String TOPIC = "contoso-compacted";
private final static int NUM_THREADS = 1;
public static void main(String... args) throws Exception {
final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
for (int i = 0; i < NUM_THREADS; i++){
executorService.execute(new TestConsumerThread(TOPIC));
}
}
}

Просмотреть файл

@ -0,0 +1,73 @@
//Copyright (c) Microsoft Corporation. All rights reserved.
//Licensed under the MIT License.
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.protocol.types.Field;
import org.apache.kafka.common.serialization.LongDeserializer;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.io.FileReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Collections;
import java.util.Properties;
public class TestConsumerThread implements Runnable {
private final String TOPIC;
//Each consumer needs a unique client ID per thread
private static int id = 0;
public TestConsumerThread(final String TOPIC){
this.TOPIC = TOPIC;
}
public void run (){
final Consumer<String, String> consumer = createConsumer();
System.out.println("Polling");
try {
while (true) {
final ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
for(ConsumerRecord<String, String> cr : consumerRecords) {
System.out.printf("Consumer Record(key, value):(%s, %s)\n", cr.key(), cr.value().substring(0, 3));
}
consumer.commitAsync();
}
} catch (CommitFailedException e) {
System.out.println("CommitFailedException: " + e);
} finally {
consumer.close();
}
}
private Consumer<String, String> createConsumer() {
try {
final Properties properties = new Properties();
synchronized (TestConsumerThread.class) {
properties.put(ConsumerConfig.CLIENT_ID_CONFIG, "KafkaExampleConsumer#" + id);
id++;
}
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
//Get remaining properties from config file
properties.load(new FileReader("src/main/resources/consumer.config"));
// Create the consumer using properties.
final Consumer<String, String> consumer = new KafkaConsumer<>(properties);
// Subscribe to the topic.
consumer.subscribe(Collections.singletonList(TOPIC));
return consumer;
} catch (FileNotFoundException e){
System.out.println("FileNotFoundException: " + e);
System.exit(1);
return null; //unreachable
} catch (IOException e){
System.out.println("IOException: " + e);
System.exit(1);
return null; //unreachable
}
}
}

Просмотреть файл

@ -0,0 +1,6 @@
bootstrap.servers=mynamespace.servicebus.windows.net:9093
group.id=$Default
request.timeout.ms=60000
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=XXXXXX;SharedAccessKey=XXXXXX";

Просмотреть файл

@ -0,0 +1,97 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.app</groupId>
<artifactId>event-hubs-kafka-java-producer-for-compaction</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<dependencies>
<!--v1.0-->
<!-- <dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.0.0</version>
</dependency> -->
<!--v2.6.0-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.0</version>
</dependency>
<!-- Jackson is now a provided dependency of kafka-clients -->
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/2.6.0 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.6.1</version>
</dependency>
<!-- enable logging -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.25</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.25</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<!--v1.1-->
<!-- <dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.1.0</version>
</dependency> -->
</dependencies>
<build>
<defaultGoal>install</defaultGoal>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<goals>
<goal>java</goal>
</goals>
</execution>
</executions>
<configuration>
<mainClass>TestProducer</mainClass>
</configuration>
</plugin>
</plugins>
</build>
</project>

Просмотреть файл

@ -0,0 +1,64 @@
//Copyright (c) Microsoft Corporation. All rights reserved.
//Licensed under the MIT License.
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.nio.charset.StandardCharsets;
import java.sql.Timestamp;
import java.util.Arrays;
public class TestDataReporter implements Runnable {
private static final int NUM_MESSAGES = 100;
final int oneMb = 1024 * 1024;
final int kafkaSerializationOverhead = 512;
private final String TOPIC;
private Producer<String, String> producer;
public TestDataReporter(final Producer<String, String> producer, String TOPIC) {
this.producer = producer;
this.TOPIC = TOPIC;
}
@Override
public void run() {
byte[] data = new byte[oneMb - kafkaSerializationOverhead];
Arrays.fill(data, (byte)'a');
String largeDummyValue = new String(data, StandardCharsets.UTF_8);
for(int i = 0; i < 100; i++) {
System.out.println("Publishing event: Key-" + i );
final ProducerRecord<String, String> record = new ProducerRecord<String, String>(TOPIC,
"Key-" + Integer.toString(i),
"V1_" + Integer.toString(i) + largeDummyValue);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
System.out.println(exception);
System.exit(1);
}
}
});
}
for(int i = 0; i < 50; i++) {
System.out.println("Publishing updated event: Key-" + i );
final ProducerRecord<String, String> record = new ProducerRecord<String, String>(TOPIC,
"Key-" + Integer.toString(i),
"V3_" + Integer.toString(i) + largeDummyValue);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
System.out.println(exception);
System.exit(1);
}
}
});
}
System.out.println("Finished sending " + NUM_MESSAGES + " messages from thread #" + Thread.currentThread().getId() + "!");
}
}

Просмотреть файл

@ -0,0 +1,50 @@
//Copyright (c) Microsoft Corporation. All rights reserved.
//Licensed under the MIT License.
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.LongSerializer;
import org.apache.kafka.common.serialization.StringSerializer;
import java.util.Properties;
import java.io.FileReader;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class TestProducer {
//Change constant to send messages to the desired topic, for this example we use 'test'
private final static String TOPIC = "contoso-compacted";
private final static int NUM_THREADS = 1;
public static void main(String... args) throws Exception {
//Create Kafka Producer
final Producer<String, String> producer = createProducer();
Thread.sleep(5000);
final ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
//Run NUM_THREADS TestDataReporters
for (int i = 0; i < NUM_THREADS; i++)
executorService.execute(new TestDataReporter(producer, TOPIC));
}
private static Producer<String, String> createProducer() {
try{
Properties properties = new Properties();
properties.load(new FileReader("src/main/resources/producer.config"));
properties.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
return new KafkaProducer<>(properties);
} catch (Exception e){
System.out.println("Failed to create producer with exception: " + e);
System.exit(0);
return null; //unreachable
}
}
}

Просмотреть файл

@ -0,0 +1,8 @@
# Root logger option
log4j.rootLogger=DEBUG, stdout
# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

Просмотреть файл

@ -0,0 +1,4 @@
bootstrap.servers=mynamespace.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://mynamespace.servicebus.windows.net/;SharedAccessKeyName=XXXXXX;SharedAccessKey=XXXXXX";