Граф коммитов

432 Коммитов

Автор SHA1 Сообщение Дата
David Arthur 51b8f26e2c Fix missing reference in kafka.py (#7715)
Also fix a default value for a dictionary arg
2020-04-13 10:57:50 -07:00
Brian Bushree efc131ab14 [MINOR] allow additional JVM args in KafkaService (#7297)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>
2020-04-13 10:57:38 -07:00
bill 054f334579 Bump version to 2.4.2-SNAPSHOT 2020-03-11 21:46:48 -04:00
bill c57222ae8c Bump version to 2.4.1 2020-03-02 19:32:42 -05:00
Brian Bushree bd4feef3dd MINOR: add wait_for_assigned_partitions to console-consumer (#8192)
what/why
the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer

this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started.

caveat
this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node.

I think a proper fix for this would be to make JmxTool its own standalone single-node service

alternatives
we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService)

Reviewers: Mathew Wong <mwong@confluent.io>, Bill Bejeck <bbejeck.com>
2020-02-29 19:46:04 -05:00
Matthew Wong e60eb69afa throttle consumer timeout increase (#8188)
The test_throttled_reassignment test fails because the consumer that is used to validate reassignment does not start on time to consume all messages. This does not seem like an issue with the throttling of the reassignment, since increasing the timeout allowed the test to pass multiple consecutive runs locally.

This test seemed to rely on the default JmxTool for the console consumer that was removed in this commit: 179d0d7
The console consumer would check to see if it had partitions assigned to it before beginning to consume. Although the test occasionally failed with the JmxTool, it began to fail much more after the removal.

Error messages of failures followed the below format with varying numbers of missed messages. They are the first messages by the producer.

535 acked message did not make it to the Consumer. They are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19...plus 515 more. Total Acked: 192792, Total Consumed: 192259. We validated that the first 535 of these missing messages correctly made it into Kafka's data files. This suggests they were lost on their way to the consumer.
In the scope of the test, this error suggests that the test is falling into the race condition described in produce_consume_validate.py, which has the timeout to prevent the consumer from missing initial messages.

This can serve as a temporary fix until the logic of consumer startup is addressed further.

Reviewers: Jason Gustafson <jason@confluent.io>, Bill Bejeck <bbejeck@gmail.com>
2020-02-27 17:50:11 -05:00
Brian Bushree 179d0d73d6 MINOR: Disable JmxTool in kafkatest console-consumer by default (#7785)
Do not initialize `JmxTool` by default when running console consumer. In order to support this, we remove `has_partitions_assigned` and its only usage in an assertion inside `ProduceConsumeValidateTest`, which did not seem to contribute much to the validation.

Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-01-09 16:54:37 -08:00
A. Sophie Blee-Goldman 69f8fad99c HOTFIX: fix system test race condition (#7836)
In some system tests a Streams app is started and then prints a message to stdout, which the system test waits for to confirm the node has successfully been brought up. It then greps for certain log messages in a retriable loop.

But waiting on the Streams app to start/print to stdout does not mean the log file has been created yet, so the grep may return an error. Although this occurs in a retriable loop it is assumed that grep will not fail, and the result is piped to wc and then blindly converted to an int in the python function, which fails since the error message is a string (throws ValueError)

We should catch the ValueError and return a 0 so it can try again rather than immediately crash

Reviewers: Bill Bejeck <bbejeck@gmail.com>, John Roesler <vvcephei@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
2020-01-06 18:34:25 -08:00
Manikumar Reddy c37fe3c671 Bump version to 2.4.1-SNAPSHOT 2019-12-14 08:07:29 +05:30
Manikumar Reddy bc5022e37a Merge tag '2.4.0-rc4' into 2.4
2.4.0-rc4
2019-12-14 08:07:11 +05:30
Randall Hauch d087eae684 MINOR: Simplify the timeout logic to handle protocol in Connect distributed system tests (#7806) 2019-12-10 12:31:29 -06:00
Manikumar Reddy 77a89fcf8d Bump version to 2.4.0 2019-12-09 22:16:25 +05:30
Randall Hauch 82d2446cd6 MINOR: Bump system test version from 2.2.1 to 2.2.2 (#7765)
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Ismael Juma <ismael@confluent.io>
2019-12-06 15:45:23 -06:00
Randall Hauch 0657879296 MINOR: Increase the timeout in one of Connect's distributed system tests (#7789)
Author: Randall Hauch <rhauch@gmail.com>
Reviewers: Nigel Liang <nigel@nigelliang.com>, Roesler <john@confluent.io>
2019-12-06 15:35:11 -06:00
David Arthur 0e14bc3432 Actually run the delete topic command in kafka.py (#7776)
Reviewed-By: Jason Gustafson <jason@confluent.io>
2019-12-05 08:15:51 +05:30
David Arthur 5ebd1e4232 KAFKA-9123 Test a large number of replicas (#7621)
Two tests using 50k replicas on 8 brokers:
* Do a rolling restart with clean shutdown, delete topics
* Run produce bench and consumer bench on a subset of topics

Reviewed-By: David Jacot <djacot@confluent.io>, Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io>
2019-11-26 11:15:13 -05:00
David Arthur b8536cc7c6 KAFKA-8981 Add rate limiting to NetworkDegradeSpec (#7446) 2019-11-26 11:14:43 -05:00
Bruno Cadonna 0bc40c63e1 MINOR: Fix Streams EOS system tests by adding clean-up of state dir (#7693)
Recently, system tests test_rebalance_[simple|complex] failed
repeatedly with a verfication error. The cause was most probably
the missing clean-up of a state directory of one of the processors.

A node is cleaned up when a service on that node is started and when
a test is torn down.

If the clean-up flag clean_node_enabled of a EOS Streams service is
unset, the clean-up of the node is skipped.

The clean-up flag of processor1 in the EOS tests should stay set before
its first start, so that the node is cleaned before the service is started.
Afterwards for the multiple restarts of processor1 the cleans-up flag should
be unset to re-use the local state.

After the multiple restarts are done, the clean-up flag of processor1 should
again be set to trigger node clean-up during the test teardown.

A dirty node can lead to test failures when tests from Streams EOS tests are
scheduled on the same node, because the state store would not start empty
since it reads the local state that was not cleaned up.

Reviewers: Matthias J. Sax <mjsax@apache.org>, Andrew Choi <andchoi@linkedin.com>, Bill Bejeck <bbejeck@gmail.com>
2019-11-21 14:07:16 -05:00
Jason Gustafson fba3e7e1f9 KAFKA-9079: Fix reset logic in transactional message copier
The consumer's `committed` API does not return an entry in the response map for a requested partition if there is no committed offset. The transactional message copier, which is used in the transaction system test, did not account for this. If the first transaction attempted by the copier was randomly aborted, then we would not seek to the beginning as expected, which means we would fail to copy some of the records.

This patch fixes the problem by iterating over the assignment rather than the result of `committed` when resetting offsets. It also adds enables additional logging in the transaction message copier service to make finding problems easier in the future.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #7653 from hachikuji/fix-transaction-system-test

(cherry picked from commit 903d66e2f9)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
2019-11-06 16:00:10 +05:30
A. Sophie Blee-Goldman 6fc17bcadd HOTFIX: fix bug in VP test where it greps for the wrong log message (#7642)
We added more version info to the log messages but must not have cleanly backported the test update -- looking back it seems like older versions might also need a fix (separate PR as the version numbers are different in 2.1-2.3)

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-11-04 16:14:25 -08:00
Boyang Chen 9dce3e7535 KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441)
Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment.

However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases.

Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now.

Our fixes are following:

• Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then.
• [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info.

Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2019-10-29 10:42:13 -07:00
Konstantine Karantasis 99926bde1f KAFKA-9078: Fix Connect system test after adding MM2 connector classes
MM2 added a few connector classes in Connect's classpath and given that the assertion in the Connect REST system tests need to be adjusted to account for these additions.

This fix makes sure that the loaded Connect plugins are a superset of the expected by the test connectors.

Testing: The change is straightforward. The fix was tested with local system test runs.

Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #7578 from kkonstantine/minor-fix-connect-test-after-mm2-classes
2019-10-23 15:48:07 +05:30
Bill Bejeck f38d47bf07 KAFKA-8496: System test for KIP-429 upgrades and compatibility (#7529)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-16 22:30:00 -07:00
A. Sophie Blee-Goldman de202a6697 KAFKA-8743: Flaky Test Repartition{WithMerge}OptimizingIntegrationTest (#7472)
All four flavors of the repartition/optimization tests have been reported as flaky and failed in one place or another:
* RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED
* RepartitionOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION
* RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_OPTIMIZED
* RepartitionWithMergeOptimizingIntegrationTest.shouldSendCorrectRecords_NO_OPTIMIZATION

They're pretty similar so it makes sense to knock them all out at once. This PR does three things:

* Switch to in-memory stores wherever possible
* Name all operators and update the Topology accordingly (not really a flaky test fix, but had to update the topology names anyway because of the IM stores so figured might as well)
* Port to TopologyTestDriver -- this is the "real" fix, should make a big difference as these repartition tests required multiple roundtrips with the Kafka cluster (while using only the default timeout)

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-10 16:24:09 -07:00
A. Sophie Blee-Goldman 133c33fde1 KAFKA-8179: Part 7, cooperative rebalancing in Streams (#7386)
Key improvements with this PR:

* tasks will remain available for IQ during a rebalance (but not during restore)
* continue restoring and processing standby tasks during a rebalance
* continue processing active tasks during rebalance until the RecordQueue is empty*
* only revoked tasks must suspended/closed
* StreamsPartitionAssignor tries to return tasks to their previous consumers within a client
* but do not try to commit, for now (pending KAFKA-7312)


Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-07 09:30:02 -07:00
A. Sophie Blee-Goldman 3b05dc685b MINOR: just remove leader on trunk like we did on 2.3 (#7447)
Small follow-up to trunk PR #7423

While debugging the 2.3 VP PR we realized we should remove the leader-tracking from the VP system test altogether. We'd already merged the corresponding trunk PR so I made a quick new PR for trunk (also fixes a missed version bump in one of the log messages)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-10-04 15:58:11 -07:00
Chris Egerton 791d0d61bf KAFKA-8804: Secure internal Connect REST endpoints (#7310)
Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key.

Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured.

This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status).

Author: Chris Egerton <chrise@confluent.io>
Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
2019-10-02 17:06:57 -05:00
A. Sophie Blee-Goldman 8da69936a7 KAFKA-8649: Send latest commonly supported version in assignment (#7423)
Instead of sending the leader's version and having older members try to blindly upgrade.

The only other real change here is that we will also set the VERSION_PROBING error code and return early from onAssignment when we are upgrading our used subscription version (not just downgrading it) since this implies the whole group has finished the rolling upgrade and all members should rejoin with the new subscription version.

Also piggy-backing on a fix for a potentially dangerous edge case, where every thread of an instance is assigned the same set of active tasks.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-10-02 08:54:32 -07:00
Rajini Sivaram 0d31272b35
KAFKA-8848; Update system tests to use new AclAuthorizer (#7374)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2019-09-24 10:30:17 +01:00
Randall Hauch ada35d5ff4 Add recent versions of Kafka to the matrix of ConnectDistributedTest (#7024)
Reviewers: Arjun Satish <arjun@confluent.io>, Konstantine Karantasis <k.karantasis@gmail.com>
2019-09-18 10:21:21 -07:00
vinoth chandar ffef0871c2 KAFKA-7149 : Reducing streams assignment data size (#7185)
* Leader instance uses dictionary encoding on the wire to send topic partitions
* Topic names (most expensive component) are mapped to an integer using the dictionary
* Follower instances receive the dictionary, decode topic names back
* Purely an on-the-wire optimization, no in-memory structures changed
* Test case added for version 5 AssignmentInfo

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-09-05 13:50:55 -07:00
Colin Patrick McCabe a225347ff2
KAFKA-8840: Fix bug where ClientCompatibilityFeaturesTest fails when running multiple iterations (#7260)
Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations.

Also, fix a typo in tests/docker/Dockerfile.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2019-08-30 16:07:59 -07:00
Matthias J. Sax 4d1ee26a13
KAFKA-8594: Add version 2.3 to Streams system tests (#7131)
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>
2019-08-21 10:26:57 -07:00
David Arthur ff9e95cb09 MINOR: Add fetch from follower system test (#7166)
This adds a basic system test that enables rack-aware brokers with the rack-aware replica selector for fetch from followers (KIP-392). The test asserts that the follower was read from at least once and that all the messages that were produced were successfully consumed. 

Reviewers: Jason Gustafson <jason@confluent.io>
2019-08-13 12:33:05 -07:00
Arjun Satish 794637232c KAFKA-8774: Regex can be found anywhere in config value (#7197)
Corrected the AbstractHerder to correctly identify task configs that contain variables for externalized secrets. The original method incorrectly used `matcher.matches()` instead of `matcher.find()`. The former method expects the entire string to match the regex, whereas the second one can find a pattern anywhere within the input string (which fits this use case more correctly).

Added unit tests to cover various cases of a config with externalized secrets, and updated system tests to cover case where config value contains additional characters besides secret that requires regex pattern to be found anywhere in the string (as opposed to complete match).

Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2019-08-13 09:40:12 -05:00
Matthias J. Sax e9a35fe02e
MINOR: Bump system test version from 2.2.0 to 2.2.1 (#6873)
Reviewers: Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>
2019-08-09 14:33:20 -07:00
Rajini Sivaram de8ce78a90
MINOR: Tolerate limited data loss for upgrade tests with old message format (#7102)
To avoid transient system test failures, tolerate a small amount of data loss due to truncation in upgrade system tests using older message format prior to KIP-101, where data loss was possible.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2019-07-31 16:19:36 +01:00
Brian Bushree e5f7220b23 MINOR: kafkatest - adding whitelist for interbroker sasl configs (#7093) 2019-07-22 09:38:28 +01:00
Ismael Juma d67495d6a7
KAFKA-8634: Update ZooKeeper to 3.5.5 (#6802)
ZooKeeper 3.5.5 is the first stable release in the 3.5.x series. The key new feature
in is TLS support, but there are a few more noteworthy features:

* Dynamic reconfiguration
* Local sessions
* New node types: Container, TTL
* Ability to remove watchers
* Multi-threaded commit processor
* Upgraded to Netty 4.1

See the release notes for more detail:
https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html

In addition to the version bump, we:

* Add `commons-cli` dependency as it's required by `ZooKeeperMain`, but specified as
`provided` in their pom.
* Remove unnecessary `ZooKeeperMainWrapper`, the bug it worked around was fixed
upstream a long time ago.
* Ignore non zero exit in one system test invocation of `ZooKeeperMain`.
`ZooKeeperMainWrapper` always returned `0` and `ZooKeeperService.query` relies
on that for correct behavior.

Reviewers: Jason Gustafson <jason@confluent.io>
2019-07-10 09:45:10 -07:00
Ismael Juma 57903be496
MINOR: Remove zkclient dependency (#7036)
ZkUtils was removed so we don't need this anymore.

Also:
* Fix ZkSecurityMigrator and ReplicaManagerTest not to
reference ZkClient classes.
* Remove references to zkclient in various `log4j.properties`
and `import-control.xml`.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2019-07-05 07:50:32 -07:00
David Arthur 23beeea34b KAFKA-8443; Broker support for fetch from followers (#6832)
Follow on to #6731, this PR adds broker-side support for [KIP-392](https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica) (fetch from followers). 

Changes:
* All brokers will handle FetchRequest regardless of leadership
* Leaders can compute a preferred replica to return to the client
* New ReplicaSelector interface for determining the preferred replica
* Incremental fetches will include partitions with no records if the preferred replica has been computed
* Adds new JMX to expose the current preferred read replica of a partition in the consumer

Two new conditions were added for completing a delayed fetch. They both relate to communicating the high watermark to followers without waiting for a timeout:
* For regular fetches, if the high watermark changes within a single fetch request 
* For incremental fetch sessions, if the follower's high watermark is lower than the leader

A new JMX attribute `preferred-read-replica` was added to the `kafka.consumer:type=consumer-fetch-manager-metrics,client-id=some-consumer,topic=my-topic,partition=0` object. This was added to support the new system test which verifies that the fetch from follower behavior works end-to-end. This attribute could also be useful in the future when debugging problems with the consumer.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>
2019-07-04 08:18:51 -07:00
Brian Bushree 5287036b38 MINOR: system tests - avoid 'sasl.enabled.mechanisms' in listener overrides (#7018)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2019-07-03 12:17:17 +01:00
Randall Hauch 6f91096c7d MINOR: Fix version for ConnectDistributed system test, remove 0.9.0.1 compatibility test (#7023)
Connect tests were using String version for KafkaService instead of the expected KafkaVersion object. This broke due to recent changes to KafkaVersion. It turns out that the tests with String version were running compatibility tests against `dev` brokers rather than the older broker versions they were expecting to run against. When version was fixed, tests using 0.9.0.1 brokers started failing since new clients are not compatible with 0.9.0.1 brokers. So this PR fixes version parameter and removes the two tests against 0.9.0.1 brokers.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
2019-07-02 19:10:41 +01:00
Colin Patrick McCabe 3d2d87abd1
MINOR: Add compatibility tests for 2.3.0 (#6995)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2019-06-28 09:25:08 -07:00
Brian Bushree 357aedeb1b MINOR: Support listener config overrides in system tests (#6981)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2019-06-27 18:10:43 +01:00
Stanislav Vodetskyi 594d043037 MINOR: Fix failing upgrade test by supporting both security.inter.broker.protocol and inter.broker.listener.name depending on kafka version (#7000)
Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2019-06-27 17:50:17 +01:00
Stanislav Vodetskyi f51d7d3c93 KAFKA-8557: system tests - add support for (optional) interbroker listener with the same security protocol as client listeners (#6938)
Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2019-06-21 17:51:43 +01:00
David Arthur d7a5e31ca2
KAFKA-8519 Add trogdor action to slow down a network (#6912)
This adds a new Trogdor fault spec for inducing network latency on a network device for system testing. It operates very similarly to the existing network partition spec by executing the `tc` linux utility.
2019-06-21 11:30:05 -04:00
Boyang Chen e981b82601 KAFKA-8500; Static member rejoin should always update member.id (#6899)
This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time. 

The PR fixes the following things:

1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id
2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member.
3. Clearly handle Dead/Empty cases as exceptional.
4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
2019-06-12 08:41:58 -07:00
Jason Gustafson 2feb44ebc8 MINOR: Fix race condition on shutdown of verifiable producer
We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following:
```
RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process
```
The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6906 from hachikuji/fix-failing-replicat-verification-test
2019-06-07 16:56:21 -07:00