Skip to content

Conversation

@clarkwtc
Copy link
Contributor

@clarkwtc clarkwtc commented Mar 4, 2025

Migrate AdminClientRebootstrapTest to the new test infra and remove the old Scala test.

The test results
Screenshot 2025-03-04 at 10 27 21 PM

Reviewers: TengYao Chi kitingiao@gmail.com, David Arthur mumrah@gmail.com

@github-actions github-actions bot added triage PRs from the community core Kafka Broker labels Mar 4, 2025
Copy link
Contributor

@frankvicky frankvicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.
I have a nit comment.

Comment on lines 76 to 78
var topic = "topic";
try (var admin = clusterInstance.admin()) {
admin.createTopics(List.of(new NewTopic(topic, BROKER_COUNT, (short) 2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline topic

Comment on lines 68 to 71
.setTypes(Set.of(Type.KRAFT))
.setBrokers(BROKER_COUNT)
.setAdminClientProperties(rebootstrapProperties)
.setServerProperties(serverProperties).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and above there's too much whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments.
I've fixed it.

server1.awaitShutdown();

// Only the server 0 is available for the admin client during the bootstrap.
admin.listTopics().names().get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've just translated an existing test, but it's good to use timeouts with these get() calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, adding timeouts to prevent get() calls from blocking.


// Only the server 0 is available for the admin client during the bootstrap.
admin.listTopics().names().get();
TestUtils.waitForCondition(() -> admin.listTopics().names().get().contains(topic),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get() will still block until the future completes. What I was suggestion was that we do something like .get(5, TimeUnit.MINUTES) or some reasonable timeout. This would prevent the test from getting stuck if something went wrong.

Adding the waitForCondition is a good improvement. It will make this test more reliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Adding the waitForCondition only waits for get() to complete, your suggestion prevents the testing blocking.
I've fixed it.

@github-actions github-actions bot removed the triage PRs from the community label Mar 5, 2025
Copy link
Member

@mumrah mumrah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks!

@mumrah mumrah merged commit 870db5d into apache:trunk Mar 6, 2025
24 checks passed
Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkwtc sorry for late review. I leave some major comments. Please take a look, thanks!

return ClusterConfig.defaultBuilder()
.setTypes(Set.of(Type.KRAFT))
.setBrokers(BROKER_COUNT)
.setAdminClientProperties(rebootstrapProperties)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is no-op, and we should use clusterInstance.admin(Map) to build admin with custom configs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open https://issues.apache.org/jira/browse/KAFKA-18944 to remove those unused setters

var topic = "topic";
var timeout = 5;
try (var admin = clusterInstance.admin()) {
admin.createTopics(List.of(new NewTopic(topic, BROKER_COUNT, (short) 2)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BROKER_COUNT should be used to define replicas rather than partitions, since our goal is to ensure each broker has a replica.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, these are different definitions and should not reuse variables in partitions.

}

@ClusterTemplate(value = "generator")
public void testRebootstrap(ClusterInstance clusterInstance) throws InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rewriting, this test no longer covers the scenario. We need to verify that the admin can "rebootstrap" when all "known" brokers are unavailable. Therefore, we should shut down one broker before creating the admin to ensure it's aware of "one" broker. Then, we shut down the broker the admin is aware of and restart the other one.

  1. Shut down broker0.
  2. Create the admin with bootstrap=broker0,broker1.
  3. The admin is aware of broker1.
  4. Run some admin APIs to ensure everything is fine.
  5. Shut down broker1 and restart broker0.
  6. The admin can't connect to broker1, so it starts to rebootstrap to find other available brokers.
  7. The admin finds broker0.
  8. Run some admin APIs to ensure everything is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for redefining this test to cover the scenario.
We will move forward with this PR: #19187

chia7712 pushed a commit that referenced this pull request Mar 9, 2025
Remove unused `saslServerProperties`, `saslClientProperties`,
`adminClientProperties`, `producerProperties`, and `consumerProperties`
in ClusterConfig.

First, I quickly fixed the unused adminClientProperties, and then I will
move on to #19094 to fix the related
issues.

Pass AdminClientRebootstrapTest
<img width="1398" alt="Screenshot 2025-03-09 at 12 54 57 PM"
src="https://github.com/user-attachments/assets/73c50376-6602-493d-8abd-0eb2bb304114"
/>

Pass ClusterConfigTest
<img width="1117" alt="Screenshot 2025-03-09 at 12 55 28 PM"
src="https://github.com/user-attachments/assets/b4da59da-dfdf-4698-9077-5086854360ab"
/>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants