-
Notifications
You must be signed in to change notification settings - Fork 611
Upgrade Kafka to 3.8.0 #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Kafka to 3.8.0 #2180
Conversation
This looks great, hopefully someone can review this given 3.8.0 has been released for a month now. |
@mhratson With this, CC will be able to be compiled and to be able to running with Kafka 3.8.0. |
@mhratson can merge it, if he thinks it is OK. |
…essary dependencies # Conflicts: # gradle.properties
…onfigs zk admin client method
its value hasn't changed, only where it was stored, this way it's backward compatible
rebased & resolved conflict |
Looks good to me. |
* Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]>
* Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]>
Hi @akatona84 @egyedt @bgrishinko , fyi we may need to revert this PR unfortunately because it break our internal build. The reason is LinkedIn today is still using kafka 3.0.*, which has some incompatibility with this change. I think the general guidance is when we make API related changes we check the kafka version and decide which to use at runtime based on the kafka version. See #2254 as an example. Sorry for any inconvenience. |
@CCisGG , could u point where was the incompatibility happened? Maybe I'll try to make it better. I might have missed that. |
Hi @akatona84, we are still identifying the incompatibilities. One example is: Line 17 in f472590
Basically we are using kafka 3.0 internally, and our unit tests is throwing runtime exception:
I'm assuming the this is a class introduced at kafka 3.0+ version. There might be many other similar incompatibilities. Unfortunately we are still at kafka 3.0.* branch, so we would ask the contributors to make changes compatible to kafka 3.0. I'm seeing that you are changing many config classes in this PR, and I'm not sure a good way to verify if these changes are compatible to kafka 3.0, but we at least already know ZkConfigs is not compatible. |
Hi @CCisGG, @akatona84, I would prefer a backward compatible solution instead of a full revert of this change ("Upgrade Kafka to 3.8.0" - commit). If you share all of the incompatibilities, then we can try to create a backward compatible commit! Thanks in advance! |
@egyedt Thanks for the response. I think fixing forward is probably better than revert this PR at this moment. Let's do it. I'm seeing many config class changes in this PR are not backwards compatible, e.g. import org.apache.kafka.coordinator.group.GroupCoordinatorConfig; We need to let CC to choose old config class at runtime on old kafka versions. Could you guys help with it? |
Hi @CCisGG, would it be possible to isolate new API usage changes like these on a separate |
@kyguy Good point. We are actually considering that option and will discuss with the team today. |
@kyguy @CCisGG I think it would be a good solution to create a new branch that is compatible with a 4.0 version as we would consider the 4.0 migration in the near future and would contribute some changes back. We currently have some similar changes for Kafka 3.9 too in #2247, so please let us know what should we do about it as based on your discussions we'd either need to make it runtime compatible with 3.0 or make the PR against another branch. |
Hi @viktorsomogyi @kyguy and other folks, to support the majority of users who are using newer kafka versions, we decided to leave main branch to work with kafka 3.8+. Hopefully all the changes you make to support kafka 3.8, 3.9, 4.0 would be compatible naturally with the main branch. For our case, we created the migrate_to_kafka_3_0 branch, which will support kafka 3.0 to 3.7. We will create CC release 3.0.* from that branch. Hopefully it will make things easier for you guys to add 3.8,3.9,4.0 support, as you don't need to worry about the compatibility issue with older kafka versions. |
* Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies # Conflicts: # gradle.properties * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]>
* Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies # Conflicts: # gradle.properties * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]>
* Upgrade Kafka to 3.8.0 (linkedin#2180) * Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]> * Use literal config name for listeners and broker.id config (linkedin#2169) --------- Co-authored-by: Andras Katona <[email protected]> Co-authored-by: David Simon <[email protected]> Co-authored-by: Henry Haiying Cai <[email protected]>
* Upgrade simplekdc to 2.1.0 (linkedin#2186) This PR resolves linkedin#2178 Upgrading simplekdc version to "2.1.0" which supports a change that can correctly use security classes based on what version of IBM Semeru JDK(if applicable) is being used. There is no regression observed using Semeru, OpenJDK and Temurin JDKs. This newer version(released on 14 August 2024) also caters vulnerability in deps mentioned linkedin#2179 as **org.jboss.xnio:xnio-api** is updated to **3.8.16**[^1] [^1]:https://github.com/apache/directory-kerby/releases/tag/kerby-all-2.1.0#:~:text=Bump%20org.jboss.xnio%3Axnio%2Dapi%20from%203.8.15.Final%20to%203.8.16.Final). * remove unused KafkaSampleStore#_skipSampleStoreTopicRackAwarenessCheck (linkedin#2183) left over from linkedin#1572 (6ae3f41) * Test logging fix, by default log4j2 looks for log4j2.properties file (linkedin#2181) `log4j.properties` files are ignored in the test resources, after renamed, finally I was able to change the loglevels while unit/integration testing. I'm not sure if it was the issue on issue linkedin#2152, but this would be the fix for tests. Prod should work with the log4j.properties file as that is passed with -Dlog4j.configurationFile java opt * fix typo in comment (linkedin#2189) Fix 'the the' in the comments * new PR template (linkedin#2191) ## Summary Why: Improve PR quality and review-ability. What: modifies current PR template to be structured and require more details when submitting PRs. ## Expected Behavior PR must come with sufficient details to address or explain the issue. ## Actual Behavior PR template only requires link to the issue: ``` This PR resolves #<Replace-Me-With-The-Issue-Number-Addressed-By-This-PR>. ``` ## Steps to reproduce 1. either create a new PR or 2. see [the current template](https://github.com/linkedin/cruise-control/blob/c5545ef04618b5b42290edda2ee63eb6bfa2e1a6/docs/pull_request_template.md) ## Known Workarounds People voluntarily provide additional details ## Additional Evidence - n/a ## Categorization - [x] refactor * CI workflow with Github Actions (linkedin#2192) ## Summary ### Why 1. GIthub Actions workflow are native GH workflows 2. Github Actions do not require additional non-github accounts unlike CircleCI 3. plenty of compute resources[^0] available for OSS projects 4. unlike CircleCI resource limits (don't have details) [^0]:https://docs.github.com/en/actions/administering-github-actions/usage-limits-billing-and-administration#availability ### What 1. creates CI workflow `ci.yaml` 2. creates Artifactory workflow: `artifactory.yaml` Workflow structure is documented in the spec[^1] [^1]:https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions ## Expected Behavior CI is expected to 1. execute unit tests 1. execute integration tests 1. execute hw platform unit tests 1. publish artifacts to the artifactory when a tag is published 1. provide ability to re-run tests on failures 1. report results to corresponding PR/branch which is to be used as quality gates for PR merging. ## Actual Behavior 1. current Circle CI integration provides [1] [2] [3] [4] from the expected behavior 4. but re-run-ing checks requires additional efforts like logging in into the Circle CI 5. which slows PR feedback loop as users may not have CircleCI credentials and knowledge of the system [1]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [2]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L51-L53 [3]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L5-L34 [4]:https://github.com/linkedin/cruise-control/blob/a298df86095532264f13ca7490cfabb8ff68839f/.circleci/config.yml#L94-L103 ## Steps to reproduce 1. see failed PR checks, ie linkedin#2133 ## Known Workarounds 1. asking PR authors to trigger build ## Migration Plan 1. add GH Actions integration along with CircleCI 2. confirm GH Actions provide equivalent or better functionality 3. remove CircleCI integration 4. ensure publishing via GH actions works ## Categorization - [x] refactor * Update README.md * Set Embedded Zookeeper listen on 127.0.0.1 (linkedin#2196) ## Summary 1. Why: when on VPN, I can't run Cruise Control tests as ZK is binding to local real ip address and local network is restricted. 2. What: changing to bind to 127.0.0.1 fixes it (got the idea from Kafka embedded ZK setup. I think it won't make any difference how automation or human would run the tests, pls correct me if I'm wrong. * Add "documentation" category to PR template (linkedin#2195) ## Summary 1. Why: to categorize documentation PRs 2. What: adds "documentation" category to the PR template ## Expected Behavior - when users make documentation changes - they should be able to specify documentation as a change category ## Actual Behavior - no documentation category to specify * Add missing documentation for minNumBrokersViolateMetricLimit ## Summary 1. Why: Documentation for **min.num.brokers.violate.metric.limit.to.decrease.cluster.concurrency** is missing. 2. What: document the setting * Add more logging to help debugging the time spent on goal based operation (linkedin#2202) * Add more logging to help debugging the time spent on goal based operation * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/async/progress/OperationProgress.java Co-authored-by: Maryan Hratson <[email protected]> * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/servlet/handler/async/runnable/GoalBasedOperationRunnable.java Co-authored-by: Maryan Hratson <[email protected]> * Update cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/servlet/handler/async/runnable/GoalBasedOperationRunnable.java Co-authored-by: Maryan Hratson <[email protected]> --------- Co-authored-by: Maryan Hratson <[email protected]> * Fix the Circle CI error by removing the test-multi-arch job (linkedin#2203) * Remove test-multi-arch job in Circle CI * Remove test-multi-arch definitions * Improve per task observability through additional logging (linkedin#2204) * Fix the issue that uuid is null in the log after execution * Add more logging to track the task with its UUID * Rename "task" to "User task" in logging * Reformat logging * Fixing Unexpected method calls: HttpSession.invalidate (linkedin#2201) ## Summary 1. Why: The test failed sometimes with unexpected method calls. 2. What: The fix is preparing the test to accept invalidate method call too ## Expected Behavior Tests are running without failure ## Actual Behavior Tests are failing sometimes with unexpected method call. ## Steps to Reproduce 1. setup repeated run on e.g. `testCreateUserTask` in IDE 2. observe failure after multiple successful runs (for me it was failing after around 250 successful runs) ## Additional evidence ``` java.lang.AssertionError: On mock #2 (zero indexed): Unexpected method calls: HttpSession.invalidate() at org.easymock.EasyMock.getAssertionError(EasyMock.java:2230) at org.easymock.EasyMock.verify(EasyMock.java:2058) at com.linkedin.kafka.cruisecontrol.servlet.UserTaskManagerTest.testCreateUserTask(UserTaskManagerTest.java:59) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176) at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100) at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60) at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56) at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113) at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65) at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69) at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74) ``` ## Categorization - [x] bugfix - [ ] new feature - [ ] refactor - [ ] CVE - [ ] other * Update the README (linkedin#2216) This is a minor improvement to the README.md. * fix: Fix CVEs (linkedin#2220) Update dependencies to fix CVEs: Zookeeper, Netty, Jetty, Nimbus JOSE+JWT * Fix: intra.broker.goals cannot be configured as default.goals (linkedin#2221) * Kerberos auth to local rules support (linkedin#2043) * Expose AdminClient exception when failing to describe the cluster (linkedin#2222) * Fix PartitionSizeAnomalyFinder, to be able to handle custom SELF_HEALING_PARTITION_SIZE_THRESHOLD_MB values (linkedin#2212) * Upgrade Kafka to 3.8.0 (linkedin#2180) * Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies # Conflicts: # gradle.properties * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]> * Rectify docker run command for s390x (linkedin#2249) * Make startup more robust and prevent auto topic creation when using CruiseControlMetricsReporterSampler (linkedin#2211) * Update license to reflect the latest status (linkedin#2256) * Catch NoSuchFileException on load failed brokers list (linkedin#2255) * Replace deprecated methods to support Kafka 4.0.0 (linkedin#2254) * Upgrade Kafka to 3.8.0 (linkedin#2180) * Upgrading kafka to 3.8.0 - config properties rewriting and adding necessary dependencies * Upgrading kafka to 3.8.0 - using alternative for removed getAllTopicConfigs zk admin client method * Upgrading kafka to 3.8.0 - adding 3.8 zk client creation way * Upgrading kafka to 3.8.0 - adding 3.8 network client creation way * replication/quota/topic log constants moved in 3.8 again its value hasn't changed, only where it was stored, this way it's backward compatible * Update usages of Metadata to conform to kafka 3.7 interface --------- Co-authored-by: David Simon <[email protected]> * Use literal config name for listeners and broker.id config (linkedin#2169) * Disabling test and integration-test in github-workflows --------- Co-authored-by: yasiribmcon <[email protected]> Co-authored-by: Lee Dongjin <[email protected]> Co-authored-by: Andras Katona <[email protected]> Co-authored-by: wonjong-yoo <[email protected]> Co-authored-by: Maryan Hratson <[email protected]> Co-authored-by: ik <[email protected]> Co-authored-by: Allen Wang <[email protected]> Co-authored-by: Hao Geng <[email protected]> Co-authored-by: Aswin A <[email protected]> Co-authored-by: Kondrat Bertalan <[email protected]> Co-authored-by: Tamas Barnabas Egyed <[email protected]> Co-authored-by: harmadasg <[email protected]> Co-authored-by: David Simon <[email protected]> Co-authored-by: Paolo Patierno <[email protected]> Co-authored-by: Shubham Rawat <[email protected]> Co-authored-by: Henry Haiying Cai <[email protected]> Co-authored-by: Daniel Vaseekaran <[email protected]>
No description provided.