Working set of changes to upgrade CDAP to Guava 32 and compiled with Java 17#16139
Working set of changes to upgrade CDAP to Guava 32 and compiled with Java 17#16139aviachar wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of shadow compatibility classes and core transaction management components to maintain support for legacy dependencies like Apache Tephra and Twill under Guava 32+. Key changes include the addition of custom Guava shims, Zookeeper client implementations with retry and namespacing logic, and the core Tephra transaction management system. Review feedback suggests optimizing snapshot persistence in LocalFileTransactionStateStorage by utilizing BufferedOutputStream for better I/O performance and replacing File.renameTo() with the more robust java.nio.file.Files.move() API.
| // save the snapshot to a temporary file | ||
| File snapshotTmpFile = new File(snapshotDir, TMP_SNAPSHOT_FILE_PREFIX + snapshot.getTimestamp()); | ||
| LOG.debug("Writing snapshot to temporary file {}", snapshotTmpFile); | ||
| OutputStream out = new java.io.FileOutputStream(snapshotTmpFile); |
There was a problem hiding this comment.
Writing large snapshots to a FileOutputStream without buffering can be inefficient. It is recommended to wrap the stream in a BufferedOutputStream to improve I/O performance, especially since a BUFFER_SIZE constant is already defined in this class.
| OutputStream out = new java.io.FileOutputStream(snapshotTmpFile); | |
| OutputStream out = new java.io.BufferedOutputStream(new java.io.FileOutputStream(snapshotTmpFile), BUFFER_SIZE); |
| if (!snapshotTmpFile.renameTo(finalFile)) { | ||
| throw new IOException("Failed renaming temporary snapshot file " + snapshotTmpFile.getName() + " to " + | ||
| finalFile.getName()); | ||
| } |
There was a problem hiding this comment.
File.renameTo() is platform-dependent and can fail silently or behave inconsistently across different file systems (e.g., failing if the destination exists on some platforms). Since this project is targeting Java 17, it is safer and more reliable to use java.nio.file.Files.move() with StandardCopyOption.REPLACE_EXISTING to ensure the snapshot update is consistent and handles existing files correctly.
try {
java.nio.file.Files.move(snapshotTmpFile.toPath(), finalFile.toPath(),
java.nio.file.StandardCopyOption.REPLACE_EXISTING);
} catch (java.io.IOException e) {
throw new java.io.IOException("Failed renaming temporary snapshot file " + snapshotTmpFile.getName() + " to " +
finalFile.getName(), e);
}
Master Combined Implementation Plan - CDAP Core & Apache Kafka Plugins Upgrades to Java 17 and Guava 32
This master document details the design patterns, architectural adjustments, custom compatibility shims, and Maven dependency trees implemented to upgrade both CDAP Core and the standalone Apache Kafka Plugins repository components to target Java 17 and run under Guava 32 (
32.0.0-jrein Core,32.1.3-jrein Plugins).Executive Summary & Core Architectural Strategy
Migrating CDAP to Java 17 and Guava 32 introduces strict language constraints, class-file major version 61 assemblies, and strict, non-idempotent service lifecycle execution. To achieve binary compatibility without breaking un-upgradable legacy transitives (e.g. Apache Twill and the retired Apache Tephra projects), we designed and deployed three advanced compile/runtime shims, refactored global packaging plugins, and adopted a safe-lifecycle design pattern.
High-Level Key Modifications
graph TD A[CDAP Core & Kafka Plugins JVM Upgrade] --> B[Java 17 & Guava 32 Target] B --> C[CDAP Core: Guava 32.0.0-jre] B --> D[Kafka Plugins: Guava 32.1.3-jre] C --> E[Service Lifecycle Refactoring] C --> F[Twill Classpath/ZK Shims] C --> G[Tephra Shadow Interface/Stopwatch Shims] C --> I[maven-jar-plugin Upgrade to 3.5.0] C --> M[Legacy Guava io.Input/OutputSupplier Stubs] C --> N[GSON Modular Reflection Environment Fixes] C --> O[Wrangler Connection & BQ Registry Refactoring] C --> P[Standalone E2E Sanity Verification Suite] D --> J[commons-text 1.10.0 Classpath Override] D --> K[Embedded Kafka Server Lifecycle Upgrade]1. CDAP Core Upgrades
1.1. Safe-Start Service Lifecycle Design Pattern
Migrating Guava to version 32 introduces strict execution boundaries on
com.google.common.util.concurrent.Serviceobjects. Under modern Guava, callingstartAsync()on any service that is not in theNEWstate immediately throws anIllegalStateException. In CDAP's shared Guice container testing environments, multiple overlapping services attempt to start/stop singletons.Solution implemented:
private boolean metricsCollectionServiceStartedByMe = false;).startAsync()invocations with checks ensuring state matchesService.State.NEW.stopAsync()orstopQuietly()operations to only trigger if the current component was the one that originally started the service.1.2. Apache Tephra Service Shadow Compatibility Interface
Retired binary dependencies like Apache Tephra (
tephra-core) compiled against legacy Guava interfaces expectcom.google.common.util.concurrent.Serviceto contain non-async synchronous lifecycles (startAndWait()andstopAndWait()) and list-return structures (start()andstop()). Modern Guava stripped these APIs, resulting in runtimeNoSuchMethodErrorcrashes.Solution implemented:
com.google.common.util.concurrent.Serviceinsidecdap-commonat Service.java.cdap-commonmodule source, shadowing the official Guava class during compilation and classloading at runtime.defaultimplementations that emulate the older asynchronous behaviors internally:default ListenableFuture<State> start() { ... startAsync(); }default State startAndWait() { startAsync().awaitRunning(); return state(); }default ListenableFuture<State> stop() { ... stopAsync(); }default State stopAndWait() { stopAsync().awaitTerminated(); return state(); }1.3. Apache Twill ServiceListenerAdapter Classpath Shim
Apache Twill (
twill-core) bundlesorg.apache.twill.internal.ServiceListenerAdapterwhich implements Guava's listener. In legacy Guava,Service.Listenerwas defined as a Java interface (which Twill implemented viaimplements). In Guava 32, this was refactored into an abstract class (requiring class extension viaextends). This mismatch results in a fatalIncompatibleClassChangeErrorwhen starting standalone environments.Solution implemented:
org.apache.twill.internal.ServiceListenerAdapteras extendingcom.google.common.util.concurrent.Service.Listenerrather than implementing it.twill-core.jardependency on the JVM classpath.1.4. Shading & Relocations in
cdap-cliTo prevent legacy Guava classes bundled within the fat
cdap-cliassembly from leaking and polluting the classpath of downstream modules or test environments, we implemented strict shading.Solution implemented:
maven-shade-pluginto relocate thecom.google.commonnamespace toio.cdap.cdap.shaded.com.google.common.java.util.function.Supplierand standard maps/lists) instead of Guava functional APIs. This ensures that unshaded downstream modules do not encounter signature mismatches when interacting with the shaded command classes.1.5. Globals & Sub-module
maven-jar-pluginRefactoringUnder Java 17, class files are compiled into major version 61 bytecodes. The legacy version
2.4ofmaven-jar-pluginutilized by CDAP has a Plexus Archiver that crashes when parsing version 61 structures, resulting in packaging errors.Solution implemented:
maven-jar-pluginversion3.5.0globally inside the<pluginManagement>block of the rootpom.xmlat pom.xml.<version>2.4</version>overrides from all 25+ sub-modulepom.xmlfiles (includingcdap-common/pom.xmlandcdap-app-fabric/pom.xml), enabling them to cleanly inherit the upgraded version settings from the parent.1.6. Apache Tephra Stopwatch Shadow Compatibility Class
Precompiled third-party libraries (such as Apache Tephra
TransactionManager) reference Guava'scom.google.common.base.Stopwatchclass.Stopwatchexposed a public parameterless constructor (@Deprecated public Stopwatch()).Stopwatch()), causing an immediate runtimejava.lang.IllegalAccessErrorwhen precompiled classes callnew Stopwatch().elapsedTime(TimeUnit)andelapsedMillis().com.google.common.base.Stopwatchinside thecdap-commonmodule at Stopwatch.java.Ticker, making the deprecated constructors public again.Duration elapsed()) and retains legacyelapsedTime(TimeUnit)andelapsedMillis()methods.IllegalAccessErrororNoSuchMethodError.1.7. Apache Twill DefaultZKClientService Shadow Compatibility Class
Precompiled ZooKeeper client configurations in Apache Twill (
twill-zookeeper) implement interfaces extending Guava'scom.google.common.util.concurrent.Service. Under modern Guava 32, theServiceinterface introduces abstract lifecycle control methods (e.g.,startAsync(),stopAsync(),awaitRunning(),awaitTerminated(), andfailureCause()). Because Twill was compiled against older Guava, its implementations are missing these method declarations, resulting in compilation and runtime link errors.Solution implemented:
org.apache.twill.internal.zookeeper.DefaultZKClientServiceinside thecdap-commonmodule at DefaultZKClientService.java.serviceDelegatewhich extends Guava's modernAbstractService):failureCause(),startAsync(),stopAsync(),awaitRunning(),awaitTerminated().Futures.addCallback()) inside the creation routines to explicitly passMoreExecutors.directExecutor(), matching Guava 32's deprecation of the two-argument convenience API.twill-zookeeper.jarat classloading and compilation times.1.8. Legacy Guava InputSupplier and OutputSupplier Dummy Shims
Multiple compiled third-party libraries reference legacy interfaces
com.google.common.io.InputSupplierandcom.google.common.io.OutputSupplier. These interfaces were deprecated and completely removed in Guava 32, causing immediate runtimeClassNotFoundExceptionor compilation failure in transitively linked modules.Solution implemented:
cdap-commonmatching the original Guava packages:getInput(),getOutput()), satisfying runtime class reference resolving without shading the downstream dependency trees.1.9. Apache Twill ZooKeeper Client Subclasses Shadow Shims
Precompiled subclasses of Twill's ZooKeeper client system (
twill-zookeeper) contain internal async callback registrations that call the legacy two-argumentFutures.addCallback(future, callback)signature. Since Guava 32 completely removed the two-argument convenience API in favor of explicit executor execution, invoking these precompiled routines triggers a fataljava.lang.NoSuchMethodErrorat runtime.Solution implemented:
cdap-common:Futures.addCallbackinvocation in these files to explicitly specify eitherMoreExecutors.directExecutor()or their correspondingThreads.SAME_THREAD_EXECUTOR, matching Guava 32's modern runtime signatures.twill-zookeeper.jaron the classpath at runtime.1.10. ZK Client Refcounted Wrapper Deadlock Fix in
KafkaClientModuleIn CDAP's shared Guice container test environments, multiple components (e.g.
BrokerServiceandKafkaClientService) share a reference-countedZKClientServicecreated by KafkaClientModule.java.ForwardingZKClientServicewrapper decrements a reference count onstopAsync(). It only stops the physical ZK Client when the refcount reaches 0.awaitTerminated()was hard-delegated to the underlying physical service.stopAsync().awaitTerminated()(which is a blocking operation), because the refcount is still > 0, the physical service is not stopped, causingawaitTerminated()to block the test thread indefinitely, resulting in a circular deadlock.awaitTerminated()andawaitTerminated(timeout, unit)inside the anonymous wrapper at KafkaClientModule.java.startedCount.get() == 0before blocking on the physical service. If the logical wrapper's refcount is still > 0, it returns immediately, matching the legacy non-blockingListenableFuturebehavior and completely resolving the container deadlock.1.11. Standalone Sandbox Bootstrapping & Guava Service Lifecycle Modernization
In step with the platform Guava 32 upgrades, the main bootstrap entrypoint for standalone setups,
StandaloneMain.java, must be updated.startAndWait()andstopAndWait()) to their proper asynchronous non-blocking equivalents (startAsync().awaitRunning()andstopAsync().awaitTerminated()), maintaining runtime thread compliance with modern Guava specifications.1.12. Wrangler Service Modular Reflection (GSON) Fix under JDK 17
When booting the CDAP Standalone Sandbox under Java 17, the newly updated services environment throws a critical reflection barrier error during the Wrangler (
dataprep) application deployment.java.time.Instant). Under Java 17's strict modules architecture (strong encapsulation properties), this triggers a fataljava.lang.reflect.InaccessibleObjectException(modulejava.basedoes not openjava.timeto unnamed module), causing theCapabilityManagementServiceto completely abort the dataprep app deployment. As a consequence, Wrangler REST queries return a global404 Not Foundresponse and the service reports a503 Service Unavailableerror at the health endpoint.--add-opens java.base/java.time=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.security/java.security=ALL-UNNAMEDto theJAVA_OPTSconfiguration inside the standalone environment launcher.1.13. Wrangler vs. Pipeline Studio Connection Storage Isolation
Within CDAP architecture, Pipeline Studio and Wrangler (Dataprep) utilize distinct REST endpoints and isolated internal database backends to persist registered user data:
connections_storesystem database table. Endpoints map to:/api/v3/namespaces/system/apps/pipeline/services/studio/methods/v1/contexts/default/connections.connectionsdatabase table. Endpoints map to:/api/v3/namespaces/system/apps/dataprep/services/service/methods/contexts/default/connections.1.14. Wrangler BigQuery Connection Property & Authentication Mapping
When registering a BigQuery connection directly via the Wrangler API to explore datasets, the property mapping configuration must be carefully realigned to conform to Wrangler's specific handlers:
datasetProjectproperty with secondary settings (e.g.project: "auto-detect"). Wrangler's backend instead maps properties directly using standard Google Cloud Client SDK signatures, requiring explicit target projects to be labeled asprojectId(camelCase). SupplyingdatasetProjectto Wrangler's endpoint causes the SDK to look up datasets under the sandbox's local execution project instead of the target project, listing 0 datasets."service-account-keyfile": "auto-detect". Passing this string literal to Wrangler's REST connections endpoint causes the service to try loading a literal local file named"auto-detect", raising a fatal I/O failure. To fall back to system Application Default Credentials (ADC) under Wrangler, all keyfile/path properties must be completely omitted from the payload properties.2. Apache Kafka Plugins Upgrades
2.1. Java 17 Target Configurations
We upgraded the standalone
hydrator-plugins/kafka-pluginsrepository to align its build targets and compiler specifications with Java 17.Solution implemented:
<guava.version>from13.0.1to32.1.3-jreand configuredmaven-compiler-pluginto compile with<source>17</source>and<target>17</target>.--add-opensflags within themaven-surefire-pluginarguments in pom.xml to enable JDK 17 deep reflection:2.2. Transitive Classpath Overrides (
commons-text)During pipeline testing, Spark components trigger a runtime classpath conflict resulting in
NoSuchMethodErrordue to transitive dependencies loading an older version ofcommons-textlacking DNS parsing lookups.Solution implemented:
commons-textversion1.10.0inside the<dependencyManagement>block ofkafka-plugins/pom.xmlto force the resolution of modern, compliant library classes across the hydrator runtime.2.3. Embedded Kafka Test Server Lifecycle Refactoring
To support modern Guava 32 lifecycles in the test suites, we refactored the embedded Kafka test server boundaries across the following test suite files:
kafkaServer.startAndWait()andkafkaServer.stopAndWait()calls with modern asynchronous non-throwing boundaries:kafkaServer.startAsync().awaitRunning();kafkaServer.stopAsync().awaitTerminated();KafkaStreamingSourceStateStoreFailureTest.javaKafkaStreamingSourceStateStoreTest.javaKafkaStreamingSourceStateStoreRecoveryTest.javaKafkaStreamingSourceBegginingOffsetTest.javaKafkaStreamingSourceLastOffsetTest.javaKafkaSinkAndAlertsPublisherTest.javaKafkaStreamingSourceTest.javaAbstractKafkaBatchSourceTest.javaKafkaStreamingSourceSpecificOffsetTest.java3. Verification Outcomes & Results
All upgrades have been successfully built, validated, and verified locally.
3.1. CDAP Core Build Status
02:30 min. Compiled classes successfully assembled and installed into the local cache usingmaven-jar-plugin:3.5.0.3.2. Kafka Plugins Compilation Status
kafka-plugins:41.48 secondswith Java 17 targeted class file generations.3.3. Kafka Plugins Test Validation
KafkaSinkAndAlertsPublisherTest: BUILD SUCCESS (2/2 Tests Passed) in02:38 min.KafkaBatchSourceTest: BUILD SUCCESS (2/2 Tests Passed) in02:02 min.IllegalStateExceptionorNoSuchMethodErrorcrashes.3.4. E2E Sandbox Developer Sanity Verification Suite
To confirm integration reliability across the complete system (AppFabric, Spark, Core Plugins, and Local Datasets) under the Java 17 and Guava 32 upgrades, a Python-based developer sanity validation script is deployed.
core-pluginsmodule.RUNNING->COMPLETED).curl -s http://localhost:11015/v3/namespaces/default/apps/SanityCopyPipeline/workflows/DataPipelineWorkflow/runs/<run_id>/logsCOMPLETED).