Skip to content

Commit 5b80d39

Browse files
authored
docs: Update version to 0.19.1, enhance foreign key features, and imp… (#130)
* docs: Update version to 0.19.1, enhance foreign key features, and improve documentation - Bump version in various files to 0.19.1. - Introduce cardinality and nullability controls for foreign key relationships. - Add new documentation for foreign key enhancements and update existing links. - Improve logging levels for better debugging. - Update Dockerfile and example configurations to reflect the new version. - Add feature catalog for standardized documentation of all features. * chore: Add spark.metrics.executorMetricsSource.enabled configuration - Updated configuration files to include "spark.metrics.executorMetricsSource.enabled" set to "false" for improved metrics handling. - Ensured consistency across application and test configurations by aligning the settings in Constants.scala and application.conf. - Modified SparkSuite to reflect the new configuration, enhancing test environment setup. * chore: Update Java distribution and enhance Spark configurations - Changed Java distribution from Oracle to Temurin in build and check workflows for improved compatibility. - Added "spark.executor.processTreeMetrics.enabled" configuration to multiple files to enhance Spark metrics handling. - Ensured consistency across application and test configurations by aligning the settings in Constants.scala, application.conf, and application-integration.conf.
1 parent 176baa3 commit 5b80d39

46 files changed

Lines changed: 13737 additions & 46 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
java-version: '17'
3636
java-package: jdk
3737
architecture: x64
38-
distribution: oracle
38+
distribution: temurin
3939
- name: Login to DockerHub
4040
uses: docker/login-action@v2
4141
with:

.github/workflows/check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
java-version: '17'
1616
java-package: jdk
1717
architecture: x64
18-
distribution: oracle
18+
distribution: temurin
1919
- name: Gradle build with cache
2020
uses: burrunan/gradle-cache-action@v1
2121
with:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ site
2121
# Python/virtualenvs used by docs
2222
.venv
2323
.python-version
24+
__pycache__
2425

2526
app/docs
2627
app/out

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Check results at `docker/data/custom/report/index.html`.
7676
### UI
7777

7878
```shell
79-
docker run -d -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.19.0
79+
docker run -d -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.19.1
8080
```
8181

8282
Open [http://localhost:9898](http://localhost:9898).

api/src/main/scala/io/github/datacatering/datacaterer/api/model/Constants.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,9 @@ object Constants {
288288
"spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled" -> "true",
289289
"spark.hadoop.fs.hdfs.impl" -> "org.apache.hadoop.hdfs.DistributedFileSystem",
290290
"spark.hadoop.fs.file.impl" -> "com.globalmentor.apache.hadoop.fs.BareLocalFileSystem",
291-
"spark.sql.extensions" -> "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
291+
"spark.sql.extensions" -> "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
292+
"spark.metrics.executorMetricsSource.enabled" -> "false",
293+
"spark.executor.processTreeMetrics.enabled" -> "false"
292294
)
293295

294296
//jdbc defaults

app/build.gradle.kts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,7 @@ tasks.shadowJar {
543543
val newTransformer = com.github.jengelman.gradle.plugins.shadow.transformers.AppendingTransformer()
544544
newTransformer.resource = "reference.conf"
545545
transformers.add(newTransformer)
546+
mergeServiceFiles()
546547
}
547548

548549
// Configure Scoverage only when it's applied (configuration cache disabled)

app/src/main/scala/io/github/datacatering/datacaterer/core/plan/ForeignKeyUniquenessProcessor.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class ForeignKeyUniquenessProcessor(val dataCatererConfiguration: DataCatererCon
4040
validations: List[ValidationConfiguration]
4141
): (Plan, List[Task], List[ValidationConfiguration]) = {
4242

43-
LOGGER.info("ForeignKeyUniquenessProcessor starting...")
43+
LOGGER.debug("ForeignKeyUniquenessProcessor starting...")
4444

4545
// Extract foreign keys from plan's sink options
4646
val foreignKeys = plan.sinkOptions.map(_.foreignKeys).getOrElse(List())

app/src/main/scala/io/github/datacatering/datacaterer/core/plan/PlanProcessor.scala

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
package io.github.datacatering.datacaterer.core.plan
22

33
import io.github.datacatering.datacaterer.api.PlanRun
4-
import io.github.datacatering.datacaterer.api.model.Constants.{DATA_CATERER_INTERFACE_JAVA, DATA_CATERER_INTERFACE_SCALA, DATA_CATERER_INTERFACE_YAML, PLAN_CLASS, PLAN_STAGE_EXTRACT_METADATA, PLAN_STAGE_PARSE_PLAN}
4+
import io.github.datacatering.datacaterer.api.model.Constants.{DATA_CATERER_INTERFACE_JAVA, DATA_CATERER_INTERFACE_SCALA, DATA_CATERER_INTERFACE_YAML, DEFAULT_STEP_TYPE, FORMAT, PLAN_CLASS, PLAN_STAGE_EXTRACT_METADATA, PLAN_STAGE_PARSE_PLAN}
55
import io.github.datacatering.datacaterer.api.model.{DataCatererConfiguration, Plan, Task, ValidationConfiguration}
66
import io.github.datacatering.datacaterer.core.activity.{PlanRunPostPlanProcessor, PlanRunPrePlanProcessor}
77
import io.github.datacatering.datacaterer.core.config.ConfigParser
@@ -118,7 +118,7 @@ object PlanProcessor {
118118
basePlan, baseTasks, baseValidations, dataCatererConfiguration, resolvedInterface
119119
)
120120

121-
LOGGER.info(s"After pre-processors: num-tasks=${finalTasks.size}")
121+
LOGGER.info(s"After pre-processors: num-tasks=${finalTasks.size}, task-names=${finalTasks.map(_.name).mkString(", ")}")
122122

123123
// Step 4: Generate data with the final modified plan/tasks
124124
val dataGeneratorProcessor = new DataGeneratorProcessor(dataCatererConfiguration)
@@ -358,7 +358,13 @@ class YamlPlanRun(
358358

359359
// Merge connection config into each step's options (connection config as base, step options override)
360360
val stepsWithConnectionConfig = task.steps.map(step => {
361-
step.copy(options = connectionConfig ++ step.options)
361+
val mergedOptions = connectionConfig ++ step.options
362+
val optionsWithFormat = if (!mergedOptions.contains(FORMAT) && step.`type` != DEFAULT_STEP_TYPE) {
363+
mergedOptions + (FORMAT -> step.`type`)
364+
} else {
365+
mergedOptions
366+
}
367+
step.copy(options = optionsWithFormat)
362368
})
363369

364370
task.copy(steps = stepsWithConnectionConfig)
@@ -386,7 +392,13 @@ class UnifiedPlanRun(
386392
val connectionConfig = dataCatererConfig.connectionConfigByName.getOrElse(dataSourceName, Map())
387393

388394
val stepsWithConnectionConfig = task.steps.map(step => {
389-
step.copy(options = connectionConfig ++ step.options)
395+
val mergedOptions = connectionConfig ++ step.options
396+
val optionsWithFormat = if (!mergedOptions.contains(FORMAT) && step.`type` != DEFAULT_STEP_TYPE) {
397+
mergedOptions + (FORMAT -> step.`type`)
398+
} else {
399+
mergedOptions
400+
}
401+
step.copy(options = optionsWithFormat)
390402
})
391403

392404
task.copy(steps = stepsWithConnectionConfig)

app/src/test/resources/application-integration.conf

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,10 @@ runtime {
110110
"spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled" = "true",
111111
"spark.hadoop.fs.hdfs.impl" = "org.apache.hadoop.hdfs.DistributedFileSystem",
112112
"spark.hadoop.fs.file.impl" = "com.globalmentor.apache.hadoop.fs.BareLocalFileSystem",
113-
"spark.sql.extensions" = "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
113+
"spark.sql.extensions" = "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
114+
"spark.metrics.executorMetricsSource.enabled" = "false",
115+
"spark.ui.enabled" = "false",
116+
"spark.executor.processTreeMetrics.enabled" = "false"
114117
}
115118
}
116119

app/src/test/resources/application.conf

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,10 @@ runtime{
4040
"spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled": "true",
4141
"spark.hadoop.fs.hdfs.impl": "org.apache.hadoop.hdfs.DistributedFileSystem",
4242
"spark.hadoop.fs.file.impl": "com.globalmentor.apache.hadoop.fs.BareLocalFileSystem",
43-
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
43+
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
44+
"spark.metrics.executorMetricsSource.enabled": "false",
45+
"spark.ui.enabled": "false",
46+
"spark.executor.processTreeMetrics.enabled": "false"
4447
}
4548
}
4649

0 commit comments

Comments
 (0)