Skip to content

Commit 13e391a

Browse files
authored
Merge pull request #106 from data-catering/feature/api-doc
Add in pre/post processing scripts to run script, add in missing chna…
2 parents 3f665fd + e82b070 commit 13e391a

21 files changed

Lines changed: 380 additions & 3 deletions

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ and deep dive into issues [from the generated report](https://data.catering/late
3838

3939
1. Docker
4040
```shell
41-
docker run -d -i -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.16.8
41+
docker run -d -i -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.16.9
4242
```
4343
[Open localhost:9898](http://localhost:9898).
4444
1. [Run Scala/Java examples](#run-scalajava-examples)

docs/docs/deployment.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,49 @@ Then you can run the following:
2828
docker build -t <my_image_name>:<my_image_tag> .
2929
```
3030

31+
### Docker Pre and Post Processing Scripts
32+
33+
Data Caterer supports running custom scripts before and after the main data generation process when deployed via Docker. This is useful for setup tasks, cleanup operations, notifications, or integrating with external systems.
34+
35+
#### Configuration
36+
37+
Configure pre and post processing scripts using environment variables:
38+
39+
| Environment Variable | Description | Default |
40+
| -------------------------- | ---------------------------------------------------------- | --------- |
41+
| `PRE_PROCESSOR_SCRIPT` | Path to script to run before Data Caterer execution | (empty) |
42+
| `POST_PROCESSOR_SCRIPT` | Path to script to run after Data Caterer execution | (empty) |
43+
| `POST_PROCESSOR_CONDITION` | When to run post processor: `success`, `failure`, `always` | `success` |
44+
45+
#### Usage Example
46+
47+
```shell
48+
docker run -d \
49+
-e PRE_PROCESSOR_SCRIPT="/opt/app/scripts/setup.sh" \
50+
-e POST_PROCESSOR_SCRIPT="/opt/app/scripts/cleanup.sh" \
51+
-e POST_PROCESSOR_CONDITION="always" \
52+
-v /path/to/scripts:/opt/app/scripts \
53+
datacatering/data-caterer:0.16.9
54+
```
55+
56+
#### Script Execution Behavior
57+
58+
- **Pre-processor**: Runs before Data Caterer starts
59+
- If the script fails, Data Caterer execution is stopped
60+
- Script must be executable and return exit code 0 for success
61+
- **Post-processor**: Runs after Data Caterer completes, based on condition:
62+
- `success`: Only runs if Data Caterer exit code is 0
63+
- `failure`: Only runs if Data Caterer exit code is non-zero
64+
- `always`: Runs regardless of Data Caterer exit code
65+
- If post-processor fails, the original Data Caterer exit code is preserved
66+
67+
#### Error Handling
68+
69+
- Scripts are executed with bash and include comprehensive error logging
70+
- Missing script files generate warnings but don't stop execution
71+
- All script output is logged with clear prefixes (pre/post processor)
72+
- The final exit code is always Data Caterer's original exit code
73+
3174
## Helm
3275

3376
[Link to sample helm on GitHub here](https://github.com/data-catering/data-caterer/tree/main/example/helm/data-caterer)

docs/get-started/quick-start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ cd data-caterer-example && ./run.sh simple-json.yaml
8585

8686
1. Docker
8787
```shell
88-
docker run -d -i -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.16.8
88+
docker run -d -i -p 9898:9898 -e DEPLOY_MODE=standalone --name datacaterer datacatering/data-caterer:0.16.9
8989
```
9090
2. [Open localhost:9898](http://localhost:9898)
9191

docs/use-case/changelog/0.15.0.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,17 @@ Deployed: 20-02-2025
1111
Latest feature and fixes for Data Catering include:
1212

1313
- Add in `rabbitmq` as a data source
14+
- [Check RabbitMQ documentation here](../../docs/guide/data-source/messaging/rabbitmq.md)
1415
- Add in `bigquery` as a data source
16+
- [Check BigQuery documentation here](../../docs/guide/data-source/database/bigquery.md)
1517
- Allow for empty sequences to be generated for per field counts
18+
- [Check count per field documentation here](../../docs/generator/count.md#per-field)
1619
- Calculate number of records generated based on foreign key definitions
20+
- [Check foreign key documentation here](../../docs/generator/foreign-key.md)
1721
- Unpersist DataFrame after generating data to avoid OOM errors
1822
- Update to use `jakarta.jms` v3.1.x
1923
- Use `sol-jms-jakarta` for JMS messaging to Solace
24+
- [Check Solace documentation here](../../docs/guide/data-source/messaging/solace.md)
2025
- Introduce `uuid` [data generation for random unique strings](../../docs/generator/data-generator.md#string)
2126
- Introduce `oneOfWeighted` data generation for weighted random selection from set of values
2227
- Can be used for [fields](../../docs/generator/data-generator.md#all-data-types) or [record count](../../docs/generator/count.md#weighted)

docs/use-case/changelog/0.15.1.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: "Data Caterer 0.15.1 release notes"
3+
description: "Ensure increment starting number uses long data type, clear rdds from memory, fix unique data logic, and add additional tests."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.15.1
8+
9+
Deployed: 17-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Ensure increment starting number uses `long` data type
14+
- Ensure all rdds are cleared from memory after each batch
15+
- Fix unique data logic
16+
- Add in additional flag to enable/disable unique check only per batch
17+
- Add additional tests

docs/use-case/changelog/0.15.2.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: "Data Caterer 0.15.2 release notes"
3+
description: "Various performance improvements including bloom filters for unique value checking, updated libraries, and new API configurations."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.15.2
8+
9+
Deployed: 18-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Various performance improvements
14+
- Don't call `df.rdd` when zipping with index in foreign key logic
15+
- Don't call `df.rdd` when checking for unique values
16+
- When passing metadata to nested fields, don't re-create dataframe
17+
- Use `unionByName` instead of checking if dataframe is empty then running `union`
18+
- Set `enableSinkMetadata` to false by default
19+
- [Check configuration documentation here](../../docs/configuration.md#flags)
20+
- New unique value checking logic using bloom filters
21+
- Add `uniqueBloomFilterNumItems` to generation config
22+
- Add `uniqueBloomFilterFalsePositiveProbability` to generation config
23+
- [Check unique generation tuning documentation here](../../docs/configuration.md#unique-generation-tuning)
24+
- Update default Spark memory settings
25+
- Update `netty` and `jsonsmart` libraries due to vulnerabilities
26+
- Add `enableUniqueCheckOnlyInBatch` to Scala and Java API
27+
- [Check configuration documentation here](../../docs/configuration.md#flags)

docs/use-case/changelog/0.15.3.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: "Data Caterer 0.15.3 release notes"
3+
description: "Ensure correct field options are parsed from YAML to real-time data sources."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.15.3
8+
9+
Deployed: 19-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Ensure correct field options are parsed from YAML to real-time data sources

docs/use-case/changelog/0.15.4.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
title: "Data Caterer 0.15.4 release notes"
3+
description: "Fix bug with using messageHeaders from YAML not being parsed correctly."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.15.4
8+
9+
Deployed: 20-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Fix bug with using `messageHeaders` from YAML not being parsed correctly
14+
- [Check message headers documentation here](../../docs/guide/data-source/messaging/rabbitmq.md#message-headers)
15+
- [Check Kafka documentation here](../../docs/guide/data-source/messaging/kafka.md)
16+
- [Check Solace documentation here](../../docs/guide/data-source/messaging/solace.md)

docs/use-case/changelog/0.16.1.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: "Data Caterer 0.16.1 release notes"
3+
description: "Fixed issue relating to matching tasks from YAML and tasks from metadata."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.16.1
8+
9+
Deployed: 21-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Fixed issue relating to matching tasks from YAML and tasks from metadata

docs/use-case/changelog/0.16.2.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: "Data Caterer 0.16.2 release notes"
3+
description: "Add referenceMode to data generation config, fix bug when using nested fields with foreign key relationships, change order of data generation precedence, and add enableFastGeneration for optimizations."
4+
image: "https://data.catering/diagrams/logo/data_catering_logo.svg"
5+
---
6+
7+
# 0.16.2
8+
9+
Deployed: 22-09-2025
10+
11+
Latest feature and fixes for Data Catering include:
12+
13+
- Add in `referenceMode` to data generation config
14+
- `enableReferenceMode` to step and connection task builders and YAML
15+
- [Check configuration documentation here](../../docs/configuration.md)
16+
- Fix bug when using nested fields with foreign key relationships
17+
- [Check foreign key documentation here](../../docs/generator/foreign-key.md)
18+
- Change order of data generation precedence
19+
- Now order is `oneOf`, `sql`, `expression`, `regex`, `random`
20+
- Previous order was `regex`, `oneOf`, `expression`, `sql`, `random`
21+
- [Check data generation documentation here](../../docs/generator/data-generator.md)
22+
- Fix bug for deeply nested SQL fields not being applied correctly
23+
- [Check SQL field generation documentation here](../../docs/generator/data-generator.md#sql)
24+
- Add in `enableFastGeneration` for automatically applying optimizations for faster completion of data generation
25+
- [Check fast generation mode documentation here](../../docs/configuration.md#fast-generation-mode)

0 commit comments

Comments
 (0)