Skip to content

Commit d5316a9

Browse files
authored
feat(tokenization): implement atomic batch tokenize and detokenize endpoints (#119)
- Added support for atomic batch processing in the Tokenization Engine. - Implemented CreateBatch and GetBatchByTokens in both PostgreSQL and MySQL repositories with optimized batch retrieval. - Added TokenizeBatch and DetokenizeBatch to the usecase layer, ensuring atomicity via TxManager. - Exposed new REST API endpoints: - POST /v1/tokenization/keys/:name/tokenize-batch - POST /v1/tokenization/detokenize-batch - Defined batch request/response DTOs with validation (enforcing a 100-item limit). - Updated the metrics decorator to track latency and operations for batch tasks. - Injected TxManager into the tokenization usecase through the DI container. - Added comprehensive unit tests for DTOs, usecases, and handlers. - Expanded integration flow tests to verify success and atomicity across both database engines. - Updated OpenAPI specification and Engine documentation with batch usage examples.
1 parent 6f6eb2b commit d5316a9

26 files changed

Lines changed: 1941 additions & 2 deletions
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Track batch_tokenization_20260309 Context
2+
3+
- [Specification](./spec.md)
4+
- [Implementation Plan](./plan.md)
5+
- [Metadata](./metadata.json)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"track_id": "batch_tokenization_20260309",
3+
"type": "feature",
4+
"status": "new",
5+
"created_at": "2026-03-09T14:30:00Z",
6+
"updated_at": "2026-03-09T14:30:00Z",
7+
"description": "Add Batch Tokenize/Detokenize Endpoints (wrap existing single-item logic in a loop with transaction)"
8+
}
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Implementation Plan: Batch Tokenize/Detokenize Endpoints
2+
3+
This plan outlines the steps to implement batch tokenization and detokenization endpoints in the Secrets manager.
4+
5+
## Phase 1: Domain and Repository Layer
6+
7+
- [x] Task: Define Batch Tokenization Interfaces [e8a8cee]
8+
- [ ] Add `CreateBatch` to `TokenRepository` interface in `internal/tokenization/domain/token.go`.
9+
- [ ] Add `GetBatchByTokens` to `TokenRepository` interface in `internal/tokenization/domain/token.go`.
10+
- [x] Task: Implement Batch Repository Methods (PostgreSQL) [517777c]
11+
- [ ] Implement `CreateBatch` in `internal/tokenization/repository/postgresql/token_repository.go`.
12+
- [ ] Implement `GetBatchByTokens` in `internal/tokenization/repository/postgresql/token_repository.go`.
13+
- [ ] Write integration tests for these methods (tagged with `//go:build integration`).
14+
- [x] Task: Implement Batch Repository Methods (MySQL) [cc03816]
15+
- [ ] Implement `CreateBatch` in `internal/tokenization/repository/mysql/token_repository.go`.
16+
- [ ] Implement `GetBatchByTokens` in `internal/tokenization/repository/mysql/token_repository.go`.
17+
- [ ] Write integration tests for these methods (tagged with `//go:build integration`).
18+
- [x] Task: Implement Batch Usecase Logic [191cb29]
19+
- [ ] Add `TokenizeBatch` to `TokenizationUsecase` in `internal/tokenization/usecase/tokenization_usecase.go`.
20+
- [ ] Add `DetokenizeBatch` to `TokenizationUsecase` in `internal/tokenization/usecase/tokenization_usecase.go`.
21+
- [ ] Ensure both methods use `TxManager` for atomicity.
22+
- [ ] Implement the loop over existing single-item logic.
23+
- [ ] Write unit tests for the new usecase methods.
24+
- [x] Task: Conductor - User Manual Verification 'Phase 1: Domain and Repository Layer' (Protocol in workflow.md)
25+
26+
## Phase 2: HTTP Layer
27+
28+
- [x] Task: Define Request/Response DTOs [cc85bfe]
29+
- [ ] Create `TokenizeBatchRequest` and `TokenizeBatchResponse` in `internal/tokenization/http/dto.go` (or equivalent).
30+
- [ ] Create `DetokenizeBatchRequest` and `DetokenizeBatchResponse` in `internal/tokenization/http/dto.go`.
31+
- [ ] Implement validation rules (e.g., max 100 items).
32+
- [x] Task: Implement HTTP Handlers [ee3290b]
33+
- [ ] Implement `TokenizeBatch` handler in `internal/tokenization/http/tokenization_handler.go`.
34+
- [ ] Implement `DetokenizeBatch` handler in `internal/tokenization/http/tokenization_handler.go`.
35+
- [ ] Write unit tests for the new handlers in `internal/tokenization/http/tokenization_handler_test.go`.
36+
- [x] Task: Register Routes [ee3290b]
37+
- [ ] Add the new batch routes to the router in `internal/tokenization/http/tokenization_handler.go` (or `internal/app/di_tokenization.go`).
38+
- [x] Task: Conductor - User Manual Verification 'Phase 2: HTTP Layer' (Protocol in workflow.md)
39+
40+
## Phase 3: Documentation and Integration Testing
41+
42+
- [x] Task: Update Integration Flow Tests [efa3c2c]
43+
- [ ] Add batch operation test cases to `test/integration/tokenization_flow_test.go`.
44+
- [ ] Verify atomicity by intentionally failing one item in a batch.
45+
- [x] Task: Update OpenAPI Specification [fa50f71]
46+
- [ ] Add the new batch endpoints to `docs/openapi.yaml`.
47+
- [x] Task: Update Engine Documentation [9faab2e]
48+
- [ ] Update `docs/engines/tokenization.md` with examples of batch requests and responses.
49+
- [x] Task: Conductor - User Manual Verification 'Phase 3: Documentation and Integration Testing' (Protocol in workflow.md)
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Specification: Batch Tokenize/Detokenize Endpoints
2+
3+
## Overview
4+
This track introduces batch processing capabilities to the Tokenization Engine. Currently, tokenization and detokenization are performed on a single item at a time. This feature will add new endpoints to allow clients to tokenize or detokenize multiple items in a single request, wrapped in a database transaction for atomicity.
5+
6+
## Functional Requirements
7+
- **New Endpoints:**
8+
- `POST /v1/tokenization/keys/:name/tokenize-batch`: Batch tokenize a list of values using a named key.
9+
- `POST /v1/tokenization/detokenize-batch`: Batch detokenize a list of tokens.
10+
- **Batch Limit:** A configurable limit of 100 items per batch request will be enforced to ensure performance and prevent resource exhaustion.
11+
- **Atomicity:** Both batch endpoints MUST be atomic. If any single item in the batch fails (e.g., validation error, database failure), the entire request MUST fail, and any database changes MUST be rolled back.
12+
- **Request/Response Formats:**
13+
- `tokenize-batch`:
14+
- Request: `{"values": ["val1", "val2", ...]}`
15+
- Response: `{"tokens": ["token1", "token2", ...]}`
16+
- `detokenize-batch`:
17+
- Request: `{"tokens": ["token1", "token2", ...]}`
18+
- Response: `{"values": ["val1", "val2", ...]}`
19+
- **Documentation:**
20+
- Update `docs/engines/tokenization.md` to include batch operations.
21+
- Update `docs/openapi.yaml` with the new endpoint definitions.
22+
23+
## Non-Functional Requirements
24+
- **Performance:** Batch processing should be more efficient than multiple single-item calls by reducing network round-trips and utilizing a single database transaction.
25+
- **Security:** Standard capability validation (`tokenize` or `detokenize`) must be enforced for the batch operations.
26+
27+
## Acceptance Criteria
28+
- [ ] Clients can successfully tokenize up to 100 values in a single call.
29+
- [ ] Clients can successfully detokenize up to 100 tokens in a single call.
30+
- [ ] If any value in a `tokenize-batch` request is invalid, the entire request returns an error (400 Bad Request) and no tokens are created.
31+
- [ ] If any token in a `detokenize-batch` request is invalid, the entire request returns an error (400 Bad Request) and no values are returned.
32+
- [ ] The batch limit is enforced and returns a 400 Bad Request if exceeded.
33+
- [ ] Unit tests cover new domain logic, usecase methods, and HTTP handlers.
34+
- [ ] Integration tests in `test/integration/tokenization_flow_test.go` cover batch operations for both PostgreSQL and MySQL.
35+
- [ ] Documentation (`docs/engines/tokenization.md`) and OpenAPI spec (`docs/openapi.yaml`) are updated.
36+
37+
## Out of Scope
38+
- Partial success/failure handling for batch requests.
39+
- Asynchronous batch processing.

conductor/product.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ To provide a secure, developer-friendly, and lightweight secrets management plat
1414
## Core Features
1515
- **Secret Management (Storage):** Versioned, envelope-encrypted storage with support for arbitrary key-value pairs and strict path validation.
1616
- **Transit Engine (EaaS):** On-the-fly encryption/decryption of application data without database storage.
17-
- **Tokenization Engine:** Format-preserving tokens for sensitive data types like credit card numbers.
17+
- **Tokenization Engine:** Format-preserving tokens for sensitive data types like credit card numbers, with support for atomic batch processing.
1818
- **Auth Token Revocation:** Immediate invalidation of authentication tokens (single or client-wide) with full state management.
1919
- **Client Secret Rotation:** Self-service and administrative rotation of client secrets with automatic auth token revocation.
2020
- **Audit Logs:** HMAC-signed audit trails capturing every access attempt and policy evaluation, with support for advanced filtering by client and date range.

docs/engines/tokenization.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,27 @@ Example response (`201 Created`):
7474
}
7575
```
7676

77+
### Tokenize Data (Batch)
78+
79+
- **Endpoint**: `POST /v1/tokenization/keys/:name/tokenize-batch`
80+
- **Capability**: `encrypt`
81+
- **Body**: `items` (array of objects with `plaintext`, `metadata`, `ttl`).
82+
- **Limit**: Maximum 100 items per batch.
83+
84+
Generates tokens for multiple plaintext values in a single atomic operation. If any item fails (e.g., invalid format), the entire batch is rejected.
85+
86+
```bash
87+
curl -X POST http://localhost:8080/v1/tokenization/keys/payment-cards/tokenize-batch \
88+
-H "Authorization: Bearer <token>" \
89+
-H "Content-Type: application/json" \
90+
-d '{
91+
"items": [
92+
{ "plaintext": "NDUzMjAxNTExMjgzMDM2Ng==", "metadata": { "index": 1 } },
93+
{ "plaintext": "NTQ5ODAxNTExMjgzMDM2Nw==", "metadata": { "index": 2 } }
94+
]
95+
}'
96+
```
97+
7798
### Detokenize Data
7899

79100
- **Endpoint**: `POST /v1/tokenization/detokenize`
@@ -91,6 +112,24 @@ Example response (`200 OK`):
91112
}
92113
```
93114

115+
### Detokenize Data (Batch)
116+
117+
- **Endpoint**: `POST /v1/tokenization/detokenize-batch`
118+
- **Capability**: `decrypt`
119+
- **Body**: `{"tokens": ["string", "string"]}`
120+
- **Limit**: Maximum 100 tokens per batch.
121+
122+
Retrieves original plaintext values for multiple tokens in a single atomic operation.
123+
124+
```bash
125+
curl -X POST http://localhost:8080/v1/tokenization/detokenize-batch \
126+
-H "Authorization: Bearer <token>" \
127+
-H "Content-Type: application/json" \
128+
-d '{
129+
"tokens": ["4532015112830366", "5498015112830367"]
130+
}'
131+
```
132+
94133
### Validate and Revoke
95134

96135
- `POST /v1/tokenization/validate` (Capability: `read`) - Check if token is valid without returning plaintext.

docs/openapi.yaml

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -911,6 +911,46 @@ paths:
911911
$ref: "#/components/responses/ValidationError"
912912
"429":
913913
$ref: "#/components/responses/TooManyRequests"
914+
/v1/tokenization/keys/{name}/tokenize-batch:
915+
post:
916+
tags: [tokenization]
917+
summary: Tokenize multiple plaintexts in batch
918+
description: Generates tokens for multiple plaintext values in a single atomic operation.
919+
security:
920+
- bearerAuth: []
921+
parameters:
922+
- name: name
923+
in: path
924+
required: true
925+
schema:
926+
type: string
927+
requestBody:
928+
required: true
929+
content:
930+
application/json:
931+
schema:
932+
$ref: "#/components/schemas/TokenizeBatchRequest"
933+
responses:
934+
"201":
935+
description: Tokens created
936+
content:
937+
application/json:
938+
schema:
939+
$ref: "#/components/schemas/TokenizeBatchResponse"
940+
"401":
941+
$ref: "#/components/responses/Unauthorized"
942+
"403":
943+
$ref: "#/components/responses/Forbidden"
944+
"404":
945+
description: Tokenization key not found
946+
content:
947+
application/json:
948+
schema:
949+
$ref: "#/components/schemas/ErrorResponse"
950+
"422":
951+
$ref: "#/components/responses/ValidationError"
952+
"429":
953+
$ref: "#/components/responses/TooManyRequests"
914954
/v1/tokenization/detokenize:
915955
post:
916956
tags: [tokenization]
@@ -944,6 +984,40 @@ paths:
944984
$ref: "#/components/responses/ValidationError"
945985
"429":
946986
$ref: "#/components/responses/TooManyRequests"
987+
/v1/tokenization/detokenize-batch:
988+
post:
989+
tags: [tokenization]
990+
summary: Detokenize multiple tokens in batch
991+
description: Retrieves original plaintext values for multiple tokens in a single atomic operation.
992+
security:
993+
- bearerAuth: []
994+
requestBody:
995+
required: true
996+
content:
997+
application/json:
998+
schema:
999+
$ref: "#/components/schemas/DetokenizeBatchRequest"
1000+
responses:
1001+
"200":
1002+
description: Plaintexts resolved
1003+
content:
1004+
application/json:
1005+
schema:
1006+
$ref: "#/components/schemas/DetokenizeBatchResponse"
1007+
"401":
1008+
$ref: "#/components/responses/Unauthorized"
1009+
"403":
1010+
$ref: "#/components/responses/Forbidden"
1011+
"404":
1012+
description: One or more tokens not found
1013+
content:
1014+
application/json:
1015+
schema:
1016+
$ref: "#/components/schemas/ErrorResponse"
1017+
"422":
1018+
$ref: "#/components/responses/ValidationError"
1019+
"429":
1020+
$ref: "#/components/responses/TooManyRequests"
9471021
/v1/tokenization/validate:
9481022
post:
9491023
tags: [tokenization]
@@ -1455,6 +1529,16 @@ components:
14551529
type: integer
14561530
minimum: 1
14571531
required: [plaintext]
1532+
TokenizeBatchRequest:
1533+
type: object
1534+
properties:
1535+
items:
1536+
type: array
1537+
minItems: 1
1538+
maxItems: 100
1539+
items:
1540+
$ref: "#/components/schemas/TokenizeRequest"
1541+
required: [items]
14581542
TokenizeResponse:
14591543
type: object
14601544
properties:
@@ -1471,12 +1555,30 @@ components:
14711555
format: date-time
14721556
nullable: true
14731557
required: [token, created_at]
1558+
TokenizeBatchResponse:
1559+
type: object
1560+
properties:
1561+
items:
1562+
type: array
1563+
items:
1564+
$ref: "#/components/schemas/TokenizeResponse"
1565+
required: [items]
14741566
DetokenizeRequest:
14751567
type: object
14761568
properties:
14771569
token:
14781570
type: string
14791571
required: [token]
1572+
DetokenizeBatchRequest:
1573+
type: object
1574+
properties:
1575+
tokens:
1576+
type: array
1577+
minItems: 1
1578+
maxItems: 100
1579+
items:
1580+
type: string
1581+
required: [tokens]
14801582
DetokenizeResponse:
14811583
type: object
14821584
properties:
@@ -1487,6 +1589,14 @@ components:
14871589
type: object
14881590
additionalProperties: true
14891591
required: [plaintext]
1592+
DetokenizeBatchResponse:
1593+
type: object
1594+
properties:
1595+
items:
1596+
type: array
1597+
items:
1598+
$ref: "#/components/schemas/DetokenizeResponse"
1599+
required: [items]
14901600
ValidateTokenRequest:
14911601
type: object
14921602
properties:

internal/app/di_tokenization.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,11 @@ func (c *Container) initTokenizationKeyUseCase(
249249
func (c *Container) initTokenizationUseCase(
250250
ctx context.Context,
251251
) (tokenizationUseCase.TokenizationUseCase, error) {
252+
txManager, err := c.TxManager(ctx)
253+
if err != nil {
254+
return nil, fmt.Errorf("failed to get tx manager for tokenization use case: %w", err)
255+
}
256+
252257
tokenizationKeyRepository, err := c.TokenizationKeyRepository(ctx)
253258
if err != nil {
254259
return nil, fmt.Errorf(
@@ -279,6 +284,7 @@ func (c *Container) initTokenizationUseCase(
279284
}
280285

281286
baseUseCase := tokenizationUseCase.NewTokenizationUseCase(
287+
txManager,
282288
tokenizationKeyRepository,
283289
tokenRepository,
284290
dekRepository,

internal/http/server.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,12 @@ func (s *Server) registerTokenizationRoutes(
408408
authHTTP.AuthorizationMiddleware(authDomain.EncryptCapability, auditLogUseCase, s.logger),
409409
tokenizationHandler.TokenizeHandler,
410410
)
411+
412+
// Tokenize batch of plaintexts with tokenization key
413+
keys.POST("/:name/tokenize-batch",
414+
authHTTP.AuthorizationMiddleware(authDomain.EncryptCapability, auditLogUseCase, s.logger),
415+
tokenizationHandler.TokenizeBatchHandler,
416+
)
411417
}
412418

413419
// Detokenize token to retrieve plaintext
@@ -416,6 +422,12 @@ func (s *Server) registerTokenizationRoutes(
416422
tokenizationHandler.DetokenizeHandler,
417423
)
418424

425+
// Detokenize batch of tokens to retrieve plaintexts
426+
tokenization.POST("/detokenize-batch",
427+
authHTTP.AuthorizationMiddleware(authDomain.DecryptCapability, auditLogUseCase, s.logger),
428+
tokenizationHandler.DetokenizeBatchHandler,
429+
)
430+
419431
// Validate token existence and validity
420432
tokenization.POST("/validate",
421433
authHTTP.AuthorizationMiddleware(authDomain.ReadCapability, auditLogUseCase, s.logger),

0 commit comments

Comments
 (0)