Skip to content

Commit 7152565

Browse files
committed
feat: db:transform command
1 parent faf830a commit 7152565

2 files changed

Lines changed: 75 additions & 4 deletions

File tree

docs/devlog/2025-05-14.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,35 @@ export default {
2020
Transformation2()
2121
]
2222
}
23-
```
23+
```
24+
25+
- [Vite `Plugin` interface](https://github.com/vitejs/vite/blob/c7c17434968848f1471179c10a5fc9d2804add8b/packages/vite/src/node/plugin.ts#L82)
26+
- [Sample plugin](https://github.com/mammadataei/vite-plugin-graphiql/blob/792548d23402cfa30e2bc758b35c62a0f3bfdffa/src/node/index.ts#L14)
27+
28+
Alright, this shape feels good to me so far and allows for options and customization in the config to be passed in. Good starting point 👍
29+
30+
Thinking through the process naively:
31+
32+
1. start with an existing table with a shape like `api` + `endpoint` or a straight table name so we're able to transform and link from transformed tables
33+
2. indicate the columns with the data we want to use
34+
3. indicate how the data needs to change; this will happen in-flight so DuckDB can infer the type when creating a new table
35+
4. create a new table with a specific name and insert
36+
37+
I have to keep reminding myself that the config file is 100% created and maintained by data owners but the transformations could be provided by contributors and used by data owners. I keep caught up in the web of different ways this can be configured (or could in the future). Just focus on the happy path use case!
38+
39+
This is working well so far and I'm seeing places where I can write an interface for plugins instead of allowing the full DB. Running into this problem:
40+
41+
https://github.com/duckdb/duckdb/discussions/9558
42+
43+
I can pull results out as JSON but can't seem to create a table from those results.
44+
45+
I refactored everything to just use interfaces and it's working really well, IMHO. I need to build a few more of these to make sure it all makes sense but Im happy with where this landed and am looking forward to working on the output based on this.
46+
47+
```ts
48+
export interface DbTransformation {
49+
getSourceTable: () => string;
50+
getSourceColumns: () => string[];
51+
transform: (data: object) => object;
52+
getDestinationTable: () => string;
53+
}
54+
```

src/commands/db/transform.ts

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,60 @@
1+
import { Connection } from "duckdb-async";
2+
13
import { DbBaseCommand } from "./_base.js";
24
import getConfig from "../../utils/config.js";
5+
import path from "path";
6+
import { removeFile, writeFile } from "../../utils/fs.js";
7+
8+
export interface DbTransformation {
9+
getSourceTable: () => string;
10+
getSourceColumns: () => string[];
11+
transform: (data: object) => object;
12+
getDestinationTable: () => string;
13+
}
14+
15+
export interface DuckDBConnection extends Connection {}
316

417
export default class TransformDb extends DbBaseCommand<typeof DbBaseCommand> {
518
static override summary = "Build new tables from existing or transformed data";
619

720
static override examples = ["<%= config.bin %> <%= command.id %>"];
821

922
public async run(): Promise<void> {
10-
const { transformations } = getConfig();
23+
const { transformations, dbOutputDir } = getConfig();
1124

1225
if (!transformations.length) {
13-
throw new Error("No trasformations to run.");
26+
throw new Error("No transformations to run.");
1427
}
1528

1629
for (const transformation of transformations) {
17-
console.log(transformation);
30+
const sourceTable = transformation.getSourceTable();
31+
const sourceCols = transformation.getSourceColumns();
32+
const results = await this.dbConn.all(`
33+
SELECT ${sourceCols.join(", ")}
34+
FROM '${sourceTable}'
35+
`);
36+
const transformed = transformation.transform(results);
37+
const destinationTable = transformation.getDestinationTable();
38+
39+
// DuckDB does not support reading from a stringified JSON object
40+
// https://github.com/duckdb/duckdb/discussions/9558
41+
const jsonTmpFile = path.join(dbOutputDir, `${destinationTable}.json`);
42+
writeFile(jsonTmpFile, JSON.stringify(transformed));
43+
44+
await this.dbConn.all(`
45+
DROP TABLE IF EXISTS "${destinationTable}"
46+
`);
47+
48+
await this.dbConn.all(`
49+
CREATE TABLE "${destinationTable}" AS
50+
SELECT *
51+
FROM read_json('${jsonTmpFile}')
52+
`);
53+
54+
removeFile(jsonTmpFile);
55+
console.log(
56+
`Created table ${destinationTable} from ${sourceTable} with ${results.length} rows`
57+
);
1858
}
1959
}
2060
}

0 commit comments

Comments
 (0)