Skip to content

Commit 58e959a

Browse files
authored
Merge pull request #42 from reproio/fix-it
Fix #34 #41 Fix map values -> Parquet conversions
2 parents 6756698 + 0ccee22 commit 58e959a

65 files changed

Lines changed: 784 additions & 1625 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Makefile

Lines changed: 42 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,32 +23,48 @@ test:
2323

2424
.PHONY: it
2525
it: build
26-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType avro examples/record/primitives.avro > /dev/null
27-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType csv examples/record/primitives.csv > /dev/null
28-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType jsonl examples/record/primitives.jsonl > /dev/null
29-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType ltsv examples/record/primitives.ltsv > /dev/null
30-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType msgpack examples/record/primitives.msgpack > /dev/null
31-
./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType tsv examples/record/primitives.tsv > /dev/null
32-
./columnify -schemaType avro -schemaFile examples/schema/nested.avsc -recordType avro examples/record/nested.avro > /dev/null
33-
./columnify -schemaType avro -schemaFile examples/schema/nested.avsc -recordType jsonl examples/record/nested.jsonl > /dev/null
34-
./columnify -schemaType avro -schemaFile examples/schema/nested.avsc -recordType msgpack examples/record/nested.msgpack > /dev/null
35-
./columnify -schemaType avro -schemaFile examples/schema/array.avsc -recordType avro examples/record/array.avro > /dev/null
36-
./columnify -schemaType avro -schemaFile examples/schema/array.avsc -recordType jsonl examples/record/array.jsonl > /dev/null
37-
./columnify -schemaType avro -schemaFile examples/schema/array.avsc -recordType msgpack examples/record/array.msgpack > /dev/null
38-
./columnify -schemaType avro -schemaFile examples/schema/complicated.avsc -recordType avro examples/record/complicated.avro > /dev/null
39-
./columnify -schemaType avro -schemaFile examples/schema/complicated.avsc -recordType jsonl examples/record/complicated.json > /dev/null
40-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType avro examples/record/primitives.avro > /dev/null
41-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType csv examples/record/primitives.csv > /dev/null
42-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType jsonl examples/record/primitives.jsonl > /dev/null
43-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType ltsv examples/record/primitives.ltsv > /dev/null
44-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType msgpack examples/record/primitives.msgpack > /dev/null
45-
./columnify -schemaType bigquery -schemaFile examples/schema/primitives.bq.json -recordType tsv examples/record/primitives.tsv > /dev/null
46-
./columnify -schemaType bigquery -schemaFile examples/schema/nested.bq.json -recordType avro examples/record/nested.avro > /dev/null
47-
./columnify -schemaType bigquery -schemaFile examples/schema/nested.bq.json -recordType jsonl examples/record/nested.jsonl > /dev/null
48-
./columnify -schemaType bigquery -schemaFile examples/schema/nested.bq.json -recordType msgpack examples/record/nested.msgpack > /dev/null
49-
./columnify -schemaType bigquery -schemaFile examples/schema/array.bq.json -recordType avro examples/record/array.avro > /dev/null
50-
./columnify -schemaType bigquery -schemaFile examples/schema/array.bq.json -recordType jsonl examples/record/array.jsonl > /dev/null
51-
./columnify -schemaType bigquery -schemaFile examples/schema/array.bq.json -recordType msgpack examples/record/array.msgpack > /dev/null
26+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType avro columnifier/testdata/record/primitives.avro > /dev/null
27+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType csv columnifier/testdata/record/primitives.csv > /dev/null
28+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType jsonl columnifier/testdata/record/primitives.jsonl > /dev/null
29+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType ltsv columnifier/testdata/record/primitives.ltsv > /dev/null
30+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType msgpack columnifier/testdata/record/primitives.msgpack > /dev/null
31+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/primitives.avsc -recordType tsv columnifier/testdata/record/primitives.tsv > /dev/null
32+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullables.avsc -recordType avro columnifier/testdata/record/nullables.avro > /dev/null
33+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullables.avsc -recordType jsonl columnifier/testdata/record/nullables.jsonl > /dev/null
34+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullables.avsc -recordType msgpack columnifier/testdata/record/nullables.msgpack > /dev/null
35+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType avro columnifier/testdata/record/logicals.avro > /dev/null
36+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType csv columnifier/testdata/record/logicals.csv > /dev/null
37+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType jsonl columnifier/testdata/record/logicals.jsonl > /dev/null
38+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType ltsv columnifier/testdata/record/logicals.ltsv > /dev/null
39+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType msgpack columnifier/testdata/record/logicals.msgpack > /dev/null
40+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType tsv columnifier/testdata/record/logicals.tsv > /dev/null
41+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nested.avsc -recordType avro columnifier/testdata/record/nested.avro > /dev/null
42+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nested.avsc -recordType jsonl columnifier/testdata/record/nested.jsonl > /dev/null
43+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nested.avsc -recordType msgpack columnifier/testdata/record/nested.msgpack > /dev/null
44+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/array.avsc -recordType avro columnifier/testdata/record/array.avro > /dev/null
45+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/array.avsc -recordType jsonl columnifier/testdata/record/array.jsonl > /dev/null
46+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/array.avsc -recordType msgpack columnifier/testdata/record/array.msgpack > /dev/null
47+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType jsonl columnifier/testdata/record/logicals.jsonl > /dev/null
48+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType avro columnifier/testdata/record/logicals.avro > /dev/null
49+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/logicals.avsc -recordType msgpack columnifier/testdata/record/logicals.msgpack > /dev/null
50+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullable_complex.avsc -recordType avro columnifier/testdata/record/nullable_complex.avro > /dev/null
51+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullable_complex.avsc -recordType jsonl columnifier/testdata/record/nullable_complex.jsonl > /dev/null
52+
./columnify -schemaType avro -schemaFile columnifier/testdata/schema/nullable_complex.avsc -recordType msgpack columnifier/testdata/record/nullable_complex.msgpack > /dev/null
53+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType avro columnifier/testdata/record/primitives.avro > /dev/null
54+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType csv columnifier/testdata/record/primitives.csv > /dev/null
55+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType jsonl columnifier/testdata/record/primitives.jsonl > /dev/null
56+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType ltsv columnifier/testdata/record/primitives.ltsv > /dev/null
57+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType msgpack columnifier/testdata/record/primitives.msgpack > /dev/null
58+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/primitives.bq.json -recordType tsv columnifier/testdata/record/primitives.tsv > /dev/null
59+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nullables.bq.json -recordType avro columnifier/testdata/record/nullables.avro > /dev/null
60+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nullables.bq.json -recordType jsonl columnifier/testdata/record/nullables.jsonl > /dev/null
61+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nullables.bq.json -recordType msgpack columnifier/testdata/record/nullables.msgpack > /dev/null
62+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nested.bq.json -recordType avro columnifier/testdata/record/nested.avro > /dev/null
63+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nested.bq.json -recordType jsonl columnifier/testdata/record/nested.jsonl > /dev/null
64+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/nested.bq.json -recordType msgpack columnifier/testdata/record/nested.msgpack > /dev/null
65+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/array.bq.json -recordType avro columnifier/testdata/record/array.avro > /dev/null
66+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/array.bq.json -recordType jsonl columnifier/testdata/record/array.jsonl > /dev/null
67+
./columnify -schemaType bigquery -schemaFile columnifier/testdata/schema/array.bq.json -recordType msgpack columnifier/testdata/record/array.msgpack > /dev/null
5268

5369
# Set GITHUB_TOKEN and create release git tag
5470
.PHONY: release

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ $ cat examples/record/primitives.jsonl
4141
{"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}
4242
{"boolean": true, "int": 2, "long": 2, "float": 2.2, "double": 2.2, "bytes": "bar", "string": "bar"}
4343

44-
$ ./columnify -schemaType avro -schemaFile examples/schema/primitives.avsc -recordType jsonl examples/record/primitives.jsonl > out.parquet
44+
$ ./columnify -schemaType avro -schemaFile examples/primitives.avsc -recordType jsonl examples/primitives.jsonl > out.parquet
4545

4646
$ parquet-tools schema out.parquet
4747
message Primitives {
@@ -86,6 +86,14 @@ $ parquet-tools cat -json out.parquet
8686
- An example is `examples/fluent-plugin-s3`
8787
- It works as a Compressor of fluent-plugin-s3 write parquet file to tmp via chunk data.
8888

89+
## Limilations
90+
91+
Currently it has some limitations from schema/record types.
92+
93+
- Some logical types like Decimal are unsupported.
94+
- If using `-recordType = avro`, it doesn't support a nested record has only 1 sub field.
95+
- If using `-recordType = avro`, it converts bytes fields to base64 encoded value implicitly.
96+
8997
## Development
9098

9199
`Columnifier` reads input file(s), converts format based on given parameter, finally writes output files.

0 commit comments

Comments
 (0)