Skip to content

jsynowiec/json2jsonl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

json2jsonl

A simple CLI tool for converting large JSON responses into JSON Lines (JSONL) format, which is more convenient for storing structured data that can be processed one record at a time.

Run Locally

git clone https://github.com/jsynowiec/json2jsonl && cd json2jsonl
uv run json2jsonl

Usage

json2jsonl [INPUT] [OPTIONS]

Arguments:
  INPUT    Path to input JSON file. Omit or use '-' for stdin.

Options:
  -o, --output PATH       Output file path. Omit or use '-' for stdout.
  --path JSONPATH         JSONPath selecting the element to operate on (default: '$')
  --extract JSONPATH      JSONPath (relative to --path) pointing to the array to extract.
                          Required when root element is an object and input is stdin.
  --no-parent-keys        Omit parent object fields when flattening nested arrays.
  --help                  Show help and exit.

Root Element

By default, the tool operates on the root element of the input JSON. Use --path to select a subtree using a JSONPath expression (default: $).

Extraction Element

By default, the tool extracts from the root element and only accepts an array. If the root element is an object, the tool expects a JSONPath relative to it that points to a key containing an array. When reading from a file, the array key is auto-detected; if multiple array keys exist, the tool prompts you to choose. When reading from stdin, --extract is required.

Data Extractions

Array of X

If the root element is an array, each value is written to a separate line in the output.

Input:

[
  {
    "span_id": "b1d4f2a8-3c7e-4b1d-8a2f-9e0c6d4b2a1f",
    "operation": "db.query",
    "duration_ms": 342,
    "tags": {
      "db.type": "postgres",
      "db.statement": "SELECT * FROM orders WHERE user_id = ?"
    }
  },
  {
    "span_id": "c2e5a3b9-4d8f-5c2e-9b3g-0f1d7e5c3b2g",
    "operation": "http.request",
    "duration_ms": 118,
    "tags": {
      "http.method": "POST",
      "http.status_code": "503"
    }
  }
]

Output:

{"span_id": "b1d4f2a8-3c7e-4b1d-8a2f-9e0c6d4b2a1f", "operation": "db.query", "duration_ms": 342, "tags": {"db.type": "postgres", "db.statement": "SELECT * FROM orders WHERE user_id = ?"}}
{"span_id": "c2e5a3b9-4d8f-5c2e-9b3g-0f1d7e5c3b2g", "operation": "http.request", "duration_ms": 118, "tags": {"http.method": "POST", "http.status_code": "503"}}

Nested Objects

If the root element is an object with one of its keys containing an array of nested objects, the CLI performs a lateral flatten operation. Each output line is an object with the parent fields merged in. This is the default behavior, but the parent keys can be omitted using the --no-parent-keys flag.

Input:

{
  "trace_id": "a3c2e1d4-7f6b-4a2e-9c8d-1b0f5e3a7c2d",
  "timestamp": "2024-11-15T14:32:07.341Z",
  "severity": "WARN",
  "spans": [
    {
      "span_id": "b1d4f2a8-3c7e-4b1d-8a2f-9e0c6d4b2a1f",
      "operation": "db.query",
      "duration_ms": 342,
      "tags": {
        "db.type": "postgres",
        "db.statement": "SELECT * FROM orders WHERE user_id = ?"
      }
    },
    {
      "span_id": "c2e5a3b9-4d8f-5c2e-9b3g-0f1d7e5c3b2g",
      "operation": "http.request",
      "duration_ms": 118,
      "tags": {
        "http.method": "POST",
        "http.status_code": "503"
      }
    }
  ]
}

Output:

{"trace_id": "a3c2e1d4-7f6b-4a2e-9c8d-1b0f5e3a7c2d", "timestamp": "2024-11-15T14:32:07.341Z", "severity": "WARN", "span_id": "b1d4f2a8-3c7e-4b1d-8a2f-9e0c6d4b2a1f", "operation": "db.query", "duration_ms": 342, "tags": {"db.type": "postgres", "db.statement": "SELECT * FROM orders WHERE user_id = ?"}}
{"trace_id": "a3c2e1d4-7f6b-4a2e-9c8d-1b0f5e3a7c2d", "timestamp": "2024-11-15T14:32:07.341Z", "severity": "WARN", "span_id": "c2e5a3b9-4d8f-5c2e-9b3g-0f1d7e5c3b2g", "operation": "http.request", "duration_ms": 118, "tags": {"http.method": "POST", "http.status_code": "503"}}

Behavior Notes

  • Input files larger than 100MB require confirmation before processing.
  • If the output file already exists, confirmation is required before overwriting.
  • If a parent key conflicts with an array item key during flattening, the item value wins and a warning is printed to stderr.
  • Output is always UTF-8, no BOM, with \n line terminators.

About

Simple CLI tool for converting large JSON responses into JSON Lines (JSONL) format.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages