pfc — DuckDB Extension for PFC-JSONL

You have compressed log archives on disk. To query them you normally decompress everything first — even if you only need one hour out of thirty days.

This extension changes that. Query .pfc files directly from DuckDB SQL. A block index tells the extension exactly which chunks of the file to decompress — the rest stays compressed.

Requires: The pfc_jsonl binary installed on your machine (Step 1 below). The extension calls it for decompression.

Platform: Linux x86_64 and macOS Apple Silicon (ARM64). No native Windows binary — Windows users must use WSL2 or a Linux machine.

INSTALL pfc FROM community;
LOAD pfc;
LOAD json;

SELECT
    line->>'$.level'   AS level,
    line->>'$.message' AS message
FROM read_pfc_jsonl('/var/log/events.pfc')
WHERE line->>'$.level' = 'ERROR';

What is PFC-JSONL?

PFC-JSONL is a high-performance compressed log format built for structured (JSONL) data. It achieves better compression than gzip and zstd on real log data while supporting random block access — meaning you can decompress only the time range you need.

Key properties:

Each file is split into independently compressible blocks
A .pfc.bidx binary index stores the byte offset and timestamp range of every block
The PFC binary can decompress any subset of blocks in a single call
Free for personal and open-source use — no account, no signup required

How It Works (Architecture)

┌──────────────────────────────────────────────────────────────┐
│ DuckDB                                                       │
│                                                              │
│  SELECT * FROM read_pfc_jsonl('events.pfc', ts_from=...)     │
│           │                                                  │
│  ┌────────▼──────────┐    reads     ┌─────────────────────┐  │
│  │  pfc extension    │─────────────▶│  events.pfc.bidx    │  │
│  │  (MIT, open src)  │  block index │  (block timestamps) │  │
│  └────────┬──────────┘              └─────────────────────┘  │
│           │ popen() / subprocess                              │
└───────────┼──────────────────────────────────────────────────┘
            │
            ▼
  ┌─────────────────────┐
  │  pfc_jsonl binary   │  ← proprietary, closed source
  │  (v3.4+, local)     │    contains BWT+rANS compression
  └─────────────────────┘
            │
            ▼
  decompressed JSON lines → back to DuckDB

The extension is a thin open-source wrapper — it reads the .bidx index in C++ to select which blocks are needed, then calls the PFC binary once to decompress only those blocks. The compression algorithm stays closed.

Installation

Step 1 — Install the PFC binary (once per machine)

The extension calls the pfc_jsonl binary for decompression. Download the latest release for your platform:

Linux x64:

curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help      # verify install

macOS (Apple Silicon M1/M2/M3/M4):

curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help      # verify install

macOS Intel (x64): Binary coming soon.

Custom path: Set PFC_JSONL_BINARY=/path/to/pfc_jsonl in your environment to override the default /usr/local/bin/pfc_jsonl.

Step 2 — Install the DuckDB extension

INSTALL pfc FROM community;
LOAD pfc;

Build from source (developers / early access)

git clone --recurse-submodules https://github.com/ImpossibleForge/pfc-duckdb
cd pfc-duckdb
GEN=ninja make release
# Extension at: build/release/extension/pfc/pfc.duckdb_extension

Usage

Basic query

LOAD pfc;

SELECT line FROM read_pfc_jsonl('/path/to/file.pfc');

Each row contains one raw JSON string in the line column. Use the DuckDB json extension to parse fields:

LOAD json;

SELECT
    line->>'$.timestamp' AS ts,
    line->>'$.level'     AS level,
    line->>'$.message'   AS message,
    line->>'$.service'   AS service
FROM read_pfc_jsonl('/path/to/file.pfc');

Timestamp-based block filtering

PFC files include a .pfc.bidx index with the timestamp range of each block. Pass ts_from and/or ts_to (Unix seconds) to skip entire blocks before decompression:

-- Only decompress blocks that overlap the given time window
SELECT line
FROM read_pfc_jsonl(
    '/path/to/file.pfc',
    ts_from = 1735689600,   -- 2026-01-01 00:00:00 UTC
    ts_to   = 1735775999    -- 2026-01-01 23:59:59 UTC
);

Convert a timestamp string to Unix seconds with epoch():

SELECT line
FROM read_pfc_jsonl(
    '/path/to/file.pfc',
    ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),
    ts_to   = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')
);

Combining block filter and row filter

ts_from/ts_to skip entire blocks (coarse, fast). Add a WHERE clause for row-level precision:

LOAD json;

SELECT line->>'$.message' AS msg
FROM read_pfc_jsonl(
    '/var/log/api.pfc',
    ts_from = epoch(TIMESTAMPTZ '2026-03-15 08:00:00+00'),
    ts_to   = epoch(TIMESTAMPTZ '2026-03-15 10:00:00+00')
)
WHERE line->>'$.level' = 'ERROR';

Analytics examples

LOAD json;

-- Error rate per hour
SELECT
    strftime(to_timestamp((line->>'$.ts')::BIGINT), '%Y-%m-%d %H:00') AS hour,
    count(*) FILTER (WHERE line->>'$.level' = 'ERROR') AS errors,
    count(*)                                            AS total
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY hour ORDER BY hour;

-- Top 10 slowest endpoints
SELECT
    line->>'$.path'                              AS endpoint,
    avg((line->>'$.duration_ms')::DOUBLE)        AS avg_ms,
    count(*)                                     AS requests
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY endpoint ORDER BY avg_ms DESC LIMIT 10;

API Reference

`read_pfc_jsonl(path [, ts_from, ts_to])`

Parameter	Type	Default	Description
`path`	VARCHAR	—	Path to the `.pfc` file. A `.pfc.bidx` index must exist at `path + ".bidx"`.
`ts_from`	BIGINT	0	Lower bound for block selection (Unix seconds). `0` = no lower bound.
`ts_to`	BIGINT	0	Upper bound for block selection (Unix seconds). `0` = no upper bound.

Returns: table with one column line VARCHAR — one row per decompressed JSON line.

Block filtering semantics: A block is included if its timestamp range [ts_start, ts_end] overlaps [ts_from, ts_to]. Blocks with unknown timestamps are always included. If both ts_from and ts_to are 0, all blocks are read.

File Requirements

File	Required	Description
`file.pfc`	yes	Compressed PFC-JSONL file
`file.pfc.bidx`	yes	Binary block index (requires PFC-JSONL v3.4+)

Generate both with the PFC binary:

pfc_jsonl compress input.jsonl output.pfc
# Produces: output.pfc  +  output.pfc.bidx

Note: The Docker image on Docker Hub (impossibleforge/pfc-jsonl) is a server-side compression tool. It is not required for using the DuckDB extension — you only need the standalone pfc_jsonl binary from GitHub Releases.

Performance

Block-level filtering can skip the majority of a file. Example: 30-day log file, 720 hourly blocks — a 1-hour query reads 1 block instead of 720.

Query range	Blocks read	Speedup (720-block file)
30 days	720/720	1×
1 day	~24/720	~30×
1 hour	~1/720	~720×

License

The PFC-JSONL binary is free for personal and open-source use — no account, no signup, no phone-home.

Commercial use requires a license. Contact: info@impossibleforge.com

Troubleshooting

Cannot open index file: /path/to/file.pfc.bidx The .pfc.bidx index is missing. Compress with PFC-JSONL v3.4+:

pfc_jsonl compress input.jsonl output.pfc

PFC binary not found at '/usr/local/bin/pfc_jsonl' Binary is missing or not executable. Re-run the curl install command, or set PFC_JSONL_BINARY=/path/to/pfc_jsonl.

popen() failed — could not start PFC binary subprocess The extension uses popen() to call the PFC binary. Windows is not supported — use WSL2 or a Linux machine.

ts_from (...) must be <= ts_to (...) You passed an inverted time range. Swap the values so ts_from comes before ts_to.

Related Projects

Project	Description
pfc-jsonl	The core binary — compress, decompress, query
pfc-fluentbit	Stream Fluent Bit logs directly to `.pfc` archives
pfc-migrate	Convert existing gzip/zstd/lz4 archives to PFC — local, S3, Azure, GCS
pfc-jsonl (PyPI)	Python package — `pip install pfc-jsonl`

License

The pfc DuckDB extension (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.github/workflows		.github/workflows
docs		docs
duckdb @ 7dbb2e6		duckdb @ 7dbb2e6
extension-ci-tools @ ef15a2a		extension-ci-tools @ ef15a2a
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
description.yml		description.yml
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pfc — DuckDB Extension for PFC-JSONL

What is PFC-JSONL?

How It Works (Architecture)

Installation

Step 1 — Install the PFC binary (once per machine)

Step 2 — Install the DuckDB extension

Build from source (developers / early access)

Usage

Basic query

Timestamp-based block filtering

Combining block filter and row filter

Analytics examples

API Reference

`read_pfc_jsonl(path [, ts_from, ts_to])`

File Requirements

Performance

License

Troubleshooting

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pfc — DuckDB Extension for PFC-JSONL

What is PFC-JSONL?

How It Works (Architecture)

Installation

Step 1 — Install the PFC binary (once per machine)

Step 2 — Install the DuckDB extension

Build from source (developers / early access)

Usage

Basic query

Timestamp-based block filtering

Combining block filter and row filter

Analytics examples

API Reference

read_pfc_jsonl(path [, ts_from, ts_to])

File Requirements

Performance

License

Troubleshooting

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`read_pfc_jsonl(path [, ts_from, ts_to])`

Packages