DuckDB JSON Tools Extension

A DuckDB extension for high-performance JSON normalization and manipulation.

This extension provides a set of utility functions to work with JSON data, focusing on performance-critical operations that are not covered by DuckDB's core JSON functionality.

WARNING: This extension is maintained on a best-effort basis by a developer who doesn’t write C++ professionally, so expect rough edges. It hasn’t been hardened for production, and you should validate it in your own environment before relying on it. Feedback and contributions are welcome via GitHub.

Core Features

json_flatten(json[, separator]): Recursively flattens nested JSON objects and arrays into a single-level object with path keys (default separator: .).
json_add_prefix(json, text): Adds a string prefix to every top-level key in a JSON object.
json_extract_columns(json, columns[, separator]): Pulls selected root keys into a struct of VARCHAR fields using regex patterns.
json_group_merge(json [ORDER BY ...]): Streams JSON patches with RFC 7396 merge semantics without materializing intermediate lists.

Quick Start

-- Build from source first (see Installation section)
LOAD './build/release/extension/json_tools/json_tools.duckdb_extension';

-- Flatten nested JSON
SELECT json_flatten('{"user": {"name": "Alice", "age": 30}}');
-- Result: {"user.name":"Alice","user.age":30}

-- Add prefix to keys
SELECT json_add_prefix('{"a": 1, "b": 2}', 'prefix.');
-- Result: {"prefix.a":1,"prefix.b":2}

Installation

Note: This extension is not yet available in the DuckDB community repository. Once published, it will be installable via INSTALL json_tools FROM community;

Current Installation (Development)

This extension must currently be built from source. See the Development section below for build instructions.

Once built, load it in DuckDB:

CLI:

duckdb -unsigned

LOAD './build/release/extension/json_tools/json_tools.duckdb_extension';

Python:

import duckdb
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions': 'true'})
con.execute("LOAD './build/release/extension/json_tools/json_tools.duckdb_extension'")

Usage

`json_flatten(json[, separator]) -> json`

Rewrites nested JSON structures into a flat object with “path” keys. By default, path segments are joined with ..

You can pass an optional separator (a 1-character constant VARCHAR) to reduce the risk of ambiguous paths when your input keys contain .:

SELECT json_flatten('{"a.b": {"c": 1}}', '/');

Result:

{"a.b/c":1}

No escaping is performed: if your input keys contain the chosen separator, the output can be ambiguous and key collisions are possible. Behavior on collisions is not specified.

Example:

SELECT json_flatten('{"outer": {"inner": [1, 2]}}');

Result:

{"outer.inner.0":1,"outer.inner.1":2}

It preserves UTF-8 characters in both keys and values:

SELECT json_flatten('{"日本語": "こんにちは", "emoji": "🚀"}');

Result:

{"日本語":"こんにちは","emoji":"🚀"}

Empty objects ({}) and arrays ([]) are omitted from the flattened output as they contain no leaf values:

SELECT json_flatten('{"data": [], "meta": {}, "count": 5}');

Result:

{"count":5}

`json_add_prefix(json, prefix) -> json`

Adds a given prefix to all top-level keys in a JSON object. This is useful for avoiding key collisions when merging data from different sources.

Example:

SELECT json_add_prefix('{"a": 1, "b": 2}', 'prefix.');

Result:

{"prefix.a":1,"prefix.b":2}

The prefix is added even to keys whose values are nested objects:

SELECT json_add_prefix('{"user": {"name": "Alice"}, "count": 5}', 'data_');

Result:

{"data_user":{"name":"Alice"},"data_count":5}

Note: This function requires the input to be a JSON object. It will raise an error if given a JSON array or primitive value.

`json_extract_columns(json, columns[, separator]) -> struct`

Extracts selected root-level fields into a struct of VARCHAR columns. The first argument must be a JSON object value (not an array or primitive). columns must be a constant JSON object mapping output column names to RE2 regex patterns evaluated against each top-level key (partial matches by default; add anchors to tighten). Patterns are case-sensitive unless you supply inline flags such as (?i). Output columns follow the mapping order.

separator defaults to '' and is inserted between multiple matches for the same column in the order keys appear in the input object. It can be empty but cannot be NULL (even when the JSON input is NULL). Columns with no matches return NULL.

Values are stringified: strings pass through unquoted; arrays, objects, numbers, booleans, and null become their JSON text.

Examples:

SELECT (json_extract_columns('{"id": 5, "name": "duck"}',
                             '{"id":"^id$","name":"^name$"}', ',')).id AS id;
-- Result: 5

SELECT (json_extract_columns('{"a":1,"a2":2,"b":3}',
                             '{"a":"^a","b":"^b$"}', '|')).a AS a_values;
-- Result: 1|2

SELECT (json_extract_columns('{"Key": "Value"}',
                             '{"k":"(?i)^key$"}', ',')).k AS case_insensitive;
-- Result: Value

SELECT (json_extract_columns('{"x":"a","xx":"b"}',
                             '{"col":"x"}')).col AS default_separator;
-- Result: ab

`json_group_merge(json_expr [, treat_null_values] [ORDER BY ...]) -> json`

Applies a sequence of JSON patches using RFC 7396 merge semantics. Inputs can be JSON values or VARCHAR text that parses as JSON. SQL NULL rows are skipped, and the aggregate returns '{}'::json when no non-null inputs are provided.

Provide an ORDER BY clause to guarantee deterministic results—later rows in the ordered stream overwrite earlier keys, arrays replace wholesale, and null removes keys. Pass the optional treat_null_values argument to override how object members set to JSON null are handled:

DELETE NULLS (default) — object members set to null delete the key.
IGNORE NULLS — null members are dropped before merging, so existing keys stay untouched. Null members in the first patch are also skipped (never seed keys with null values).

The treat_null_values argument must be a constant VARCHAR literal (case-insensitive, surrounding whitespace ignored).

Grouped aggregation:

SELECT session_id,
       json_group_merge(patch ORDER BY event_ts) AS state
FROM session_events
GROUP BY session_id
ORDER BY session_id;

Window accumulation:

SELECT customer_id,
       event_ts,
       json_group_merge(patch ORDER BY event_ts)
           OVER (PARTITION BY customer_id
                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS current_state
FROM customer_patch_stream;

Use descending order when you want the newest patch first:

SELECT json_group_merge(patch ORDER BY version DESC) AS latest
FROM config_history
WHERE feature = 'search';

Toggle the null handling mode when sparse streams should ignore deletes:

SELECT json_group_merge(patch, 'IGNORE NULLS' ORDER BY ts)
FROM (VALUES ('{"keep":1}'::json, 1), ('{"keep":null}'::json, 2)) AS t(patch, ts);

Result:

{"keep":1}

Error Handling

json_flatten() returns an error for malformed JSON
json_add_prefix() requires a JSON object (not array or primitive value)
json_extract_columns() requires a JSON object input and a constant JSON object of string regex patterns; it raises on invalid regexes, NULL separators, non-string object keys, or mismatched input shapes
json_group_merge() surfaces DuckDB JSON parse errors for invalid text and raises on merge buffers that exceed DuckDB limits
Maximum nesting depth: 1000 levels
Empty objects ({}) and arrays ([]) are omitted from flattened output

Development

Building from Source

This project uses VCPKG for dependency management.

Set up VCPKG:

git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
export VCPKG_TOOLCHAIN_PATH=`pwd`/vcpkg/scripts/buildsystems/vcpkg.cmake

(You may want to add the export line to your shell's profile, e.g., ~/.bashrc or ~/.zshrc)

Build the extension:
```
make
```

This will create the following binaries in the ./build/release directory:

duckdb: A shell with the extension pre-loaded.
test/unittest: The test runner.
extension/json_tools/json_tools.duckdb_extension: The distributable extension binary.

Running Tests

To run the SQL tests:

make test

Running Benchmarks

Performance benchmarks detect regressions. Run the full suite:

make release && uv run python bench/bench.py

See bench/README.md and bench/PROFILING.md for filtering, profiling, and baseline management.

Contributing

Contributions are welcome! Please:

Open an issue first to discuss proposed changes
Add tests for new functionality in test/sql/
Run make test before submitting a pull request

See GitHub Issues for current tasks and feature requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

This extension was originally based on the DuckDB Extension Template.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
bench		bench
docs		docs
duckdb @ d1dc88f		duckdb @ d1dc88f
extension-ci-tools @ af154d2		extension-ci-tools @ af154d2
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
extension_config.cmake		extension_config.cmake
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB JSON Tools Extension

Core Features

Quick Start

Installation

Current Installation (Development)

Usage

`json_flatten(json[, separator]) -> json`

`json_add_prefix(json, prefix) -> json`

`json_extract_columns(json, columns[, separator]) -> struct`

`json_group_merge(json_expr [, treat_null_values] [ORDER BY ...]) -> json`

Error Handling

Development

Building from Source

Running Tests

Running Benchmarks

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Flamefork/duckdb-json-tools

Folders and files

Latest commit

History

Repository files navigation

DuckDB JSON Tools Extension

Core Features

Quick Start

Installation

Current Installation (Development)

Usage

json_flatten(json[, separator]) -> json

json_add_prefix(json, prefix) -> json

json_extract_columns(json, columns[, separator]) -> struct

json_group_merge(json_expr [, treat_null_values] [ORDER BY ...]) -> json

Error Handling

Development

Building from Source

Running Tests

Running Benchmarks

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`json_flatten(json[, separator]) -> json`

`json_add_prefix(json, prefix) -> json`

`json_extract_columns(json, columns[, separator]) -> struct`

`json_group_merge(json_expr [, treat_null_values] [ORDER BY ...]) -> json`

Packages