Skip to content

feat: json2 experiment#7750

Draft
MichaelScofield wants to merge 1 commit intomainfrom
json2-experiment
Draft

feat: json2 experiment#7750
MichaelScofield wants to merge 1 commit intomainfrom
json2-experiment

Conversation

@MichaelScofield
Copy link
Copy Markdown
Collaborator

RUN CI ONLY, NO MERGE

@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Mar 4, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces experimental support for a new JSON format, Json2, aiming to provide more structured and type-aware handling of JSON data. The changes span across multiple modules, from defining the new format and its associated vector builder to updating SQL parsing, data type conversion, and schema management to correctly interpret and process Json2 types. This foundational work sets the stage for advanced JSON capabilities.

Highlights

  • New JSON Format (Json2) Introduction: Introduced a new JsonFormat::Json2 variant to support a more structured JSON data type, enabling specific handling and processing for this format throughout the system.
  • Json2 Vector Builder Implementation: Implemented a dedicated Json2VectorBuilder to manage and align Json2 data, including merging JSON types and ensuring values conform to the expected schema.
  • SQL Parsing and Schema Alignment for Json2: Updated SQL statement parsing to recognize 'JSON2' as a custom data type and modified schema alignment logic to dynamically adjust for JSON extension types in record batches.
  • Enhanced JSON Type Merging: Modified the JSON type merging function to include a lift parameter, allowing for more flexible handling of type conflicts by promoting conflicting types to a string when lift is true.
Changelog
  • src/api/src/helper.rs
    • Updated ConcreteDataType::json_native_datatype to initialize JsonType with JsonFormat::Json2 when no specific type is provided.
    • Added a case to handle JsonFormat::Json2 when converting ConcreteDataType to ColumnDataTypeWrapper.
  • src/common/sql/src/convert.rs
    • Modified parse_string_to_value to include JsonFormat::Json2 in the matching for JSON parsing, alongside JsonFormat::Native.
  • src/datatypes/src/types/json_type.rs
    • Imported arrow_schema::Fields and Json2VectorBuilder.
    • Added Json2 variant to the JsonFormat enum.
    • Implemented unimplemented!() for JsonFormat::Json2 in native_type() and as_struct_type() methods.
    • Modified the merge function signature to accept a lift boolean parameter and updated its internal calls.
    • Introduced a fallback to JsonNativeType::String in merge when type conflicts occur and lift is true.
    • Added string representation 'JSON2' for JsonFormat::Json2 in display_type_name.
    • Specified ArrowDataType::Struct(Fields::empty()) for JsonFormat::Json2 in as_arrow_type.
    • Instantiated Json2VectorBuilder for JsonFormat::Json2 in create_mutable_vector.
  • src/datatypes/src/vectors/json.rs
    • Added a new public module builder2.
  • src/datatypes/src/vectors/json/builder2.rs
    • Added a new file builder2.rs containing the Json2VectorBuilder struct and its implementation.
    • Implemented MutableVector trait for Json2VectorBuilder, including methods for data type, length, vector conversion, value pushing, and null handling.
    • Introduced align_json_value_with_type function to align JSON values with an expected JsonType.
  • src/mito2/src/memtable/bulk/part.rs
    • Imported is_json_extension_type for JSON type checking.
    • Added a public parts() method to UnorderedPart to access internal bulk parts.
    • Modified BulkPartConverter::build_record_batch to use a new align_schema_with_json_array function for schema creation.
    • Added align_schema_with_json_array function to dynamically adjust the schema of JSON columns based on the actual array data types.
  • src/mito2/src/memtable/time_series.rs
    • Updated field.push(field_value) to include error handling with unwrap_or_else.
  • src/operator/src/req_convert/insert/stmt_to_region.rs
    • Refined the JsonColumnTypeUpdater logic to specifically check if a ConcreteDataType::Json is a native type when identifying column types.
  • src/sql/src/statements.rs
    • Modified column_to_schema to detect 'JSON2' as a custom SQL data type and process it similarly to 'JSON'.
    • Updated sql_data_type_to_concrete_data_type to recognize 'JSON2' as a custom type and convert it to ConcreteDataType::Json(JsonType::new(JsonFormat::Json2)).
  • src/sql/src/statements/create.rs
    • Improved build_json_structure_settings to handle the JSON_OPT_FORMAT option more robustly, defaulting to JSON_FORMAT_FULL_STRUCTURED and returning an error for unknown formats.
Activity
  • The pull request is marked as 'RUN CI ONLY, NO MERGE', indicating it is an experimental or work-in-progress change intended for continuous integration testing rather than immediate merging.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an experimental Json2 data type. The changes are extensive, touching SQL parsing, data type definitions, and vector builders to support this new type. A key feature is the Json2VectorBuilder which performs on-the-fly schema inference for JSON data. My review has identified several places where unimplemented! or panic! macros are used, which could lead to runtime crashes. I've provided suggestions to improve error handling and make the implementation more robust.

Note: Security Review did not run due to the size of the PR.

Comment thread src/datatypes/src/types/json_type.rs Outdated
Comment thread src/datatypes/src/types/json_type.rs Outdated
Comment thread src/datatypes/src/vectors/json/builder2.rs Outdated
Comment thread src/mito2/src/memtable/time_series.rs
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 2 times, most recently from 97c3547 to 8debc99 Compare March 5, 2026 10:04
@github-actions github-actions bot added size/L and removed size/M labels Mar 5, 2026
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 3 times, most recently from 2b760c1 to 95e5678 Compare March 5, 2026 11:38
@github-actions github-actions bot added size/M and removed size/L labels Mar 6, 2026
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 2 times, most recently from a27c4f6 to 66677b6 Compare March 9, 2026 09:19
@github-actions github-actions bot added size/L and removed size/M labels Mar 9, 2026
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 3 times, most recently from 177b186 to 042c3c7 Compare March 18, 2026 06:08
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 3 times, most recently from a54224d to 391e4ef Compare April 1, 2026 03:14
@github-actions github-actions bot added size/XL and removed size/L labels Apr 2, 2026
@github-actions github-actions bot added size/L and removed size/XL labels Apr 2, 2026
@github-actions github-actions bot removed the size/L label Apr 3, 2026
@github-actions github-actions bot added size/L and removed size/XL labels Apr 3, 2026
@MichaelScofield MichaelScofield force-pushed the json2-experiment branch 7 times, most recently from 0679fb0 to d4ee21d Compare April 10, 2026 09:45
insert
flush
(?) query-driven and data-driven concretize
(?) select
(?) compaction

Signed-off-by: luofucong <luofc@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant