Skip to content

feat: changes to ingestion flow - added docling chunker.#1113

Open
ricofurtado wants to merge 1 commit intomainfrom
new-ingestion-flow-for-docling-based-chunkers-support
Open

feat: changes to ingestion flow - added docling chunker.#1113
ricofurtado wants to merge 1 commit intomainfrom
new-ingestion-flow-for-docling-based-chunkers-support

Conversation

@ricofurtado
Copy link
Collaborator

@ricofurtado ricofurtado commented Mar 12, 2026

Changes in default ingestion flow so it can support docling-based chunking.


@github-actions github-actions bot added the enhancement 🔵 New feature or request label Mar 12, 2026
@ricofurtado ricofurtado requested a review from mpawlow March 12, 2026 16:18
@github-actions github-actions bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Mar 13, 2026
Copy link
Collaborator

@mpawlow mpawlow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricofurtado

Code Review 2

  • See PR comments: (2a) to (2e)

@@ -1,93 +1,6 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2a) [Blocker] flows/ingestion_flow.json is invalid JSON

  • Confirmed by Python JSON parser:
 json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2925 column 41 (char 255868)
  • The outputs array inside the OpenSearchVectorStoreComponentMultimodalMultiEmbedding node's template is unclosed
    • Fields from the ingest_data input template were spliced in, corrupting the structure. Langflow will fail to import the flow entirely.
  • Root cause?: merge commit 9475c56 introduced a broken merge of the flow JSON

@@ -1,93 +1,6 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2b) [Major] flows_service.py hardcodes "Split Text" display name

  • The PR removes the SplitText node from the flow, but flows_service.py still references it by display name "Split Text".
  • Calls to update_ingest_flow_chunk_size() and update_ingest_flow_chunk_overlap() via the Settings API will silently do nothing since the node lookup will return no match.
  • Affected lines: 911–916 and 918–927

@@ -1,93 +1,6 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2c) [Major] langflow_file_service.py hardcodes "SplitText-QIKhg"

  • The removed node SplitText-QIKhg is still referenced in langflow_file_service.py for per-run tweaks.
  • Any per-ingestion chunkSize, chunkOverlap, or separator settings will be sent to a non-existent node and silently dropped by Langflow.
  • Affected lines: 292–309

@@ -1,93 +1,6 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2d) [Major] api/langflow_files.py hardcodes "SplitText-PC36h"

  • A second stale SplitText node ID (SplitText-PC36h) is referenced in the file upload endpoint for tweaks.
  • Same silent-drop behavior as Issue (2c)
  • Affected lines: 96–104

@@ -1,93 +1,6 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2e) [Major] api/settings.py reads defaults from "SplitText-QIKhg"

  • GET /api/settings parses the live Langflow flow to populate chunkSize, chunkOverlap, and separator defaults.
  • Since SplitText-QIKhg is gone, this code will never match and the defaults will always fall back to YAML-config values instead of live flow values.
  • Affected lines: 277–290

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement 🔵 New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants