Changes in ingestion flow to support docling chunking component. by ricofurtado · Pull Request #900 · langflow-ai/openrag

ricofurtado · 2026-02-05T04:42:02Z

Changes in default ingestion flow so it can support docling-based chunking.

mpawlow

@ricofurtado

Code Review 1

See PR comments: (a) to (f)
Note: No functional review was performed as part of this code review

flows/ingestion_flow.json

mpawlow · 2026-02-19T14:15:24Z

flows/ingestion_flow.json

+                "show": true,
+                "title_case": false,
+                "type": "code",
+                "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n    display_name: str = \"Chunk DoclingDocument\"\n    description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n    documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n    icon = \"Docling\"\n    name = \"ChunkDoclingDocument\"\n\n    inputs = [\n        HandleInput(\n            name=\"data_inputs\",\n            display_name=\"Data or DataFrame\",\n            info=\"The data with documents to split in chunks.\",\n            input_types=[\"Data\", \"DataFrame\"],\n            required=True,\n        ),\n        DropdownInput(\n            name=\"chunker\",\n            display_name=\"Chunker\",\n            options=[\"HybridChunker\", \"HierarchicalChunker\"],\n            info=(\"Which chunker to use.\"),\n            value=\"HybridChunker\",\n            real_time_refresh=True,\n        ),\n        DropdownInput(\n            name=\"provider\",\n            display_name=\"Provider\",\n            options=[\"Hugging Face\", \"OpenAI\"],\n            info=(\"Which tokenizer provider.\"),\n            value=\"Hugging Face\",\n            show=True,\n            real_time_refresh=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"hf_model_name\",\n            display_name=\"HF model name\",\n            info=(\n                \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n            ),\n            value=\"sentence-transformers/all-MiniLM-L6-v2\",\n            show=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"openai_model_name\",\n            display_name=\"OpenAI model name\",\n            info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n            value=\"gpt-4o\",\n            show=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        IntInput(\n            name=\"max_tokens\",\n            display_name=\"Maximum tokens\",\n            info=(\"Maximum number of tokens for the HybridChunker.\"),\n            show=True,\n            required=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        MessageTextInput(\n            name=\"doc_key\",\n            display_name=\"Doc Key\",\n            info=\"The key to use for the DoclingDocument column.\",\n            value=\"doc\",\n            advanced=True,\n        ),\n    ]\n\n    outputs = [\n        Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n    ]\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n        if field_name == \"chunker\":\n            provider_type = build_config[\"provider\"][\"value\"]\n            is_hf = provider_type == \"Hugging Face\"\n            is_openai = provider_type == \"OpenAI\"\n            if field_value == \"HybridChunker\":\n                build_config[\"provider\"][\"show\"] = True\n                build_config[\"hf_model_name\"][\"show\"] = is_hf\n                build_config[\"openai_model_name\"][\"show\"] = is_openai\n                build_config[\"max_tokens\"][\"show\"] = True\n            else:\n                build_config[\"provider\"][\"show\"] = False\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = False\n                build_config[\"max_tokens\"][\"show\"] = False\n        elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n            if field_value == \"Hugging Face\":\n                build_config[\"hf_model_name\"][\"show\"] = True\n                build_config[\"openai_model_name\"][\"show\"] = False\n            elif field_value == \"OpenAI\":\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = True\n\n        return build_config\n\n    def _docs_to_data(self, docs) -> list[Data]:\n        return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n    def chunk_documents(self) -> DataFrame:\n        documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n        chunker: BaseChunker\n        if self.chunker == \"HybridChunker\":\n            try:\n                from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n            except ImportError as e:\n                msg = (\n                    \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n                    \"or `uv pip install transformers`\"\n                )\n                raise ImportError(msg) from e\n            max_tokens: int | None = self.max_tokens if self.max_tokens else None\n            if self.provider == \"Hugging Face\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n                except ImportError as e:\n                    msg = (\n                        \"HuggingFaceTokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                    )\n                    raise ImportError(msg) from e\n                tokenizer = HuggingFaceTokenizer.from_pretrained(\n                    model_name=self.hf_model_name,\n                    max_tokens=max_tokens,\n                )\n            elif self.provider == \"OpenAI\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n                except ImportError as e:\n                    msg = (\n                        \"OpenAITokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                        \" or `uv pip install transformers`\"\n                    )\n                    raise ImportError(msg) from e\n                if max_tokens is None:\n                    max_tokens = 128 * 1024  # context window length required for OpenAI tokenizers\n                tokenizer = OpenAITokenizer(\n                    tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n                )\n            chunker = HybridChunker(\n                tokenizer=tokenizer,\n            )\n        elif self.chunker == \"HierarchicalChunker\":\n            chunker = HierarchicalChunker()\n\n        results: list[Data] = []\n        try:\n            for doc in documents:\n                for chunk in chunker.chunk(dl_doc=doc):\n                    enriched_text = chunker.contextualize(chunk=chunk)\n                    meta = DocMeta.model_validate(chunk.meta)\n\n                    results.append(\n                        Data(\n                            data={\n                                \"text\": enriched_text,\n                                \"document_id\": f\"{doc.origin.binary_hash}\",\n                                \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n                            }\n                        )\n                    )\n\n        except Exception as e:\n            msg = f\"Error splitting text: {e}\"\n            raise TypeError(msg) from e\n\n        return DataFrame(results)\n"


(b) [Normal] tokenizer variable unbound in HybridChunker path

Similar to (a)

Within the HybridChunker branch, UnboundLocalError if provider is neither value

if self.provider == "Hugging Face": tokenizer = HuggingFaceTokenizer.from_pretrained(...) elif self.provider == "OpenAI": tokenizer = OpenAITokenizer(...) chunker = HybridChunker(tokenizer=tokenizer) # <<<<<<<<<<<

If self.provider has an unexpected value, tokenizer is unbound.

Potential Solution: Add an else branch with an error message

mpawlow · 2026-02-19T14:19:01Z

flows/ingestion_flow.json

+                "show": true,
+                "title_case": false,
+                "type": "code",
+                "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n    display_name: str = \"Chunk DoclingDocument\"\n    description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n    documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n    icon = \"Docling\"\n    name = \"ChunkDoclingDocument\"\n\n    inputs = [\n        HandleInput(\n            name=\"data_inputs\",\n            display_name=\"Data or DataFrame\",\n            info=\"The data with documents to split in chunks.\",\n            input_types=[\"Data\", \"DataFrame\"],\n            required=True,\n        ),\n        DropdownInput(\n            name=\"chunker\",\n            display_name=\"Chunker\",\n            options=[\"HybridChunker\", \"HierarchicalChunker\"],\n            info=(\"Which chunker to use.\"),\n            value=\"HybridChunker\",\n            real_time_refresh=True,\n        ),\n        DropdownInput(\n            name=\"provider\",\n            display_name=\"Provider\",\n            options=[\"Hugging Face\", \"OpenAI\"],\n            info=(\"Which tokenizer provider.\"),\n            value=\"Hugging Face\",\n            show=True,\n            real_time_refresh=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"hf_model_name\",\n            display_name=\"HF model name\",\n            info=(\n                \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n            ),\n            value=\"sentence-transformers/all-MiniLM-L6-v2\",\n            show=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"openai_model_name\",\n            display_name=\"OpenAI model name\",\n            info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n            value=\"gpt-4o\",\n            show=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        IntInput(\n            name=\"max_tokens\",\n            display_name=\"Maximum tokens\",\n            info=(\"Maximum number of tokens for the HybridChunker.\"),\n            show=True,\n            required=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        MessageTextInput(\n            name=\"doc_key\",\n            display_name=\"Doc Key\",\n            info=\"The key to use for the DoclingDocument column.\",\n            value=\"doc\",\n            advanced=True,\n        ),\n    ]\n\n    outputs = [\n        Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n    ]\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n        if field_name == \"chunker\":\n            provider_type = build_config[\"provider\"][\"value\"]\n            is_hf = provider_type == \"Hugging Face\"\n            is_openai = provider_type == \"OpenAI\"\n            if field_value == \"HybridChunker\":\n                build_config[\"provider\"][\"show\"] = True\n                build_config[\"hf_model_name\"][\"show\"] = is_hf\n                build_config[\"openai_model_name\"][\"show\"] = is_openai\n                build_config[\"max_tokens\"][\"show\"] = True\n            else:\n                build_config[\"provider\"][\"show\"] = False\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = False\n                build_config[\"max_tokens\"][\"show\"] = False\n        elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n            if field_value == \"Hugging Face\":\n                build_config[\"hf_model_name\"][\"show\"] = True\n                build_config[\"openai_model_name\"][\"show\"] = False\n            elif field_value == \"OpenAI\":\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = True\n\n        return build_config\n\n    def _docs_to_data(self, docs) -> list[Data]:\n        return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n    def chunk_documents(self) -> DataFrame:\n        documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n        chunker: BaseChunker\n        if self.chunker == \"HybridChunker\":\n            try:\n                from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n            except ImportError as e:\n                msg = (\n                    \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n                    \"or `uv pip install transformers`\"\n                )\n                raise ImportError(msg) from e\n            max_tokens: int | None = self.max_tokens if self.max_tokens else None\n            if self.provider == \"Hugging Face\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n                except ImportError as e:\n                    msg = (\n                        \"HuggingFaceTokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                    )\n                    raise ImportError(msg) from e\n                tokenizer = HuggingFaceTokenizer.from_pretrained(\n                    model_name=self.hf_model_name,\n                    max_tokens=max_tokens,\n                )\n            elif self.provider == \"OpenAI\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n                except ImportError as e:\n                    msg = (\n                        \"OpenAITokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                        \" or `uv pip install transformers`\"\n                    )\n                    raise ImportError(msg) from e\n                if max_tokens is None:\n                    max_tokens = 128 * 1024  # context window length required for OpenAI tokenizers\n                tokenizer = OpenAITokenizer(\n                    tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n                )\n            chunker = HybridChunker(\n                tokenizer=tokenizer,\n            )\n        elif self.chunker == \"HierarchicalChunker\":\n            chunker = HierarchicalChunker()\n\n        results: list[Data] = []\n        try:\n            for doc in documents:\n                for chunk in chunker.chunk(dl_doc=doc):\n                    enriched_text = chunker.contextualize(chunk=chunk)\n                    meta = DocMeta.model_validate(chunk.meta)\n\n                    results.append(\n                        Data(\n                            data={\n                                \"text\": enriched_text,\n                                \"document_id\": f\"{doc.origin.binary_hash}\",\n                                \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n                            }\n                        )\n                    )\n\n        except Exception as e:\n            msg = f\"Error splitting text: {e}\"\n            raise TypeError(msg) from e\n\n        return DataFrame(results)\n"


(c) [Normal] Dead code: _docs_to_data method in ChunkDoclingDocument

The component defines a helper method that is never called

def _docs_to_data(self, docs) -> list[Data]: return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]

mpawlow · 2026-02-19T14:23:36Z

flows/ingestion_flow.json

+                "show": true,
+                "title_case": false,
+                "type": "code",
+                "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n    display_name: str = \"Chunk DoclingDocument\"\n    description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n    documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n    icon = \"Docling\"\n    name = \"ChunkDoclingDocument\"\n\n    inputs = [\n        HandleInput(\n            name=\"data_inputs\",\n            display_name=\"Data or DataFrame\",\n            info=\"The data with documents to split in chunks.\",\n            input_types=[\"Data\", \"DataFrame\"],\n            required=True,\n        ),\n        DropdownInput(\n            name=\"chunker\",\n            display_name=\"Chunker\",\n            options=[\"HybridChunker\", \"HierarchicalChunker\"],\n            info=(\"Which chunker to use.\"),\n            value=\"HybridChunker\",\n            real_time_refresh=True,\n        ),\n        DropdownInput(\n            name=\"provider\",\n            display_name=\"Provider\",\n            options=[\"Hugging Face\", \"OpenAI\"],\n            info=(\"Which tokenizer provider.\"),\n            value=\"Hugging Face\",\n            show=True,\n            real_time_refresh=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"hf_model_name\",\n            display_name=\"HF model name\",\n            info=(\n                \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n            ),\n            value=\"sentence-transformers/all-MiniLM-L6-v2\",\n            show=True,\n            advanced=True,\n            dynamic=True,\n        ),\n        StrInput(\n            name=\"openai_model_name\",\n            display_name=\"OpenAI model name\",\n            info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n            value=\"gpt-4o\",\n            show=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        IntInput(\n            name=\"max_tokens\",\n            display_name=\"Maximum tokens\",\n            info=(\"Maximum number of tokens for the HybridChunker.\"),\n            show=True,\n            required=False,\n            advanced=True,\n            dynamic=True,\n        ),\n        MessageTextInput(\n            name=\"doc_key\",\n            display_name=\"Doc Key\",\n            info=\"The key to use for the DoclingDocument column.\",\n            value=\"doc\",\n            advanced=True,\n        ),\n    ]\n\n    outputs = [\n        Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n    ]\n\n    def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n        if field_name == \"chunker\":\n            provider_type = build_config[\"provider\"][\"value\"]\n            is_hf = provider_type == \"Hugging Face\"\n            is_openai = provider_type == \"OpenAI\"\n            if field_value == \"HybridChunker\":\n                build_config[\"provider\"][\"show\"] = True\n                build_config[\"hf_model_name\"][\"show\"] = is_hf\n                build_config[\"openai_model_name\"][\"show\"] = is_openai\n                build_config[\"max_tokens\"][\"show\"] = True\n            else:\n                build_config[\"provider\"][\"show\"] = False\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = False\n                build_config[\"max_tokens\"][\"show\"] = False\n        elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n            if field_value == \"Hugging Face\":\n                build_config[\"hf_model_name\"][\"show\"] = True\n                build_config[\"openai_model_name\"][\"show\"] = False\n            elif field_value == \"OpenAI\":\n                build_config[\"hf_model_name\"][\"show\"] = False\n                build_config[\"openai_model_name\"][\"show\"] = True\n\n        return build_config\n\n    def _docs_to_data(self, docs) -> list[Data]:\n        return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n    def chunk_documents(self) -> DataFrame:\n        documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n        chunker: BaseChunker\n        if self.chunker == \"HybridChunker\":\n            try:\n                from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n            except ImportError as e:\n                msg = (\n                    \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n                    \"or `uv pip install transformers`\"\n                )\n                raise ImportError(msg) from e\n            max_tokens: int | None = self.max_tokens if self.max_tokens else None\n            if self.provider == \"Hugging Face\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n                except ImportError as e:\n                    msg = (\n                        \"HuggingFaceTokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                    )\n                    raise ImportError(msg) from e\n                tokenizer = HuggingFaceTokenizer.from_pretrained(\n                    model_name=self.hf_model_name,\n                    max_tokens=max_tokens,\n                )\n            elif self.provider == \"OpenAI\":\n                try:\n                    from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n                except ImportError as e:\n                    msg = (\n                        \"OpenAITokenizer is not installed.\"\n                        \" Please install it with `uv pip install docling-core[chunking]`\"\n                        \" or `uv pip install transformers`\"\n                    )\n                    raise ImportError(msg) from e\n                if max_tokens is None:\n                    max_tokens = 128 * 1024  # context window length required for OpenAI tokenizers\n                tokenizer = OpenAITokenizer(\n                    tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n                )\n            chunker = HybridChunker(\n                tokenizer=tokenizer,\n            )\n        elif self.chunker == \"HierarchicalChunker\":\n            chunker = HierarchicalChunker()\n\n        results: list[Data] = []\n        try:\n            for doc in documents:\n                for chunk in chunker.chunk(dl_doc=doc):\n                    enriched_text = chunker.contextualize(chunk=chunk)\n                    meta = DocMeta.model_validate(chunk.meta)\n\n                    results.append(\n                        Data(\n                            data={\n                                \"text\": enriched_text,\n                                \"document_id\": f\"{doc.origin.binary_hash}\",\n                                \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n                            }\n                        )\n                    )\n\n        except Exception as e:\n            msg = f\"Error splitting text: {e}\"\n            raise TypeError(msg) from e\n\n        return DataFrame(results)\n"


(d) [Normal] Large value 128 * 1024 set as default max_tokens for OpenAI tokenizer

i.e.

if max_tokens is None: max_tokens = 128 * 1024 # context window length required for OpenAI tokenizers

A 128k-token default for chunk max size is enormous and would result in very few or no splits for most documents.

Question: Is this value indeed required? (according to the comment)

Note: The HuggingFaceTokenizer default from sentence-transformers/all-MiniLM-L6-v2 is 256 tokens (which might be more appropriate for RAG)

Recommend verifying the value against what Docling's OpenAITokenizer actually requires vs. what makes sense for retrieval.

mpawlow · 2026-02-19T14:27:44Z

flows/ingestion_flow.json

                "_input_type": "SecretStrInput",
                "advanced": false,
-                "display_name": "IBM watsonx.ai API Key",
+                "display_name": "OpenAI API Key",


(e) [Minor] DoclingRemote field label change are a breaking rename: "IBM watsonx.ai API Key" -> "OpenAI API Key"

Note: This isn't directly authored in this PR (it's an upstream lfx component update), but the effect for any existing WatsonX users of OpenRAG is that their configured field names would mismatch.

Recommend checking if FlowsService needs to handle this migration.

mpawlow · 2026-02-19T14:28:53Z

flows/ingestion_flow.json

-                "advanced": true,
-                "display_name": "API Base URL",
+                "advanced": false,
+                "display_name": "OpenAI API Base URL",


(f) [Minor] DoclingRemote field label change are a breaking rename: "API Base URL" -> "OpenAI API Base URL"

Similar to (e)

Note: This isn't directly authored in this PR (it's an upstream lfx component update), but the effect for any existing WatsonX users of OpenRAG is that their configured field names would mismatch.

Recommend checking if FlowsService needs to handle this migration.

ricofurtado · 2026-03-12T16:21:06Z

This pull request was substituted by #1113

Changes in ingestion flow to support docling chunking component.

b8bc2b4

ricofurtado requested review from edwinjosechittilappilly and matanor February 5, 2026 04:43

Changed default chunk node for hybrid, removed default file.

593103f

aimurphy mentioned this pull request Feb 6, 2026

[Docs]: Changes to ingestion flow components #910

Closed

1 task

ricofurtado requested review from Adam-Aghili and mpawlow February 12, 2026 20:38

mpawlow requested changes Feb 19, 2026

View reviewed changes

edwinjosechittilappilly linked an issue Feb 19, 2026 that may be closed by this pull request

[Bug]: [LF] Update Docling Chunck component in Langflow #995

Closed

2 tasks

edwinjosechittilappilly removed a link to an issue Feb 19, 2026

[Bug]: [LF] Update Docling Chunck component in Langflow #995

Closed

2 tasks

mpawlow mentioned this pull request Mar 2, 2026

[Bug]: Move away from using Langflow tweaks in langflow service in OpenRAG #901

Open

2 tasks

ricofurtado added 2 commits March 12, 2026 11:52

Update ingestion_flow.json

40e8d3c

Merge branch 'main' into docling-based-chunkers-support

9475c56

mpawlow mentioned this pull request Mar 13, 2026

feat: changes to ingestion flow - added docling chunker. #1113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes in ingestion flow to support docling chunking component.#900

Changes in ingestion flow to support docling chunking component.#900
ricofurtado wants to merge 4 commits intomainfrom
docling-based-chunkers-support

ricofurtado commented Feb 5, 2026

Uh oh!

mpawlow left a comment

Uh oh!

Uh oh!

mpawlow Feb 19, 2026

Uh oh!

mpawlow Feb 19, 2026

Uh oh!

mpawlow Feb 19, 2026

Uh oh!

mpawlow Feb 19, 2026

Uh oh!

mpawlow Feb 19, 2026

Uh oh!

ricofurtado commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ricofurtado commented Feb 5, 2026

Uh oh!

mpawlow left a comment

Choose a reason for hiding this comment

Code Review 1

Uh oh!

Uh oh!

mpawlow Feb 19, 2026

Choose a reason for hiding this comment

(b) [Normal] tokenizer variable unbound in HybridChunker path

Uh oh!

mpawlow Feb 19, 2026

Choose a reason for hiding this comment

(c) [Normal] Dead code: _docs_to_data method in ChunkDoclingDocument

Uh oh!

mpawlow Feb 19, 2026

Choose a reason for hiding this comment

(d) [Normal] Large value 128 * 1024 set as default max_tokens for OpenAI tokenizer

Uh oh!

mpawlow Feb 19, 2026

Choose a reason for hiding this comment

(e) [Minor] DoclingRemote field label change are a breaking rename: "IBM watsonx.ai API Key" -> "OpenAI API Key"

Uh oh!

mpawlow Feb 19, 2026

Choose a reason for hiding this comment

(f) [Minor] DoclingRemote field label change are a breaking rename: "API Base URL" -> "OpenAI API Base URL"

Uh oh!

ricofurtado commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants