Changes in ingestion flow to support docling chunking component.#900
Open
ricofurtado wants to merge 4 commits intomainfrom
Open
Changes in ingestion flow to support docling chunking component.#900ricofurtado wants to merge 4 commits intomainfrom
ricofurtado wants to merge 4 commits intomainfrom
Conversation
1 task
mpawlow
requested changes
Feb 19, 2026
Collaborator
mpawlow
left a comment
There was a problem hiding this comment.
Code Review 1
- See PR comments: (a) to (f)
- Note: No functional review was performed as part of this code review
flows/ingestion_flow.json
Outdated
| "show": true, | ||
| "title_case": false, | ||
| "type": "code", | ||
| "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n display_name: str = \"Chunk DoclingDocument\"\n description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n icon = \"Docling\"\n name = \"ChunkDoclingDocument\"\n\n inputs = [\n HandleInput(\n name=\"data_inputs\",\n display_name=\"Data or DataFrame\",\n info=\"The data with documents to split in chunks.\",\n input_types=[\"Data\", \"DataFrame\"],\n required=True,\n ),\n DropdownInput(\n name=\"chunker\",\n display_name=\"Chunker\",\n options=[\"HybridChunker\", \"HierarchicalChunker\"],\n info=(\"Which chunker to use.\"),\n value=\"HybridChunker\",\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"provider\",\n display_name=\"Provider\",\n options=[\"Hugging Face\", \"OpenAI\"],\n info=(\"Which tokenizer provider.\"),\n value=\"Hugging Face\",\n show=True,\n real_time_refresh=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"hf_model_name\",\n display_name=\"HF model name\",\n info=(\n \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n ),\n value=\"sentence-transformers/all-MiniLM-L6-v2\",\n show=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"openai_model_name\",\n display_name=\"OpenAI model name\",\n info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n value=\"gpt-4o\",\n show=False,\n advanced=True,\n dynamic=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Maximum tokens\",\n info=(\"Maximum number of tokens for the HybridChunker.\"),\n show=True,\n required=False,\n advanced=True,\n dynamic=True,\n ),\n MessageTextInput(\n name=\"doc_key\",\n display_name=\"Doc Key\",\n info=\"The key to use for the DoclingDocument column.\",\n value=\"doc\",\n advanced=True,\n ),\n ]\n\n outputs = [\n Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n ]\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n if field_name == \"chunker\":\n provider_type = build_config[\"provider\"][\"value\"]\n is_hf = provider_type == \"Hugging Face\"\n is_openai = provider_type == \"OpenAI\"\n if field_value == \"HybridChunker\":\n build_config[\"provider\"][\"show\"] = True\n build_config[\"hf_model_name\"][\"show\"] = is_hf\n build_config[\"openai_model_name\"][\"show\"] = is_openai\n build_config[\"max_tokens\"][\"show\"] = True\n else:\n build_config[\"provider\"][\"show\"] = False\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = False\n build_config[\"max_tokens\"][\"show\"] = False\n elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n if field_value == \"Hugging Face\":\n build_config[\"hf_model_name\"][\"show\"] = True\n build_config[\"openai_model_name\"][\"show\"] = False\n elif field_value == \"OpenAI\":\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = True\n\n return build_config\n\n def _docs_to_data(self, docs) -> list[Data]:\n return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n def chunk_documents(self) -> DataFrame:\n documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n chunker: BaseChunker\n if self.chunker == \"HybridChunker\":\n try:\n from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n except ImportError as e:\n msg = (\n \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n \"or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n max_tokens: int | None = self.max_tokens if self.max_tokens else None\n if self.provider == \"Hugging Face\":\n try:\n from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n except ImportError as e:\n msg = (\n \"HuggingFaceTokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n )\n raise ImportError(msg) from e\n tokenizer = HuggingFaceTokenizer.from_pretrained(\n model_name=self.hf_model_name,\n max_tokens=max_tokens,\n )\n elif self.provider == \"OpenAI\":\n try:\n from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n except ImportError as e:\n msg = (\n \"OpenAITokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n \" or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n if max_tokens is None:\n max_tokens = 128 * 1024 # context window length required for OpenAI tokenizers\n tokenizer = OpenAITokenizer(\n tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n )\n chunker = HybridChunker(\n tokenizer=tokenizer,\n )\n elif self.chunker == \"HierarchicalChunker\":\n chunker = HierarchicalChunker()\n\n results: list[Data] = []\n try:\n for doc in documents:\n for chunk in chunker.chunk(dl_doc=doc):\n enriched_text = chunker.contextualize(chunk=chunk)\n meta = DocMeta.model_validate(chunk.meta)\n\n results.append(\n Data(\n data={\n \"text\": enriched_text,\n \"document_id\": f\"{doc.origin.binary_hash}\",\n \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n }\n )\n )\n\n except Exception as e:\n msg = f\"Error splitting text: {e}\"\n raise TypeError(msg) from e\n\n return DataFrame(results)\n" |
Collaborator
There was a problem hiding this comment.
(b) [Normal] tokenizer variable unbound in HybridChunker path
- Similar to (a)
- Within the HybridChunker branch, UnboundLocalError if provider is neither value
if self.provider == "Hugging Face":
tokenizer = HuggingFaceTokenizer.from_pretrained(...)
elif self.provider == "OpenAI":
tokenizer = OpenAITokenizer(...)
chunker = HybridChunker(tokenizer=tokenizer) # <<<<<<<<<<<- If self.provider has an unexpected value, tokenizer is unbound.
- Potential Solution: Add an else branch with an error message
flows/ingestion_flow.json
Outdated
| "show": true, | ||
| "title_case": false, | ||
| "type": "code", | ||
| "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n display_name: str = \"Chunk DoclingDocument\"\n description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n icon = \"Docling\"\n name = \"ChunkDoclingDocument\"\n\n inputs = [\n HandleInput(\n name=\"data_inputs\",\n display_name=\"Data or DataFrame\",\n info=\"The data with documents to split in chunks.\",\n input_types=[\"Data\", \"DataFrame\"],\n required=True,\n ),\n DropdownInput(\n name=\"chunker\",\n display_name=\"Chunker\",\n options=[\"HybridChunker\", \"HierarchicalChunker\"],\n info=(\"Which chunker to use.\"),\n value=\"HybridChunker\",\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"provider\",\n display_name=\"Provider\",\n options=[\"Hugging Face\", \"OpenAI\"],\n info=(\"Which tokenizer provider.\"),\n value=\"Hugging Face\",\n show=True,\n real_time_refresh=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"hf_model_name\",\n display_name=\"HF model name\",\n info=(\n \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n ),\n value=\"sentence-transformers/all-MiniLM-L6-v2\",\n show=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"openai_model_name\",\n display_name=\"OpenAI model name\",\n info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n value=\"gpt-4o\",\n show=False,\n advanced=True,\n dynamic=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Maximum tokens\",\n info=(\"Maximum number of tokens for the HybridChunker.\"),\n show=True,\n required=False,\n advanced=True,\n dynamic=True,\n ),\n MessageTextInput(\n name=\"doc_key\",\n display_name=\"Doc Key\",\n info=\"The key to use for the DoclingDocument column.\",\n value=\"doc\",\n advanced=True,\n ),\n ]\n\n outputs = [\n Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n ]\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n if field_name == \"chunker\":\n provider_type = build_config[\"provider\"][\"value\"]\n is_hf = provider_type == \"Hugging Face\"\n is_openai = provider_type == \"OpenAI\"\n if field_value == \"HybridChunker\":\n build_config[\"provider\"][\"show\"] = True\n build_config[\"hf_model_name\"][\"show\"] = is_hf\n build_config[\"openai_model_name\"][\"show\"] = is_openai\n build_config[\"max_tokens\"][\"show\"] = True\n else:\n build_config[\"provider\"][\"show\"] = False\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = False\n build_config[\"max_tokens\"][\"show\"] = False\n elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n if field_value == \"Hugging Face\":\n build_config[\"hf_model_name\"][\"show\"] = True\n build_config[\"openai_model_name\"][\"show\"] = False\n elif field_value == \"OpenAI\":\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = True\n\n return build_config\n\n def _docs_to_data(self, docs) -> list[Data]:\n return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n def chunk_documents(self) -> DataFrame:\n documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n chunker: BaseChunker\n if self.chunker == \"HybridChunker\":\n try:\n from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n except ImportError as e:\n msg = (\n \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n \"or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n max_tokens: int | None = self.max_tokens if self.max_tokens else None\n if self.provider == \"Hugging Face\":\n try:\n from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n except ImportError as e:\n msg = (\n \"HuggingFaceTokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n )\n raise ImportError(msg) from e\n tokenizer = HuggingFaceTokenizer.from_pretrained(\n model_name=self.hf_model_name,\n max_tokens=max_tokens,\n )\n elif self.provider == \"OpenAI\":\n try:\n from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n except ImportError as e:\n msg = (\n \"OpenAITokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n \" or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n if max_tokens is None:\n max_tokens = 128 * 1024 # context window length required for OpenAI tokenizers\n tokenizer = OpenAITokenizer(\n tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n )\n chunker = HybridChunker(\n tokenizer=tokenizer,\n )\n elif self.chunker == \"HierarchicalChunker\":\n chunker = HierarchicalChunker()\n\n results: list[Data] = []\n try:\n for doc in documents:\n for chunk in chunker.chunk(dl_doc=doc):\n enriched_text = chunker.contextualize(chunk=chunk)\n meta = DocMeta.model_validate(chunk.meta)\n\n results.append(\n Data(\n data={\n \"text\": enriched_text,\n \"document_id\": f\"{doc.origin.binary_hash}\",\n \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n }\n )\n )\n\n except Exception as e:\n msg = f\"Error splitting text: {e}\"\n raise TypeError(msg) from e\n\n return DataFrame(results)\n" |
Collaborator
There was a problem hiding this comment.
(c) [Normal] Dead code: _docs_to_data method in ChunkDoclingDocument
- The component defines a helper method that is never called
def _docs_to_data(self, docs) -> list[Data]:
return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]
flows/ingestion_flow.json
Outdated
| "show": true, | ||
| "title_case": false, | ||
| "type": "code", | ||
| "value": "import json\n\nimport tiktoken\nfrom docling_core.transforms.chunker import BaseChunker, DocMeta\nfrom docling_core.transforms.chunker.hierarchical_chunker import HierarchicalChunker\n\nfrom lfx.base.data.docling_utils import extract_docling_documents\nfrom lfx.custom import Component\nfrom lfx.io import DropdownInput, HandleInput, IntInput, MessageTextInput, Output, StrInput\nfrom lfx.schema import Data, DataFrame\n\n\nclass ChunkDoclingDocumentComponent(Component):\n display_name: str = \"Chunk DoclingDocument\"\n description: str = \"Use the DocumentDocument chunkers to split the document into chunks.\"\n documentation = \"https://docling-project.github.io/docling/concepts/chunking/\"\n icon = \"Docling\"\n name = \"ChunkDoclingDocument\"\n\n inputs = [\n HandleInput(\n name=\"data_inputs\",\n display_name=\"Data or DataFrame\",\n info=\"The data with documents to split in chunks.\",\n input_types=[\"Data\", \"DataFrame\"],\n required=True,\n ),\n DropdownInput(\n name=\"chunker\",\n display_name=\"Chunker\",\n options=[\"HybridChunker\", \"HierarchicalChunker\"],\n info=(\"Which chunker to use.\"),\n value=\"HybridChunker\",\n real_time_refresh=True,\n ),\n DropdownInput(\n name=\"provider\",\n display_name=\"Provider\",\n options=[\"Hugging Face\", \"OpenAI\"],\n info=(\"Which tokenizer provider.\"),\n value=\"Hugging Face\",\n show=True,\n real_time_refresh=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"hf_model_name\",\n display_name=\"HF model name\",\n info=(\n \"Model name of the tokenizer to use with the HybridChunker when Hugging Face is chosen as a tokenizer.\"\n ),\n value=\"sentence-transformers/all-MiniLM-L6-v2\",\n show=True,\n advanced=True,\n dynamic=True,\n ),\n StrInput(\n name=\"openai_model_name\",\n display_name=\"OpenAI model name\",\n info=(\"Model name of the tokenizer to use with the HybridChunker when OpenAI is chosen as a tokenizer.\"),\n value=\"gpt-4o\",\n show=False,\n advanced=True,\n dynamic=True,\n ),\n IntInput(\n name=\"max_tokens\",\n display_name=\"Maximum tokens\",\n info=(\"Maximum number of tokens for the HybridChunker.\"),\n show=True,\n required=False,\n advanced=True,\n dynamic=True,\n ),\n MessageTextInput(\n name=\"doc_key\",\n display_name=\"Doc Key\",\n info=\"The key to use for the DoclingDocument column.\",\n value=\"doc\",\n advanced=True,\n ),\n ]\n\n outputs = [\n Output(display_name=\"DataFrame\", name=\"dataframe\", method=\"chunk_documents\"),\n ]\n\n def update_build_config(self, build_config: dict, field_value: str, field_name: str | None = None) -> dict:\n if field_name == \"chunker\":\n provider_type = build_config[\"provider\"][\"value\"]\n is_hf = provider_type == \"Hugging Face\"\n is_openai = provider_type == \"OpenAI\"\n if field_value == \"HybridChunker\":\n build_config[\"provider\"][\"show\"] = True\n build_config[\"hf_model_name\"][\"show\"] = is_hf\n build_config[\"openai_model_name\"][\"show\"] = is_openai\n build_config[\"max_tokens\"][\"show\"] = True\n else:\n build_config[\"provider\"][\"show\"] = False\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = False\n build_config[\"max_tokens\"][\"show\"] = False\n elif field_name == \"provider\" and build_config[\"chunker\"][\"value\"] == \"HybridChunker\":\n if field_value == \"Hugging Face\":\n build_config[\"hf_model_name\"][\"show\"] = True\n build_config[\"openai_model_name\"][\"show\"] = False\n elif field_value == \"OpenAI\":\n build_config[\"hf_model_name\"][\"show\"] = False\n build_config[\"openai_model_name\"][\"show\"] = True\n\n return build_config\n\n def _docs_to_data(self, docs) -> list[Data]:\n return [Data(text=doc.page_content, data=doc.metadata) for doc in docs]\n\n def chunk_documents(self) -> DataFrame:\n documents = extract_docling_documents(self.data_inputs, self.doc_key)\n\n chunker: BaseChunker\n if self.chunker == \"HybridChunker\":\n try:\n from docling_core.transforms.chunker.hybrid_chunker import HybridChunker\n except ImportError as e:\n msg = (\n \"HybridChunker is not installed. Please install it with `uv pip install docling-core[chunking] \"\n \"or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n max_tokens: int | None = self.max_tokens if self.max_tokens else None\n if self.provider == \"Hugging Face\":\n try:\n from docling_core.transforms.chunker.tokenizer.huggingface import HuggingFaceTokenizer\n except ImportError as e:\n msg = (\n \"HuggingFaceTokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n )\n raise ImportError(msg) from e\n tokenizer = HuggingFaceTokenizer.from_pretrained(\n model_name=self.hf_model_name,\n max_tokens=max_tokens,\n )\n elif self.provider == \"OpenAI\":\n try:\n from docling_core.transforms.chunker.tokenizer.openai import OpenAITokenizer\n except ImportError as e:\n msg = (\n \"OpenAITokenizer is not installed.\"\n \" Please install it with `uv pip install docling-core[chunking]`\"\n \" or `uv pip install transformers`\"\n )\n raise ImportError(msg) from e\n if max_tokens is None:\n max_tokens = 128 * 1024 # context window length required for OpenAI tokenizers\n tokenizer = OpenAITokenizer(\n tokenizer=tiktoken.encoding_for_model(self.openai_model_name), max_tokens=max_tokens\n )\n chunker = HybridChunker(\n tokenizer=tokenizer,\n )\n elif self.chunker == \"HierarchicalChunker\":\n chunker = HierarchicalChunker()\n\n results: list[Data] = []\n try:\n for doc in documents:\n for chunk in chunker.chunk(dl_doc=doc):\n enriched_text = chunker.contextualize(chunk=chunk)\n meta = DocMeta.model_validate(chunk.meta)\n\n results.append(\n Data(\n data={\n \"text\": enriched_text,\n \"document_id\": f\"{doc.origin.binary_hash}\",\n \"doc_items\": json.dumps([item.self_ref for item in meta.doc_items]),\n }\n )\n )\n\n except Exception as e:\n msg = f\"Error splitting text: {e}\"\n raise TypeError(msg) from e\n\n return DataFrame(results)\n" |
Collaborator
There was a problem hiding this comment.
(d) [Normal] Large value 128 * 1024 set as default max_tokens for OpenAI tokenizer
- i.e.
if max_tokens is None:
max_tokens = 128 * 1024 # context window length required for OpenAI tokenizers- A 128k-token default for chunk max size is enormous and would result in very few or no splits for most documents.
Question: Is this value indeed required? (according to the comment)- Note: The HuggingFaceTokenizer default from sentence-transformers/all-MiniLM-L6-v2 is 256 tokens (which might be more appropriate for RAG)
- Recommend verifying the value against what Docling's OpenAITokenizer actually requires vs. what makes sense for retrieval.
flows/ingestion_flow.json
Outdated
| "_input_type": "SecretStrInput", | ||
| "advanced": false, | ||
| "display_name": "IBM watsonx.ai API Key", | ||
| "display_name": "OpenAI API Key", |
Collaborator
There was a problem hiding this comment.
(e) [Minor] DoclingRemote field label change are a breaking rename: "IBM watsonx.ai API Key" -> "OpenAI API Key"
- Note: This isn't directly authored in this PR (it's an upstream lfx component update), but the effect for any existing WatsonX users of OpenRAG is that their configured field names would mismatch.
- Recommend checking if FlowsService needs to handle this migration.
flows/ingestion_flow.json
Outdated
| "advanced": true, | ||
| "display_name": "API Base URL", | ||
| "advanced": false, | ||
| "display_name": "OpenAI API Base URL", |
Collaborator
There was a problem hiding this comment.
(f) [Minor] DoclingRemote field label change are a breaking rename: "API Base URL" -> "OpenAI API Base URL"
- Similar to (e)
- Note: This isn't directly authored in this PR (it's an upstream lfx component update), but the effect for any existing WatsonX users of OpenRAG is that their configured field names would mismatch.
- Recommend checking if FlowsService needs to handle this migration.
2 tasks
2 tasks
2 tasks
Collaborator
Author
|
This pull request was substituted by #1113 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes in default ingestion flow so it can support docling-based chunking.