Skip to content

[FEA]: to_markdown/to_markdown_by_page should differentiate by distinct document ingested #1630

@randerzander

Description

@randerzander

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Significant improvement

Please provide a clear description of problem this feature solves

Using the snippet, if you ingest a single document, the markdown conversion makes sense.

However, if your ingestion job contained multiple documents, there's no way to differentiate returns for different documents

For example, if you ingest multimodal_test.pdf and an additional single page PDF, to_markdown_by_page will return what looks like a representation of a 4 page single document.

Describe the feature, and optionally a solution or implementation and any alternatives

Both to_markdown and to_markdown_by page should probably include a source_filename field by which chunks are grouped.

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions