Skip to content

Slack App: Fragments of a document #19

@20001LastOrder

Description

@20001LastOrder

Description:

  • Provide a clear and detailed explanation of the problem you're encountering.
  • Include any relevant context or background information.
  • Mention the expected behavior if applicable.
  • Share any error messages or logs you've encountered.

When we store documents into the vector database, we break the documents into fragments in order to fix the prompt size of the LLM. However, this may also remove the contextual information of the document.

Steps to Reproduce:

  • Provide a step-by-step guide on how to reproduce the issue.
  • Include specific details such as input data, configurations, or any necessary prerequisites.
  • If applicable, mention the browser, operating system, or environment you're using.

For example: @sherpa what did ehsan kamalinejad present?
Answer: Ehsan Kamalinejad's presentation focused on machine learning developments in natural language processing (NLP) at Amazon. He has also worked on computer vision projects, including Photos Memories, during his time at Apple. Ehsan is currently a lead scientist at Amazon and an associate professor at California State University. You can find more information about his work in the LLM Foundations section of the LLM Live Book Link.

Observation: it doesn't recognize that the page this info is found on is related to Ehsan's presentation

Actual Results:

  • Describe the actual outcome or behavior you observed.
  • Include any error messages, warnings, or relevant logs.

see above

Expected Results:

  • Clearly state what you expected to happen instead.
  • This helps developers understand the desired behavior.

Let's assume the question has the relevant context. Then the assistant should be able to use that context to find the right chunks to use. The problem is that the necessary contextual keywords might not be in the chunk where the answer to the question is.

I has to take into account the relationship between chunks and their higher level abstractios (pages, sections, book, ...) and then use that to narrow down to the right document and the right chunk.

Additional Information:

  • Provide any additional details that may be helpful but were not covered above.
  • Include screenshots, code snippets, or links to relevant resources if applicable.

N/A

Reproducibility:

  • Mention whether the issue is reproducible consistently or intermittently.
  • Include any relevant information on the frequency or conditions under which it occurs.

The actual answer might vary but the general pattern is reproducible.

Possible Solutions:

  • If you have any ideas or suggestions for how the issue could be resolved, mention them here.
  • This can help guide the discussion and provide potential starting points for developers.

see expected results section
or alternatively: We need to add some contextual information to each fragment of the document, like the title or first several sentences.
Hirarechal index might be another approach

Related Issues:

  • If there are any related GitHub issues or pull requests, reference them here.
  • This helps create connections and provides additional context.

N/A

Steps Taken So Far:

  • Outline any steps you've already taken to investigate or resolve the issue.
  • This helps developers understand what has been attempted and avoids duplication of effort.

N/A

Environment:

  • Provide details about the environment you're working in (e.g., programming language, versions, libraries, frameworks).
  • Mention any relevant dependencies or configurations.

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions