Skip to content

DocArrayInMemorySearch -> similarity_search_with_score returns incorrect results with Ollama #6

@anatoli26

Description

@anatoli26

Hi, thanks for your videos and the code.

I'm trying to advance with your 2 vids for this repo and the youtube-rag one.

For some reason I can't make the DocArrayInMemorySearch work correctly with the basic example. Here's the simplified code merged from both repos:

import os
from dotenv import load_dotenv
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama

load_dotenv()

# Initialize Ollama embeddings
MODEL = "llama3"
model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

# Create a list of documents
vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "Mary's sister is Susana",
        "John and Tommy are brothers",
        "Patricia likes white cars",
        "Pedro's mother is a teacher",
        "Lucia drives an Audi",
        "Mary has two siblings",
    ],
    embedding=embeddings,
)

print(vectorstore1.similarity_search_with_score(query="Who is Mary's sister?", k=6))

The results are as follows (split with new lines for easier reading):

[
    (Document(page_content="Pedro's mother is a teacher"), 0.4350104103848039),
    (Document(page_content='Mary has two siblings'), 0.43119987668775467),
    (Document(page_content='John and Tommy are brothers'), 0.41273142441302735),
    (Document(page_content='Patricia likes white cars'), 0.3569403395446856),
    (Document(page_content="Mary's sister is Susana"), 0.3464697744599006),
    (Document(page_content='Lucia drives an Audi'), 0.22815817605634237)
]

Why the similarity score and the order don't match what is shown in your videos and the notebooks? I.e. (Document(page_content="Mary's sister is Susana"), 0.3464697744599006) is the 5th result when it should be the first one.

I'm using python 3.10.12.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions