Skip to content

Conversation

@Acuspeedster
Copy link
Collaborator

One thing to be aware of: this change will affect any existing data in your Qdrant store. If we already have vectors in your collections that were inserted with sequential IDs, and we now start inserting with UUIDs, we'll have a mix of ID formats. This won't cause errors but might be confusing when examining the data directly. If we need to work with existing data and maintain consistency, we might want to:
Create new collections Re-index all your data with UUIDs Switch to the new collections when complete.

Signed-off-by: Acuspeedster <arnavrajsingh@gmail.com>
@Acuspeedster Acuspeedster self-assigned this May 31, 2025
@Acuspeedster Acuspeedster requested a review from juntao May 31, 2025 18:07
@Acuspeedster Acuspeedster added the enhancement New feature or request label May 31, 2025
@juntao
Copy link
Contributor

juntao commented May 31, 2025

What about the point_id in load_data.py? Which point ID is it using when I run the load data script.

@Acuspeedster
Copy link
Collaborator Author

@juntao Looking at the load_data.py file, I can see that both the load_project_examples() and load_error_examples() functions are generating new UUIDs for each point they add to the vector database:

# In load_project_examples() function
# Store in vector DB with proper UUID
point_id = str(uuid.uuid4())  # Generate proper UUID
        
vector_store.upsert("project_examples", 
                  [{"id": point_id,  # Use UUID instead of filename
                    "vector": embedding, 
                    "payload": example}])
# In load_error_examples() function
# Store in vector DB with proper UUID
point_id = str(uuid.uuid4())
        
# Store in vector DB
vector_store.upsert("error_examples", 
                   [{"id": point_id, 
                     "vector": embedding, 
                     "payload": example}])

@juntao
Copy link
Contributor

juntao commented May 31, 2025

I know this. I am just pointing to a code quality problem. We should not have the same code segments littered in different files. It is a maintenance issue.

Shouldn't load_data just call the insert_document instead of upsert in this case?

…n; enhance error handling and logging

Signed-off-by: Acuspeedster <arnavrajsingh@gmail.com>
@Acuspeedster
Copy link
Collaborator Author

Acuspeedster commented May 31, 2025

@juntao I have improved the codes' maintainability as much as I could have potentially seen.

@juntao juntao merged commit 33f5e1e into cardea-mcp:main May 31, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants