Skip to content

[FEAT] Research Agentic Search for Legal Document Queries #24

@mikhashev

Description

@mikhashev

Feature Description

Research "agentic search" as an alternative approach to the current RAG + vector database architecture for legal document queries in Law7.

Problem Statement

Law7 currently uses RAG (Retrieval-Augmented Generation) with Qdrant vector database for semantic search of legal documents. However, Boris Cherny (Claude Code team) shares that:

"Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."

Source: https://x.com/bcherny/status/2017824286489383315?s=46

This insight suggests we should explore whether agentic search could improve Law7's:

  • Query accuracy and relevance for legal documents
  • System simplicity and maintainability
  • Data freshness (eliminating embedding staleness issues)
  • Security and privacy (no external vector DB dependencies)

Proposed Solution

Conduct a research phase to evaluate "agentic search" for legal document queries:

Research Questions

  1. What is agentic search in the context of legal documents?

    • How does it differ from RAG + vector DB?
    • What are the core patterns and techniques?
  2. How would agentic search work for Law7?

    • Agent-based querying of PostgreSQL via direct SQL
    • Multi-step reasoning (e.g., "find amendments to article 123 of Civil Code")
    • Tool-calling patterns for different query types
  3. What are the trade-offs?

    • Performance: Agentic search vs. vector search (Qdrant)
    • Accuracy: Semantic understanding vs. keyword + structural queries
    • Complexity: Agent logic vs. embedding management
    • Data freshness: Real-time vs. embedding rebuilds
  4. Implementation patterns

    • How to structure agents for legal queries
    • Tool definition for PostgreSQL access
    • Handling consolidation queries (historical article versions)

Proof-of-Concept

Build a minimal PoC comparing:

  • Current: RAG + Qdrant vector search
  • Alternative: Agentic search with direct PostgreSQL queries

Test queries:

  • "liability for breach of contract" (semantic)
  • "article 564 Civil Code effective date 2020" (structured)
  • "all amendments to Labor Code article 80 in 2023" (consolidation)

Alternatives Considered

  1. Keep current RAG + Qdrant architecture

    • Pros: Proven, working well for semantic search
    • Cons: Embedding staleness, complexity of vector DB management
  2. Hybrid approach: Agentic + Vector search

    • Use agents for structured queries, vector DB for semantic
    • Pros: Best of both worlds
    • Cons: Increased complexity
  3. Full migration to agentic search

    • Pros: Simpler architecture, no embeddings, always fresh
    • Cons: May lose some semantic matching capability

Additional Context

Current Architecture:

  • PostgreSQL: 157K+ legal documents with full text + trigram indexes
  • Qdrant: 768-dimensional embeddings (deepvk/USER2-base)
  • MCP Tools: query-laws, get-law, get-article-version, trace-amendment-history

Relevant Files:

  • src/tools/query-laws.ts - Current hybrid search implementation
  • scripts/indexer/embeddings.py - Embedding generation
  • scripts/indexer/qdrant_client.py - Vector DB operations

Related Research:

Implementation Ideas (Optional)

Research Artifacts to Create

Document Description
docs/AGENTIC_SEARCH_RESEARCH.md Detailed findings on agentic search patterns
docs/AGENTIC_SEARCH_POC.md Proof-of-concept results and comparison
docs/AGENTIC_SEARCH_ARCHITECTURE.md Recommended architecture for Law7

Potential Code Changes for PoC

  • src/tools/query-laws-agentic.ts - Agentic search variant
  • src/agents/ - Agent logic for legal queries
  • Benchmark scripts comparing approaches

Timeline

Week 1: Research and documentation

  • Study agentic search patterns
  • Document findings, pros/cons
  • Design PoC architecture

Week 2: Proof-of-concept

  • Build minimal agentic search PoC
  • Run comparison benchmarks vs. current RAG
  • Create decision matrix

Reference

Priority

MEDIUM - Research phase to inform potential architectural evolution

Success Criteria

  1. Clear understanding of agentic search vs. RAG for legal documents
  2. PoC demonstrating feasibility
  3. Data-driven recommendation (keep current / hybrid / migrate)
  4. Architecture guidance if migration is recommended

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions