Skip to content

WIP#536

Open
renatocron wants to merge 1 commit intohomolfrom
feat/duckdb-sidecar
Open

WIP#536
renatocron wants to merge 1 commit intohomolfrom
feat/duckdb-sidecar

Conversation

@renatocron
Copy link
Contributor

@renatocron renatocron commented Feb 4, 2026

Summary by CodeRabbit

  • New Features

    • Added advanced search capabilities including vector similarity search, full-text search, and hybrid search combining both methods.
    • Implemented DuckDB integration for enhanced data indexing and query performance.
  • Documentation

    • Added comprehensive DuckDB module documentation with configuration and usage guidelines.
    • Added LLM integration opportunities report outlining strategic initiatives and implementation roadmap.
    • Added example implementations demonstrating search functionality patterns.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive DuckDB sidecar architecture for the SMAE system, enabling vector similarity search, BM25 full-text search, and hybrid search capabilities. It adds a separate DuckDB process managing IPC communication, new NestJS services for search operations, PostgreSQL attachment, index creation (HNSW for vectors, FTS for text), health checks, and example usage patterns. Documentation includes an LLM opportunities blueprint and technical README.

Changes

Cohort / File(s) Summary
DuckDB Sidecar Process
backend/src/bin/run-duckdb-sidecar.ts, backend/src/common/duckdb/duckdb-sidecar.service.ts
New sidecar binary and lifecycle manager supporting IPC commands (query, attach_postgres, install_extension, ping), automatic restart with rate limiting, health checks, message correlation, and graceful shutdown.
DuckDB Provider Enhancement
backend/src/common/duckdb/duckdb-provider.service.ts
Added DuckDBConfig interface and new public methods: createInstance, createVectorSearchInstance, attachPostgres, createHNSWIndex, and createFTSIndex. Refactored internal configuration logic with S3 support and extension management.
DuckDB Search Service
backend/src/common/duckdb/duckdb-search.service.ts
New search service providing vectorSearch, bm25Search, and hybridSearch (RRF fusion), table synchronization from Postgres, index creation, and arbitrary SQL execution with Portuguese stemming support.
Module & Exports
backend/src/common/duckdb/duckdb.module.ts, backend/src/common/duckdb/index.ts
Module updated to import ConfigModule and export new DuckDBSidecarService and DuckDBSearchService alongside existing DuckDBProviderService; barrel exports consolidated.
Documentation & Examples
backend/LLM_OPPORTUNITIES_REPORT.md, backend/src/common/duckdb/README.md, backend/src/common/duckdb/example-usage.ts
New LLM opportunities blueprint, DuckDB architecture README, and example services demonstrating text, embedding, and hybrid search patterns via REST endpoints and advanced SQL analysis.

Sequence Diagram(s)

sequenceDiagram
    participant Main as Main Process
    participant Sidecar as DuckDB Sidecar
    participant DDB as DuckDB Engine
    participant PG as PostgreSQL

    Main->>Sidecar: fork() & start
    Sidecar->>DDB: initialize & load extensions
    Sidecar->>Main: ready event
    
    Main->>Sidecar: attachPostgres(connStr, alias)
    Sidecar->>DDB: ATTACH DATABASE
    DDB->>PG: establish connection
    PG-->>DDB: connected
    
    Main->>Sidecar: syncFromPostgres(table)
    Sidecar->>PG: SELECT * FROM table
    PG-->>Sidecar: data rows
    Sidecar->>DDB: CREATE TABLE & INSERT data
    DDB-->>Sidecar: sync complete
    
    Main->>Sidecar: createHNSWIndex(table, vector_col)
    Sidecar->>DDB: CREATE INDEX with HNSW
    DDB-->>Sidecar: index ready
    
    Main->>Sidecar: query(vectorSearch SQL)
    Sidecar->>DDB: execute ANN query
    DDB-->>Sidecar: results with scores
    Sidecar-->>Main: QueryResult {data, error}
    
    Main->>Sidecar: ping()
    Sidecar-->>Main: pong
    
    Main->>Sidecar: stopSidecar()
    Sidecar->>DDB: close connection
    Sidecar-->>Main: exit(0)
Loading
sequenceDiagram
    participant Client as Client / Controller
    participant Search as DuckDBSearchService
    participant Sidecar as DuckDBSidecarService
    participant DDB as DuckDB Sidecar

    Client->>Search: vectorSearch(table, queryVector, topK)
    Search->>Search: validate sidecar ready
    
    alt Sidecar Ready
        Search->>Sidecar: query(vector similarity SQL)
        Sidecar->>DDB: execute SELECT with distance calc
        DDB-->>Sidecar: ranked results
        Sidecar-->>Search: QueryResult {data}
        Search->>Search: map to VectorSearchResult[]
        Search-->>Client: results with id, score, metadata
    else Sidecar Not Ready
        Search-->>Client: error or empty results
    end

    Client->>Search: bm25Search(table, queryText, topK)
    Search->>Sidecar: query(BM25 FTS SQL)
    Sidecar->>DDB: execute FTS query with ranking
    DDB-->>Sidecar: ranked text results
    Sidecar-->>Search: QueryResult {data}
    Search->>Search: map to BM25SearchResult[]
    Search-->>Client: results with id, score, content

    Client->>Search: hybridSearch(table, queryVector, queryText, topK)
    Search->>Search: parallelize vector & BM25 queries
    Search->>Sidecar: query(vector search)
    Search->>Sidecar: query(BM25 search)
    Sidecar->>DDB: process both queries
    DDB-->>Sidecar: vector results + BM25 results
    Sidecar-->>Search: both QueryResults
    Search->>Search: RRF fusion & combine scores
    Search-->>Client: HybridSearchResult[] with semanticScore, lexicalScore, combinedScore
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • GustavoFSoares

Poem

🐰 A sidecar hops into the night,
Vector embeddings dancing in delight,
BM25 words and Postgres tales,
DuckDB's HNSW never fails!
Search hybrid, swift, with RRF's grace,
We've built a nest in cyberspace! 🥕✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'WIP' is vague and non-descriptive; it does not convey meaningful information about the changeset, which includes DuckDB sidecar implementation, search services, and LLM opportunities documentation. Replace with a descriptive title such as 'Add DuckDB sidecar architecture with vector and BM25 search capabilities' that clearly summarizes the main changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/duckdb-sidecar

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 4, 2026

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🤖 Fix all issues with AI agents
In `@backend/LLM_OPPORTUNITIES_REPORT.md`:
- Around line 37-45: The fenced ASCII-art block starting with
"┌─────────────────────────────────────────────────────────────────┐" and other
fenced blocks flagged by MD040/MD058 must have language identifiers (e.g.,
```text) and be surrounded by blank lines; also insert a blank line before and
after the Markdown table under "Modelo de Valor:" and ensure that table rows
remain fenced by blank lines. Update the specific fenced blocks containing the
"ANTES/DEPOIS" ASCII boxes and the "Modelo de Valor:" table (and the other
flagged blocks at the ranges mentioned) to add the language tag on the opening
``` and add one blank line above and below each fenced block and the table so
markdownlint MD040/MD058 are resolved.

In `@backend/src/bin/run-duckdb-sidecar.ts`:
- Around line 15-28: Sidecar IPC messages lack correlation IDs causing responses
to be misrouted under concurrency; add a requestId string (or number) to
SidecarRequest and have the sidecar include the same requestId on every
SidecarResponse variant (query_result, query_error, attached,
extension_installed, pong, ready, error) and update the IPC send/receive
handlers to route responses by requestId. Concretely: extend the SidecarRequest
union to include requestId, extend every SidecarResponse interface to include
requestId, update the code paths that send responses (the sidecar message
emitter/handler) to echo requestId back, and update the caller-side promise
registry (the code that sends a request and awaits a response) to key promises
by requestId and resolve/reject only when a response with the matching requestId
arrives. Ensure all places that construct or consume
SidecarRequest/SidecarResponse (types SidecarRequest, SidecarResponse and the
IPC onMessage/send logic) are updated accordingly.

In `@backend/src/common/duckdb/duckdb-provider.service.ts`:
- Around line 135-147: The CREATE INDEX call builds SQL by interpolating
tableName and columnName directly (see indexName and the db.run(...) block)
which allows identifier injection; validate/sanitize these identifiers the same
way used in the search service (e.g., an allowlist/regex check for [A-Za-z0-9_]+
and/or a helper like validateIdentifier(name)) before composing indexName and
the SQL, and throw or reject when validation fails so only safe table/column
names are used in the db.run call.
- Around line 163-173: The PR injects raw identifiers into the FTS PRAGMA call
(columnsStr / textColumns, tableName, idColumn) which repeats the
identifier-injection risk; update the create_fts_index usage in
duckdb-provider.service.ts to apply the same identifier validation/sanitization
used elsewhere (e.g., the existing validateIdentifier or sanitizeIdentifier
helper) to tableName, idColumn and every entry of textColumns before building
columnsStr, and build the PRAGMA call only from validated/escaped identifiers
(or use a safe parameterized approach if available) so no raw user-controlled
identifier is interpolated into db.run in create_fts_index.
- Around line 206-228: The bug is that useSsl is computed after the protocol is
stripped from endpoint so it is always false; in duckdb-provider.service.ts
compute a boolean (e.g., useSsl) immediately after fetching endpoint by checking
the original string for a leading 'https' (using
smaeConfigService.getConfig('S3_HOST') result) before calling endpoint.replace
to remove the scheme, then use that boolean when building config.s3Config (in
the DuckDBConfig creation) and keep the sanitized endpoint (without protocol)
for the endpoint field.
- Around line 105-113: The attachPostgres method currently interpolates
connectionString into the SQL which allows SQL injection if the string contains
a single quote; update attachPostgres to safely escape single quotes in
connectionString (e.g., replace each ' with '' ) before calling db.run(`ATTACH
'${escapedConnectionString}' AS ${alias} (TYPE postgres)`), or better yet use a
parameterized/bound query API of Database if available; ensure the changed
symbol is attachPostgres and the db.run invocation is the only place modified.
- Around line 79-100: The createVectorSearchInstance function currently ignores
the vectorDimension parameter when computing estimatedMemoryMB; update the
memory estimation to include vectorDimension by accounting for bytes per
embedding element (e.g., 4 bytes for float32) times estimatedRows and adding any
index overhead (previously approximated as ~1KB per vector), then convert to MB
and apply the same min/max clamp used for memoryLimit; update the calculation
that assigns estimatedMemoryMB (and keep using memoryLimit, DuckDBConfig, and
the existing createInstance/db run flow) so the new estimate reflects both
estimatedRows and vectorDimension.
- Around line 186-197: The SQL interpolation for creating the secret (db.run
with the CREATE OR REPLACE SECRET smaep_s3_secret block) directly inserts
config.accessKey, config.secretKey, config.region, endpoint, urlStyle and should
be sanitized to prevent SQL injection and syntax breaks; fix by escaping single
quotes in those values (e.g., replace ' with '') or, if the DB client supports
parameterized/prepared statements, switch to parameter binding for those fields,
and update the call site that constructs the SQL to use the sanitizer/helper
(escapeSqlString) or parameters before passing to db.run.

In `@backend/src/common/duckdb/duckdb-search.service.ts`:
- Around line 279-297: The current code builds SQL by interpolating identifiers
and the fields array (see escapedQuery, fieldsParam, tableName, idCol, and
fts_main_${tableName}.match_bm25), leaving identifier-injection risk; fix by
validating/sanitizing identifiers instead of raw interpolation: enforce a strict
whitelist or regex (e.g. /^[A-Za-z_][A-Za-z0-9_]*$/) for tableName, idCol and
every entry in fields, reject or throw on invalid names, construct fieldsParam
from the validated names only (or map to quoted-safe forms), and where possible
switch the query text itself to a parameterized approach for the user-supplied
string (escapedQuery) rather than manual quote concatenation to eliminate SQL
injection risk.
- Around line 203-239: The query construction in DuckDBSearchService is
vulnerable because options.where and additionalCols are directly interpolated
into SQL (see options.where, additionalCols, idCol, embeddingCol, tableName, and
the SQL template in the method), so replace direct interpolation with safe
handling: validate additionalCols and idCol/embeddingCol/tableName as SQL
identifiers against a strict regex (allow only alphanumerics and underscores)
and whitelist or reject invalid names, and remove or replace options.where with
a structured filter object that you convert into a parameterized WHERE clause
(build expressions from known operators and column names, use parameter
placeholders for values and pass a params array to the DuckDB client) or else
drop the free-text where entirely and require caller-side filtering; ensure
ORDER BY and LIMIT still use validated/whitelisted values rather than raw input.
- Around line 137-141: The SQL string that builds CREATE INDEX (the const sql
using tableName, columnName, metric) is vulnerable to SQL injection because
tableName, columnName and metric are interpolated directly; update the code in
duckdb-search.service.ts to validate or sanitize these identifiers before use —
restrict tableName and columnName to a safe whitelist or to a strict regex for
valid SQL identifiers (letters, digits, underscores) and escape them as quoted
identifiers (double-quote them after validation), and validate metric against an
allowed set (e.g., 'cosine','euclidean','dot') before interpolating; apply these
checks where the const sql is created and throw or reject invalid values instead
of interpolating raw input.
- Around line 162-171: The PRAGMA string interpolation in
duckdb-search.service.ts (variables tableName, idColumn, and columnsStr built
from textColumns) is vulnerable to SQL injection; validate each identifier
(tableName, idColumn, and every entry in textColumns) against a strict
identifier regex (e.g. only letters, digits, and underscores, starting with a
letter or underscore) and reject or sanitize inputs that don't match, or
alternatively escape double quotes by doubling them and wrap identifiers in
double quotes before interpolation; ensure columnsStr is built from the
validated/escaped textColumns values (not raw user input) and use the validated
tableName and idColumn when constructing the PRAGMA create_fts_index SQL string
so no unvalidated quotes or characters can inject SQL.

In `@backend/src/common/duckdb/duckdb-sidecar.service.ts`:
- Around line 7-22: The IPC protocol currently lacks correlation IDs causing
FIFO-based mismatches; update the types SidecarRequest and SidecarResponse to
include a requestId (string) on all request/response variants, update
QueryResult to carry requestId when used over IPC, and change the
sender/receiver logic that currently resolves responses FIFO to instead match
incoming responses by requestId (e.g., map pending promises by requestId and
resolve/reject the matching promise when a SidecarResponse with that requestId
arrives). Ensure all places that create requests (types under SidecarRequest)
set a unique requestId and all response handlers (types under SidecarResponse
and any code paths that produce query results or errors) include the same
requestId so resolution uses the ID map rather than queue order.
- Around line 351-372: In startHealthCheck(), the timeout callback currently
only restarts the sidecar if lastPongTime exists and is older than pingTime, so
the “no pong ever received” case is ignored; update the logic in the setTimeout
inside startHealthCheck() (referencing lastPongTime, PONG_TIMEOUT_MS,
childProcess, isSidecarReady) to treat a null/undefined lastPongTime as a
failure condition (i.e., if !this.lastPongTime || this.lastPongTime.getTime() <
pingTime) then log an error and force-restart the sidecar by killing
this.childProcess (preserving the existing SIGKILL behavior).
- Around line 187-210: The syncFromPostgres method builds SQL by directly
interpolating tableName, columns, options.where and options.limit, creating SQL
injection risk; update syncFromPostgres to validate/whitelist tableName and
options.duckdbTableName against allowed identifiers, validate each entry in
options.columns against a safe column-name pattern or whitelist, construct the
WHERE predicates via a safe predicate builder (or accept structured filter
objects) and pass any dynamic values (e.g. filter values and limit) as
parameters to the existing query(sql, params) call instead of string
concatenation, keeping only safe identifiers (validated) in the SQL string and
all user-supplied values parameterized via the query(...) method.

In `@backend/src/common/duckdb/example-usage.ts`:
- Around line 136-169: Convert the buscarImagens handler to accept a validated
DTO (e.g., BuscaImagensQueryDto) instead of raw query strings; parse
query.embedding inside a try/catch in buscarImagens to catch JSON.parse errors,
validate that the parsed value is an array of numbers, and throw an
HttpException(400) with a clear message on invalid input; ensure you use the
DTO's limite (defaulting to 10) when calling imgSearch.buscarImagensSimilares;
reference the buscarImagens method and the new BuscaImagensQueryDto so reviewers
can find and update the controller and validation classes.
- Around line 144-153: Wrap each switch case body in a block to satisfy ESLint
no-case-declarations: add braces around the 'semantica' and 'hibrida' case
bodies so the local const declarations (embedding, embeddingHibrido) are
block-scoped; keep the existing awaits to this.gerarEmbedding(query) and the
return calls to this.docSearch.buscarPorEmbedding(...) and
this.docSearch.buscarHibrida(...) inside those new braces.

In `@backend/src/common/duckdb/README.md`:
- Around line 11-39: The Markdown fenced code blocks in the DuckDB README (the
ASCII architecture diagram and the log snippet containing lines like
"[DuckDBSidecar] Iniciando DuckDB sidecar process..." and "[DuckDBSearch] Tabela
documentos sincronizada...") lack language identifiers causing markdownlint
MD040 warnings; update each triple-backtick fence that wraps the diagram and the
log snippet to use a language tag (e.g., ```text) so the fences become ```text
... ```, and ensure all other similar fenced blocks (including the ones around
the log snippet and the additional block referenced) are updated consistently.
🧹 Nitpick comments (4)
backend/src/common/duckdb/duckdb-sidecar.service.ts (1)

133-180: Clean up IPC listeners on timeout to avoid leaks.

If the timeout fires, the handler remains attached and can resolve future messages unexpectedly.

🧽 Suggested fix
-            const timer = setTimeout(() => {
-                resolve(false);
-            }, 30000);
+            const timer = setTimeout(() => {
+                this.childProcess!.off('message', handler);
+                resolve(false);
+            }, 30000);
@@
-            const timer = setTimeout(() => {
-                resolve(false);
-            }, 60000);
+            const timer = setTimeout(() => {
+                this.childProcess!.off('message', handler);
+                resolve(false);
+            }, 60000);
backend/src/bin/run-duckdb-sidecar.ts (1)

164-210: Inputs to these functions are trusted (environment-configured or hardcoded); SQL injection risk is not present.

alias defaults to 'postgres' and is never user-controlled; connectionString comes from environment configuration (DATABASE_URL); name is always called with hardcoded extension names (vss, fts, postgres, spatial, httpfs). Since the sidecar is internal IPC and no public endpoints expose these functions to external input, the interpolation pattern here does not create a practical vulnerability.

If this code is refactored to accept external parameters in the future, input validation would become necessary. For defense-in-depth, identifier validation for alias and name is still a reasonable hardening measure, but not required given current usage.

backend/src/common/duckdb/duckdb-search.service.ts (1)

378-384: Consider documenting the security implications of this escape hatch.

This method allows arbitrary SQL execution. While useful, it should be clearly documented that callers are responsible for preventing SQL injection. Alternatively, mark it as private or protected if it's only for internal use.

backend/src/common/duckdb/duckdb-provider.service.ts (1)

53-65: Defensive improvement: validate configuration values before interpolation.

memoryLimit, threads, and extension names are interpolated into SQL. While currently sourced from internal config, validating these values adds defense-in-depth.

🛡️ Proposed validation
     async createInstance(config: DuckDBConfig = {}): Promise<Database> {
         const dbPath = config.dbPath ?? ':memory:';
         const memoryLimit = config.memoryLimit ?? '2GB';
         const threads = config.threads ?? 2;
         const extensions = config.extensions ?? ['httpfs', 'vss', 'fts', 'postgres'];

+        // Validate memory limit format
+        if (!/^\d+(?:KB|MB|GB)$/i.test(memoryLimit)) {
+            throw new Error(`Invalid memory limit format: ${memoryLimit}`);
+        }
+        
+        // Validate threads is a positive integer
+        if (!Number.isInteger(threads) || threads < 1) {
+            throw new Error(`Invalid threads value: ${threads}`);
+        }
+        
+        // Validate extension names
+        const validExtensions = ['httpfs', 'vss', 'fts', 'postgres', 'json', 'parquet'];
+        for (const ext of extensions) {
+            if (!validExtensions.includes(ext)) {
+                throw new Error(`Unknown extension: ${ext}`);
+            }
+        }
+
         this.logger.log(`Criando instância DuckDB (${dbPath})...`);

Comment on lines +37 to +45
```
┌─────────────────────────────────────────────────────────────────┐
│ ANTES: Coordenador lê 50+ análises mensais manualmente │
│ DEPOIS: LLM gera resumo executivo com destaque para riscos │
│ │
│ ANTES: Usuário escreve análise do zero │
│ DEPOIS: LLM sugere baseado em dados históricos similares │
└─────────────────────────────────────────────────────────────────┘
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdownlint warnings (MD040/MD058).

Add language identifiers to fenced blocks and surround the table with blank lines.

🧹 Suggested fix (representative)
 **Valor Proposto:**
-```
+```text
 ┌─────────────────────────────────────────────────────────────────┐
 │  ANTES: Coordenador lê 50+ análises mensais manualmente         │
@@
-```
+```

 **Modelo de Valor:**
+
 | Campo Atual | Uso de LLM | Economia |
 |-------------|-----------|----------|
 | `Transferencia.objeto` | Classificação automática em áreas temáticas | 70% do tempo de triagem |
 | `Transferencia.detalhamento` | Extração de entidades (orgãos, valores, prazos) | 50% do tempo de cadastro |
 | Anexos (PDF) | Resumo automático para análise prévia | 60% do tempo de revisão |
+

Apply the same pattern to the other fenced blocks flagged by MD040.

</details>


Also applies to: 59-63, 100-105, 230-247, 251-262

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.20.0)</summary>

[warning] 37-37: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @backend/LLM_OPPORTUNITIES_REPORT.md around lines 37 - 45, The fenced
ASCII-art block starting with
"┌─────────────────────────────────────────────────────────────────┐" and other
fenced blocks flagged by MD040/MD058 must have language identifiers (e.g.,

after the Markdown table under "Modelo de Valor:" and ensure that table rows
remain fenced by blank lines. Update the specific fenced blocks containing the
"ANTES/DEPOIS" ASCII boxes and the "Modelo de Valor:" table (and the other
flagged blocks at the ranges mentioned) to add the language tag on the opening
``` and add one blank line above and below each fenced block and the table so
markdownlint MD040/MD058 are resolved.

Comment on lines +15 to +28
type SidecarRequest =
| { type: 'query'; sql: string; params?: any[] }
| { type: 'attach_postgres'; connectionString: string; alias?: string }
| { type: 'install_extension'; name: string }
| { type: 'ping' };

type SidecarResponse =
| { event: 'query_result'; data: any[]; error?: string }
| { event: 'query_error'; error: string }
| { event: 'attached'; alias: string; error?: string }
| { event: 'extension_installed'; name: string; error?: string }
| { event: 'pong' }
| { event: 'ready' }
| { event: 'error'; error: string };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add request IDs to IPC to avoid mismatched query results under concurrency.

Without correlation, concurrent queries can resolve the wrong promise. Include a requestId in both the request and response and echo it back from the sidecar.

✅ Suggested fix (IPC correlation)
-type SidecarRequest =
-    | { type: 'query'; sql: string; params?: any[] }
-    | { type: 'attach_postgres'; connectionString: string; alias?: string }
-    | { type: 'install_extension'; name: string }
-    | { type: 'ping' };
+type SidecarRequest =
+    | { id: string; type: 'query'; sql: string; params?: any[] }
+    | { id: string; type: 'attach_postgres'; connectionString: string; alias?: string }
+    | { id: string; type: 'install_extension'; name: string }
+    | { id: string; type: 'ping' };

-type SidecarResponse =
-    | { event: 'query_result'; data: any[]; error?: string }
-    | { event: 'query_error'; error: string }
-    | { event: 'attached'; alias: string; error?: string }
-    | { event: 'extension_installed'; name: string; error?: string }
-    | { event: 'pong' }
-    | { event: 'ready' }
-    | { event: 'error'; error: string };
+type SidecarResponse =
+    | { id: string; event: 'query_result'; data: any[]; error?: string }
+    | { id: string; event: 'query_error'; error: string }
+    | { id: string; event: 'attached'; alias: string; error?: string }
+    | { id: string; event: 'extension_installed'; name: string; error?: string }
+    | { id: string; event: 'pong' }
+    | { event: 'ready' }
+    | { event: 'error'; error: string };

@@
-                    case 'query':
-                        await handleQuery(msg.sql, msg.params);
+                    case 'query':
+                        await handleQuery(msg.id, msg.sql, msg.params);
                         break;
@@
-async function handleQuery(sql: string, params?: any[]) {
+async function handleQuery(requestId: string, sql: string, params?: any[]) {
@@
-            process.send({ event: 'query_result', data: rows } as SidecarResponse);
+            process.send({ id: requestId, event: 'query_result', data: rows } as SidecarResponse);
@@
-            process.send({ event: 'query_error', error: errorMsg } as SidecarResponse);
+            process.send({ id: requestId, event: 'query_error', error: errorMsg } as SidecarResponse);

Also applies to: 98-130, 137-160

🤖 Prompt for AI Agents
In `@backend/src/bin/run-duckdb-sidecar.ts` around lines 15 - 28, Sidecar IPC
messages lack correlation IDs causing responses to be misrouted under
concurrency; add a requestId string (or number) to SidecarRequest and have the
sidecar include the same requestId on every SidecarResponse variant
(query_result, query_error, attached, extension_installed, pong, ready, error)
and update the IPC send/receive handlers to route responses by requestId.
Concretely: extend the SidecarRequest union to include requestId, extend every
SidecarResponse interface to include requestId, update the code paths that send
responses (the sidecar message emitter/handler) to echo requestId back, and
update the caller-side promise registry (the code that sends a request and
awaits a response) to key promises by requestId and resolve/reject only when a
response with the matching requestId arrives. Ensure all places that construct
or consume SidecarRequest/SidecarResponse (types SidecarRequest, SidecarResponse
and the IPC onMessage/send logic) are updated accordingly.

Comment on lines +79 to +100
async createVectorSearchInstance(
vectorDimension: number,
estimatedRows: number = 100000
): Promise<Database> {
// Estima memória necessária: ~2GB base + ~1KB por vetor para HNSW index
const estimatedMemoryMB = 2048 + Math.ceil((estimatedRows * 1) / 1024);
const memoryLimit = `${Math.min(estimatedMemoryMB, 8192)}MB`; // Max 8GB

const config: DuckDBConfig = {
dbPath: ':memory:',
memoryLimit,
threads: 4,
extensions: ['vss', 'fts', 'postgres'],
};

const db = await this.createInstance(config);

// Habilita persistência experimental do índice HNSW se necessário
await db.run('SET hnsw_enable_experimental_persistence = true');

return db;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: vectorDimension parameter is unused in memory calculation.

The method accepts vectorDimension but the memory estimation (line 84) only considers estimatedRows. For high-dimensional embeddings (e.g., 1536 for OpenAI), this significantly underestimates memory requirements.

🐛 Proposed fix
     async createVectorSearchInstance(
         vectorDimension: number,
         estimatedRows: number = 100000
     ): Promise<Database> {
-        // Estima memória necessária: ~2GB base + ~1KB por vetor para HNSW index
-        const estimatedMemoryMB = 2048 + Math.ceil((estimatedRows * 1) / 1024);
+        // Estima memória necessária: ~2GB base + (rows * dimension * 4 bytes per float32) for vectors
+        // Plus ~1.5x overhead for HNSW index structures
+        const vectorMemoryMB = Math.ceil((estimatedRows * vectorDimension * 4 * 1.5) / (1024 * 1024));
+        const estimatedMemoryMB = 2048 + vectorMemoryMB;
         const memoryLimit = `${Math.min(estimatedMemoryMB, 8192)}MB`; // Max 8GB
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async createVectorSearchInstance(
vectorDimension: number,
estimatedRows: number = 100000
): Promise<Database> {
// Estima memória necessária: ~2GB base + ~1KB por vetor para HNSW index
const estimatedMemoryMB = 2048 + Math.ceil((estimatedRows * 1) / 1024);
const memoryLimit = `${Math.min(estimatedMemoryMB, 8192)}MB`; // Max 8GB
const config: DuckDBConfig = {
dbPath: ':memory:',
memoryLimit,
threads: 4,
extensions: ['vss', 'fts', 'postgres'],
};
const db = await this.createInstance(config);
// Habilita persistência experimental do índice HNSW se necessário
await db.run('SET hnsw_enable_experimental_persistence = true');
return db;
}
async createVectorSearchInstance(
vectorDimension: number,
estimatedRows: number = 100000
): Promise<Database> {
// Estima memória necessária: ~2GB base + (rows * dimension * 4 bytes per float32) for vectors
// Plus ~1.5x overhead for HNSW index structures
const vectorMemoryMB = Math.ceil((estimatedRows * vectorDimension * 4 * 1.5) / (1024 * 1024));
const estimatedMemoryMB = 2048 + vectorMemoryMB;
const memoryLimit = `${Math.min(estimatedMemoryMB, 8192)}MB`; // Max 8GB
const config: DuckDBConfig = {
dbPath: ':memory:',
memoryLimit,
threads: 4,
extensions: ['vss', 'fts', 'postgres'],
};
const db = await this.createInstance(config);
// Habilita persistência experimental do índice HNSW se necessário
await db.run('SET hnsw_enable_experimental_persistence = true');
return db;
}
🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/duckdb-provider.service.ts` around lines 79 - 100,
The createVectorSearchInstance function currently ignores the vectorDimension
parameter when computing estimatedMemoryMB; update the memory estimation to
include vectorDimension by accounting for bytes per embedding element (e.g., 4
bytes for float32) times estimatedRows and adding any index overhead (previously
approximated as ~1KB per vector), then convert to MB and apply the same min/max
clamp used for memoryLimit; update the calculation that assigns
estimatedMemoryMB (and keep using memoryLimit, DuckDBConfig, and the existing
createInstance/db run flow) so the new estimate reflects both estimatedRows and
vectorDimension.

Comment on lines +105 to +113
async attachPostgres(
db: Database,
connectionString: string,
alias: string = 'postgres'
): Promise<void> {
this.logger.log(`Anexando PostgreSQL como ${alias}...`);
await db.run(`ATTACH '${connectionString}' AS ${alias} (TYPE postgres)`);
this.logger.log(`PostgreSQL anexado com sucesso como ${alias}`);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection risk: connection string is not escaped.

If connectionString contains a single quote (which could occur in passwords), it would break the SQL or allow injection. Escape single quotes in the connection string.

🛡️ Proposed fix
     async attachPostgres(
         db: Database,
         connectionString: string,
         alias: string = 'postgres'
     ): Promise<void> {
         this.logger.log(`Anexando PostgreSQL como ${alias}...`);
-        await db.run(`ATTACH '${connectionString}' AS ${alias} (TYPE postgres)`);
+        const escapedConnStr = connectionString.replace(/'/g, "''");
+        const safeAlias = alias.replace(/[^a-zA-Z0-9_]/g, '');
+        await db.run(`ATTACH '${escapedConnStr}' AS ${safeAlias} (TYPE postgres)`);
         this.logger.log(`PostgreSQL anexado com sucesso como ${alias}`);
     }
🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/duckdb-provider.service.ts` around lines 105 - 113,
The attachPostgres method currently interpolates connectionString into the SQL
which allows SQL injection if the string contains a single quote; update
attachPostgres to safely escape single quotes in connectionString (e.g., replace
each ' with '' ) before calling db.run(`ATTACH '${escapedConnectionString}' AS
${alias} (TYPE postgres)`), or better yet use a parameterized/bound query API of
Database if available; ensure the changed symbol is attachPostgres and the
db.run invocation is the only place modified.

Comment on lines +135 to +147
const indexName = `idx_${tableName}_${columnName}_hnsw`;

await db.run(`
CREATE INDEX IF NOT EXISTS ${indexName}
ON ${tableName}
USING HNSW (${columnName})
WITH (
metric = '${metric}',
ef_construction = ${efConstruction},
ef_search = ${efSearch},
M = ${M}
)
`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Identifier injection risk in index creation.

tableName and columnName are interpolated without sanitization. Apply the same identifier validation pattern suggested for the search service.

🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/duckdb-provider.service.ts` around lines 135 - 147,
The CREATE INDEX call builds SQL by interpolating tableName and columnName
directly (see indexName and the db.run(...) block) which allows identifier
injection; validate/sanitize these identifiers the same way used in the search
service (e.g., an allowlist/regex check for [A-Za-z0-9_]+ and/or a helper like
validateIdentifier(name)) before composing indexName and the SQL, and throw or
reject when validation fails so only safe table/column names are used in the
db.run call.

Comment on lines +187 to +210
async syncFromPostgres(
tableName: string,
options?: {
columns?: string[];
where?: string;
limit?: number;
duckdbTableName?: string;
}
): Promise<QueryResult> {
const duckdbTable = options?.duckdbTableName || tableName;
const columns = options?.columns?.join(', ') || '*';
const whereClause = options?.where ? `WHERE ${options.where}` : '';
const limitClause = options?.limit ? `LIMIT ${options.limit}` : '';

// Cria tabela no DuckDB a partir do PostgreSQL
const sql = `
CREATE OR REPLACE TABLE ${duckdbTable} AS
SELECT ${columns}
FROM postgres.${tableName}
${whereClause}
${limitClause}
`;

return this.query(sql);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's read the full duckdb-sidecar.service.ts file to understand context
wc -l backend/src/common/duckdb/duckdb-sidecar.service.ts

Repository: AppCivico/smae

Length of output: 113


🏁 Script executed:

# Check if PATTERNS.md exists and read it
if [ -f backend/PATTERNS.md ]; then
  cat backend/PATTERNS.md
fi

Repository: AppCivico/smae

Length of output: 22118


🏁 Script executed:

# Search for all calls to syncFromPostgres to determine if user input is passed
rg "syncFromPostgres" -A 3 -B 1

Repository: AppCivico/smae

Length of output: 2052


🏁 Script executed:

cat -n backend/src/common/duckdb/duckdb-sidecar.service.ts | head -100

Repository: AppCivico/smae

Length of output: 4513


🏁 Script executed:

cat -n backend/src/common/duckdb/duckdb-sidecar.service.ts | tail -100

Repository: AppCivico/smae

Length of output: 3927


🏁 Script executed:

# Read duckdb-search.service.ts to see how syncFromPostgres is called
cat backend/src/common/duckdb/duckdb-search.service.ts

Repository: AppCivico/smae

Length of output: 12688


🏁 Script executed:

# Search for callers of duckdb-search service methods
rg "duckdbSearch|DuckdbSearchService" -A 2 -B 1 | head -100

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Read lines 187-210 of duckdb-sidecar.service.ts specifically
sed -n '187,210p' backend/src/common/duckdb/duckdb-sidecar.service.ts

Repository: AppCivico/smae

Length of output: 881


🏁 Script executed:

# Search for callers of syncTable or vectorSearch/bm25Search from controllers
rg "syncTable|vectorSearch|bm25Search|hybridSearch" -B 3 | grep -E "(controller|Controller|@Post|@Get|@Patch|@Delete)" -A 5 -B 5 | head -60

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Check if DuckDBSearchService is injected into any controllers
rg "DuckDBSearchService|duckdbSearch" -B 2 -A 2 | grep -E "(controller|Controller|Injectable|inject)" -B 3 -A 3

Repository: AppCivico/smae

Length of output: 2436


🏁 Script executed:

# Search for actual REST controllers that might expose DuckDB search functionality
find backend/src -name "*.controller.ts" -exec grep -l "DuckDBSearchService\|DocumentoSearchService\|ImagemSearchService\|duckdbSearch" {} \;

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Search more broadly for controllers that might use these services
rg "DuckDBSearchService|DocumentoSearchService|ImagemSearchService" backend/src --type ts | grep -v "example-usage" | grep -v "node_modules"

Repository: AppCivico/smae

Length of output: 702


🏁 Script executed:

# Check if there are any actual usages in the real codebase (not examples)
rg "syncTable|vectorSearch|bm25Search" backend/src --type ts | grep -v example-usage | grep -v "^.*\.service\.ts:" | head -20

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Look at the complete syncFromPostgres implementation including any validation
sed -n '180,225p' backend/src/common/duckdb/duckdb-sidecar.service.ts

Repository: AppCivico/smae

Length of output: 1499


🏁 Script executed:

# Search for any actual usage of syncTable from a real service (not example)
rg "\.syncTable\(" backend/src --type ts | grep -v example-usage

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Look for all files that import DuckDBModule to see where it's actually used
rg "DuckDBModule|from.*duckdb" backend/src --type ts | grep -v example-usage | grep -v "\.spec\."

Repository: AppCivico/smae

Length of output: 2198


🏁 Script executed:

# Check if there are any existing vulnerabilities in vectorSearch or bm25Search that ARE exposed
rg -A 10 "async.*Search.*tableName.*string" backend/src/common/duckdb/duckdb-search.service.ts | head -40

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Look at the complete vectorSearch to see if where clause validation is done
sed -n '220,280p' backend/src/common/duckdb/duckdb-search.service.ts

Repository: AppCivico/smae

Length of output: 2070


🏁 Script executed:

# Check if any modules besides example-usage actually call the DuckDBSearchService methods
rg "syncTable\(|vectorSearch\(|bm25Search\(" backend/src --type ts | grep -v example-usage | grep -v "duckdb-search.service.ts"

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Verify that no controllers expose these methods to REST endpoints
find backend/src -name "*.controller.ts" -type f | wc -l

Repository: AppCivico/smae

Length of output: 61


🏁 Script executed:

# Double-check: are there any `@Post`, `@Get`, `@Patch`, `@Delete` that call duckdb search?
rg "@Post|@Get|@Patch|@Delete" backend/src --type ts -A 5 | rg -B 5 "syncTable\(|vectorSearch\(|bm25Search\(|DuckDBSearchService"

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Check the exact signature and parameters of syncFromPostgres to confirm the vulnerability pattern
sed -n '185,235p' backend/src/common/duckdb/duckdb-sidecar.service.ts

Repository: AppCivico/smae

Length of output: 1928


🏁 Script executed:

# Check where syncTable is actually called with user input or if it's only internal
rg "\.syncTable\(" backend/src --type ts -B 5 -A 3

Repository: AppCivico/smae

Length of output: 3117


🏁 Script executed:

# Verify no API endpoint in any controller accepts tableName, columns, where as query/body params
rg "@Query|@Body|@Param" backend/src/common/duckdb --type ts -A 2

Repository: AppCivico/smae

Length of output: 826


🏁 Script executed:

# Final check: look at the query method to see if params are supported for parameterized queries
sed -n '86,130p' backend/src/common/duckdb/duckdb-sidecar.service.ts

Repository: AppCivico/smae

Length of output: 1845


Fix SQL injection risk in syncFromPostgres before exposing to user input.

Lines 187-210 interpolate tableName, columns, where, and limit directly into SQL. While these methods are not currently exposed via REST endpoints, this pattern should be fixed defensively:

  • Use identifier whitelisting for tableName and duckdbTableName
  • Use column name validation for columns array
  • Build where clauses through a safe predicate builder rather than string concatenation
  • The query() method supports parameterized queries via params — use them for dynamic values

This ensures the code remains safe if refactored to accept user input in the future or exposed through new API endpoints.

🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/duckdb-sidecar.service.ts` around lines 187 - 210,
The syncFromPostgres method builds SQL by directly interpolating tableName,
columns, options.where and options.limit, creating SQL injection risk; update
syncFromPostgres to validate/whitelist tableName and options.duckdbTableName
against allowed identifiers, validate each entry in options.columns against a
safe column-name pattern or whitelist, construct the WHERE predicates via a safe
predicate builder (or accept structured filter objects) and pass any dynamic
values (e.g. filter values and limit) as parameters to the existing query(sql,
params) call instead of string concatenation, keeping only safe identifiers
(validated) in the SQL string and all user-supplied values parameterized via the
query(...) method.

Comment on lines +351 to +372
private startHealthCheck() {
this.healthCheckInterval = setInterval(async () => {
if (!this.isSidecarReady()) {
this.logger.warn('Sidecar não está pronto durante health check');
return;
}

// Envia ping
const pingTime = Date.now();
this.childProcess!.send({ type: 'ping' } as SidecarRequest);

// Aguarda pong por um tempo limitado
setTimeout(() => {
if (this.lastPongTime && this.lastPongTime.getTime() < pingTime) {
this.logger.error('Pong não recebido a tempo, sidecar pode estar travado');
// Força reinício
if (this.childProcess) {
this.childProcess.kill('SIGKILL');
}
}
}, this.PONG_TIMEOUT_MS);
}, this.HEALTH_CHECK_INTERVAL_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle the “no pong ever received” case in health checks.

If lastPongTime is null, the timeout never forces a restart. Treat missing pong as failure.

🩺 Suggested fix
-            setTimeout(() => {
-                if (this.lastPongTime && this.lastPongTime.getTime() < pingTime) {
+            setTimeout(() => {
+                if (!this.lastPongTime || this.lastPongTime.getTime() < pingTime) {
                     this.logger.error('Pong não recebido a tempo, sidecar pode estar travado');
                     if (this.childProcess) {
                         this.childProcess.kill('SIGKILL');
                     }
                 }
             }, this.PONG_TIMEOUT_MS);
🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/duckdb-sidecar.service.ts` around lines 351 - 372,
In startHealthCheck(), the timeout callback currently only restarts the sidecar
if lastPongTime exists and is older than pingTime, so the “no pong ever
received” case is ignored; update the logic in the setTimeout inside
startHealthCheck() (referencing lastPongTime, PONG_TIMEOUT_MS, childProcess,
isSidecarReady) to treat a null/undefined lastPongTime as a failure condition
(i.e., if !this.lastPongTime || this.lastPongTime.getTime() < pingTime) then log
an error and force-restart the sidecar by killing this.childProcess (preserving
the existing SIGKILL behavior).

Comment on lines +136 to +169
@Get('documentos')
async buscarDocumentos(
@Query('q') query: string,
@Query('tipo') tipo?: 'texto' | 'semantica' | 'hibrida',
@Query('limite') limite: number = 10
) {
if (!query) return { error: 'Query obrigatória' };

switch (tipo) {
case 'semantica':
// Aqui você precisaria gerar o embedding da query
// usando um modelo como CLIP ou sentence-transformers
const embedding = await this.gerarEmbedding(query);
return this.docSearch.buscarPorEmbedding(embedding, limite);

case 'hibrida':
const embeddingHibrido = await this.gerarEmbedding(query);
return this.docSearch.buscarHibrida(query, embeddingHibrido, limite);

case 'texto':
default:
return this.docSearch.buscarPorTexto(query, limite);
}
}

@Get('imagens')
async buscarImagens(
@Query('embedding') embeddingStr: string,
@Query('limite') limite: number = 10
) {
// Recebe embedding como JSON string: "[0.1, 0.2, ...]"
const embedding = JSON.parse(embeddingStr);
return this.imgSearch.buscarImagensSimilares(embedding, limite);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "example-usage.ts" | head -20

Repository: AppCivico/smae

Length of output: 102


🏁 Script executed:

cat backend/PATTERNS.md 2>/dev/null || echo "File not found at backend/PATTERNS.md"

Repository: AppCivico/smae

Length of output: 22118


🏁 Script executed:

wc -l backend/src/common/duckdb/example-usage.ts 2>/dev/null || echo "File not found at backend/src/common/duckdb/example-usage.ts"

Repository: AppCivico/smae

Length of output: 104


🏁 Script executed:

sed -n '130,175p' backend/src/common/duckdb/example-usage.ts

Repository: AppCivico/smae

Length of output: 1737


🏁 Script executed:

sed -n '1,50p' backend/src/common/duckdb/example-usage.ts

Repository: AppCivico/smae

Length of output: 1882


🏁 Script executed:

rg "ParseIntPipe|DefaultValuePipe" --type ts | head -20

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

rg "class BuscaController" -A 100 backend/src/common/duckdb/example-usage.ts | grep -E "import|from" | head -10

Repository: AppCivico/smae

Length of output: 205


🏁 Script executed:

cat > /tmp/check_nestjs.js << 'EOF'
// Check if NestJS exports these pipes
const packages = [
  '@nestjs/common'
];

packages.forEach(pkg => {
  try {
    const mod = require(pkg);
    const hasParseIntPipe = 'ParseIntPipe' in mod;
    const hasDefaultValuePipe = 'DefaultValuePipe' in mod;
    const hasBadRequestException = 'BadRequestException' in mod;
    console.log(`${pkg}:`);
    console.log(`  ParseIntPipe: ${hasParseIntPipe}`);
    console.log(`  DefaultValuePipe: ${hasDefaultValuePipe}`);
    console.log(`  BadRequestException: ${hasBadRequestException}`);
  } catch(e) {
    console.log(`${pkg}: Package not found or error: ${e.message}`);
  }
});
EOF
node /tmp/check_nestjs.js 2>&1 || echo "Node execution not possible in sandbox"

Repository: AppCivico/smae

Length of output: 175


🏁 Script executed:

# Check NestJS version in package.json
grep -A 2 '"@nestjs/common"' backend/package.json

Repository: AppCivico/smae

Length of output: 153


🏁 Script executed:

# Search for existing usage of ParseIntPipe or DefaultValuePipe in the codebase
rg "ParseIntPipe|DefaultValuePipe" backend/src --type ts -l

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

# Look for `@Query` decorators with pipes to see how this codebase handles query params
rg "@Query\(" backend/src --type ts -A 2 | head -40

Repository: AppCivico/smae

Length of output: 2967


🏁 Script executed:

# Check for error handling patterns with JSON.parse
rg "JSON\.parse" backend/src --type ts -B 2 -A 2 | head -50

Repository: AppCivico/smae

Length of output: 3819


🏁 Script executed:

# Check file context - is this truly an example file or actual code?
head -30 backend/src/common/duckdb/example-usage.ts

Repository: AppCivico/smae

Length of output: 1227


🏁 Script executed:

# Search for any imports or usage of BuscaController or DocumentoSearchService
rg "BuscaController|DocumentoSearchService|ImagemSearchService" backend/src --type ts

Repository: AppCivico/smae

Length of output: 1594


🏁 Script executed:

# Check if NestJS exception handling follows HttpException pattern
rg "BadRequestException|HttpException" backend/src --type ts | head -20

Repository: AppCivico/smae

Length of output: 2847


Use DTO-based query parameters with validation and handle JSON parsing errors.

This is example code, but query params arrive as strings and JSON.parse() can throw unhandled errors. Use a typed DTO for @Query() and validate embedding input before parsing:

import { IsString, IsOptional, IsArray } from 'class-validator';

export class BuscaImagensQueryDto {
    `@IsString`()
    embedding: string;

    `@IsOptional`()
    `@IsArray`()
    limite?: number = 10;
}

Then in the controller:

`@Get`('imagens')
async buscarImagens(`@Query`() query: BuscaImagensQueryDto) {
    let embedding: number[];
    try {
        embedding = JSON.parse(query.embedding);
    } catch {
        throw new HttpException('embedding inválido', 400);
    }
    if (!Array.isArray(embedding) || !embedding.every((v) => typeof v === 'number')) {
        throw new HttpException('embedding deve ser um array numérico', 400);
    }
    return this.imgSearch.buscarImagensSimilares(embedding, query.limite ?? 10);
}
🧰 Tools
🪛 Biome (2.3.13)

[error] 148-149: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 152-153: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/example-usage.ts` around lines 136 - 169, Convert
the buscarImagens handler to accept a validated DTO (e.g., BuscaImagensQueryDto)
instead of raw query strings; parse query.embedding inside a try/catch in
buscarImagens to catch JSON.parse errors, validate that the parsed value is an
array of numbers, and throw an HttpException(400) with a clear message on
invalid input; ensure you use the DTO's limite (defaulting to 10) when calling
imgSearch.buscarImagensSimilares; reference the buscarImagens method and the new
BuscaImagensQueryDto so reviewers can find and update the controller and
validation classes.

Comment on lines +144 to +153
switch (tipo) {
case 'semantica':
// Aqui você precisaria gerar o embedding da query
// usando um modelo como CLIP ou sentence-transformers
const embedding = await this.gerarEmbedding(query);
return this.docSearch.buscarPorEmbedding(embedding, limite);

case 'hibrida':
const embeddingHibrido = await this.gerarEmbedding(query);
return this.docSearch.buscarHibrida(query, embeddingHibrido, limite);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find backend -name "example-usage.ts" -type f

Repository: AppCivico/smae

Length of output: 100


🏁 Script executed:

sed -n '144,153p' backend/src/common/duckdb/example-usage.ts | cat -n

Repository: AppCivico/smae

Length of output: 654


🏁 Script executed:

find backend -name "biome.json" -o -name ".eslintrc*" -o -name "tsconfig.json" | head -20

Repository: AppCivico/smae

Length of output: 100


🏁 Script executed:

cat backend/.eslintrc.js

Repository: AppCivico/smae

Length of output: 1162


🏁 Script executed:

find backend -name "biome.json" -o -name "biome.jsonc" -o -name "deno.json*"

Repository: AppCivico/smae

Length of output: 40


🏁 Script executed:

grep -r "no-case-declarations\|switch-exhaustiveness" backend/.eslintrc* || echo "No specific switch-case rules found"

Repository: AppCivico/smae

Length of output: 93


🏁 Script executed:

cat backend/tsconfig.json | head -50

Repository: AppCivico/smae

Length of output: 756


🏁 Script executed:

cat backend/PATTERNS.md | head -100

Repository: AppCivico/smae

Length of output: 3215


Wrap switch-case declarations in blocks to follow ESLint best practices.

The no-case-declarations rule flags variable declarations without block scoping in switch cases. While the return statements prevent actual cross-case access, wrapping each case in braces improves code clarity and consistency.

✅ Suggested fix
-            case 'semantica':
+            case 'semantica': {
                 const embedding = await this.gerarEmbedding(query);
                 return this.docSearch.buscarPorEmbedding(embedding, limite);
+            }
 
-            case 'hibrida':
+            case 'hibrida': {
                 const embeddingHibrido = await this.gerarEmbedding(query);
                 return this.docSearch.buscarHibrida(query, embeddingHibrido, limite);
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
switch (tipo) {
case 'semantica':
// Aqui você precisaria gerar o embedding da query
// usando um modelo como CLIP ou sentence-transformers
const embedding = await this.gerarEmbedding(query);
return this.docSearch.buscarPorEmbedding(embedding, limite);
case 'hibrida':
const embeddingHibrido = await this.gerarEmbedding(query);
return this.docSearch.buscarHibrida(query, embeddingHibrido, limite);
switch (tipo) {
case 'semantica': {
// Aqui você precisaria gerar o embedding da query
// usando um modelo como CLIP ou sentence-transformers
const embedding = await this.gerarEmbedding(query);
return this.docSearch.buscarPorEmbedding(embedding, limite);
}
case 'hibrida': {
const embeddingHibrido = await this.gerarEmbedding(query);
return this.docSearch.buscarHibrida(query, embeddingHibrido, limite);
}
🧰 Tools
🪛 Biome (2.3.13)

[error] 148-149: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 152-153: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/example-usage.ts` around lines 144 - 153, Wrap each
switch case body in a block to satisfy ESLint no-case-declarations: add braces
around the 'semantica' and 'hibrida' case bodies so the local const declarations
(embedding, embeddingHibrido) are block-scoped; keep the existing awaits to
this.gerarEmbedding(query) and the return calls to
this.docSearch.buscarPorEmbedding(...) and this.docSearch.buscarHibrida(...)
inside those new braces.

Comment on lines +11 to +39
```
┌─────────────────────────────────────────────────────────────────┐
│ Main NestJS Process │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ DuckDBSidecarSvc │ │ DuckDBSearchSvc │ │ Other Svcs │ │
│ │ (monitora e │ │ (API de busca) │ │ │ │
│ │ controla) │ │ │ │ │ │
│ └────────┬─────────┘ └────────┬─────────┘ └──────────────┘ │
│ │ │ │
│ │ IPC (fork) │ Usa │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ DuckDB Sidecar Process │ │
│ │ ┌──────────────────────────────────────┐ │ │
│ │ │ DuckDB (:memory:) │ │ │
│ │ │ - VSS extension (HNSW index) │ │ │
│ │ │ - FTS extension (BM25) │ │ │
│ │ │ - Postgres extension │ │ │
│ │ └──────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ ATTACH
┌─────────────────┐
│ PostgreSQL │
│ (dados) │
└─────────────────┘
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks (markdownlint MD040).

This resolves the lint warnings for the architecture diagram and log snippet.

🩹 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────────────────┐
 │                    Main NestJS Process                          │
@@
-```
+```

@@
-```
+```text
 [DuckDBSidecar] Iniciando DuckDB sidecar process...
 [DuckDBSidecar] DuckDB sidecar está pronto
 [DuckDBSearch] Tabela documentos sincronizada com sucesso (1000 linhas)
-```
+```

Also applies to: 249-253

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 11-11: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In `@backend/src/common/duckdb/README.md` around lines 11 - 39, The Markdown
fenced code blocks in the DuckDB README (the ASCII architecture diagram and the
log snippet containing lines like "[DuckDBSidecar] Iniciando DuckDB sidecar
process..." and "[DuckDBSearch] Tabela documentos sincronizada...") lack
language identifiers causing markdownlint MD040 warnings; update each
triple-backtick fence that wraps the diagram and the log snippet to use a
language tag (e.g., ```text) so the fences become ```text ... ```, and ensure
all other similar fenced blocks (including the ones around the log snippet and
the additional block referenced) are updated consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant