Add streaming import/export and file-ingestion paths to avoid whole-dataset buffering in the Node client

## Summary
The Node client appears to expose bulk-oriented document operations (`Documents.import()`, `Documents.export()`, `Documents.addPDF()`, `Documents.addCSV()`) while emphasizing high performance and zero external runtime dependencies. The highest-impact gap is the lack of an explicit streaming/backpressure-aware ingestion path: large imports and parsed file payloads are likely materialized fully in memory before being sent over HTTP, which will become the main bottleneck long before hlquery or RocksDB does.

## Context
The recent README expansion documents a broad document pipeline surface area in [README.md](/home/cferry/p/hlquery/node-api/README.md): bulk import/export, local PDF parsing, local CSV parsing, and helper methods that normalize extracted content before indexing. That points directly at `lib/Documents.js`, `lib/Request.js`, and `utils/Validator.js` as the core execution path for potentially large payloads.

This matters because hlquery is positioned as a high-performance search engine/database wrapper around RocksDB. If the Node client buffers entire CSVs, PDFs, export responses, or bulk document arrays in process memory, then the client becomes the throughput and reliability ceiling for indexing jobs. A single large import can cause excessive heap growth, long GC pauses, request retries with oversized bodies, and poor behavior under concurrent ingestion workloads. The first-commit state plus the README’s “tests (for future use)” note also suggests this area may have grown faster than its performance validation coverage.

## Proposed Implementation
1. Add a streaming request mode in `lib/Request.js` that accepts `Readable` bodies, supports chunked transfer, and preserves backpressure instead of serializing everything up front.
2. Extend `lib/Documents.js` with explicit streaming APIs, for example `importStream()`, `exportStream()`, and chunked helpers for large arrays so callers can choose bounded-memory ingestion.
3. Refactor `addCSV()` to parse rows incrementally and flush documents in configurable batches rather than building one large in-memory payload.
4. Refactor `addPDF()` to support a bounded-size ingestion path for extracted text and metadata, with clear limits and failure modes when a file exceeds configured thresholds.
5. Add config knobs for batch size, max in-flight bytes, request timeout, and retry policy so ingestion can be tuned for different deployment profiles.
6. Add tests and benchmarks that cover large imports/exports, memory usage under load, and concurrent ingestion to prevent regressions.

## Impact
This directly improves the most important property of a client for a high-performance search system: sustained ingestion throughput without client-side instability. Bounded-memory streaming will reduce heap pressure, improve reliability for large indexing jobs, make the Node client usable for real production backfills, and align the client’s behavior with hlquery’s performance-oriented positioning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streaming import/export and file-ingestion paths to avoid whole-dataset buffering in the Node client #3

Summary

Context

Proposed Implementation

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add streaming import/export and file-ingestion paths to avoid whole-dataset buffering in the Node client #3

Description

Summary

Context

Proposed Implementation

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions