Skip to content

Migrate from in-memory store to PostgreSQL for persistent data storage #1

@bryanchriswhite

Description

@bryanchriswhite

Summary

PinShare currently uses an in-memory data store for file metadata, which means:

  • All data is lost when the service restarts
  • No data persistence across container rebuilds
  • Limited by available RAM
  • No ACID guarantees for concurrent operations
  • No advanced query capabilities

This issue proposes migrating to PostgreSQL for persistent, scalable, and reliable data storage.

Current Implementation

In-memory store in internal/store/store.go:

type BaseMetadata struct {
  FileSHA256      string
  IpfsCID         string
  FileName        string
  FileType        string
  ModerationVotes int
  Tags            map[string]int
  BanSet          int
}

Data structure:

  • Global map: map[string]*BaseMetadata
  • Key: FileSHA256 (SHA256 hash of file)
  • No persistence, no transactions, no relationships

Proposed Changes

PostgreSQL Schema

CREATE TABLE files (
  file_sha256 VARCHAR(64) PRIMARY KEY,
  ipfs_cid VARCHAR(100) NOT NULL,
  file_name TEXT,
  file_type VARCHAR(100),
  moderation_votes INTEGER DEFAULT 0,
  ban_set INTEGER DEFAULT 0,
  added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT chk_ban_set CHECK (ban_set IN (0, 3, 6, 9))
);

CREATE TABLE file_tags (
  id SERIAL PRIMARY KEY,
  file_sha256 VARCHAR(64) REFERENCES files(file_sha256) ON DELETE CASCADE,
  tag_name VARCHAR(100) NOT NULL,
  tag_count INTEGER DEFAULT 1,
  UNIQUE(file_sha256, tag_name)
);

CREATE INDEX idx_ipfs_cid ON files(ipfs_cid);
CREATE INDEX idx_file_type ON files(file_type);
CREATE INDEX idx_added_at ON files(added_at);
CREATE INDEX idx_tags_file ON file_tags(file_sha256);
CREATE INDEX idx_tags_name ON file_tags(tag_name);

-- Full-text search support (for future search endpoint)
CREATE INDEX idx_file_name_fts ON files USING gin(to_tsvector('english', file_name));

Benefits

  1. Data Persistence: Metadata survives service restarts
  2. Scalability: Not limited by RAM, can store millions of records
  3. Reliability: ACID transactions, data integrity guarantees
  4. Advanced Queries:
    • Full-text search on file names
    • Complex filtering and aggregations
    • Efficient joins for tags and relationships
  5. Backup & Recovery: Standard PostgreSQL backup tools
  6. Monitoring: Rich ecosystem of PostgreSQL monitoring tools
  7. Multi-user: Proper concurrent access handling

Implementation Steps

  1. Add PostgreSQL dependency

    • Update go.mod with PostgreSQL driver (e.g., pgx or lib/pq)
    • Add PostgreSQL to docker-compose.yml
  2. Create database layer

    • Implement store interface abstraction
    • Create PostgreSQL implementation
    • Add connection pooling
    • Implement migrations system (e.g., golang-migrate)
  3. Update API handlers

    • Replace in-memory store calls with DB queries
    • Add proper error handling for DB operations
    • Implement transaction support where needed
  4. Migration path

    • Support both stores during transition (feature flag)
    • Provide export/import tools for data migration
    • Update documentation
  5. Configuration

    • Add database connection config (env vars)
    • Support connection string or individual params
    • Default to in-memory for development if DB not available

Docker Compose Example

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: pinshare
      POSTGRES_USER: pinshare
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./migrations:/docker-entrypoint-initdb.d
    ports:
      - "5432:5432"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U pinshare"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:

Recommended Go Libraries

  • Database driver: github.com/jackc/pgx/v5 (high-performance, PostgreSQL-specific)
  • Query builder: github.com/Masterminds/squirrel (optional, for complex queries)
  • Migrations: github.com/golang-migrate/migrate/v4
  • Connection pooling: Built into pgx

Backward Compatibility

Maintain the in-memory store as an option:

  • Feature flag or env var: STORAGE_TYPE=postgres|memory
  • Development defaults to memory for simplicity
  • Production uses PostgreSQL for reliability

Testing

  • Add integration tests with testcontainers
  • Unit tests for store implementation
  • Migration tests
  • Performance benchmarks

Related Issues

Implementation Priority

High Priority - Data persistence is critical for production deployments. Current in-memory approach is only suitable for testing/development.

Additional Considerations

Alternative: SQLite

For simpler deployments, SQLite could be an intermediate step:

  • File-based persistence (simpler than PostgreSQL)
  • Still supports SQL queries and FTS
  • No separate database service required
  • Good for single-node deployments

However, PostgreSQL is recommended for:

  • Multi-node deployments
  • Better concurrency handling
  • More mature full-text search
  • Production-grade monitoring and tooling

Migrated from bryanchriswhite/PinShare#3

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions