Skip to content

Migrate SelfDB Backend from Python/FastAPI to Go with Memcached #2

@Selfdb-io

Description

@Selfdb-io

Description

We've been experiencing memory usage limits with our Python/FastAPI backend. This issue tracks the migration of our backend services from Python to Go, incorporating Memcached for caching to improve performance and reduce memory footprint.

Background

Current issues with Python backend:

  • High memory consumption under load
  • GIL limitations affecting concurrent request handling
  • Memory leaks in long-running async operations
  • Slower startup times affecting container orchestration

Objectives

  1. Reduce memory usage by 70-80%
  2. Improve request throughput by 3-5x
  3. Reduce cold start times for containerized deployments
  4. Maintain feature parity with existing Python implementation
  5. Improve caching strategy with Memcached

Proposed Architecture Changes

1. Technology Stack Migration

From:

  • Python 3.13 with FastAPI
  • SQLAlchemy ORM (async)
  • Alembic for migrations
  • In-memory caching
  • asyncio for concurrency

To:

  • Go 1.24+
  • Gin or Fiber web framework
  • GORM or sqlx for database
  • golang-migrate for migrations
  • Memcached for distributed caching
  • Goroutines for concurrency

2. Service Architecture Changes

Current Structure:

backend/
├── app/
│   ├── apis/endpoints/    # FastAPI routers
│   ├── core/             # Config, security
│   ├── crud/             # Database operations
│   ├── models/           # SQLAlchemy models
│   └── schemas/          # Pydantic schemas

Proposed Go Structure:

backend-go/
├── cmd/
│   └── server/           # Main application entry
├── internal/
│   ├── api/              # HTTP handlers
│   ├── auth/             # Authentication logic
│   ├── cache/            # Memcached client wrapper
│   ├── config/           # Configuration management
│   ├── database/         # DB connection and queries
│   ├── middleware/       # HTTP middleware
│   ├── models/           # Domain models
│   ├── realtime/         # WebSocket handlers
│   └── storage/          # Storage service client
├── pkg/
│   └── utils/            # Shared utilities
└── migrations/           # SQL migrations

3. Key Component Migrations

Authentication & Security

Current (backend/app/core/security.py):

def create_access_token(subject: str, expires_delta: timedelta = None) -> str:
    # JWT creation with python-jose

Proposed Go:

// internal/auth/jwt.go
func CreateAccessToken(subject string, expiresDelta time.Duration) (string, error) {
    // Use github.com/golang-jwt/jwt/v5
}

Database Layer

Current (backend/app/db/session.py):

engine = create_async_engine(settings.DATABASE_URL)
AsyncSessionLocal = async_sessionmaker(engine)

Proposed Go:

// internal/database/connection.go
type DB struct {
    *sql.DB
    cache *memcache.Client
}

func NewDB(dsn string, cacheAddr string) (*DB, error) {
    // Initialize both DB and Memcached connections
}

API Endpoints

Current (backend/app/apis/endpoints/files.py):

@router.post("/upload", response_model=FileUploadResponse)
async def upload_file_to_bucket(
    db: AsyncSession = Depends(get_db),
    upload_request: FileUploadInitiateRequest,
    requester: Union[User, Literal["anon"], None] = Depends(get_current_user_or_anon),
):
    # File upload logic

Proposed Go:

// internal/api/files.go
func (h *Handler) UploadFile(c *gin.Context) {
    // Parse request
    // Check auth via middleware
    // Cache check
    // Process upload
    // Update cache
}

4. Memcached Integration

Caching Strategy:

// internal/cache/client.go
type CacheClient struct {
    mc *memcache.Client
    ttl int
}

// Cache keys pattern
const (
    UserCacheKey = "user:%s"
    BucketCacheKey = "bucket:%s"
    FileCacheKey = "file:%s"
    TableMetaCacheKey = "table:meta:%s"
)

Cache Usage Example:

func (s *UserService) GetUser(ctx context.Context, userID string) (*models.User, error) {
    // Try cache first
    cacheKey := fmt.Sprintf(UserCacheKey, userID)
    if cached, err := s.cache.Get(cacheKey); err == nil {
        return unmarshalUser(cached)
    }
    
    // Database fetch
    user, err := s.db.GetUser(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    // Update cache
    s.cache.Set(cacheKey, marshalUser(user), 300) // 5 min TTL
    return user, nil
}

5. Docker Compose Changes

Add Memcached Service:

# docker-compose.yml
services:
  # ... existing services ...
  
  memcached:
    image: memcached:1.6-alpine
    container_name: selfdb_memcached
    ports:
      - "11211:11211"
    command: memcached -m 256 -I 2m
    restart: unless-stopped
    networks:
      - selfdb_network
    healthcheck:
      test: ["CMD", "echo", "stats", "|", "nc", "localhost", "11211"]
      interval: 10s
      timeout: 5s
      retries: 5

  backend:
    build:
      context: ./backend-go
      dockerfile: Dockerfile
    environment:
      - MEMCACHED_ADDR=memcached:11211
      - CACHE_TTL=300
    depends_on:
      - postgres
      - memcached
      - storage_service

6. Migration Strategy

Phase 1: Core Services (Week 1-2)

  • Set up Go project structure
  • Implement configuration management
  • Create database connection pool with GORM/sqlx
  • Set up Memcached client
  • Implement JWT authentication
  • Create middleware (CORS, Auth, Rate Limiting)

Phase 2: API Endpoints (Week 3-4)

  • Migrate health check endpoints
  • Migrate auth endpoints (login, register, refresh)
  • Migrate user management endpoints
  • Add comprehensive caching for user data

Phase 3: Storage Integration (Week 5)

  • Migrate storage service client
  • Implement file upload/download endpoints
  • Migrate bucket management
  • Cache file metadata aggressively

Phase 4: Advanced Features (Week 6-7)

  • Migrate WebSocket/realtime functionality
  • Implement SQL execution endpoint
  • Migrate table management
  • Migrate schema management
  • Migrate cloud functions management

Phase 5: Testing & Optimization (Week 8)

  • Performance benchmarking
  • Load testing comparison
  • Memory profiling
  • Cache hit ratio optimization
  • API compatibility testing

7. Performance Targets

Memory Usage:

  • Python Backend: ~500MB-1GB per instance
  • Go Backend Target: <100MB per instance
  • Memcached: 256MB dedicated

Request Latency:

  • Auth endpoints: <50ms (with cache)
  • File metadata: <20ms (with cache)
  • Large file operations: No change (storage-bound)

Throughput:

  • Current: ~1000 req/s per instance
  • Target: 5000+ req/s per instance

8. Breaking Changes & Compatibility

API Compatibility:

  • Maintain exact same REST API structure
  • Same request/response formats
  • Same authentication headers
  • WebSocket protocol unchanged

Configuration Changes:

  • New environment variables for Memcached
  • Modified DATABASE_URL format for Go
  • New cache TTL configurations

9. Rollback Strategy

  • Keep Python backend in maintenance mode
  • Use feature flags for gradual rollout
  • Implement API gateway for A/B testing
  • Maintain database compatibility

Acceptance Criteria

  • All existing API endpoints migrated with same interface
  • All tests passing with >90% coverage
  • Memory usage reduced by at least 70%
  • Request throughput increased by at least 3x
  • Cache hit ratio >80% for frequently accessed data
  • Zero downtime migration completed
  • Documentation updated for Go implementation
  • Docker images optimized (multi-stage builds)
  • Monitoring and metrics implemented
  • Load testing shows stable performance under stress

Additional Considerations

  1. Dependency Management:

    • Use Go modules for dependency management
    • Pin all dependency versions
    • Regular security updates
  2. Error Handling:

    • Implement structured logging with zerolog
    • Proper error wrapping and context
    • Graceful degradation when cache unavailable
  3. Monitoring:

    • Prometheus metrics for Go application
    • Memcached statistics monitoring
    • Custom dashboards for cache performance
  4. Development Workflow:

    • Hot reload for development (Air)
    • Makefile for common tasks
    • Pre-commit hooks for formatting

References

  • Current Python implementation: main.py
  • FastAPI to Gin migration guide
  • GORM documentation
  • Memcached best practices
  • Go concurrency patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions