Migrate SelfDB Backend from Python/FastAPI to Go with Memcached



## Description
We've been experiencing memory usage limits with our Python/FastAPI backend. This issue tracks the migration of our backend services from Python to Go, incorporating Memcached for caching to improve performance and reduce memory footprint.

## Background
Current issues with Python backend:
- High memory consumption under load
- GIL limitations affecting concurrent request handling
- Memory leaks in long-running async operations
- Slower startup times affecting container orchestration

## Objectives
1. Reduce memory usage by 70-80%
2. Improve request throughput by 3-5x
3. Reduce cold start times for containerized deployments
4. Maintain feature parity with existing Python implementation
5. Improve caching strategy with Memcached

## Proposed Architecture Changes

### 1. Technology Stack Migration

**From:**
- Python 3.13 with FastAPI
- SQLAlchemy ORM (async)
- Alembic for migrations
- In-memory caching
- asyncio for concurrency

**To:**
- Go 1.24+
- Gin or Fiber web framework
- GORM or sqlx for database
- golang-migrate for migrations
- Memcached for distributed caching
- Goroutines for concurrency

### 2. Service Architecture Changes

**Current Structure:**
```
backend/
├── app/
│   ├── apis/endpoints/    # FastAPI routers
│   ├── core/             # Config, security
│   ├── crud/             # Database operations
│   ├── models/           # SQLAlchemy models
│   └── schemas/          # Pydantic schemas
```

**Proposed Go Structure:**
```
backend-go/
├── cmd/
│   └── server/           # Main application entry
├── internal/
│   ├── api/              # HTTP handlers
│   ├── auth/             # Authentication logic
│   ├── cache/            # Memcached client wrapper
│   ├── config/           # Configuration management
│   ├── database/         # DB connection and queries
│   ├── middleware/       # HTTP middleware
│   ├── models/           # Domain models
│   ├── realtime/         # WebSocket handlers
│   └── storage/          # Storage service client
├── pkg/
│   └── utils/            # Shared utilities
└── migrations/           # SQL migrations
```

### 3. Key Component Migrations

#### Authentication & Security
**Current (`backend/app/core/security.py`):**
```python
def create_access_token(subject: str, expires_delta: timedelta = None) -> str:
    # JWT creation with python-jose
```

**Proposed Go:**
```go
// internal/auth/jwt.go
func CreateAccessToken(subject string, expiresDelta time.Duration) (string, error) {
    // Use github.com/golang-jwt/jwt/v5
}
```

#### Database Layer
**Current (`backend/app/db/session.py`):**
```python
engine = create_async_engine(settings.DATABASE_URL)
AsyncSessionLocal = async_sessionmaker(engine)
```

**Proposed Go:**
```go
// internal/database/connection.go
type DB struct {
    *sql.DB
    cache *memcache.Client
}

func NewDB(dsn string, cacheAddr string) (*DB, error) {
    // Initialize both DB and Memcached connections
}
```

#### API Endpoints
**Current (`backend/app/apis/endpoints/files.py`):**
```python
@router.post("/upload", response_model=FileUploadResponse)
async def upload_file_to_bucket(
    db: AsyncSession = Depends(get_db),
    upload_request: FileUploadInitiateRequest,
    requester: Union[User, Literal["anon"], None] = Depends(get_current_user_or_anon),
):
    # File upload logic
```

**Proposed Go:**
```go
// internal/api/files.go
func (h *Handler) UploadFile(c *gin.Context) {
    // Parse request
    // Check auth via middleware
    // Cache check
    // Process upload
    // Update cache
}
```

### 4. Memcached Integration

**Caching Strategy:**
```go
// internal/cache/client.go
type CacheClient struct {
    mc *memcache.Client
    ttl int
}

// Cache keys pattern
const (
    UserCacheKey = "user:%s"
    BucketCacheKey = "bucket:%s"
    FileCacheKey = "file:%s"
    TableMetaCacheKey = "table:meta:%s"
)
```

**Cache Usage Example:**
```go
func (s *UserService) GetUser(ctx context.Context, userID string) (*models.User, error) {
    // Try cache first
    cacheKey := fmt.Sprintf(UserCacheKey, userID)
    if cached, err := s.cache.Get(cacheKey); err == nil {
        return unmarshalUser(cached)
    }
    
    // Database fetch
    user, err := s.db.GetUser(ctx, userID)
    if err != nil {
        return nil, err
    }
    
    // Update cache
    s.cache.Set(cacheKey, marshalUser(user), 300) // 5 min TTL
    return user, nil
}
```

### 5. Docker Compose Changes

**Add Memcached Service:**
```yaml
# docker-compose.yml
services:
  # ... existing services ...
  
  memcached:
    image: memcached:1.6-alpine
    container_name: selfdb_memcached
    ports:
      - "11211:11211"
    command: memcached -m 256 -I 2m
    restart: unless-stopped
    networks:
      - selfdb_network
    healthcheck:
      test: ["CMD", "echo", "stats", "|", "nc", "localhost", "11211"]
      interval: 10s
      timeout: 5s
      retries: 5

  backend:
    build:
      context: ./backend-go
      dockerfile: Dockerfile
    environment:
      - MEMCACHED_ADDR=memcached:11211
      - CACHE_TTL=300
    depends_on:
      - postgres
      - memcached
      - storage_service
```

### 6. Migration Strategy

#### Phase 1: Core Services (Week 1-2)
- [ ] Set up Go project structure
- [ ] Implement configuration management
- [ ] Create database connection pool with GORM/sqlx
- [ ] Set up Memcached client
- [ ] Implement JWT authentication
- [ ] Create middleware (CORS, Auth, Rate Limiting)

#### Phase 2: API Endpoints (Week 3-4)
- [ ] Migrate health check endpoints
- [ ] Migrate auth endpoints (login, register, refresh)
- [ ] Migrate user management endpoints
- [ ] Add comprehensive caching for user data

#### Phase 3: Storage Integration (Week 5)
- [ ] Migrate storage service client
- [ ] Implement file upload/download endpoints
- [ ] Migrate bucket management
- [ ] Cache file metadata aggressively

#### Phase 4: Advanced Features (Week 6-7)
- [ ] Migrate WebSocket/realtime functionality
- [ ] Implement SQL execution endpoint
- [ ] Migrate table management
- [ ] Migrate schema management
- [ ] Migrate cloud functions management

#### Phase 5: Testing & Optimization (Week 8)
- [ ] Performance benchmarking
- [ ] Load testing comparison
- [ ] Memory profiling
- [ ] Cache hit ratio optimization
- [ ] API compatibility testing

### 7. Performance Targets

**Memory Usage:**
- Python Backend: ~500MB-1GB per instance
- Go Backend Target: <100MB per instance
- Memcached: 256MB dedicated

**Request Latency:**
- Auth endpoints: <50ms (with cache)
- File metadata: <20ms (with cache)
- Large file operations: No change (storage-bound)

**Throughput:**
- Current: ~1000 req/s per instance
- Target: 5000+ req/s per instance

### 8. Breaking Changes & Compatibility

**API Compatibility:**
- Maintain exact same REST API structure
- Same request/response formats
- Same authentication headers
- WebSocket protocol unchanged

**Configuration Changes:**
- New environment variables for Memcached
- Modified DATABASE_URL format for Go
- New cache TTL configurations

### 9. Rollback Strategy

- Keep Python backend in maintenance mode
- Use feature flags for gradual rollout
- Implement API gateway for A/B testing
- Maintain database compatibility

## Acceptance Criteria

- [ ] All existing API endpoints migrated with same interface
- [ ] All tests passing with >90% coverage
- [ ] Memory usage reduced by at least 70%
- [ ] Request throughput increased by at least 3x
- [ ] Cache hit ratio >80% for frequently accessed data
- [ ] Zero downtime migration completed
- [ ] Documentation updated for Go implementation
- [ ] Docker images optimized (multi-stage builds)
- [ ] Monitoring and metrics implemented
- [ ] Load testing shows stable performance under stress

## Additional Considerations

1. **Dependency Management:**
   - Use Go modules for dependency management
   - Pin all dependency versions
   - Regular security updates

2. **Error Handling:**
   - Implement structured logging with zerolog
   - Proper error wrapping and context
   - Graceful degradation when cache unavailable

3. **Monitoring:**
   - Prometheus metrics for Go application
   - Memcached statistics monitoring
   - Custom dashboards for cache performance

4. **Development Workflow:**
   - Hot reload for development (Air)
   - Makefile for common tasks
   - Pre-commit hooks for formatting

## References

- Current Python implementation: main.py
- FastAPI to Gin migration guide
- GORM documentation
- Memcached best practices
- Go concurrency patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate SelfDB Backend from Python/FastAPI to Go with Memcached #2

Description

Background

Objectives

Proposed Architecture Changes

1. Technology Stack Migration

2. Service Architecture Changes

3. Key Component Migrations

Authentication & Security

Database Layer

API Endpoints

4. Memcached Integration

5. Docker Compose Changes

6. Migration Strategy

Phase 1: Core Services (Week 1-2)

Phase 2: API Endpoints (Week 3-4)

Phase 3: Storage Integration (Week 5)

Phase 4: Advanced Features (Week 6-7)

Phase 5: Testing & Optimization (Week 8)

7. Performance Targets

8. Breaking Changes & Compatibility

9. Rollback Strategy

Acceptance Criteria

Additional Considerations

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Migrate SelfDB Backend from Python/FastAPI to Go with Memcached #2

Description

Description

Background

Objectives

Proposed Architecture Changes

1. Technology Stack Migration

2. Service Architecture Changes

3. Key Component Migrations

Authentication & Security

Database Layer

API Endpoints

4. Memcached Integration

5. Docker Compose Changes

6. Migration Strategy

Phase 1: Core Services (Week 1-2)

Phase 2: API Endpoints (Week 3-4)

Phase 3: Storage Integration (Week 5)

Phase 4: Advanced Features (Week 6-7)

Phase 5: Testing & Optimization (Week 8)

7. Performance Targets

8. Breaking Changes & Compatibility

9. Rollback Strategy

Acceptance Criteria

Additional Considerations

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions