Troubleshooting Guide

Common issues and solutions for VideoGen Messenger.

Server Issues
Database Issues
Redis Issues
Elasticsearch Issues
API Issues
Video Generation Issues
Performance Issues
Deployment Issues

Server Issues

Server Won't Start

Symptom: Server crashes immediately or won't start

Common Causes:

Port already in use
Missing environment variables
Database connection failure
Node version mismatch

Solutions:

# Check if port is in use
lsof -i :3000
# Kill process if needed
kill -9 <PID>

# Verify environment variables
cat .env
# Ensure all required vars are set

# Check Node version
node --version
# Should be 18.0.0 or higher

# Check logs for detailed error
npm run dev 2>&1 | tee server.log

ECONNREFUSED Errors

Symptom: Connection refused errors

Solution:

# Verify services are running
docker ps

# Restart services
docker-compose restart

# Check service health
curl http://localhost:5432  # PostgreSQL
curl http://localhost:6379  # Redis
curl http://localhost:9200  # Elasticsearch

Module Not Found Errors

Symptom: Cannot find module errors

Solution:

# Clear node_modules and reinstall
rm -rf node_modules package-lock.json
npm install

# Clear npm cache
npm cache clean --force

Database Issues

Connection Timeout

Symptom: Database connection timeouts

Solutions:

# Check database is running
docker ps | grep postgres

# Test connection
psql postgresql://postgres:postgres@localhost:5432/videogen_dev

# Check connection pool settings
# In .env:
DATABASE_POOL_MAX=10
DATABASE_POOL_MIN=2

# Check for hanging connections
SELECT pid, usename, application_name, state, query
FROM pg_stat_activity
WHERE datname = 'videogen_dev';

# Kill hanging connections if needed
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'videogen_dev' AND pid <> pg_backend_pid();

Migration Failures

Symptom: Database migrations fail

Solutions:

# Check migration status
npm run migrate:status

# Rollback failed migration
npm run migrate:rollback

# Run migrations with verbose logging
NODE_ENV=development npm run migrate

# Manually run SQL if needed
psql videogen_dev < migrations/001_initial.sql

Slow Queries

Symptom: Database queries taking too long

Solutions:

-- Enable query logging
ALTER DATABASE videogen_dev SET log_min_duration_statement = 100;

-- Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Add missing indexes
CREATE INDEX idx_videos_created_at ON videos(created_at DESC);
CREATE INDEX idx_videos_user_status ON videos(user_id, status);

-- Analyze table statistics
ANALYZE videos;
VACUUM ANALYZE videos;

Redis Issues

Redis Connection Failed

Symptom: Cannot connect to Redis

Solutions:

# Check Redis is running
docker ps | grep redis

# Test connection
redis-cli ping
# Should return "PONG"

# Check Redis logs
docker logs videogen-redis

# Restart Redis
docker restart videogen-redis

# Check Redis configuration
redis-cli CONFIG GET maxmemory
redis-cli CONFIG GET maxmemory-policy

Redis Memory Full

Symptom: OOM errors from Redis

Solutions:

# Check memory usage
redis-cli INFO memory

# Set eviction policy
redis-cli CONFIG SET maxmemory-policy allkeys-lru

# Increase memory limit
redis-cli CONFIG SET maxmemory 2gb

# Clear cache if needed
redis-cli FLUSHDB

Slow Redis Operations

Symptom: Redis commands timing out

Solutions:

# Check slow log
redis-cli SLOWLOG GET 10

# Monitor commands in real-time
redis-cli MONITOR

# Check for large keys
redis-cli --bigkeys

# Use pipeline for bulk operations
# Instead of multiple SET commands, use MSET

Elasticsearch Issues

Elasticsearch Not Starting

Symptom: Elasticsearch fails to start

Solutions:

# Check logs
docker logs videogen-elasticsearch

# Increase memory
# In docker-compose.yml:
ES_JAVA_OPTS=-Xms1g -Xmx1g

# Check disk space
df -h

# Reset Elasticsearch
docker-compose down
docker volume rm videogen_es_data
docker-compose up -d

Index Not Found

Symptom: index_not_found_exception

Solutions:

# Check indexes
curl http://localhost:9200/_cat/indices?v

# Create index
curl -X PUT http://localhost:9200/videos_dev \
  -H 'Content-Type: application/json' \
  -d @index-mapping.json

# Or use the service method
node -e "
const SearchService = require('./services/search/SearchService.js');
const service = new SearchService();
service.createIndex().then(() => console.log('Index created'));
"

Search Queries Slow

Symptom: Elasticsearch queries taking too long

Solutions:

# Check cluster health
curl http://localhost:9200/_cluster/health?pretty

# Profile slow queries
curl -X GET "http://localhost:9200/videos/_search?pretty" \
  -H 'Content-Type: application/json' \
  -d '{ "profile": true, "query": {...} }'

# Optimize index
curl -X POST "http://localhost:9200/videos/_forcemerge?max_num_segments=1"

# Increase refresh interval
curl -X PUT "http://localhost:9200/videos/_settings" \
  -H 'Content-Type: application/json' \
  -d '{ "index": { "refresh_interval": "30s" } }'

API Issues

401 Unauthorized

Symptom: Authentication failures

Solutions:

# Verify JWT token is valid
# Use jwt.io to decode token

# Check JWT_SECRET matches
echo $JWT_SECRET

# Generate new token
curl -X POST http://localhost:3000/api/v1/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"user@example.com","password":"password"}'

# Check token expiration
# Tokens expire after JWT_EXPIRY (default: 24h)

429 Too Many Requests

Symptom: Rate limit exceeded

Solutions:

# Check rate limit settings
# In .env:
RATE_LIMIT_WINDOW_MS=900000  # 15 minutes
RATE_LIMIT_MAX_REQUESTS=100

# Clear rate limit for user (Redis)
redis-cli DEL ratelimit:user123:/api/v1/generate

# Increase limits for development
RATE_LIMIT_MAX_REQUESTS=10000

500 Internal Server Error

Symptom: Unexpected server errors

Solutions:

# Check server logs
tail -f logs/error.log

# Enable debug logging
LOG_LEVEL=debug npm run dev

# Check Sentry for error details
# Or review CloudWatch logs in production

# Common causes:
# - Uncaught exceptions
# - Database connection issues
# - External API failures
# - Missing environment variables

Video Generation Issues

Generation Stuck in Processing

Symptom: Video generation never completes

Solutions:

# Check job status in Redis
redis-cli GET generation:job:JOB_ID

# Check BullMQ queue
redis-cli LRANGE bull:generation:active 0 -1

# Restart workers
docker restart videogen-workers

# Check provider API status
# Google Veo: https://status.google.com
# Runway: https://status.runway.ml
# Minimax: Check their status page

# Manually fail stuck job
redis-cli HSET generation:job:JOB_ID status failed

Provider API Errors

Symptom: AI provider returns errors

Solutions:

# Check API keys are valid
echo $GOOGLE_VEO_API_KEY
echo $RUNWAY_API_KEY
echo $MINIMAX_API_KEY

# Test API connectivity
curl -H "Authorization: Bearer $GOOGLE_VEO_API_KEY" \
  https://api.veo.google.com/v1/status

# Check rate limits
# Verify not exceeding provider limits

# Switch to different provider
# System automatically falls back if available

# Check provider-specific error codes
# Consult provider documentation

Video Download Failed

Symptom: Cannot download generated video

Solutions:

# Check S3 credentials
aws s3 ls s3://videogen-videos-dev/

# Verify S3 bucket exists
aws s3api head-bucket --bucket videogen-videos-dev

# Check CloudFront distribution
aws cloudfront get-distribution --id DISTRIBUTION_ID

# Test video URL directly
curl -I https://cdn.yourdomain.com/video.mp4

# Check CORS configuration
# Ensure S3 bucket allows your domain

Performance Issues

High Latency

Symptom: API responses are slow

Solutions:

Enable Caching:

// Add caching to expensive operations
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);

Optimize Database Queries:

-- Use EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT * FROM videos WHERE user_id = '123';

-- Add indexes
CREATE INDEX idx_videos_user_id ON videos(user_id);

Connection Pooling:

// Increase pool size
DATABASE_POOL_MAX=20

Enable Compression:

// Already enabled via compression middleware
// Verify in response headers: Content-Encoding: gzip

Memory Leaks

Symptom: Memory usage constantly increasing

Solutions:

# Monitor memory usage
node --inspect api/server.js
# Open chrome://inspect

# Take heap snapshots
curl http://localhost:9229/json/list

# Common causes:
# - Event listeners not removed
# - Large objects in memory
# - Unclosed connections

# Use weak references for caches
const cache = new WeakMap();

High CPU Usage

Symptom: CPU at 100%

Solutions:

# Profile CPU usage
node --prof api/server.js
node --prof-process isolate-*.log

# Common causes:
# - Inefficient loops
# - RegEx operations
# - JSON parsing large objects
# - Synchronous operations

# Use worker threads for CPU-intensive tasks
const { Worker } = require('worker_threads');

Deployment Issues

ECS Task Failing

Symptom: ECS tasks keep restarting

Solutions:

# Check task logs
aws logs tail /ecs/videogen-backend --follow

# Check task definition
aws ecs describe-tasks --cluster videogen --tasks TASK_ARN

# Common causes:
# - Health check failing
# - Insufficient memory
# - Environment variables missing
# - Container port mismatch

# Update task definition
aws ecs update-service --cluster videogen \
  --service videogen-backend \
  --force-new-deployment

Load Balancer 502/503

Symptom: Bad Gateway or Service Unavailable

Solutions:

# Check target health
aws elbv2 describe-target-health \
  --target-group-arn TARGET_GROUP_ARN

# Common causes:
# - Application not responding to health checks
# - Security group blocking traffic
# - Server taking too long to start

# Adjust health check settings
aws elbv2 modify-target-group \
  --target-group-arn TARGET_GROUP_ARN \
  --health-check-interval-seconds 30 \
  --healthy-threshold-count 2

Database Migration Failure in Production

Symptom: Migration fails on deployment

Solutions:

# Always test migrations in staging first
# Never run migrations directly in production

# Use migration-specific deployment
# 1. Deploy code without running migrations
# 2. Test manually in production
# 3. Run migrations separately
# 4. Monitor for errors

# Rollback procedure
npm run migrate:rollback
# Redeploy previous version

Monitoring & Debugging

Enable Debug Mode

# Local development
DEBUG=* npm run dev

# Production (temporary)
LOG_LEVEL=debug
# Remember to revert to 'info' after debugging

Check System Health

# Health endpoint
curl http://localhost:3000/health

# Database connection
curl http://localhost:3000/health/db

# Redis connection
curl http://localhost:3000/health/redis

# Elasticsearch connection
curl http://localhost:3000/health/search

Application Metrics

# Check CloudWatch metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ServiceName,Value=videogen-backend \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T23:59:59Z \
  --period 300 \
  --statistics Average

Getting Help

If you're still experiencing issues:

Check Logs: Always start with application logs
Search Issues: Check GitHub issues for similar problems
Ask Community: Post in discussions with:
- Error messages
- Logs
- Environment details
- Steps to reproduce
Create Issue: If it's a bug, create a detailed issue

Emergency Procedures

Production Outage

Immediate Actions:

# Check all services
aws ecs describe-services --cluster videogen

# Rollback to last known good version
aws ecs update-service --cluster videogen \
  --service videogen-backend \
  --task-definition videogen-backend:PREVIOUS_VERSION

Communication:
- Post status update
- Notify stakeholders
- Update status page
Investigation:
- Collect logs
- Check metrics
- Review recent changes

Data Loss Prevention

# Emergency database backup
pg_dump videogen_prod > emergency_backup_$(date +%Y%m%d_%H%M%S).sql

# Backup to S3
aws s3 cp emergency_backup.sql s3://videogen-backups/emergency/

FilesExpand file tree

TROUBLESHOOTING.md

Latest commit

History

TROUBLESHOOTING.md

File metadata and controls

Troubleshooting Guide

Table of Contents

Server Issues

Server Won't Start

ECONNREFUSED Errors

Module Not Found Errors

Database Issues

Connection Timeout

Migration Failures

Slow Queries

Redis Issues

Redis Connection Failed

Redis Memory Full

Slow Redis Operations

Elasticsearch Issues

Elasticsearch Not Starting

Index Not Found

Search Queries Slow

API Issues

401 Unauthorized

429 Too Many Requests

500 Internal Server Error

Video Generation Issues

Generation Stuck in Processing

Provider API Errors

Video Download Failed

Performance Issues

High Latency

Memory Leaks

High CPU Usage

Deployment Issues

ECS Task Failing

Load Balancer 502/503

Database Migration Failure in Production

Monitoring & Debugging

Enable Debug Mode

Check System Health

Application Metrics

Getting Help

Emergency Procedures

Production Outage

Data Loss Prevention