Common issues and solutions for VideoGen Messenger.
- Server Issues
- Database Issues
- Redis Issues
- Elasticsearch Issues
- API Issues
- Video Generation Issues
- Performance Issues
- Deployment Issues
Symptom: Server crashes immediately or won't start
Common Causes:
- Port already in use
- Missing environment variables
- Database connection failure
- Node version mismatch
Solutions:
# Check if port is in use
lsof -i :3000
# Kill process if needed
kill -9 <PID>
# Verify environment variables
cat .env
# Ensure all required vars are set
# Check Node version
node --version
# Should be 18.0.0 or higher
# Check logs for detailed error
npm run dev 2>&1 | tee server.logSymptom: Connection refused errors
Solution:
# Verify services are running
docker ps
# Restart services
docker-compose restart
# Check service health
curl http://localhost:5432 # PostgreSQL
curl http://localhost:6379 # Redis
curl http://localhost:9200 # ElasticsearchSymptom: Cannot find module errors
Solution:
# Clear node_modules and reinstall
rm -rf node_modules package-lock.json
npm install
# Clear npm cache
npm cache clean --forceSymptom: Database connection timeouts
Solutions:
# Check database is running
docker ps | grep postgres
# Test connection
psql postgresql://postgres:postgres@localhost:5432/videogen_dev
# Check connection pool settings
# In .env:
DATABASE_POOL_MAX=10
DATABASE_POOL_MIN=2
# Check for hanging connections
SELECT pid, usename, application_name, state, query
FROM pg_stat_activity
WHERE datname = 'videogen_dev';
# Kill hanging connections if needed
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'videogen_dev' AND pid <> pg_backend_pid();Symptom: Database migrations fail
Solutions:
# Check migration status
npm run migrate:status
# Rollback failed migration
npm run migrate:rollback
# Run migrations with verbose logging
NODE_ENV=development npm run migrate
# Manually run SQL if needed
psql videogen_dev < migrations/001_initial.sqlSymptom: Database queries taking too long
Solutions:
-- Enable query logging
ALTER DATABASE videogen_dev SET log_min_duration_statement = 100;
-- Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Add missing indexes
CREATE INDEX idx_videos_created_at ON videos(created_at DESC);
CREATE INDEX idx_videos_user_status ON videos(user_id, status);
-- Analyze table statistics
ANALYZE videos;
VACUUM ANALYZE videos;Symptom: Cannot connect to Redis
Solutions:
# Check Redis is running
docker ps | grep redis
# Test connection
redis-cli ping
# Should return "PONG"
# Check Redis logs
docker logs videogen-redis
# Restart Redis
docker restart videogen-redis
# Check Redis configuration
redis-cli CONFIG GET maxmemory
redis-cli CONFIG GET maxmemory-policySymptom: OOM errors from Redis
Solutions:
# Check memory usage
redis-cli INFO memory
# Set eviction policy
redis-cli CONFIG SET maxmemory-policy allkeys-lru
# Increase memory limit
redis-cli CONFIG SET maxmemory 2gb
# Clear cache if needed
redis-cli FLUSHDBSymptom: Redis commands timing out
Solutions:
# Check slow log
redis-cli SLOWLOG GET 10
# Monitor commands in real-time
redis-cli MONITOR
# Check for large keys
redis-cli --bigkeys
# Use pipeline for bulk operations
# Instead of multiple SET commands, use MSETSymptom: Elasticsearch fails to start
Solutions:
# Check logs
docker logs videogen-elasticsearch
# Increase memory
# In docker-compose.yml:
ES_JAVA_OPTS=-Xms1g -Xmx1g
# Check disk space
df -h
# Reset Elasticsearch
docker-compose down
docker volume rm videogen_es_data
docker-compose up -dSymptom: index_not_found_exception
Solutions:
# Check indexes
curl http://localhost:9200/_cat/indices?v
# Create index
curl -X PUT http://localhost:9200/videos_dev \
-H 'Content-Type: application/json' \
-d @index-mapping.json
# Or use the service method
node -e "
const SearchService = require('./services/search/SearchService.js');
const service = new SearchService();
service.createIndex().then(() => console.log('Index created'));
"Symptom: Elasticsearch queries taking too long
Solutions:
# Check cluster health
curl http://localhost:9200/_cluster/health?pretty
# Profile slow queries
curl -X GET "http://localhost:9200/videos/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{ "profile": true, "query": {...} }'
# Optimize index
curl -X POST "http://localhost:9200/videos/_forcemerge?max_num_segments=1"
# Increase refresh interval
curl -X PUT "http://localhost:9200/videos/_settings" \
-H 'Content-Type: application/json' \
-d '{ "index": { "refresh_interval": "30s" } }'Symptom: Authentication failures
Solutions:
# Verify JWT token is valid
# Use jwt.io to decode token
# Check JWT_SECRET matches
echo $JWT_SECRET
# Generate new token
curl -X POST http://localhost:3000/api/v1/auth/login \
-H 'Content-Type: application/json' \
-d '{"email":"user@example.com","password":"password"}'
# Check token expiration
# Tokens expire after JWT_EXPIRY (default: 24h)Symptom: Rate limit exceeded
Solutions:
# Check rate limit settings
# In .env:
RATE_LIMIT_WINDOW_MS=900000 # 15 minutes
RATE_LIMIT_MAX_REQUESTS=100
# Clear rate limit for user (Redis)
redis-cli DEL ratelimit:user123:/api/v1/generate
# Increase limits for development
RATE_LIMIT_MAX_REQUESTS=10000Symptom: Unexpected server errors
Solutions:
# Check server logs
tail -f logs/error.log
# Enable debug logging
LOG_LEVEL=debug npm run dev
# Check Sentry for error details
# Or review CloudWatch logs in production
# Common causes:
# - Uncaught exceptions
# - Database connection issues
# - External API failures
# - Missing environment variablesSymptom: Video generation never completes
Solutions:
# Check job status in Redis
redis-cli GET generation:job:JOB_ID
# Check BullMQ queue
redis-cli LRANGE bull:generation:active 0 -1
# Restart workers
docker restart videogen-workers
# Check provider API status
# Google Veo: https://status.google.com
# Runway: https://status.runway.ml
# Minimax: Check their status page
# Manually fail stuck job
redis-cli HSET generation:job:JOB_ID status failedSymptom: AI provider returns errors
Solutions:
# Check API keys are valid
echo $GOOGLE_VEO_API_KEY
echo $RUNWAY_API_KEY
echo $MINIMAX_API_KEY
# Test API connectivity
curl -H "Authorization: Bearer $GOOGLE_VEO_API_KEY" \
https://api.veo.google.com/v1/status
# Check rate limits
# Verify not exceeding provider limits
# Switch to different provider
# System automatically falls back if available
# Check provider-specific error codes
# Consult provider documentationSymptom: Cannot download generated video
Solutions:
# Check S3 credentials
aws s3 ls s3://videogen-videos-dev/
# Verify S3 bucket exists
aws s3api head-bucket --bucket videogen-videos-dev
# Check CloudFront distribution
aws cloudfront get-distribution --id DISTRIBUTION_ID
# Test video URL directly
curl -I https://cdn.yourdomain.com/video.mp4
# Check CORS configuration
# Ensure S3 bucket allows your domainSymptom: API responses are slow
Solutions:
-
Enable Caching:
// Add caching to expensive operations const cached = await redis.get(cacheKey); if (cached) return JSON.parse(cached);
-
Optimize Database Queries:
-- Use EXPLAIN ANALYZE EXPLAIN ANALYZE SELECT * FROM videos WHERE user_id = '123'; -- Add indexes CREATE INDEX idx_videos_user_id ON videos(user_id);
-
Connection Pooling:
// Increase pool size DATABASE_POOL_MAX=20
-
Enable Compression:
// Already enabled via compression middleware // Verify in response headers: Content-Encoding: gzip
Symptom: Memory usage constantly increasing
Solutions:
# Monitor memory usage
node --inspect api/server.js
# Open chrome://inspect
# Take heap snapshots
curl http://localhost:9229/json/list
# Common causes:
# - Event listeners not removed
# - Large objects in memory
# - Unclosed connections
# Use weak references for caches
const cache = new WeakMap();Symptom: CPU at 100%
Solutions:
# Profile CPU usage
node --prof api/server.js
node --prof-process isolate-*.log
# Common causes:
# - Inefficient loops
# - RegEx operations
# - JSON parsing large objects
# - Synchronous operations
# Use worker threads for CPU-intensive tasks
const { Worker } = require('worker_threads');Symptom: ECS tasks keep restarting
Solutions:
# Check task logs
aws logs tail /ecs/videogen-backend --follow
# Check task definition
aws ecs describe-tasks --cluster videogen --tasks TASK_ARN
# Common causes:
# - Health check failing
# - Insufficient memory
# - Environment variables missing
# - Container port mismatch
# Update task definition
aws ecs update-service --cluster videogen \
--service videogen-backend \
--force-new-deploymentSymptom: Bad Gateway or Service Unavailable
Solutions:
# Check target health
aws elbv2 describe-target-health \
--target-group-arn TARGET_GROUP_ARN
# Common causes:
# - Application not responding to health checks
# - Security group blocking traffic
# - Server taking too long to start
# Adjust health check settings
aws elbv2 modify-target-group \
--target-group-arn TARGET_GROUP_ARN \
--health-check-interval-seconds 30 \
--healthy-threshold-count 2Symptom: Migration fails on deployment
Solutions:
# Always test migrations in staging first
# Never run migrations directly in production
# Use migration-specific deployment
# 1. Deploy code without running migrations
# 2. Test manually in production
# 3. Run migrations separately
# 4. Monitor for errors
# Rollback procedure
npm run migrate:rollback
# Redeploy previous version# Local development
DEBUG=* npm run dev
# Production (temporary)
LOG_LEVEL=debug
# Remember to revert to 'info' after debugging# Health endpoint
curl http://localhost:3000/health
# Database connection
curl http://localhost:3000/health/db
# Redis connection
curl http://localhost:3000/health/redis
# Elasticsearch connection
curl http://localhost:3000/health/search# Check CloudWatch metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ServiceName,Value=videogen-backend \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T23:59:59Z \
--period 300 \
--statistics AverageIf you're still experiencing issues:
- Check Logs: Always start with application logs
- Search Issues: Check GitHub issues for similar problems
- Ask Community: Post in discussions with:
- Error messages
- Logs
- Environment details
- Steps to reproduce
- Create Issue: If it's a bug, create a detailed issue
-
Immediate Actions:
# Check all services aws ecs describe-services --cluster videogen # Rollback to last known good version aws ecs update-service --cluster videogen \ --service videogen-backend \ --task-definition videogen-backend:PREVIOUS_VERSION
-
Communication:
- Post status update
- Notify stakeholders
- Update status page
-
Investigation:
- Collect logs
- Check metrics
- Review recent changes
# Emergency database backup
pg_dump videogen_prod > emergency_backup_$(date +%Y%m%d_%H%M%S).sql
# Backup to S3
aws s3 cp emergency_backup.sql s3://videogen-backups/emergency/