-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Problem Description
The OAuth authentication flow currently stores CSRF state tokens in server-local memory, preventing successful OAuth flows when users are load-balanced across multiple servers.
Current Implementation
- Location:
src/oauth.rslines 26, 39, 136, 149, 247, 261 - Structure: Each OAuth provider (GitHub, Google, Microsoft) maintains its own in-memory state store:
pub state_store: Arc<RwLock<HashMap<String, OAuthState>>>,- OAuth State Contents:
- CSRF token for security
- User context (project_id, redirect URLs)
- Timestamp for expiration
Impact on Multi-Server Deployment
Without Changes (Sticky Sessions)
- ✅ Will work if user stays on same server throughout OAuth flow
- ❌ Will fail if:
- Server crashes/restarts during OAuth flow → User must restart OAuth login
- Load balancer reassigns user mid-flow → User must restart OAuth login
- Sticky session expires during OAuth → User must restart OAuth login
User Experience Impact
HIGH SEVERITY - Users will need to completely restart the OAuth login process if they hit a different server, leading to:
- Confusion ("Why did my login fail?")
- Potential abandonment of signup/login
- Support tickets about "broken" login
Proposed Solutions
Option 1: Redis-based State Store
Implementation:
use redis::{Client, Commands, RedisResult};
pub struct RedisOAuthStateStore {
client: Client,
ttl_seconds: u64, // e.g., 600 for 10 minutes
}
impl RedisOAuthStateStore {
pub async fn store_state(&self, state_token: String, oauth_state: OAuthState) -> Result<(), Error> {
let mut conn = self.client.get_connection()?;
let serialized = serde_json::to_string(&oauth_state)?;
conn.set_ex(&state_token, serialized, self.ttl_seconds)?;
Ok(())
}
pub async fn get_state(&self, state_token: &str) -> Result<Option<OAuthState>, Error> {
let mut conn = self.client.get_connection()?;
let value: Option<String> = conn.get(state_token)?;
match value {
Some(v) => Ok(Some(serde_json::from_str(&v)?)),
None => Ok(None),
}
}
pub async fn remove_state(&self, state_token: &str) -> Result<(), Error> {
let mut conn = self.client.get_connection()?;
conn.del(state_token)?;
Ok(())
}
}Pros:
- ✅ Fast O(1) lookups
- ✅ Built-in TTL support (auto-cleanup)
- ✅ Atomic operations
- ✅ Minimal latency impact
Cons:
- ❌ Additional infrastructure dependency
- ❌ Redis clustering needed for HA
- ❌ Memory cost for Redis instance
Dependencies:
redis = { version = "0.25", features = ["tokio-comp", "connection-manager"] }Option 2: PostgreSQL-based State Store
Implementation:
-- Migration: Create oauth_states table
CREATE TABLE oauth_states (
state_token VARCHAR(255) PRIMARY KEY,
oauth_state JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
expires_at TIMESTAMP WITH TIME ZONE NOT NULL
);
-- Index for cleanup queries
CREATE INDEX idx_oauth_states_expires_at ON oauth_states(expires_at);use diesel::prelude::*;
use chrono::{DateTime, Utc, Duration};
#[derive(Insertable)]
#[diesel(table_name = oauth_states)]
pub struct NewOAuthState {
pub state_token: String,
pub oauth_state: serde_json::Value,
pub expires_at: DateTime<Utc>,
}
impl OAuthStateStore for PostgresStateStore {
async fn store_state(&self, state_token: String, oauth_state: OAuthState) -> Result<(), Error> {
let expires_at = Utc::now() + Duration::minutes(10);
let new_state = NewOAuthState {
state_token,
oauth_state: serde_json::to_value(&oauth_state)?,
expires_at,
};
diesel::insert_into(oauth_states::table)
.values(&new_state)
.execute(&self.conn)?;
Ok(())
}
async fn get_state(&self, state_token: &str) -> Result<Option<OAuthState>, Error> {
let result = oauth_states::table
.filter(oauth_states::state_token.eq(state_token))
.filter(oauth_states::expires_at.gt(Utc::now()))
.select(oauth_states::oauth_state)
.first::<serde_json::Value>(&self.conn)
.optional()?;
match result {
Some(json) => Ok(Some(serde_json::from_value(json)?)),
None => Ok(None),
}
}
}Cleanup Task:
// Run periodically (e.g., every hour)
async fn cleanup_expired_states(db: &PgConnection) -> Result<usize, Error> {
diesel::delete(oauth_states::table)
.filter(oauth_states::expires_at.lt(Utc::now()))
.execute(db)
}Pros:
- ✅ No additional infrastructure
- ✅ Transactional consistency
- ✅ Can join with user tables if needed
- ✅ Existing backup/recovery procedures apply
Cons:
- ❌ Slightly higher latency than Redis
- ❌ Requires periodic cleanup task
- ❌ Additional database load
Implementation Steps
- Create abstraction trait:
#[async_trait]
pub trait OAuthStateStore: Send + Sync {
async fn store_state(&self, state_token: String, oauth_state: OAuthState) -> Result<(), Error>;
async fn get_state(&self, state_token: &str) -> Result<Option<OAuthState>, Error>;
async fn remove_state(&self, state_token: &str) -> Result<(), Error>;
}-
Update OAuth providers to use injected state store instead of local HashMap
-
Add configuration for Redis/PostgreSQL selection
-
Test scenarios:
- OAuth flow across server restarts
- Concurrent OAuth flows
- State expiration handling
- Performance under load
Recommendation
For production: Use Redis if you already have Redis infrastructure, otherwise PostgreSQL is simpler to maintain.
Reasoning:
- OAuth state is ephemeral (10-minute lifetime)
- Redis provides better performance for this use case
- PostgreSQL is acceptable if avoiding new dependencies
Testing Plan
- Unit tests: Mock state store implementations
- Integration tests: Test with real Redis/PostgreSQL
- Load testing: Simulate 1000+ concurrent OAuth flows
- Chaos testing: Kill servers mid-OAuth flow
Migration Strategy
- Deploy with feature flag for new state store
- Test with small percentage of traffic
- Monitor error rates and latency
- Gradually increase traffic percentage
- Remove in-memory implementation
Monitoring
Add metrics for:
- OAuth state store latency (p50, p95, p99)
- OAuth flow success rate
- State store errors
- Expired state cleanup rate
Related Issues
- #[TBD] - Externalize session states
- #[TBD] - Externalize ephemeral keys
Metadata
Metadata
Assignees
Labels
No labels