Soulcaster

A self-healing development loop system that automatically ingests bug reports from multiple sources, clusters them using AI, and generates code fixes that are opened as GitHub pull requests.

Flow: Reddit/Sentry/GitHub Issues → Clustered Issues → Human Triage Dashboard → One-Click Fix → GitHub PR

Overview

Soulcaster is an open-source feedback triage and automated fix generation system. It monitors multiple sources for bug reports and user feedback, intelligently clusters similar issues together, and uses AI to generate code fixes that can be reviewed and merged via GitHub PRs.

Key Features

Multi-Source Ingestion: Automatically collects feedback from:
- Reddit (via polling configured subreddits)
- Sentry (via webhooks)
- GitHub Issues (via manual sync or webhooks)
- Manual feedback submission
AI-Powered Clustering: Backend-owned async jobs embed feedback with Gemini and automatically deduplicate similar reports into clusters
Multi-Tenant Projects: Support for multiple projects per user with project-level isolation for feedback and clusters
Authentication & Authorization: GitHub OAuth integration via NextAuth
- Users sign in with their GitHub account
- PRs are created from the user's account (personalized attribution)
- Secure token management with encrypted session storage
- Future: GitHub App support for bot-based PRs
Automated Fix Generation: LLM-powered coding agent that:
- Analyzes clustered issues
- Selects relevant files to modify
- Generates code patches
- Opens GitHub pull requests from the authenticated user's account
- Runs in E2B sandboxes
Job Tracking: Monitor agent fix generation jobs with status updates, logs, and PR links
Web Dashboard: Next.js dashboard for:
- Reviewing clusters and feedback
- Managing multiple projects
- Triggering fixes with one click
- Configuring Reddit sources per project
- Viewing PRs and job status

Architecture

The system consists of three main components:

Backend (FastAPI): Python service handling ingestion, clustering, and agent orchestration
Dashboard (Next.js): Web interface for triage and management

Prerequisites

Python 3.12+
Node.js 20+
Redis instance (Upstash recommended for serverless)
PostgreSQL database (for dashboard authentication and project management only)
GitHub account with repository access
LLM API key (Gemini recommended for embeddings and fix generation)

Quick Start

1. Clone the Repository

git clone https://github.com/altock/soulcaster.git
cd soulcaster

2. Backend Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r backend/requirements.txt

# Copy environment variables
cp .env.example .env
# Edit .env with your configuration (see Configuration section)

3. Dashboard Setup

cd dashboard
npm install

# Setup Database
npx prisma migrate dev

# Copy environment variables
cp .env.example .env.local
# Edit .env.local with your configuration

4. Start Services

Backend (from project root):

uvicorn backend.main:app --reload

Reddit Poller (optional, from project root):

python -m backend.reddit_poller

Dashboard (from dashboard directory):

npm run dev

The dashboard will be available at http://localhost:3000 and the backend API at http://localhost:8000.

Configuration

See .env.example for all available environment variables. Key configuration includes:

Required Variables

Backend (.env):

UPSTASH_REDIS_REST_URL - Redis REST API URL (Upstash)
UPSTASH_REDIS_REST_TOKEN - Redis REST API token
GEMINI_API_KEY or GOOGLE_GENERATIVE_AI_API_KEY - Required for embeddings + backend-owned clustering
GITHUB_ID - GitHub OAuth app client ID
GITHUB_SECRET - GitHub OAuth app client secret
E2B_API_KEY - Required for E2B sandbox provisioning (default coding agent runner)
KILOCODE_TEMPLATE_NAME - E2B template name (e.g., kilo-sandbox-v-0-1-dev)

Dashboard (.env.local):

UPSTASH_REDIS_REST_URL - Same Redis credentials
UPSTASH_REDIS_REST_TOKEN - Same Redis token
GITHUB_ID - GitHub OAuth app client ID (same as backend)
GITHUB_SECRET - GitHub OAuth app client secret (same as backend)
NEXTAUTH_URL - Your app URL (e.g., http://localhost:3000)
NEXTAUTH_SECRET - Random secret for NextAuth (generate with openssl rand -base64 32)
DATABASE_URL - PostgreSQL connection string for NextAuth and projects
BACKEND_URL - Backend API URL (defaults to http://localhost:8000)
GEMINI_API_KEY / GOOGLE_GENERATIVE_AI_API_KEY - Only required if you opt into the deprecated dashboard-run clustering flow (see ENABLE_DASHBOARD_CLUSTERING)

How GitHub Authentication Works:

Users MUST sign in with GitHub OAuth (required for all environments - local and production)
GitHub redirects user to authorization page where they grant repo and read:user scopes
Access token is stored securely in encrypted NextAuth session
When user triggers a fix, their token is passed to the backend
PRs are created from the user's GitHub account (e.g., @username), not a bot
No fallback to personal access tokens - consistent behavior everywhere
Future: GitHub App support coming for bot-based PRs (soulcaster[bot])

Optional Variables

Backend:

REDDIT_SUBREDDITS - Comma-separated list of subreddits to monitor (e.g., "claudeai,programming")
REDDIT_SORTS - Listing sorts to pull (new, hot, top) - defaults to "new"
REDDIT_POLL_INTERVAL_SECONDS - How often to poll Reddit - defaults to 300
REDIS_URL or UPSTASH_REDIS_URL - Alternative Redis connection string (if not using REST API)

Dashboard:

GITHUB_OWNER - Default GitHub repository owner (for new issues)
GITHUB_REPO - Default GitHub repository name (for new issues)
ENABLE_DASHBOARD_CLUSTERING - (default false) temporarily re-enable deprecated dashboard-run clustering APIs for dev-only experiments; production setups should keep this unset so clustering remains backend-owned

Development

Running Tests

Backend:

pytest backend/tests -q --cov=backend

Dashboard:

npm run test --prefix dashboard

Code Style

Backend:

black backend && ruff backend

Dashboard:

npm run lint --prefix dashboard

Deployment

Backend Deployment (Sevalla, Railway, Render, etc.)

Platform Settings:

Build path: ./backend/
Start command: uvicorn main:app --host 0.0.0.0 --port ${PORT:-8080}
Health probe: GET / (recommended for readiness checks)

Environment Variables: Configure all required variables from the Configuration section above (Redis, Gemini API key, etc.).

Dashboard Deployment (Vercel recommended)

Vercel is recommended for the Next.js dashboard. Set the root directory to dashboard/ and configure all required environment variables.

Note: When deploying the backend with build path ./backend/, the working directory is already inside the backend folder, so use main:app (not backend.main:app) in the uvicorn command.

Project Structure

soulcaster/
├── backend/           # FastAPI backend service
│   ├── main.py        # API endpoints
│   ├── models.py      # Data models
│   ├── store.py       # Redis storage layer
│   ├── reddit_poller.py  # Reddit polling service
│   └── tests/         # Backend tests
├── dashboard/         # Next.js dashboard
│   ├── app/           # Next.js app router pages
│   ├── components/     # React components
│   ├── lib/           # Utility libraries
│   └── __tests__/     # Dashboard tests
└── documentation/     # Additional documentation
    └── operations/    # Runbooks, monitoring, and backups

How It Works

Setup: Create a project in the dashboard and configure your GitHub repository
Ingestion: The system monitors configured sources (Reddit, Sentry, GitHub) for new feedback
- Reddit: Background poller checks configured subreddits periodically
- GitHub: Manual sync or webhook integration
- Sentry: Webhook integration
- Manual: Direct submission via dashboard
Clustering: New feedback items are queued in the backend and clustered asynchronously:
- Feedback is embedded using Gemini and grouped via the backend's clustering runner
- POST /cluster-jobs kicks off a job, while /cluster-jobs* and /clustering/status expose job progress + pending counts
- Resulting clusters automatically deduplicate similar reports and refresh the dashboard view
Triage: Clusters are displayed in the dashboard with:
- AI-generated summaries and titles
- Feedback count and source breakdown
- Links to original feedback items
Fix Generation: When you click "Generate Fix":
- A job is created and tracked
- The coding agent (E2B sandbox) is triggered
- Agent analyzes the cluster context and generates code patches
- Creates a branch and opens a GitHub PR
- Job status updates with logs and PR link
Review: Review and merge the PR through GitHub as normal

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Contributing

Contributions are welcome! Please read the project guidelines and submit pull requests for any improvements.

Support

For issues, questions, or contributions, please open an issue on GitHub.

Operations

For operational guides, see:

Name		Name	Last commit message	Last commit date
Latest commit History 478 Commits
backend		backend
dashboard		dashboard
data		data
docs		docs
documentation		documentation
fixtures		fixtures
scripts		scripts
tasks		tasks
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CLUSTERING_DECISION_GUIDE.md		CLUSTERING_DECISION_GUIDE.md
CLUSTERING_IMPLEMENTATION.md		CLUSTERING_IMPLEMENTATION.md
CLUSTERING_INDEX.md		CLUSTERING_INDEX.md
CLUSTERING_RESEARCH.md		CLUSTERING_RESEARCH.md
CLUSTERING_SUMMARY.md		CLUSTERING_SUMMARY.md
CLUSTERING_VISUAL_GUIDE.md		CLUSTERING_VISUAL_GUIDE.md
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
LICENSE		LICENSE
MANUAL_CLUSTER_EDITING_RESEARCH.md		MANUAL_CLUSTER_EDITING_RESEARCH.md
OPTIMIZATION_CHECKLIST.md		OPTIMIZATION_CHECKLIST.md
OPTIMIZATION_INDEX.md		OPTIMIZATION_INDEX.md
OPTIMIZATION_QUICK_START.md		OPTIMIZATION_QUICK_START.md
README.md		README.md
README_CLUSTERING_RESEARCH.md		README_CLUSTERING_RESEARCH.md
UPSTASH_VECTOR_OPTIMIZATION_GUIDE.md		UPSTASH_VECTOR_OPTIMIZATION_GUIDE.md
VECTOR_OPTIMIZATION_SUMMARY.txt		VECTOR_OPTIMIZATION_SUMMARY.txt
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Soulcaster

Overview

Key Features

Architecture

Prerequisites

Quick Start

1. Clone the Repository

2. Backend Setup

3. Dashboard Setup

4. Start Services

Configuration

Required Variables

Optional Variables

Development

Running Tests

Code Style

Deployment

Backend Deployment (Sevalla, Railway, Render, etc.)

Dashboard Deployment (Vercel recommended)

Project Structure

How It Works

License

Contributing

Support

Operations

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

altock/soulcaster

Folders and files

Latest commit

History

Repository files navigation

Soulcaster

Overview

Key Features

Architecture

Prerequisites

Quick Start

1. Clone the Repository

2. Backend Setup

3. Dashboard Setup

4. Start Services

Configuration

Required Variables

Optional Variables

Development

Running Tests

Code Style

Deployment

Backend Deployment (Sevalla, Railway, Render, etc.)

Dashboard Deployment (Vercel recommended)

Project Structure

How It Works

License

Contributing

Support

Operations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages