Skip to content

tkilaker/kiln

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ Kiln

Kiln is a personal "news kiln" that automatically scrapes articles from Gasetten.se and stores them in a structured format for later consumption. Think of it as your personal article archive with RSS feed support.

🎯 Features

  • Automated Login: Automatically authenticates with Gasetten.se using your credentials
  • Smart Scraping: Uses headless browser automation (Rod) to handle JavaScript-rendered content
  • Real-Time Progress: Live scraping updates with progress bar and status messages via SSE
  • Instant UI Updates: Articles appear immediately as they're scrapedβ€”no refresh needed
  • Article Storage: Stores full article content (HTML + text) in PostgreSQL
  • Individual Article Management: Delete specific articles with confirmation dialog
  • Smart Sorting: Articles ordered by publication date (most recent first)
  • Web Interface: Clean, responsive UI built with HTMX and TailwindCSS
  • RSS Feed: Generate personal RSS feeds for consumption in podcast apps or readers
  • Session Persistence: Maintains login sessions between runs
  • Deduplication: Automatically skips articles that have already been scraped

🧩 Tech Stack

  • Backend: Go 1.23+
  • Web Framework: Chi (routing) + Templ (templates) + HTMX
  • Database: PostgreSQL 17
  • Scraper: Rod (headless browser automation)
  • Deployment: Docker Compose

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Go 1.23+ (for local development)
  • Gasetten.se account credentials

1. Clone and Configure

cd /Users/tim/dev/kiln

# Copy environment template
cp .env.example .env

# Edit .env with your credentials
nano .env

Update .env with your Gasetten credentials:

DATABASE_URL=postgres://postgres:postgres@db:5432/kiln?sslmode=disable
GASETTEN_USER=your_username
GASETTEN_PASS=your_password
PORT=8080
FEED_TITLE=My Personal Kiln Feed
FEED_DESCRIPTION=Articles from Gasetten
FEED_LINK=http://localhost:8080
FEED_AUTHOR=Your Name

2. Start with Docker

# Build and start containers
make docker-build

# Or manually:
docker-compose up -d --build

# View logs
make docker-logs

3. Access the Application

πŸ“– Usage

Scraping Articles

  1. Open http://localhost:8080 in your browser
  2. Click the "Scrape New Articles" button
  3. Watch real-time progress with:
    • Live status messages (logging in, extracting links, processing articles)
    • Progress bar showing completion percentage
    • Counter of new articles added
  4. Articles appear instantly in the list as they're scraped
  5. No need to refreshβ€”everything updates automatically!

Managing Articles

  • View Article: Click on any article card to see the full content
  • Delete Article: Click the trash icon in the top-right of any article card
  • Clear All: Use the "Clear All" button to remove all articles (requires confirmation)

RSS Feed

Access your personal RSS feed at:

http://localhost:8080/rss.xml

Add this URL to your favorite RSS reader or podcast app.

πŸ› οΈ Development

Local Development Setup

# Install dependencies
make deps

# Install templ CLI
make install-tools

# Generate templates
make templ

# Run locally (requires PostgreSQL running)
make run

Project Structure

kiln/
β”œβ”€β”€ cmd/kiln/              # Application entry point
β”‚   └── main.go
β”œβ”€β”€ internal/
β”‚   β”œβ”€β”€ config/           # Configuration management
β”‚   β”œβ”€β”€ database/         # Database models and queries
β”‚   β”œβ”€β”€ scraper/          # Rod-based web scraper
β”‚   β”œβ”€β”€ server/           # HTTP server and handlers
β”‚   └── feed/             # RSS feed generation
β”œβ”€β”€ migrations/           # SQL migrations
β”œβ”€β”€ docker-compose.yml    # Docker orchestration
β”œβ”€β”€ Dockerfile           # Application container
└── Makefile            # Development commands

Database Schema

CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  source TEXT NOT NULL DEFAULT 'gasetten',
  url TEXT UNIQUE NOT NULL,
  title TEXT,
  author TEXT,
  published_at TIMESTAMP,
  content_html TEXT,
  content_text TEXT,
  created_at TIMESTAMP NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Useful Commands

# Build the application
make build

# Run tests
make test

# Format code
make fmt

# Tidy dependencies
make tidy

# Open database shell
make db-shell

# Stop containers
make docker-down

# Clean build artifacts
make clean

πŸ”’ Security Notes

  • Never commit .env - it contains your credentials
  • Login sessions are stored in ~/.gasetten/sessions
  • All passwords are handled securely (never logged or exposed)
  • When deploying remotely, use HTTPS and secure environment variable management

πŸ› Troubleshooting

Scraper Issues

Problem: Login fails or articles aren't found

Solution: Gasetten's HTML structure may have changed. Update the selectors in:

  • internal/scraper/scraper.go (lines with page.MustElement())

Database Connection Issues

Problem: Can't connect to database

Solution:

# Check if database is running
docker-compose ps

# View database logs
docker-compose logs db

# Restart database
docker-compose restart db

Port Already in Use

Problem: Port 8080 is already in use

Solution: Change PORT in .env to another port (e.g., 8081)

πŸ—ΊοΈ Roadmap

Current: MVP βœ…

  • Automated login and scraping
  • PostgreSQL storage
  • Web UI with HTMX
  • RSS feed generation
  • Docker deployment
  • Real-time scraping progress with loading indicators
  • Server-Sent Events (SSE) for live updates
  • Articles appear in real-time as they're scraped
  • Individual article deletion with confirmation
  • Auto-updating UI (no manual refresh needed)
  • Date-based article sorting (descending order)

Future Stages

Stage 2: Audio Generation

  • Text-to-speech conversion (OpenAI TTS or ElevenLabs)
  • Audio file management
  • Podcast feed support

Stage 3: Multi-Source Support

  • Additional website scrapers
  • RSS feed aggregation
  • Source prioritization

Stage 4: AI Enhancement

  • Article summarization
  • Auto-tagging and categorization
  • Topic extraction

Stage 5: Mobile & Sync

  • Progressive Web App (PWA)
  • Mobile-friendly interface
  • Cloud sync options

πŸ“ License

This is a personal project. Use at your own discretion and respect Gasetten's terms of service.

🀝 Contributing

This is a personal tool, but suggestions and improvements are welcome! Open an issue or submit a pull request.

πŸ’‘ Tips

  1. Scraping Frequency: Start with manual scraping to avoid overwhelming the server
  2. RSS Readers: Works great with Feedly, Reeder, or Apple Podcasts (for future audio support)
  3. Customization: All HTML selectors can be adjusted in scraper.go if Gasetten changes their layout
  4. Backup: Database is stored in Docker volume db_data - back it up regularly if needed

πŸ“§ Support

For issues or questions, check the troubleshooting section or review the code comments.


Built with ❀️ using Go, Rod, Chi, and Templ

About

πŸ”₯ Personal news kiln - Automated article scraper for Gasetten.se with RSS feed support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors