An intelligent agent-driven pipeline to automate the rewriting and publication of public-domain books with human-in-the-loop editing and content versioning.
-
🌐 Web Scraping
Scrapes book chapters from online sources (e.g., Wikisource) using Playwright. -
🧠 AI Content Generation
Uses a multi-agent architecture (Writer & Reviewer) powered by LLMs (like Google Gemini) to rewrite and refine chapters. -
🧑💻 Human-in-the-Loop Editing
Optional manual feedback after AI rewriting allows human writers/editors to make changes before finalizing. -
📚 Content Versioning
Final outputs are saved into ChromaDB with automatic version labels (v1, v2, ...), UUIDs, and metadata. -
🔍 RL-inspired Retrieval
Query past versions using TF-IDF + cosine similarity (stubbed for future reinforcement learning-based retrieval).
auto_book_pub/
├── main.py # Entry point
├── README.md
├── LICENSE
├── requirements.txt
├── .env
├── .gitignore
├── config/
│ └── settings.yaml
├── data/
│ ├── raw/ # Raw scraped HTML + screenshots
│ ├── processed/ # AI-edited versions
│ └── versions/ # Finalized, versioned outputs
├── scraping/
│ └── scraper.py # Playwright-based scraper
├── ai_agents/
│ ├── writer_agent.py # AI "spinner"
│ ├── reviewer_agent.py # Reviewer LLM
│ └── editor_agent.py # Optional human/AI edit flow
├── human_loop/
│ └── feedback_manager.py # Handle user input iterations
├── versioning/
│ └── chromadb_manager.py # Store/retrieve versioned content
├── rl_search/
│ └── retriever.py # Reinforcement Learning-based retriever
└── utils/
└── helpers.py # Common utilities
- Scrape Content
raw_text = scrape_chapter(target_url)- AI Rewriting + Review
rewritten = WriterAgent().spin(raw_text)
reviewed = ReviewerAgent().review(rewritten)- Human Feedback (Optional)
final_version = collect_feedback(reviewed)- Save to ChromaDB
save_version(final_version)- Retrieve Similar Versions
results = retrieve_version("version-number")-Python 3.10+
-Google Gemini API Key (optional)
-Playwright dependencies
git clone https://github.com/yourusername/auto_book_pub
cd auto_book_pubpip install -r requirements.txtplaywright installCreate a .env file:
GEMINI_API_KEY=your_google_gemini_keyTo avoid re-scraping each time, enable dev mode in main.py:
dev_mode = True
if dev_mode:
raw_text = load_cached_text()
else:
raw_text = scrape_chapter(url)MIT License. See LICENSE for details.