Inhuman Science

Automated AI/ML content curation and publishing pipeline. Aggregates papers, blog posts, and tweets from the AI world, evaluates them with LLMs, and publishes the best finds to Telegram and Twitter/X.

How It Works

The system runs four scheduled jobs (cron-based, timezone-aware):

Papers (10:00) — scrapes trending papers from AlphaXiv, scores them with an LLM oracle, downloads PDFs, extracts the most representative figure using a vision model, generates bilingual posts (Russian for Telegram, English for Twitter), and publishes.

Blogs (12:00) — fetches RSS feeds from 11 sources (OpenAI, Anthropic, Google Gemini, Google DeepMind, Meta AI, Microsoft Research, NVIDIA Tech, Amazon Science, IBM Research, Apple ML, PyTorch), scores and fact-checks each post, generates summaries, and publishes.

Twitter (14:00) — monitors 25 accounts: AI lab leaders (Demis Hassabis, Jeff Dean, Dario Amodei, etc.), top researchers (Yoshua Bengio, Fei-Fei Li, Noam Brown, Jan Leike), open-source leads (Thomas Wolf, Clément Delangue, Soumith Chintala), and official accounts (OpenAI, DeepMind, Anthropic, Meta AI, Mistral, Cohere, ByteDance OSS). Scores tweets, generates Russian summaries for Telegram, retweets on Twitter.

Backup (03:00) — daily SQLite dump to backups/ with Telegram delivery. Keeps last 7 days.

Content Filtering

The oracle is tuned for pure science and engineering only. It publishes new models, architectures, agents, benchmarks, open-source releases, and infrastructure breakthroughs. It rejects politics, business deals, gossip, marketing, and think-pieces.

Cross-Source Deduplication

When the same news goes viral across multiple sources, a batch dedup step groups all scored candidates by topic in a single LLM call and keeps only the highest-scored item per group. This runs against both the current batch and the last 5 days of published content.

Architecture

Sources                 Processing              Publishing
┌─────────────┐        ┌──────────────┐        ┌───────────┐
│  AlphaXiv   │───┐    │  Oracle      │        │ Telegram  │
│  (papers)   │   │    │  (scoring,   │   ┌───▶│ (RU post) │
├─────────────┤   │    │  fact-check, │   │    ├───────────┤
│  RSS Feeds  │───┼───▶│  batch dedup)│───┤    │ Twitter   │
│  (11 blogs) │   │    ├──────────────┤   └───▶│ (EN post) │
├─────────────┤   │    │  Processors  │        └───────────┘
│  Twitter    │───┘    │  (PDF, image,│              │
│  (25 accts) │        │  post gen)   │              ▼
└─────────────┘        └──────────────┘        ┌───────────┐
                                               │  SQLite   │
                                               │  (state)  │
                                               └───────────┘

Project Structure

├── main.py                 # Entry point, scheduler, pipeline orchestration
├── config.py               # Configuration and environment variables
├── Makefile                # Convenience commands (deploy, logs, backup, etc.)
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
│
├── sources/
│   ├── base.py             # ContentItem dataclass
│   ├── alphaxiv.py         # AlphaXiv trending papers scraper
│   ├── blogs.py            # RSS feed parser
│   └── twitter_feed.py     # Twitter API v2 feed reader
│
├── oracle/
│   └── oracle.py           # LLM scoring, fact-checking, batch deduplication
│
├── processors/
│   ├── pdf.py              # PDF download and text extraction
│   ├── images.py           # Best figure extraction via vision model
│   └── post_generator.py   # Bilingual post generation (RU/EN)
│
├── publishers/
│   ├── telegram.py         # Telegram channel publisher
│   └── twitter.py          # Twitter/X publisher
│
├── storage/
│   └── state.py            # SQLite state tracking
│
└── llm/
    └── client.py           # OpenRouter API client

Setup

Prerequisites

Python 3.11+
API keys: OpenRouter, Telegram Bot, Twitter/X

Configuration

cp .env.example .env
# Fill in API keys

Required variables

Variable	Description
`OPENROUTER_API_KEY`	OpenRouter API key for LLM calls
`TELEGRAM_BOT_TOKEN`	Telegram bot token
`TELEGRAM_CHANNEL_ID`	Target Telegram channel ID
`TELEGRAM_ERROR_CHAT_ID`	Chat ID for error notifications
`TWITTER_API_KEY`	Twitter API key
`TWITTER_API_SECRET`	Twitter API secret
`TWITTER_ACCESS_TOKEN`	Twitter access token
`TWITTER_ACCESS_SECRET`	Twitter access secret

Optional variables

Variable	Default	Description
`SCHEDULE_PAPERS_CRON`	`0 10 * * *`	Papers pipeline schedule
`SCHEDULE_BLOGS_CRON`	`0 12 * * *`	Blogs pipeline schedule
`SCHEDULE_TWITTER_CRON`	`0 14 * * *`	Twitter pipeline schedule
`SCHEDULE_BACKUP_CRON`	`0 3 * * *`	DB backup schedule
`TWITTER_MONITOR_USERS`	25 accounts	Comma-separated Twitter usernames
`ORACLE_MIN_SCORE`	`7`	Minimum LLM score (1-10) to publish
`ORACLE_MAX_PAPERS_PER_RUN`	`5`	Max papers published per run
`ORACLE_MAX_BLOGS_PER_RUN`	`3`	Max blog posts published per run
`BACKUP_DIR`	`backups`	Directory for DB backups
`BACKUP_KEEP_DAYS`	`7`	Days to retain backups
`TIMEZONE`	`Europe/Moscow`	Timezone for scheduling

Usage

Docker (recommended)

make deploy     # Build and start
make logs       # Follow logs
make stop       # Stop
make restart    # Restart
make update     # git pull + rebuild
make backup     # Manual DB backup
make status     # Container status

Manual

pip install -r requirements.txt

python main.py              # Start scheduler (runs forever)
python main.py papers       # Single papers run
python main.py blogs        # Single blogs run
python main.py twitter      # Single twitter run
python main.py backup       # Manual backup
python main.py all          # All pipelines sequentially

LLM Models

All LLM calls go through OpenRouter. Models are configured in config.py:

Task	Model
Content scoring	`deepseek/deepseek-chat-v3-0324`
Fact-checking	`deepseek/deepseek-chat-v3-0324`
Post generation (RU & EN)	`anthropic/claude-sonnet-4.6`
Figure extraction (vision)	`google/gemini-2.5-flash`

Data Storage

SQLite database (state.db) with five tables:

posted_papers — published papers (arxiv ID, title, timestamp)
posted_blogs — published blog posts (URL, title, timestamp)
posted_tweets — published tweets (tweet URL, author, timestamp)
oracle_decisions — all scoring decisions with scores and reasoning
published_summaries — title + summary of all published content for cross-source dedup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inhuman Science

How It Works

Content Filtering

Cross-Source Deduplication

Architecture

Project Structure

Setup

Prerequisites

Configuration

Required variables

Optional variables

Usage

Docker (recommended)

Manual

LLM Models

Data Storage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backups		backups
llm		llm
oracle		oracle
processors		processors
publishers		publishers
sources		sources
storage		storage
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Inhuman Science

How It Works

Content Filtering

Cross-Source Deduplication

Architecture

Project Structure

Setup

Prerequisites

Configuration

Required variables

Optional variables

Usage

Docker (recommended)

Manual

LLM Models

Data Storage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages