feat: duplicate content detection middleware (spam mitigation) #79

wolframs · 2026-02-02T16:31:11Z

Summary

Adds spamDetection.js middleware that detects and blocks duplicate content submissions. This addresses the most impactful item from #76 -- content hashing to catch identical spam.

What it does

Same-agent duplicate detection (24h window): If an agent posts identical content (normalized for case/whitespace) within 24 hours, the submission is rejected with 429.
Cross-agent duplicate comments (1h window): If a different agent posts an identical comment within 1 hour, it is rejected. This catches coordinated bot farms posting the same template across threads.
Cross-agent posts are allowed: Different agents may legitimately post similar content to different submolts.
Short content skipped: Content under 20 characters is exempt (emoji reactions, short replies).

What it catches today

Observed on the live platform:

FinallyOffline (karma: -53,688) carpet-bombing every thread with identical promotional templates linking to finallyoffline.com
Editor-in-Chief posting identical text to FinallyOffline (likely same operator)
ClawdBot farm (ClawdBotSeventh through ClawdBotEleventh) posting identical mint JSON payloads

Implementation

New file: src/middleware/spamDetection.js
Uses in-memory Map with periodic cleanup (same pattern as rateLimit.js)
SHA-256 hash of normalized content (lowercase, collapsed whitespace)
Wired into POST /posts and POST /posts/:id/comments after existing rate limiters
Uses existing RateLimitError class for consistent error responses

Tests

6 new tests added to test/api.test.js:

Hash normalization (case, whitespace)
Hash uniqueness for different content
First submission allowed
Same-agent duplicate blocked
Cross-agent identical comments blocked
Cross-agent posts allowed
Short content exemption

All 21 tests pass (15 existing + 6 new).

What this does not cover (future work from #76)

Near-duplicate detection (fuzzy hashing / simhash for minor variations)
Pattern detection for crypto addresses, mint payloads, scam phrases
Report endpoint (POST /posts/id/report)
Reputation gating for new accounts

Refs #76

Adds spamDetection.js middleware that blocks identical content submissions: - Same agent posting identical content within 24h -> blocked - Different agents posting identical comments within 1h -> blocked (catches coordinated bot farms like FinallyOffline/-53k karma) - Cross-agent posts allowed (legitimate cross-posting to submolts) - Content < 20 chars skipped (greetings, emoji reactions) Uses SHA-256 hash of normalized content (lowercase, collapsed whitespace) with in-memory storage, following the same pattern as rateLimit.js. Wired into POST /posts and POST /posts/:id/comments routes, after existing rate limiters. Refs moltbook#76 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

rel770

Review: Duplicate Content Detection Middleware

Great work on addressing spam! This is exactly the kind of infrastructure the platform needs.

Strengths:

✅ SHA-256 hashing with normalization (case, whitespace) - solid approach
✅ Different windows for same-agent (24h) vs cross-agent (1h) - good balance
✅ Short content exemption (<20 chars) - prevents false positives on reactions
✅ Tests included (6 new tests)
✅ Uses existing error classes for consistency

Suggestions:

Consider adding a configurable threshold for the minimum content length (currently hardcoded at 20)
The periodic cleanup interval should be documented in config
Consider logging blocked duplicates for moderation review

Security Note:
The cross-agent duplicate detection for comments is smart - catches bot farms posting identical templates. Nice catch on the real-world examples (FinallyOffline, ClawdBot farm).

Human-AI Review Note:
This review was conducted by copilotariel (Claude Opus 4.5) in collaboration with human partner Ariel. We've been reviewing moltbook PRs as part of our open source contribution effort.

Looking forward to seeing this merged! 🦞

— copilotariel (github.com/copilotariel/humanai-community)

…gging - Make minContentLength configurable via config/env (default: 20) - Add all spam detection config to config/index.js with documentation - Log blocked duplicates with [spam] prefix for moderation visibility - Export _MIN_CONTENT_LENGTH for testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wolframs · 2026-02-03T21:27:33Z

@rel770 Thanks for that, we addressed the suggestions in our commit 09789fc

wolframs mentioned this pull request Feb 2, 2026

[Feature] Spam mitigation: report API, rate limits, and detection patterns #76

Open

rel770 reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: duplicate content detection middleware (spam mitigation) #79

feat: duplicate content detection middleware (spam mitigation) #79

wolframs commented Feb 2, 2026

Uh oh!

rel770 left a comment

Uh oh!

wolframs commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: duplicate content detection middleware (spam mitigation) #79

Are you sure you want to change the base?

feat: duplicate content detection middleware (spam mitigation) #79

Conversation

wolframs commented Feb 2, 2026

Summary

What it does

What it catches today

Implementation

Tests

What this does not cover (future work from #76)

Uh oh!

rel770 left a comment

Choose a reason for hiding this comment

Review: Duplicate Content Detection Middleware

Uh oh!

wolframs commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants