Skip to content

Conversation

@wolframs
Copy link

@wolframs wolframs commented Feb 2, 2026

Summary

Adds spamDetection.js middleware that detects and blocks duplicate content submissions. This addresses the most impactful item from #76 -- content hashing to catch identical spam.

What it does

  • Same-agent duplicate detection (24h window): If an agent posts identical content (normalized for case/whitespace) within 24 hours, the submission is rejected with 429.
  • Cross-agent duplicate comments (1h window): If a different agent posts an identical comment within 1 hour, it is rejected. This catches coordinated bot farms posting the same template across threads.
  • Cross-agent posts are allowed: Different agents may legitimately post similar content to different submolts.
  • Short content skipped: Content under 20 characters is exempt (emoji reactions, short replies).

What it catches today

Observed on the live platform:

  • FinallyOffline (karma: -53,688) carpet-bombing every thread with identical promotional templates linking to finallyoffline.com
  • Editor-in-Chief posting identical text to FinallyOffline (likely same operator)
  • ClawdBot farm (ClawdBotSeventh through ClawdBotEleventh) posting identical mint JSON payloads

Implementation

  • New file: src/middleware/spamDetection.js
  • Uses in-memory Map with periodic cleanup (same pattern as rateLimit.js)
  • SHA-256 hash of normalized content (lowercase, collapsed whitespace)
  • Wired into POST /posts and POST /posts/:id/comments after existing rate limiters
  • Uses existing RateLimitError class for consistent error responses

Tests

6 new tests added to test/api.test.js:

  • Hash normalization (case, whitespace)
  • Hash uniqueness for different content
  • First submission allowed
  • Same-agent duplicate blocked
  • Cross-agent identical comments blocked
  • Cross-agent posts allowed
  • Short content exemption

All 21 tests pass (15 existing + 6 new).

What this does not cover (future work from #76)

  • Near-duplicate detection (fuzzy hashing / simhash for minor variations)
  • Pattern detection for crypto addresses, mint payloads, scam phrases
  • Report endpoint (POST /posts/id/report)
  • Reputation gating for new accounts

Refs #76

Adds spamDetection.js middleware that blocks identical content submissions:
- Same agent posting identical content within 24h -> blocked
- Different agents posting identical comments within 1h -> blocked
  (catches coordinated bot farms like FinallyOffline/-53k karma)
- Cross-agent posts allowed (legitimate cross-posting to submolts)
- Content < 20 chars skipped (greetings, emoji reactions)

Uses SHA-256 hash of normalized content (lowercase, collapsed whitespace)
with in-memory storage, following the same pattern as rateLimit.js.

Wired into POST /posts and POST /posts/:id/comments routes,
after existing rate limiters.

Refs moltbook#76

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@rel770 rel770 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Duplicate Content Detection Middleware

Great work on addressing spam! This is exactly the kind of infrastructure the platform needs.

Strengths:

  • ✅ SHA-256 hashing with normalization (case, whitespace) - solid approach
  • ✅ Different windows for same-agent (24h) vs cross-agent (1h) - good balance
  • ✅ Short content exemption (<20 chars) - prevents false positives on reactions
  • ✅ Tests included (6 new tests)
  • ✅ Uses existing error classes for consistency

Suggestions:

  1. Consider adding a configurable threshold for the minimum content length (currently hardcoded at 20)
  2. The periodic cleanup interval should be documented in config
  3. Consider logging blocked duplicates for moderation review

Security Note:
The cross-agent duplicate detection for comments is smart - catches bot farms posting identical templates. Nice catch on the real-world examples (FinallyOffline, ClawdBot farm).

Human-AI Review Note:
This review was conducted by copilotariel (Claude Opus 4.5) in collaboration with human partner Ariel. We've been reviewing moltbook PRs as part of our open source contribution effort.

Looking forward to seeing this merged! 🦞

— copilotariel (github.com/copilotariel/humanai-community)

…gging

- Make minContentLength configurable via config/env (default: 20)
- Add all spam detection config to config/index.js with documentation
- Log blocked duplicates with [spam] prefix for moderation visibility
- Export _MIN_CONTENT_LENGTH for testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wolframs
Copy link
Author

wolframs commented Feb 3, 2026

@rel770 Thanks for that, we addressed the suggestions in our commit 09789fc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants