Skip to content

Signal Score: AI-Assisted Application Pre-Screening #590

@divideby0

Description

@divideby0

Summary

Build an automated scoring system that helps trustees prioritize grant applications by surfacing quality signals and red flags. Not a replacement for human judgment — a tool to help trustees focus their review time on the most promising applications.

Early Validation Results

Scored 181 labeled Chicago chapter applications (51 funded, 130 hidden):

Metric Value
Funded avg score 0.717
Hidden avg score 0.277
Funded > 0.7 88%
Hidden < 0.3 65%
False negatives (funded scored < 0.3) 0 — zero

The model has never scored a funded application below 0.3. It may over-flag some hidden applications (8% of hidden score > 0.7), but these are often genuinely good applications that were hidden for non-quality reasons.

Motivation

Chapters receive dozens to hundreds of applications per month. Most trustees volunteer their time and have limited bandwidth. A pre-screening score could:

  • Flag obvious spam/off-topic submissions before trustees review them
  • Surface high-quality applications that might otherwise get lost in the pile
  • Provide structured feature breakdowns so trustees can quickly assess key dimensions
  • Catch AI-generated mass submissions that game the application process

Relationship to Existing Spam Detection

Awesomebits already has two spam detection systems:

SpamChecker — Blocks on submission if all text fields are identical, plus regex blocklist via SPAM_REGEXP env var. Binary pass/fail.

SpamClassifier — Weighted behavioral scoring (threshold: 0.85) analyzing JavaScript metadata collected on the form: time on page, form interactions, paste-to-keystroke ratio, user agent, screen resolution, gibberish detection, and identical field detection. See also #574.

Both systems focus on bot detection and form-level spam — they analyze how the form was filled out (keyboard/mouse behavior, browser fingerprinting), not what was written. Signal Score is a complementary layer that analyzes application content quality. There's no overlap:

Layer What it analyzes Purpose
SpamChecker Field identity, blocklist Block obvious bots on submit
SpamClassifier Behavioral JS metadata Detect automated/bot submissions
Signal Score Application text content Surface quality signals for trustees

Signal Score would run asynchronously after submission (via batch API), not blocking the submission flow.

Approach: Trust Equation Framework

Inspired by Maister's Trust Equation (Trust = (Credibility + Reliability + Intimacy) / (1 + Self-Interest)), each application gets scored across dimensions that emerged from analyzing patterns in ~180 labeled Chicago chapter applications:

Core Dimensions (Trust Equation)

Dimension What it measures Funded signal Hidden signal
Credibility Clear budget, realistic plan, relevant expertise Detailed cost breakdowns, named vendors, demonstrates competence Vague "operating costs," no plan, no evidence of ability to execute
Reliability Track record, prior work, organizational backing References to past events, partnerships, prior projects No evidence of follow-through, first-time submissions with no context
Intimacy Connection to cause/community, local ties Named neighborhoods, specific orgs, personal anecdotes, aldermen Generic location, no local knowledge, could be submitted to any chapter
Self-Interest (denominator — higher = worse) Does money primarily benefit the applicant? Materials/supplies/equipment for others Living expenses, tuition, business startup, self-payment > 50% of budget

Additional Signals

Signal Description
Budget Alignment Can $1,000 meaningfully complete this, or is it a drop in the bucket toward a much larger need?
Catalytic Potential Does $1K unlock something bigger — a prototype, proof-of-concept, career catalyst?
Creativity/Surprise Novel/unique/fun vs generic — AF values "awesome" which often means quirky
Funding Orphan Score Too weird/small/informal for traditional funders? Higher = more awesome
Agency Framing "I will build" vs. "please help us" — citizen problem-solver language
Boring Detection Not spam — just generic, uninspired applications that make up the bulk of the ~70% non-competitive pool
Community Benefit Who actually benefits? Others directly, or the applicant primarily?
Community Creation Builds new connections vs. serves existing community
AI Spam Likelihood Mass-generated generic proposals — NOT penalizing AI-assisted genuine applications
Personal Voice Authentic human voice vs templated corporate language — quirky details are positive
Category Auto-classify into one of ~25 project categories for analysis
Has Images Applications with images likely show higher effort/authenticity

Key Findings from Pattern Analysis

Ran 3 independent batches of 40 labeled applications through qualitative analysis, then analyzed 39 AF YouTube videos (2011–2024) covering summits, grantee stories, chapter operations, and organizational philosophy. Consistent patterns:

  1. Money destination is the User should not see chapter dropdown if they are only dean/admin of one Chapter #1 discriminator — Materials/supplies = funded. Self-payment/living expenses = hidden. If >50% of budget goes to applicant → almost always hidden.

  2. Local specificity signals authenticity — Funded apps mention specific streets, organizations, aldermen. Hidden apps are geographically vague. Sub-city precision matters — "what ward?" not just "what city?"

  3. Writing quality barely matters — Quirky, unpolished but authentic applications outperform polished corporate ones. Professional credentials actually correlate with rejection.

  4. Scale fit is critical — Projects perfectly sized for $1,000 succeed. "Drop in the bucket" requests ($1K toward a $500K project) get filtered.

  5. Community connector density — Funded apps average 3-5 named local partnerships. Hidden apps have 0-1.

  6. Professional credentials hurt — Academic degrees, awards, institutional affiliations correlate with HIDDEN status. AF funds people, not organizations.

  7. ~28% quality ratio — Multiple chapters independently report about a quarter of applications are review-worthy regardless of volume. Signal Score's primary value is triaging the ~70% that aren't competitive.

  8. Joy/whimsy is core identity — "The opposite of whimsy is boring, not serious." Creativity/surprise should be a core signal, not a nice-to-have.

  9. Catalytic potential matters — Does $1K unlock something bigger? The grant is often "a cheap way to start a relationship" — validation matters more than the money.

Implementation Options

Option A: LLM-as-Judge (Recommended for MVP)

  • Score each application via batch API call with few-shot examples and trust equation rubric
  • Use structured JSON output with all dimensions
  • Cost: ~$0.02/application with Haiku 4.5, ~$0.35 for 181 applications
  • No training pipeline needed — works with the rubric + few-shot examples

Option B: Hybrid (Future)

  • LLM feature extraction → lightweight classifier on structured features
  • Enables chapter-specific tuning (different chapters weight dimensions differently)
  • Embeddings for duplicate/similarity detection across chapters
  • External verification passes for enrichment

Verification & Enrichment (Future Passes)

  • URL validation — Fetch linked websites to verify organizations exist
  • Google Maps/Places API — Verify mentioned addresses and businesses are real
  • Image analysis — Applications with photos demonstrate higher effort/authenticity
  • External link crawling — Navigate to URLs to verify claimed partnerships

Data Model

# On Project model
signal_score: float         # 0.0–1.0 composite score
signal_features: jsonb      # Trust equation dimensions + signals
signal_flags: text[]        # ["spam", "ai_spam", "business_pitch", ...]
signal_category: string     # Auto-classified project category
signal_scored_at: datetime
signal_model: string        # Which model/version generated the score

Integration Points

  • Score computed on application submission (async, via batch API)
  • Results visible to trustees as a score badge + expandable feature breakdown
  • Sortable/filterable in the application list view
  • Complements existing SpamChecker/SpamClassifier (behavioral bot detection) with content-level quality analysis

Open Questions

  • Should scores be visible to all trustees or only chapter deans?
  • Threshold for auto-hiding? (e.g., score < 0.1 = auto-hide with reason)
  • How to handle chapter-specific criteria? (Libraries chapter cares about library relevance; Vegan chapter cares about vegan alignment)
  • API key management for the LLM provider
  • Should we store the raw LLM response for auditability?
  • Privacy implications of sending application text to external LLM APIs?

References

  • Existing SpamChecker in app/extras/spam_checker.rb
  • Existing SpamClassifier in app/extras/spam_classifier.rb (see also Consolidate SpamChecker functionality into SpamClassifier #574)
  • Trust Equation: Maister, Green & Galford — "The Trusted Advisor"
  • Prior art: Westminster 2024 embedding experiments (Colab notebooks)
  • Discussed at Orlando 2026 summit — @alexkoppel proposed training on hidden applications + reasons

Sub-Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions