-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Summary
Build an automated scoring system that helps trustees prioritize grant applications by surfacing quality signals and red flags. Not a replacement for human judgment — a tool to help trustees focus their review time on the most promising applications.
Early Validation Results
Scored 181 labeled Chicago chapter applications (51 funded, 130 hidden):
| Metric | Value |
|---|---|
| Funded avg score | 0.717 |
| Hidden avg score | 0.277 |
| Funded > 0.7 | 88% |
| Hidden < 0.3 | 65% |
| False negatives (funded scored < 0.3) | 0 — zero |
The model has never scored a funded application below 0.3. It may over-flag some hidden applications (8% of hidden score > 0.7), but these are often genuinely good applications that were hidden for non-quality reasons.
Motivation
Chapters receive dozens to hundreds of applications per month. Most trustees volunteer their time and have limited bandwidth. A pre-screening score could:
- Flag obvious spam/off-topic submissions before trustees review them
- Surface high-quality applications that might otherwise get lost in the pile
- Provide structured feature breakdowns so trustees can quickly assess key dimensions
- Catch AI-generated mass submissions that game the application process
Relationship to Existing Spam Detection
Awesomebits already has two spam detection systems:
SpamChecker — Blocks on submission if all text fields are identical, plus regex blocklist via SPAM_REGEXP env var. Binary pass/fail.
SpamClassifier — Weighted behavioral scoring (threshold: 0.85) analyzing JavaScript metadata collected on the form: time on page, form interactions, paste-to-keystroke ratio, user agent, screen resolution, gibberish detection, and identical field detection. See also #574.
Both systems focus on bot detection and form-level spam — they analyze how the form was filled out (keyboard/mouse behavior, browser fingerprinting), not what was written. Signal Score is a complementary layer that analyzes application content quality. There's no overlap:
| Layer | What it analyzes | Purpose |
|---|---|---|
SpamChecker |
Field identity, blocklist | Block obvious bots on submit |
SpamClassifier |
Behavioral JS metadata | Detect automated/bot submissions |
| Signal Score | Application text content | Surface quality signals for trustees |
Signal Score would run asynchronously after submission (via batch API), not blocking the submission flow.
Approach: Trust Equation Framework
Inspired by Maister's Trust Equation (Trust = (Credibility + Reliability + Intimacy) / (1 + Self-Interest)), each application gets scored across dimensions that emerged from analyzing patterns in ~180 labeled Chicago chapter applications:
Core Dimensions (Trust Equation)
| Dimension | What it measures | Funded signal | Hidden signal |
|---|---|---|---|
| Credibility | Clear budget, realistic plan, relevant expertise | Detailed cost breakdowns, named vendors, demonstrates competence | Vague "operating costs," no plan, no evidence of ability to execute |
| Reliability | Track record, prior work, organizational backing | References to past events, partnerships, prior projects | No evidence of follow-through, first-time submissions with no context |
| Intimacy | Connection to cause/community, local ties | Named neighborhoods, specific orgs, personal anecdotes, aldermen | Generic location, no local knowledge, could be submitted to any chapter |
| Self-Interest (denominator — higher = worse) | Does money primarily benefit the applicant? | Materials/supplies/equipment for others | Living expenses, tuition, business startup, self-payment > 50% of budget |
Additional Signals
| Signal | Description |
|---|---|
| Budget Alignment | Can $1,000 meaningfully complete this, or is it a drop in the bucket toward a much larger need? |
| Catalytic Potential | Does $1K unlock something bigger — a prototype, proof-of-concept, career catalyst? |
| Creativity/Surprise | Novel/unique/fun vs generic — AF values "awesome" which often means quirky |
| Funding Orphan Score | Too weird/small/informal for traditional funders? Higher = more awesome |
| Agency Framing | "I will build" vs. "please help us" — citizen problem-solver language |
| Boring Detection | Not spam — just generic, uninspired applications that make up the bulk of the ~70% non-competitive pool |
| Community Benefit | Who actually benefits? Others directly, or the applicant primarily? |
| Community Creation | Builds new connections vs. serves existing community |
| AI Spam Likelihood | Mass-generated generic proposals — NOT penalizing AI-assisted genuine applications |
| Personal Voice | Authentic human voice vs templated corporate language — quirky details are positive |
| Category | Auto-classify into one of ~25 project categories for analysis |
| Has Images | Applications with images likely show higher effort/authenticity |
Key Findings from Pattern Analysis
Ran 3 independent batches of 40 labeled applications through qualitative analysis, then analyzed 39 AF YouTube videos (2011–2024) covering summits, grantee stories, chapter operations, and organizational philosophy. Consistent patterns:
-
Money destination is the User should not see chapter dropdown if they are only dean/admin of one Chapter #1 discriminator — Materials/supplies = funded. Self-payment/living expenses = hidden. If >50% of budget goes to applicant → almost always hidden.
-
Local specificity signals authenticity — Funded apps mention specific streets, organizations, aldermen. Hidden apps are geographically vague. Sub-city precision matters — "what ward?" not just "what city?"
-
Writing quality barely matters — Quirky, unpolished but authentic applications outperform polished corporate ones. Professional credentials actually correlate with rejection.
-
Scale fit is critical — Projects perfectly sized for $1,000 succeed. "Drop in the bucket" requests ($1K toward a $500K project) get filtered.
-
Community connector density — Funded apps average 3-5 named local partnerships. Hidden apps have 0-1.
-
Professional credentials hurt — Academic degrees, awards, institutional affiliations correlate with HIDDEN status. AF funds people, not organizations.
-
~28% quality ratio — Multiple chapters independently report about a quarter of applications are review-worthy regardless of volume. Signal Score's primary value is triaging the ~70% that aren't competitive.
-
Joy/whimsy is core identity — "The opposite of whimsy is boring, not serious." Creativity/surprise should be a core signal, not a nice-to-have.
-
Catalytic potential matters — Does $1K unlock something bigger? The grant is often "a cheap way to start a relationship" — validation matters more than the money.
Implementation Options
Option A: LLM-as-Judge (Recommended for MVP)
- Score each application via batch API call with few-shot examples and trust equation rubric
- Use structured JSON output with all dimensions
- Cost: ~$0.02/application with Haiku 4.5, ~$0.35 for 181 applications
- No training pipeline needed — works with the rubric + few-shot examples
Option B: Hybrid (Future)
- LLM feature extraction → lightweight classifier on structured features
- Enables chapter-specific tuning (different chapters weight dimensions differently)
- Embeddings for duplicate/similarity detection across chapters
- External verification passes for enrichment
Verification & Enrichment (Future Passes)
- URL validation — Fetch linked websites to verify organizations exist
- Google Maps/Places API — Verify mentioned addresses and businesses are real
- Image analysis — Applications with photos demonstrate higher effort/authenticity
- External link crawling — Navigate to URLs to verify claimed partnerships
Data Model
# On Project model
signal_score: float # 0.0–1.0 composite score
signal_features: jsonb # Trust equation dimensions + signals
signal_flags: text[] # ["spam", "ai_spam", "business_pitch", ...]
signal_category: string # Auto-classified project category
signal_scored_at: datetime
signal_model: string # Which model/version generated the scoreIntegration Points
- Score computed on application submission (async, via batch API)
- Results visible to trustees as a score badge + expandable feature breakdown
- Sortable/filterable in the application list view
- Complements existing
SpamChecker/SpamClassifier(behavioral bot detection) with content-level quality analysis
Open Questions
- Should scores be visible to all trustees or only chapter deans?
- Threshold for auto-hiding? (e.g., score < 0.1 = auto-hide with reason)
- How to handle chapter-specific criteria? (Libraries chapter cares about library relevance; Vegan chapter cares about vegan alignment)
- API key management for the LLM provider
- Should we store the raw LLM response for auditability?
- Privacy implications of sending application text to external LLM APIs?
References
- Existing
SpamCheckerinapp/extras/spam_checker.rb - Existing
SpamClassifierinapp/extras/spam_classifier.rb(see also Consolidate SpamChecker functionality into SpamClassifier #574) - Trust Equation: Maister, Green & Galford — "The Trusted Advisor"
- Prior art: Westminster 2024 embedding experiments (Colab notebooks)
- Discussed at Orlando 2026 summit — @alexkoppel proposed training on hidden applications + reasons
Sub-Issues
- Signal Score: Ruby scoring scripts and data pipeline #591 — Ruby scoring scripts and data pipeline
- Chore: Create read-only database role for Signal Score #592 — Read-only database role for live data access — Ruby scoring scripts and data pipeline