Signal Score: AI-Assisted Application Pre-Screening

## Summary

Build an automated scoring system that helps trustees prioritize grant applications by surfacing quality signals and red flags. Not a replacement for human judgment — a tool to help trustees focus their review time on the most promising applications.

## Early Validation Results

Scored 181 labeled Chicago chapter applications (51 funded, 130 hidden):

| Metric | Value |
|--------|-------|
| Funded avg score | **0.717** |
| Hidden avg score | **0.277** |
| Funded > 0.7 | 88% |
| Hidden < 0.3 | 65% |
| **False negatives** (funded scored < 0.3) | **0 — zero** |

The model has never scored a funded application below 0.3. It may over-flag some hidden applications (8% of hidden score > 0.7), but these are often genuinely good applications that were hidden for non-quality reasons.

## Motivation

Chapters receive dozens to hundreds of applications per month. Most trustees volunteer their time and have limited bandwidth. A pre-screening score could:

- Flag obvious spam/off-topic submissions before trustees review them
- Surface high-quality applications that might otherwise get lost in the pile
- Provide structured feature breakdowns so trustees can quickly assess key dimensions
- Catch AI-generated mass submissions that game the application process

## Relationship to Existing Spam Detection

Awesomebits already has two spam detection systems:

**`SpamChecker`** — Blocks on submission if all text fields are identical, plus regex blocklist via `SPAM_REGEXP` env var. Binary pass/fail.

**`SpamClassifier`** — Weighted behavioral scoring (threshold: 0.85) analyzing JavaScript metadata collected on the form: time on page, form interactions, paste-to-keystroke ratio, user agent, screen resolution, gibberish detection, and identical field detection. See also #574.

Both systems focus on **bot detection and form-level spam** — they analyze *how* the form was filled out (keyboard/mouse behavior, browser fingerprinting), not *what* was written. Signal Score is a complementary layer that analyzes application **content quality**. There's no overlap:

| Layer | What it analyzes | Purpose |
|-------|-----------------|---------|
| `SpamChecker` | Field identity, blocklist | Block obvious bots on submit |
| `SpamClassifier` | Behavioral JS metadata | Detect automated/bot submissions |
| **Signal Score** | Application text content | Surface quality signals for trustees |

Signal Score would run asynchronously after submission (via batch API), not blocking the submission flow.

## Approach: Trust Equation Framework

Inspired by Maister's Trust Equation (`Trust = (Credibility + Reliability + Intimacy) / (1 + Self-Interest)`), each application gets scored across dimensions that emerged from analyzing patterns in ~180 labeled Chicago chapter applications:

### Core Dimensions (Trust Equation)

| Dimension | What it measures | Funded signal | Hidden signal |
|-----------|-----------------|---------------|---------------|
| **Credibility** | Clear budget, realistic plan, relevant expertise | Detailed cost breakdowns, named vendors, demonstrates competence | Vague "operating costs," no plan, no evidence of ability to execute |
| **Reliability** | Track record, prior work, organizational backing | References to past events, partnerships, prior projects | No evidence of follow-through, first-time submissions with no context |
| **Intimacy** | Connection to cause/community, local ties | Named neighborhoods, specific orgs, personal anecdotes, aldermen | Generic location, no local knowledge, could be submitted to any chapter |
| **Self-Interest** _(denominator — higher = worse)_ | Does money primarily benefit the applicant? | Materials/supplies/equipment for others | Living expenses, tuition, business startup, self-payment > 50% of budget |

### Additional Signals

| Signal | Description |
|--------|-------------|
| **Budget Alignment** | Can $1,000 meaningfully complete this, or is it a drop in the bucket toward a much larger need? |
| **Catalytic Potential** | Does $1K unlock something bigger — a prototype, proof-of-concept, career catalyst? |
| **Creativity/Surprise** | Novel/unique/fun vs generic — AF values "awesome" which often means quirky |
| **Funding Orphan Score** | Too weird/small/informal for traditional funders? Higher = more awesome |
| **Agency Framing** | "I will build" vs. "please help us" — citizen problem-solver language |
| **Boring Detection** | Not spam — just generic, uninspired applications that make up the bulk of the ~70% non-competitive pool |
| **Community Benefit** | Who actually benefits? Others directly, or the applicant primarily? |
| **Community Creation** | Builds new connections vs. serves existing community |
| **AI Spam Likelihood** | Mass-generated generic proposals — NOT penalizing AI-assisted genuine applications |
| **Personal Voice** | Authentic human voice vs templated corporate language — quirky details are positive |
| **Category** | Auto-classify into one of ~25 project categories for analysis |
| **Has Images** | Applications with images likely show higher effort/authenticity |

## Key Findings from Pattern Analysis

Ran 3 independent batches of 40 labeled applications through qualitative analysis, then analyzed 39 AF YouTube videos (2011–2024) covering summits, grantee stories, chapter operations, and organizational philosophy. Consistent patterns:

1. **Money destination is the #1 discriminator** — Materials/supplies = funded. Self-payment/living expenses = hidden. If >50% of budget goes to applicant → almost always hidden.

2. **Local specificity signals authenticity** — Funded apps mention specific streets, organizations, aldermen. Hidden apps are geographically vague. Sub-city precision matters — "what ward?" not just "what city?"

3. **Writing quality barely matters** — Quirky, unpolished but authentic applications outperform polished corporate ones. Professional credentials actually correlate with rejection.

4. **Scale fit is critical** — Projects perfectly sized for $1,000 succeed. "Drop in the bucket" requests ($1K toward a $500K project) get filtered.

5. **Community connector density** — Funded apps average 3-5 named local partnerships. Hidden apps have 0-1.

6. **Professional credentials hurt** — Academic degrees, awards, institutional affiliations correlate with HIDDEN status. AF funds people, not organizations.

7. **~28% quality ratio** — Multiple chapters independently report about a quarter of applications are review-worthy regardless of volume. Signal Score's primary value is triaging the ~70% that aren't competitive.

8. **Joy/whimsy is core identity** — "The opposite of whimsy is boring, not serious." Creativity/surprise should be a core signal, not a nice-to-have.

9. **Catalytic potential matters** — Does $1K unlock something bigger? The grant is often "a cheap way to start a relationship" — validation matters more than the money.

## Implementation Options

### Option A: LLM-as-Judge (Recommended for MVP)

- Score each application via batch API call with few-shot examples and trust equation rubric
- Use structured JSON output with all dimensions
- Cost: ~$0.02/application with Haiku 4.5, ~$0.35 for 181 applications
- No training pipeline needed — works with the rubric + few-shot examples

### Option B: Hybrid (Future)

- LLM feature extraction → lightweight classifier on structured features
- Enables chapter-specific tuning (different chapters weight dimensions differently)
- Embeddings for duplicate/similarity detection across chapters
- External verification passes for enrichment

## Verification & Enrichment (Future Passes)

- **URL validation** — Fetch linked websites to verify organizations exist
- **Google Maps/Places API** — Verify mentioned addresses and businesses are real
- **Image analysis** — Applications with photos demonstrate higher effort/authenticity
- **External link crawling** — Navigate to URLs to verify claimed partnerships

## Data Model

```ruby
# On Project model
signal_score: float         # 0.0–1.0 composite score
signal_features: jsonb      # Trust equation dimensions + signals
signal_flags: text[]        # ["spam", "ai_spam", "business_pitch", ...]
signal_category: string     # Auto-classified project category
signal_scored_at: datetime
signal_model: string        # Which model/version generated the score
```

## Integration Points

- Score computed on application submission (async, via batch API)
- Results visible to trustees as a score badge + expandable feature breakdown
- Sortable/filterable in the application list view
- Complements existing `SpamChecker`/`SpamClassifier` (behavioral bot detection) with content-level quality analysis

## Open Questions

- Should scores be visible to all trustees or only chapter deans?
- Threshold for auto-hiding? (e.g., score < 0.1 = auto-hide with reason)
- How to handle chapter-specific criteria? (Libraries chapter cares about library relevance; Vegan chapter cares about vegan alignment)
- API key management for the LLM provider
- Should we store the raw LLM response for auditability?
- Privacy implications of sending application text to external LLM APIs?

## References

- Existing `SpamChecker` in `app/extras/spam_checker.rb`
- Existing `SpamClassifier` in `app/extras/spam_classifier.rb` (see also #574)
- Trust Equation: Maister, Green & Galford — "The Trusted Advisor"
- Prior art: Westminster 2024 embedding experiments (Colab notebooks)
- Discussed at Orlando 2026 summit — @alexkoppel proposed training on hidden applications + reasons


## Sub-Issues

- [ ] #591 — Ruby scoring scripts and data pipeline
- [ ] #592 — Read-only database role for live data access — Ruby scoring scripts and data pipeline



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signal Score: AI-Assisted Application Pre-Screening #590

Summary

Early Validation Results

Motivation

Relationship to Existing Spam Detection

Approach: Trust Equation Framework

Core Dimensions (Trust Equation)

Additional Signals

Key Findings from Pattern Analysis

Implementation Options

Option A: LLM-as-Judge (Recommended for MVP)

Option B: Hybrid (Future)

Verification & Enrichment (Future Passes)

Data Model

Integration Points

Open Questions

References

Sub-Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value
Funded avg score	0.717
Hidden avg score	0.277
Funded > 0.7	88%
Hidden < 0.3	65%
False negatives (funded scored < 0.3)	0 — zero

Layer	What it analyzes	Purpose
`SpamChecker`	Field identity, blocklist	Block obvious bots on submit
`SpamClassifier`	Behavioral JS metadata	Detect automated/bot submissions
Signal Score	Application text content	Surface quality signals for trustees

Dimension	What it measures	Funded signal	Hidden signal
Credibility	Clear budget, realistic plan, relevant expertise	Detailed cost breakdowns, named vendors, demonstrates competence	Vague "operating costs," no plan, no evidence of ability to execute
Reliability	Track record, prior work, organizational backing	References to past events, partnerships, prior projects	No evidence of follow-through, first-time submissions with no context
Intimacy	Connection to cause/community, local ties	Named neighborhoods, specific orgs, personal anecdotes, aldermen	Generic location, no local knowledge, could be submitted to any chapter
Self-Interest (denominator — higher = worse)	Does money primarily benefit the applicant?	Materials/supplies/equipment for others	Living expenses, tuition, business startup, self-payment > 50% of budget

Signal	Description
Budget Alignment	Can $1,000 meaningfully complete this, or is it a drop in the bucket toward a much larger need?
Catalytic Potential	Does $1K unlock something bigger — a prototype, proof-of-concept, career catalyst?
Creativity/Surprise	Novel/unique/fun vs generic — AF values "awesome" which often means quirky
Funding Orphan Score	Too weird/small/informal for traditional funders? Higher = more awesome
Agency Framing	"I will build" vs. "please help us" — citizen problem-solver language
Boring Detection	Not spam — just generic, uninspired applications that make up the bulk of the ~70% non-competitive pool
Community Benefit	Who actually benefits? Others directly, or the applicant primarily?
Community Creation	Builds new connections vs. serves existing community
AI Spam Likelihood	Mass-generated generic proposals — NOT penalizing AI-assisted genuine applications
Personal Voice	Authentic human voice vs templated corporate language — quirky details are positive
Category	Auto-classify into one of ~25 project categories for analysis
Has Images	Applications with images likely show higher effort/authenticity

Signal Score: AI-Assisted Application Pre-Screening #590

Description

Summary

Early Validation Results

Motivation

Relationship to Existing Spam Detection

Approach: Trust Equation Framework

Core Dimensions (Trust Equation)

Additional Signals

Key Findings from Pattern Analysis

Implementation Options

Option A: LLM-as-Judge (Recommended for MVP)

Option B: Hybrid (Future)

Verification & Enrichment (Future Passes)

Data Model

Integration Points

Open Questions

References

Sub-Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions