Skip to content

samay58/sherlock-homes

Repository files navigation

Sherlock Homes

Apartment hunting is a nightmare. This makes it slightly less of one.

Sherlock Homes ranks listings against your criteria using NLP, geospatial signals, and OpenAI Vision. Because staring at 47 identical "sun-drenched" listings should not be a full-time job. Currently configured for NYC rentals (previously SF purchases).

Current Production Profile (March 1, 2026)

  • Deployment: Fly app sherlock-homes-nyc (https://sherlock-homes-nyc.fly.dev)
  • Active ingestion sources: zillow,streeteasy
  • Active criteria file in production: config/nyc_rental_criteria.yaml
  • StreetEasy low-count incident is resolved (runbook + verification in docs/OPERATIONS_FLY.md)

The Problem

Zillow tells you what exists. It does not tell you what is good. The north-facing "garden unit" with the $15k/year HOA and a fire station next door? Zillow will show it. We will not.

Sherlock Homes reviews 200+ signals per listing so you do not learn the obvious after a 40-minute drive.

Quick Start

# Start API
./run_local.sh

# Start frontend (separate terminal)
./run_frontend.sh

API: http://localhost:8000 Frontend: http://localhost:5173

Python 3.11/3.12 recommended. If uv is installed, ./run_local.sh will use it automatically.

Local data (SQLite DB, JSON exports) is kept under .local/ by default to avoid repo-root file sprawl. Existing legacy DBs at ./sherlock.db or ./homehog.db are still detected and used automatically.

What It Actually Does

NLP Scoring Reads descriptions like a suspicious buyer. Extracts 32+ keywords across categories: natural light, views, outdoor space, high ceilings, parking. Flags the bad stuff too. "Cozy" usually means small.

Visual Scoring OpenAI Vision looks at listing photos. Rates modernity, condition, brightness, staging, cleanliness. Catches water stains, tired fixtures, and the telltale signs of a flipper who watched too much HGTV.

Tranquility Score How close is this place to things that make noise? Freeways, busy streets, fire stations. No API calls. Just local SF data and geometry. Some of us have meetings.

Light Potential Estimates how much natural light you will actually get. Top floor, corner unit, south-facing equals good. North-facing basement equals lamps.

Why This Matched Every match includes explicit reasons (budget fit, neighborhood focus, recency, light, quiet) plus one tradeoff. No black-box scores.

Change Tracking Detects meaningful listing changes like price drops, status flips, and photo updates so you do not miss the quiet gems.

Vibe Presets

  • Light Chaser: For people who need sunlight to function.
  • Urban Professional: Walkability uber alles.
  • Deal Hunter: Watches for price drops like a hawk.

How It Works

  1. Ingestion: Scrapes Zillow + StreetEasy via ZenRows on a recurring scheduler (and on-demand via admin endpoint).
  2. Enrichment: NLP, geospatial, and visual scoring per listing.
  3. Matching: Weighted scoring against your preferences with soft and hard caps.
  4. Ranking: Top matches, with explanations of why.

Architecture

home-hog/
├── app/                    # FastAPI backend
│   ├── models/             # SQLAlchemy models
│   ├── services/
│   │   ├── nlp.py               # Keyword extraction
│   │   ├── advanced_matching.py # Scoring engine
│   │   ├── geospatial.py        # Tranquility calculations
│   │   └── visual_scoring.py    # OpenAI Vision
│   └── routes/             # API endpoints
├── frontend/               # Vite + React app
├── scripts/                # Data tools
├── run_local.sh            # Start API
├── run_frontend.sh         # Start frontend
└── nuke_db.sh              # Reset database

API Endpoints

Endpoint What it does
GET /matches/test-user Your ranked matches
GET /listings All listings, paginated
GET /listings/{id} Single listing
GET /listings/{id}/history Change history for a listing
GET /changes Recent listing changes
POST /admin/ingestion/run Force a data refresh
GET /ingestion/status Ingestion status
GET /ping Health check

Development

Burn it down and start over:

./nuke_db.sh && ./run_local.sh
python scripts/import_from_json.py

Run visual analysis:

python -m app.scripts.analyze_visual_scores

Production (Fly.io)

Production app: https://sherlock-homes-nyc.fly.dev

Use the canonical runbook:

  • docs/OPERATIONS_FLY.md for deploy, ingestion operations, validation, and rollback.

Configuration

Create .env.local:

DATABASE_URL=sqlite:///./.local/sherlock.db
ZENROWS_API_KEY=your_key
OPENAI_API_KEY=your_key
# Optional: fallback for text intelligence when OpenAI is rate-limited/unset
DEEPINFRA_API_KEY=your_key

# NYC rental scoring profile (current default for this project)
BUYER_CRITERIA_PATH=config/nyc_rental_criteria.yaml

# Optional StreetEasy runtime guardrails (defaults are safe)
STREETEASY_REQUEST_TIMEOUT_SECONDS=45
STREETEASY_REQUEST_RETRIES=1
STREETEASY_MAX_DETAIL_CALLS=80

Optional alerts (iMessage / email / SMS) are documented in docs/DEVELOPMENT.md. Production operations are documented in docs/OPERATIONS_FLY.md.

Tuning Kamya's Outdoor Preference

The current system intentionally prefers "a little bit or more" outdoor access without being brittle:

  • Baseline signals: nlp_signals.positive.outdoor
  • Stronger boosts for meaningful private space: outdoor_private, outdoor_premium
  • Soft penalties for weak/noisy signals: nlp_signals.negative.weak_outdoor

Edit these in config/nyc_rental_criteria.yaml to calibrate strictness without hard-disqualifying viable listings.

Stack

  • Backend: FastAPI, SQLAlchemy, Pydantic
  • Frontend: Vite, React 18, TypeScript, React Query
  • Database: SQLite local, PostgreSQL in Docker
  • Sources: Zillow (ZenRows), StreetEasy (ZenRows)
  • AI: OpenAI (vision + optional text intelligence), DeepInfra fallback

License: not specified (no LICENSE file in this repo).

About

SF real estate intelligence. Scores properties using NLP, geospatial analysis, and Claude Vision.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors