Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
8acaccf
Add Job Search OS layer — v1 + v2 complete
nothingbhatnet Mar 2, 2026
90cacc1
Fix reference hygiene and add system guardrails
nothingbhatnet Mar 2, 2026
ce400a7
Fix clone URL to correct GitHub username (dbhat93)
nothingbhatnet Mar 2, 2026
2b58a45
Add Dhiraj Bhat to LICENSE copyright
nothingbhatnet Mar 2, 2026
50d34b6
Add Phase 1 + Phase 2A: reference files and new commands
nothingbhatnet Mar 2, 2026
e348eba
Phase 2B: merge salary.md → comp.md with full depth from upstream
nothingbhatnet Mar 5, 2026
486c718
Phase 3: add resume.md + wire challenge-protocol resume invocation
nothingbhatnet Mar 5, 2026
2e40b63
Fix resume.md: rename concern labels to eliminate frameable-gap colli…
nothingbhatnet Mar 5, 2026
aded8a6
Phase 4: merge analyze.md with upstream additions
nothingbhatnet Mar 5, 2026
809dad4
Ultrathink pressure test: fix 7 cross-system issues
nothingbhatnet Mar 5, 2026
5cf5a06
Phase 5: merge prep, mock, stories, practice with upstream additions
nothingbhatnet Mar 7, 2026
5b7839f
Phase 6: add linkedin.md, pitch.md + wire into system
nothingbhatnet Mar 7, 2026
211a0b2
Phase 7: Merge Noam v3 (28 enhancements across 23 commands)
nothingbhatnet Mar 7, 2026
a7b2e7f
Fix 8 defects found in Phase 7 post-merge audit
nothingbhatnet Mar 7, 2026
6d59e83
Add strategy command — search-level meta-layer
nothingbhatnet Mar 7, 2026
3c25936
Add sync module, 48hr debrief protocol, and seniority calibration exa…
nothingbhatnet Mar 10, 2026
687be2a
Add velocity metrics to progress command
nothingbhatnet Mar 10, 2026
f4d267f
Add map command, gap pipeline, and velocity metric fixes
nothingbhatnet Mar 10, 2026
ba7ca5e
Security hardening: prompt injection guards, data privacy, retention
nothingbhatnet Mar 10, 2026
1cabdc9
Add Data and Privacy section to README for forkers
nothingbhatnet Mar 10, 2026
b514e5a
Update docs: add map command, v3.1 version entry, cross-cutting note
nothingbhatnet Mar 10, 2026
048cf2a
Exclude state/ directory from git (personal pipeline data)
nothingbhatnet Mar 10, 2026
f5032df
Add personal skill setup option to README Quick Start
nothingbhatnet Mar 11, 2026
b902f9e
Add SKILL.md for npx skills add one-liner install
nothingbhatnet Mar 11, 2026
b2484dd
Promote install one-liner to top of README
nothingbhatnet Mar 11, 2026
cc9a4d6
v3.2: LinkedIn network intelligence, outreach cadence tracking, async…
nothingbhatnet Mar 13, 2026
594bf7c
Add v3.2 eval test suite: 19 test cases + mock fixtures
nothingbhatnet Mar 17, 2026
e737a33
v3.2.1: Hardening pass — 12 fixes from eval audit
nothingbhatnet Mar 17, 2026
95e800a
v3.2.2: Full codebase audit — 16 fixes across product/security/usabil…
nothingbhatnet Mar 17, 2026
a091c93
Add Research Rigor Module for business case prep (#1)
dbhat93 Mar 21, 2026
8286905
prep: add Step 0 — ground truth before external research
nothingbhatnet Mar 22, 2026
03c8f91
Merge pull request #2 from dbhat93/ground-truth-step-0
dbhat93 Mar 22, 2026
1a54afe
coach: add mandatory date verification rule
nothingbhatnet Mar 23, 2026
2339755
Add round command — post-interview compound workflow (v3.3)
nothingbhatnet Mar 25, 2026
3e9eb9a
Fix stale debrief references after round command addition (v3.3 cleanup)
nothingbhatnet Mar 25, 2026
7c9245a
fix: gitignore .claude/ so npx skills can detect SKILL.md
nothingbhatnet Mar 26, 2026
4d377b8
fix: quote SKILL.md description to fix YAML parsing
nothingbhatnet Mar 26, 2026
eb91e9d
revert: restore .claude/settings.json — not the cause of npx skills f…
nothingbhatnet Mar 26, 2026
b13d405
Add External Communication Style Module — no em dashes in external co…
nothingbhatnet Mar 26, 2026
afcb728
v3.4: Relationship intelligence, Minutes integration, narrative consi…
nothingbhatnet Apr 1, 2026
9baec07
Sanitize test fixtures: remove leaked paths, real names, stale URLs
nothingbhatnet Apr 1, 2026
693837b
Add ship dates to v3.1-v3.4 version entries
nothingbhatnet Apr 1, 2026
4d59602
v3.5: Archetype detection, STAR+Reflection, Proof Bank
nothingbhatnet Apr 5, 2026
2d5bb67
QA fixes: sync wiring, Proof Bank consumption, em dashes, version bump
nothingbhatnet Apr 5, 2026
fd020b4
Merge Lenny expert frameworks + achievement-doc pattern + pipeline cl…
dbhat93 Apr 7, 2026
6f8c2bf
v4: Writing Quality Gate + Voice Enforcement
dbhat93 Apr 7, 2026
f1eabda
QA fixes for v4: step numbering, stale module reference, repo structure
dbhat93 Apr 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
# OS
.DS_Store

# Personal coaching data — contains stories, scores, interview targets, and concerns
# Personal coaching data — contains sensitive data: storybank, scores, interview targets,
# compensation strategy, recruiter feedback, transcript excerpts, and personal reflections.
# Never commit this file. See README "Data and Privacy" section for details.
coaching_state.md

# Active skill file (created by copying SKILL.md during setup)
CLAUDE.md

# Personal pipeline tracking — contains company names, stages, and interview outcomes
state/

# Auto-generated by skills ecosystem (npx skills add) — machine-specific
.agents/
.claude/skills/
skills-lock.json
667 changes: 667 additions & 0 deletions COACH.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
MIT License

Copyright (c) 2026 Noam Segal
Copyright (c) 2024 Noam Segal
Copyright (c) 2026 Dhiraj Bhat

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
463 changes: 378 additions & 85 deletions README.md

Large diffs are not rendered by default.

350 changes: 12 additions & 338 deletions SKILL.md

Large diffs are not rendered by default.

397 changes: 397 additions & 0 deletions VERSIONS.md

Large diffs are not rendered by default.

228 changes: 228 additions & 0 deletions references/calibration-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Calibration Engine

Centralizes calibration logic for scoring accuracy, root cause tracking, and learning from outcomes. Referenced by `progress`, `analyze`, `feedback`, and `practice`.

---

## Section 1: Calibration Metadata Schema

The Calibration State section in `coaching_state.md` tracks:

```markdown
## Calibration State

### Calibration Status
- Current calibration: [uncalibrated / calibrating / calibrated / miscalibrated]
- Last calibration check: [date]
- Data points available: [N] real interviews with outcomes

### Scoring Drift Log
| Date | Dimension | Direction | Evidence | Adjustment |

### Calibration Adjustments
| Date | Trigger | What Changed | Rationale |

### Cross-Dimension Root Causes (active)
| Root Cause | Affected Dimensions | First Detected | Status | Treatment |

### Unmeasured Factor Investigations
| Date | Trigger | Hypothesis | Investigation | Finding | Action |
```

**Status definitions:**
- **Uncalibrated**: Fewer than 3 real interview outcomes. Not enough data for calibration. The system operates normally but cannot verify its own accuracy.
- **Calibrating**: 3+ outcomes exist, first calibration check in progress. System is building its accuracy model.
- **Calibrated**: Calibration check completed, practice scores reasonably predict outcomes (correlation exists). System confidence is high.
- **Miscalibrated**: Practice scores do not predict outcomes, or external feedback consistently contradicts coach scoring. System is actively investigating and adjusting.

---

## Section 2: Scoring Drift Detection Protocol

Runs during `progress` when 3+ real interview outcomes exist in the Outcome Log.

### Step 1: Build Outcome-Score Matrix
For each real interview with a known outcome, pull:
- The most recent practice scores before that interview (from Score History)
- The interview analysis scores (if a transcript was analyzed)
- The outcome (advanced / rejected / offer)
- Any recruiter/interviewer feedback received

### Step 2: Check for Systematic Drift Per Dimension
For each of the 5 dimensions:
- Compare average practice scores for interviews where the candidate advanced vs. where they were rejected
- Look for dimensions where practice scores are consistently HIGH but outcomes are poor (the system is over-scoring)
- Look for dimensions where practice scores are LOW but outcomes are good (the system is under-scoring or the dimension doesn't matter for this candidate's targets)

### Step 3: Check for Feedback Contradictions
Cross-reference external feedback with coach scores:
- If recruiter feedback says "great storytelling, hard to follow" but the coach scored Structure at 4, that's a drift signal
- If feedback says "polished but generic" but the coach scored Differentiation at 3+, that's a drift signal
- Weight external feedback higher than internal scoring — the interviewer is the ground truth

### Step 4: Generate Drift Report
For each dimension showing drift:
- **Direction**: Over-scoring or under-scoring
- **Evidence**: Specific interviews where the prediction failed
- **Magnitude**: How far off (e.g., "Structure scores averaged 3.8 but 3 of 4 rejections cited unclear answers")
- **Proposed adjustment**: Concrete change (e.g., "Tighten Structure scoring by 0.5 — what I've been scoring as 4 is landing as 3 with real interviewers")

### Step 5: Present to Candidate
Frame drift corrections as improved predictive accuracy, NOT as goalpost-moving:
- "I've been scoring your Structure at 4, but interviewer feedback consistently suggests it's landing at 3. I'm recalibrating so my scores better predict how real interviewers will evaluate you. This isn't a downgrade — it's a more accurate picture."
- Record the adjustment in the Scoring Drift Log and Calibration Adjustments tables

---

## Section 3: Cross-Dimension Root Cause Tracking

### Lifecycle

**Detection**: When the same root cause appears in 2+ consecutive sessions (or across 2+ answers in the same session), it graduates from "per-answer observation" to "active root cause" in the Calibration State table.

Example: "Conflict avoidance" detected in Session 3 (affecting Substance on Q2 and Q5) and Session 4 (affecting Substance on Q1 and Differentiation on Q3) → create entry:

| Root Cause | Affected Dimensions | First Detected | Status | Treatment |
|---|---|---|---|---|
| Conflict avoidance | Substance, Differentiation | Session 3 | Active | Tension-mining drill on top 3 stories |

**Unified Treatment**: Prescribe ONE intervention targeting the root cause itself, not each affected dimension separately. "Conflict avoidance" gets tension-mining drills — not separate Substance drills AND Differentiation drills. This is more efficient and addresses the actual problem.

**Progress Tracking**: In subsequent sessions, check whether ALL affected dimensions are improving in tandem. If Substance improves but Differentiation doesn't, the root cause may be partially resolved or the treatment needs adjustment.

**Resolution Criteria**: A root cause is resolved when:
- All affected dimensions show 1+ point sustained improvement over 3+ sessions
- The pattern no longer appears in new analyses
- Update status to "Resolved" with date and what worked

---

## Section 4: Temporal Decay for Intelligence Data

Apply these freshness rules when referencing intelligence data in prep, progress, or any workflow:

| Data Type | Current | Historical | Archive |
|---|---|---|---|
| **Question Bank entries** | < 3 months old | 3-6 months old — flag as "historical, may not reflect current process" | > 6 months old — archive only, don't weight in predictions |
| **Effective/Ineffective Patterns** | < 3 months old | 3+ months old — re-test before relying. "This pattern was confirmed 4 months ago. Let's check if it still holds." | N/A — patterns don't expire, but confidence decays |
| **Company Patterns** | < 6 months old | > 6 months old — flag: "Company data is from [date]. Processes and culture may have changed." | > 12 months — treat as background context only |
| **Recruiter/Interviewer Feedback** | Current loop (active interview process) = high-signal | Closed loops = context only, not actionable | N/A |

**Application**: When referencing intelligence data, check its age. If stale, note it:
- "The last data point for this company is 5 months old — their process may have changed."
- "This effective pattern was confirmed in [month]. Worth re-testing to make sure it still holds."

---

## Section 5: Role-Drill Integration with Core Dimensions

Role-specific drills (from `references/role-drills.md`) use native scoring axes that don't map 1:1 to the core 5 dimensions. This section provides the explicit mapping so role-drill performance feeds into trend analysis and calibration.

### Mapping Table

| Role | Drill Axis | Maps To Core Dimension(s) |
|---|---|---|
| PM | Acknowledging tension | Credibility |
| PM | Specific evidence | Substance |
| PM | Decision rationale | Substance + Structure |
| PM | Stakeholder awareness | Relevance |
| PM | Trade-off articulation | Substance + Differentiation |
| Engineer | Depth of understanding | Substance + Credibility |
| Engineer | Explaining technical decisions | Structure |
| Engineer | Acknowledging constraints | Credibility |
| Engineer | Systems thinking | Substance |
| Designer | Rationale clarity | Structure |
| Designer | User evidence | Substance + Credibility |
| Designer | Design trade-offs | Substance + Differentiation |
| Designer | Process articulation | Structure |
| Data Science | Statistical rigor | Substance + Credibility |
| Data Science | Business translation | Relevance + Structure |
| Data Science | Methodology defense | Credibility + Differentiation |
| Research | Evidence quality | Substance |
| Research | Insight articulation | Differentiation |
| Research | Stakeholder translation | Relevance + Structure |
| Marketing | Metric articulation | Substance + Credibility |
| Marketing | Creative rationale | Differentiation + Structure |
| Operations | Process thinking | Structure |
| Operations | Scale articulation | Substance |
| Operations | Constraint navigation | Credibility + Differentiation |

### After Scoring a Role Drill

1. Score using the native drill axes (as defined in role-drills.md)
2. Map each axis score to core dimensions using the table above
3. When a drill axis maps to multiple core dimensions, use the axis score for both (blended input)
4. Record the blended scores in Score History alongside the native drill scores
5. This ensures role-drill performance feeds into trend analysis, calibration checks, and graduation criteria

---

## Section 6: Learning from Successes

When the Outcome Log shows "advanced" or "offer":

### Step 1: Validate Fit Assessment
If a fit assessment was recorded in Interview Loops:
- Was the verdict correct? (Did a "Strong Fit" actually advance? Did a "Stretch" actually get rejected?)
- What signals drove the assessment? Were they accurate?
- Update fit assessment heuristics based on what actually predicted outcomes

### Step 2: Track Dimension-Outcome Correlation (Positive)
- Which dimension scores coincide with advancement?
- Is there a minimum threshold? (e.g., "Every advancement had Differentiation 3.5+")
- Record positive correlations alongside negative ones — the system should know what works, not just what fails

### Step 3: Track Story Success
- Update storybank Notes for stories used in successful interviews: "Contributed to advancement at [Company], Round [N]"
- If a story has been part of 2+ advancements, flag it as a "proven performer"
- These annotations help future story selection — proven performers get priority in competitive preps

### Step 4: Extract Success Patterns
When 3+ successes exist, look for commonalities:
- Same stories used?
- Same dimensions strong?
- Same company type or interview format?
- What was the candidate's emotional state? (from debrief data)
- Record confirmed success patterns in Interview Intelligence → Effective Patterns

---

## Section 7: Structured Unmeasured Factor Investigation

When practice scores don't predict outcomes (high practice scores + rejections, or low practice scores + advancements), investigate factors outside the scoring rubric.

### Step 1: Candidate Debrief Questions
Ask the candidate about the interviews where prediction failed:
- "How was your energy level going into [the interview that went well despite low practice scores]?"
- "Did the conversation feel natural or forced?"
- "Did you ask good questions at the end? How did the interviewer respond?"
- "Was anything different about the environment — time of day, format, who was in the room?"

### Step 2: Common Factor Taxonomy

| Factor | Signals | How to Test |
|---|---|---|
| **Energy/enthusiasm** | Candidate reports feeling "on" vs. "flat"; interviewer feedback mentions "excited about the role" | Practice at different energy levels. Track correlation. |
| **Rapport** | Candidate found common ground, conversation felt natural, interviewer shared personal stories | Practice with different persona styles. Track which interactions feel natural. |
| **Pacing/timing** | Answers felt too long or too short relative to the interview format | Time practice answers. Compare to format norms. |
| **Question quality** | The questions the candidate asked at the end; interviewer engagement with those questions | Practice generating questions. Score them. |
| **Format comfort** | Performance varies by interview format (behavioral vs. panel vs. system design) | Track scores by format. Identify format-specific gaps. |
| **Physical/emotional state** | Time of day, sleep, stress level, preparation routine | Note environmental factors in debrief. Look for patterns. |

### Step 3: Hypothesis Testing
1. Form a specific hypothesis: "Energy level is driving outcomes more than content quality"
2. Design a targeted drill or tracking approach: "For the next 3 practice sessions, vary energy level and see if scores change"
3. Track results explicitly in the Unmeasured Factor Investigations table

### Step 4: Resolution
- **Confirmed**: The factor is real. Add it to Effective Patterns (or Ineffective Patterns). Integrate into coaching: "Before your next interview, do a 10-minute warmup — your energy level matters as much as your content."
- **Inconclusive**: Not enough data. Continue tracking.
- **Closed**: The hypothesis didn't hold. Remove from active investigation.

---

## Integration with Story Mapping Engine

- When scoring drift adjusts a dimension score, flag stories in the storybank whose strength ratings were heavily influenced by that dimension. They may need re-evaluation. Example: "Your Substance scores have been recalibrated down by 0.5. S003 and S007 were rated based on Substance evidence — consider re-scoring them."
- When calibration shows Differentiation predicts advancement, upgrade earned-secret-aware selection in the story mapping engine from conditional to default mode.
- When calibration links a specific dimension to rejections (e.g., "every rejection correlates with Credibility < 3"), elevate story mapping gaps in that dimension's competencies to "Calibration-Urgent" priority level.
Loading