From e7273aa6675ee125b95fbd5ef8dceb50bca74cc4 Mon Sep 17 00:00:00 2001 From: alphainfinitus Date: Tue, 6 Jan 2026 01:43:03 +0530 Subject: [PATCH 1/2] Remove plans from tracking --- plans/core.md | 23 -- plans/phase0.md | 775 --------------------------------------- plans/phase1.md | 952 ------------------------------------------------ plans/phase2.md | 877 -------------------------------------------- plans/phase3.md | 863 ------------------------------------------- plans/phases.md | 913 ---------------------------------------------- plans/plan.md | 364 ------------------ 7 files changed, 4767 deletions(-) delete mode 100644 plans/core.md delete mode 100644 plans/phase0.md delete mode 100644 plans/phase1.md delete mode 100644 plans/phase2.md delete mode 100644 plans/phase3.md delete mode 100644 plans/phases.md delete mode 100644 plans/plan.md diff --git a/plans/core.md b/plans/core.md deleted file mode 100644 index ddc7f08..0000000 --- a/plans/core.md +++ /dev/null @@ -1,23 +0,0 @@ -I was building an AI chatbot, I wanted to add a speech to text feature like the top AI apps have with the web speech api that browsers have inbuilt for people who don't want to pay for whisper etc and the complex setup that comes with it. - -A mic when you click it, listens to you and enters whatever you say into a text input. -I wanted to add basic stuff like: -- mic permission handling -- listening state -- browser compatibility workarounds -- the text should enter where the cursor is -- etc -I realised that this basic stuff should be a package and i couldn't find any that does anything like such, there are basic wrappers around the web speech api but nothing else. - -So this project is to make that package. - -We'll use: -https://github.com/rolldown/tsdown/ -https://tsdown.dev/recipes/react-support - - -I've created the github repo at: https://github.com/SyntropyLabs/react-web-speech -and reserved the organisation name for 'syntropy-labs' - -Please research the web on how other people do it, research: -Basic principles of creating and maintaining an open source npm package. \ No newline at end of file diff --git a/plans/phase0.md b/plans/phase0.md deleted file mode 100644 index 1102bea..0000000 --- a/plans/phase0.md +++ /dev/null @@ -1,775 +0,0 @@ -# Phase 0: Project Foundation & DX Infrastructure - -> **Goal:** Establish a production-grade open-source npm package foundation with excellent developer experience from day one. - -**Estimated Time:** 1-2 days - ---- - -## Overview - -Phase 0 sets the foundation for everything that follows. It focuses on: -- Package configuration with modern best practices -- TypeScript and linting setup -- Testing infrastructure -- CI/CD and release automation -- Repository community files - -> [!IMPORTANT] -> All configurations in this phase should be completed before writing any library code. This ensures consistent quality from the first commit. - ---- - -## 0.1 Package Configuration - -### 0.1.1 tsdown Configuration - -The current `tsdown.config.ts` already has a good foundation. Here are refinements based on best practices from the tsdown documentation: - -> [!NOTE] -> The current config has a duplicate `plugins` key inside the babel config object. While this works (JS takes the last value), it should be cleaned up for clarity. - -```typescript -// tsdown.config.ts -import pluginBabel from '@rollup/plugin-babel' -import { defineConfig } from 'tsdown' - -export default defineConfig({ - entry: ['./src/index.ts'], - format: ['esm', 'cjs'], - platform: 'neutral', // SSR-safe for Next.js, Remix, etc. - dts: true, // Generate declaration files - exports: true, // Auto-generate package.json exports - clean: true, - treeshake: true, - external: ['react', 'react-dom'], - fixedExtension: true, // Generate .mjs/.cjs (recommended for npm packages) - plugins: [ - pluginBabel({ - babelHelpers: 'bundled', - parserOpts: { - sourceType: 'module', - plugins: ['jsx', 'typescript'], - }, - plugins: ['babel-plugin-react-compiler'], - extensions: ['.js', '.jsx', '.ts', '.tsx'], - }), - ], -}) -``` - -**Recommended addition:** Add `fixedExtension: true` to generate `.mjs`/`.cjs` file extensions, which is considered best practice for npm packages. - -**Key Configuration Options:** - -| Option | Value | Purpose | -|--------|-------|---------| -| `format` | `['esm', 'cjs']` | Dual format for maximum compatibility | -| `platform` | `'neutral'` | Works in browser, Node.js, and SSR | -| `dts` | `true` | Generate TypeScript declaration files | -| `exports` | `true` | Auto-update `package.json` exports field | -| `fixedExtension` | `true` | Use `.mjs`/`.cjs` extensions (npm best practice) | -| `treeshake` | `true` | Remove unused code from bundle | -| `plugins` | React Compiler | Pre-optimize React components via babel-plugin-react-compiler | - -> [!TIP] -> With `exports: true`, tsdown **auto-generates** the `exports`, `main`, `module`, and `types` fields in `package.json` during build. You don't need to manage these manually. - ---- - -### 0.1.2 package.json Updates - -Update `package.json` with modern npm package best practices. - -> [!IMPORTANT] -> Since tsdown uses `exports: true`, the `exports`, `main`, `module`, and `types` fields are **auto-generated** during build. You should NOT manually define these - tsdown will manage them. - -```json -{ - "name": "@syntropy-labs/react-web-speech", - "version": "0.0.1", - "type": "module", - "description": "A React library for the Web Speech API with first-class DX: mic permissions, listening states, browser compatibility, and cursor-aware text insertion.", - "author": "SyntropyLabs", - "license": "MIT", - "repository": { - "type": "git", - "url": "git+https://github.com/SyntropyLabs/react-web-speech.git" - }, - "homepage": "https://github.com/SyntropyLabs/react-web-speech#readme", - "bugs": { - "url": "https://github.com/SyntropyLabs/react-web-speech/issues" - }, - "keywords": [ - "react", "speech", "speech-to-text", "speech-recognition", - "web-speech-api", "voice", "microphone", "hooks", "typescript" - ], - "sideEffects": false, - "files": ["dist", "README.md", "LICENSE"], - "scripts": { - "build": "tsdown", - "dev": "tsdown --watch", - "lint": "eslint .", - "lint:fix": "eslint . --fix", - "format": "prettier --write .", - "format:check": "prettier --check .", - "typecheck": "tsc --noEmit", - "test": "vitest", - "test:coverage": "vitest run --coverage", - "prepublishOnly": "npm run build", - "prepare": "husky", - "changeset": "changeset", - "version": "changeset version", - "release": "npm run build && changeset publish" - }, - "peerDependencies": { - "react": ">=17.0.0", - "react-dom": ">=17.0.0" - }, - "peerDependenciesMeta": { - "react-dom": { "optional": true } - }, - "devDependencies": { - "@babel/core": "^7.28.5", - "@changesets/cli": "^2.27.0", - "@changesets/changelog-github": "^0.5.0", - "@rollup/plugin-babel": "^6.1.0", - "@testing-library/react": "^16.0.0", - "@types/react": "^19.0.0", - "@types/react-dom": "^19.0.0", - "@typescript-eslint/eslint-plugin": "^8.0.0", - "@typescript-eslint/parser": "^8.0.0", - "@vitest/coverage-v8": "^3.0.0", - "babel-plugin-react-compiler": "^1.0.0", - "eslint": "^9.0.0", - "eslint-config-prettier": "^10.0.0", - "eslint-plugin-react-hooks": "^5.0.0", - "happy-dom": "^16.0.0", - "husky": "^9.0.0", - "lint-staged": "^15.0.0", - "prettier": "^3.0.0", - "react": "^19.0.0", - "react-dom": "^19.0.0", - "tsdown": "^0.18.1", - "typescript": "^5.9.3", - "vitest": "^3.0.0" - }, - "engines": { - "node": ">=18" - }, - "publishConfig": { - "access": "public", - "provenance": true - } -} -``` - -**Key Updates:** - -| Field | Purpose | -|-------|---------| -| ~~`exports/main/module/types`~~ | **Auto-generated by tsdown** - do not manually define | -| `publishConfig.provenance` | Enable npm provenance for supply chain security | -| `sideEffects: false` | Enables tree-shaking for consumers | -| `files` | Explicitly include only necessary files | -| `prepare` | Auto-setup Husky on `npm install` | - ---- - -## 0.2 TypeScript Configuration - -### tsconfig.json - -```json -{ - "$schema": "https://json.schemastore.org/tsconfig", - "compilerOptions": { - "target": "ES2020", - "lib": ["ES2020", "DOM", "DOM.Iterable"], - "module": "ESNext", - "moduleResolution": "bundler", - "jsx": "react-jsx", - "strict": true, - "noEmit": true, - "declaration": true, - "declarationMap": true, - "sourceMap": true, - "esModuleInterop": true, - "allowSyntheticDefaultImports": true, - "forceConsistentCasingInFileNames": true, - "skipLibCheck": true, - "resolveJsonModule": true, - "isolatedModules": true, - "noUnusedLocals": true, - "noUnusedParameters": true, - "noFallthroughCasesInSwitch": true, - "allowImportingTsExtensions": true - }, - "include": ["src/**/*"], - "exclude": ["node_modules", "dist", "**/*.test.ts", "**/*.test.tsx"] -} -``` - -**Key Compiler Options:** - -| Option | Value | Purpose | -|--------|-------|---------| -| `target` | `ES2020` | Modern baseline, good browser support | -| `moduleResolution` | `bundler` | Best for library bundling with tsdown | -| `strict` | `true` | Maximum type safety | -| `noEmit` | `true` | tsdown handles output, TS is for type checking | -| `isolatedModules` | `true` | Required for bundler compatibility | - ---- - -## 0.3 ESLint Configuration (Flat Config) - -ESLint 9+ uses the new flat config format (`eslint.config.js`). This is the modern standard. - -### eslint.config.js - -```javascript -import eslint from '@eslint/js' -import tseslint from '@typescript-eslint/eslint-plugin' -import tsparser from '@typescript-eslint/parser' -import reactHooks from 'eslint-plugin-react-hooks' -import prettier from 'eslint-config-prettier' - -export default [ - // Base ESLint recommended rules - eslint.configs.recommended, - - // TypeScript files configuration - { - files: ['**/*.{ts,tsx}'], - languageOptions: { - parser: tsparser, - parserOptions: { - ecmaVersion: 'latest', - sourceType: 'module', - ecmaFeatures: { - jsx: true, - }, - }, - }, - plugins: { - '@typescript-eslint': tseslint, - 'react-hooks': reactHooks, - }, - rules: { - // TypeScript rules - ...tseslint.configs.recommended.rules, - '@typescript-eslint/no-unused-vars': ['error', { - argsIgnorePattern: '^_', - varsIgnorePattern: '^_', - }], - '@typescript-eslint/explicit-function-return-type': 'off', - '@typescript-eslint/no-explicit-any': 'warn', - - // React Hooks rules - ...reactHooks.configs.recommended.rules, - 'react-hooks/rules-of-hooks': 'error', - 'react-hooks/exhaustive-deps': 'warn', - }, - }, - - // Disable rules that conflict with Prettier - prettier, - - // Ignore patterns - { - ignores: ['dist/**', 'node_modules/**', 'coverage/**', '*.config.*'], - }, -] -``` - ---- - -## 0.4 Prettier Configuration - -### .prettierrc - -```json -{ - "semi": false, - "singleQuote": true, - "trailingComma": "es5", - "tabWidth": 2, - "printWidth": 100, - "bracketSpacing": true, - "jsxSingleQuote": false, - "arrowParens": "always" -} -``` - -### .prettierignore - -``` -dist -node_modules -coverage -pnpm-lock.yaml -yarn.lock -package-lock.json -*.md -``` - ---- - -## 0.5 Testing Infrastructure - -### Vitest Configuration - -Using Vitest for fast, modern testing with native ESM support. - -#### vitest.config.ts - -```typescript -import { defineConfig } from 'vitest/config' - -export default defineConfig({ - test: { - environment: 'happy-dom', // Lighter than jsdom - globals: true, - include: ['src/**/*.test.{ts,tsx}'], - coverage: { - provider: 'v8', // Faster than istanbul - reporter: ['text', 'json', 'html', 'lcov'], - exclude: [ - 'node_modules', - 'dist', - '**/*.config.*', - '**/*.d.ts', - 'src/types/**', - ], - thresholds: { - lines: 80, - functions: 80, - branches: 80, - statements: 80, - }, - }, - }, -}) -``` - -### Testing Dependencies Purpose - -| Package | Purpose | -|---------|---------| -| `vitest` | Test runner with native ESM, Jest-compatible API | -| `happy-dom` | Lightweight DOM environment (faster than jsdom) | -| `@testing-library/react` | Test React hooks with `renderHook` | -| `@vitest/coverage-v8` | V8-based coverage (faster, AST-aware since v3.2) | - ---- - -## 0.6 Pre-commit Hooks (Husky + lint-staged) - -### Setup Commands - -```bash -# Initialize Husky -npx husky init - -# The prepare script in package.json will auto-setup on npm install -``` - -### .husky/pre-commit - -```bash -npx lint-staged -``` - -### lint-staged.config.js - -```javascript -export default { - '*.{ts,tsx}': ['eslint --fix', 'prettier --write'], - '*.{json,md,yml,yaml}': ['prettier --write'], -} -``` - ---- - -## 0.7 Changesets Configuration - -Changesets provides explicit version control and human-readable changelogs. - -### Setup Commands - -```bash -npx @changesets/cli init -``` - -### .changeset/config.json - -```json -{ - "$schema": "https://unpkg.com/@changesets/config@latest/schema.json", - "changelog": ["@changesets/changelog-github", { "repo": "SyntropyLabs/react-web-speech" }], - "commit": false, - "fixed": [], - "linked": [], - "access": "public", - "baseBranch": "main", - "updateInternalDependencies": "patch", - "ignore": [] -} -``` - -**Why Changesets over Semantic Release?** -- Explicit control over when to release -- Human-readable changelogs in `.changeset/` directory -- Better for initial development with frequent breaking changes -- Used by Radix UI, Chakra UI, TanStack Query, and other top packages - ---- - -## 0.8 GitHub Actions CI/CD - -### .github/workflows/ci.yml - -```yaml -name: CI - -on: - push: - branches: [main] - pull_request: - branches: [main] - -concurrency: - group: ${{ github.workflow }}-${{ github.ref }} - cancel-in-progress: true - -jobs: - lint-and-test: - runs-on: ubuntu-latest - - steps: - - name: Checkout - uses: actions/checkout@v4 - - - name: Setup Node.js - uses: actions/setup-node@v4 - with: - node-version: 20 - cache: 'npm' - - - name: Install dependencies - run: npm ci - - - name: Lint - run: npm run lint - - - name: Type check - run: npm run typecheck - - - name: Format check - run: npm run format:check - - - name: Test - run: npm run test:coverage - - - name: Build - run: npm run build - - - name: Upload coverage - uses: codecov/codecov-action@v4 - with: - files: ./coverage/lcov.info - fail_ci_if_error: false -``` - -### .github/workflows/release.yml - -```yaml -name: Release - -on: - push: - branches: [main] - -concurrency: ${{ github.workflow }}-${{ github.ref }} - -jobs: - release: - runs-on: ubuntu-latest - permissions: - contents: write - pull-requests: write - id-token: write # Required for npm provenance - - steps: - - name: Checkout - uses: actions/checkout@v4 - with: - fetch-depth: 0 - - - name: Setup Node.js - uses: actions/setup-node@v4 - with: - node-version: 20 - cache: 'npm' - registry-url: 'https://registry.npmjs.org' - - - name: Install dependencies - run: npm ci - - - name: Build - run: npm run build - - - name: Create Release Pull Request or Publish - uses: changesets/action@v1 - with: - version: npm run version - publish: npm run release - commit: 'chore: version packages' - title: 'chore: version packages' - env: - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - NPM_TOKEN: ${{ secrets.NPM_TOKEN }} -``` - ---- - -## 0.9 Repository Community Files - -### Issue Templates - -#### .github/ISSUE_TEMPLATE/bug_report.md - -```markdown ---- -name: Bug report -about: Create a report to help us improve -title: '[Bug]: ' -labels: bug -assignees: '' ---- - -**Describe the bug** -A clear and concise description of what the bug is. - -**To Reproduce** -Steps to reproduce the behavior: -1. -2. -3. - -**Expected behavior** -A clear and concise description of what you expected to happen. - -**Environment** -- Package version: -- React version: -- Browser: -- OS: - -**Additional context** -Add any other context about the problem here. -``` - -#### .github/ISSUE_TEMPLATE/feature_request.md - -```markdown ---- -name: Feature request -about: Suggest an idea for this project -title: '[Feature]: ' -labels: enhancement -assignees: '' ---- - -**Is your feature request related to a problem?** -A clear and concise description of what the problem is. - -**Describe the solution you'd like** -A clear and concise description of what you want to happen. - -**Describe alternatives you've considered** -A clear and concise description of any alternative solutions or features you've considered. - -**Additional context** -Add any other context or screenshots about the feature request here. -``` - -### .github/PULL_REQUEST_TEMPLATE.md - -```markdown -## Description - - - -## Type of Change - -- [ ] Bug fix (non-breaking change which fixes an issue) -- [ ] New feature (non-breaking change which adds functionality) -- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) -- [ ] Documentation update - -## Checklist - -- [ ] I have read the [CONTRIBUTING](CONTRIBUTING.md) document -- [ ] My code follows the code style of this project -- [ ] I have added tests that prove my fix is effective or that my feature works -- [ ] All new and existing tests pass -- [ ] I have updated the documentation accordingly -``` - -### CONTRIBUTING.md - -```markdown -# Contributing to @syntropy-labs/react-web-speech - -Thank you for your interest in contributing! This document provides guidelines for contributing to the project. - -## Development Setup - -1. Fork and clone the repository -2. Install dependencies: `npm install` -3. Run tests: `npm test` -4. Start development: `npm run dev` - -## Making Changes - -1. Create a new branch: `git checkout -b feature/your-feature` -2. Make your changes -3. Add a changeset: `npm run changeset` -4. Commit your changes -5. Push and create a pull request - -## Code Style - -- We use ESLint and Prettier for code formatting -- Run `npm run lint:fix` before committing -- Pre-commit hooks will automatically check your code - -## Testing - -- Write tests for new features -- Ensure all tests pass: `npm test` -- Check coverage: `npm run test:coverage` - -## Commit Messages - -We follow [Conventional Commits](https://www.conventionalcommits.org/): - -- `feat:` New feature -- `fix:` Bug fix -- `docs:` Documentation changes -- `chore:` Maintenance tasks -- `test:` Test additions or modifications -``` - -### CODE_OF_CONDUCT.md - -```markdown -# Contributor Covenant Code of Conduct - -## Our Pledge - -We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone. - -## Our Standards - -Examples of behavior that contributes to a positive environment: - -- Using welcoming and inclusive language -- Being respectful of differing viewpoints and experiences -- Gracefully accepting constructive criticism -- Focusing on what is best for the community -- Showing empathy towards other community members - -Examples of unacceptable behavior: - -- The use of sexualized language or imagery and unwelcome sexual attention -- Trolling, insulting/derogatory comments, and personal or political attacks -- Public or private harassment -- Publishing others' private information without explicit permission -- Other conduct which could reasonably be considered inappropriate - -## Enforcement - -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the project maintainers. All complaints will be reviewed and investigated promptly and fairly. - -## Attribution - -This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 2.1. -``` - -### .editorconfig - -```ini -root = true - -[*] -charset = utf-8 -end_of_line = lf -indent_size = 2 -indent_style = space -insert_final_newline = true -trim_trailing_whitespace = true - -[*.md] -trim_trailing_whitespace = false -``` - ---- - -## 0.10 Phase 0 Deliverables Checklist - -| Deliverable | Status | File(s) | -|-------------|--------|---------| -| tsdown configuration (fixed) | ⬜ | `tsdown.config.ts` | -| package.json with exports and provenance | ⬜ | `package.json` | -| TypeScript configuration | ⬜ | `tsconfig.json` | -| ESLint flat config | ⬜ | `eslint.config.js` | -| Prettier configuration | ⬜ | `.prettierrc`, `.prettierignore` | -| Vitest configuration | ⬜ | `vitest.config.ts` | -| Husky setup | ⬜ | `.husky/pre-commit` | -| lint-staged configuration | ⬜ | `lint-staged.config.js` | -| Changesets initialization | ⬜ | `.changeset/config.json` | -| CI workflow | ⬜ | `.github/workflows/ci.yml` | -| Release workflow | ⬜ | `.github/workflows/release.yml` | -| Issue templates | ⬜ | `.github/ISSUE_TEMPLATE/` | -| PR template | ⬜ | `.github/PULL_REQUEST_TEMPLATE.md` | -| Contributing guide | ⬜ | `CONTRIBUTING.md` | -| Code of conduct | ⬜ | `CODE_OF_CONDUCT.md` | -| Editor config | ⬜ | `.editorconfig` | - ---- - -## Summary - -Phase 0 establishes a solid foundation with: - -1. **Modern Build System**: tsdown with dual ESM/CJS output, auto-generated exports, and proper TypeScript declarations -2. **Code Quality**: ESLint flat config + Prettier + pre-commit hooks -3. **Testing**: Vitest with happy-dom for fast React hook testing -4. **Release Automation**: Changesets with GitHub Actions for versioning and npm publishing with provenance -5. **Community**: Issue templates, PR template, contributing guide, and code of conduct - -This ensures the project starts with production-grade tooling that will scale as the library grows. - ---- - -## Appendix: Comparison with Popular React Packages - -Research into top React packages reveals common patterns: - -| Package | Build Tool | Test Tool | Version Mgmt | Pre-commit | -|---------|-----------|-----------|--------------|------------| -| **TanStack Query** | tsup | Vitest | Changesets | ❌ | -| **react-hook-form** | Rollup | Jest | Manual | Husky + lint-staged | -| **react-use** | tsc | Jest | semantic-release | Husky + lint-staged | -| **This Package** | tsdown | Vitest | Changesets | Husky + lint-staged | - -### Key Patterns Observed - -1. **`sideEffects: false`** - All packages use this for tree-shaking -2. **Dual ESM/CJS** - All export both formats with proper `exports` field -3. **TypeScript** - All are TypeScript-first with declaration files -4. **`prepare: husky`** - Most use husky for git hooks (except TanStack which handles it differently in monorepo) -5. **Testing** - Modern packages trending toward Vitest over Jest diff --git a/plans/phase1.md b/plans/phase1.md deleted file mode 100644 index 61fa854..0000000 --- a/plans/phase1.md +++ /dev/null @@ -1,952 +0,0 @@ -# Phase 1: Core Speech Recognition Engine - -> **Goal:** Build a robust, browser-normalized Web Speech API wrapper with proper TypeScript types and comprehensive testing. - -**Estimated Time:** 2-3 days - ---- - -## Overview - -Phase 1 creates the foundational building blocks for the library: -- Comprehensive TypeScript type definitions -- Browser detection and Web Speech API normalization -- Microphone permission management -- SpeechRecognition engine wrapper with event handling - -> [!IMPORTANT] -> All code in Phase 1 is framework-agnostic (no React). React hooks come in Phase 2. - ---- - -## 1.1 File Structure - -``` -src/ -├── index.ts # Main exports -├── types/ -│ └── index.ts # All TypeScript definitions -├── core/ -│ ├── browser.ts # Browser detection & normalization -│ ├── permissions.ts # Mic permission handling -│ └── recognition.ts # SpeechRecognition wrapper -└── __tests__/ - ├── browser.test.ts - ├── permissions.test.ts - └── recognition.test.ts -``` - ---- - -## 1.2 Type Definitions - -### Research Findings: SpeechRecognition API - -From MDN documentation, the SpeechRecognition interface has: - -**Properties:** -| Property | Type | Default | Description | -|----------|------|---------|-------------| -| `lang` | `string` | `navigator.language` | Recognition language (BCP 47 tag) | -| `continuous` | `boolean` | `false` | Keep listening after pause | -| `interimResults` | `boolean` | `false` | Return partial results | -| `maxAlternatives` | `number` | `1` | Number of alternative transcripts | -| `processLocally` | `boolean` | - | Force on-device processing (Chrome 120+) | - -**Events:** -| Event | Description | -|-------|-------------| -| `start` | Service begins listening | -| `end` | Service disconnected | -| `result` | Word/phrase recognized | -| `error` | Recognition error occurred | -| `speechstart` | Speech detected | -| `speechend` | Speech stopped | -| `audiostart` | Audio capture started | -| `audioend` | Audio capture ended | -| `soundstart` | Any sound detected | -| `soundend` | Any sound stopped | -| `nomatch` | No significant recognition | - -**Error Types:** -| Error | Description | -|-------|-------------| -| `no-speech` | No speech detected | -| `aborted` | Recognition aborted | -| `audio-capture` | No mic or mic access failed | -| `network` | Network error | -| `not-allowed` | Mic permission denied | -| `service-not-allowed` | Service not allowed | -| `bad-grammar` | Grammar error | -| `language-not-supported` | Language not supported | - -### src/types/index.ts - -```typescript -/** - * Type definitions for @syntropy-labs/react-web-speech - * Based on Web Speech API specification and MDN documentation - */ - -// ============================================================================ -// Browser & Global Types -// ============================================================================ - -/** Global augmentation for webkit-prefixed SpeechRecognition */ -declare global { - interface Window { - SpeechRecognition: typeof SpeechRecognition - webkitSpeechRecognition: typeof SpeechRecognition - } -} - -// ============================================================================ -// Speech Recognition Types (Browser-normalized) -// ============================================================================ - -/** - * Normalized SpeechRecognition instance interface - * Combines standard and webkit-prefixed APIs - */ -export interface SpeechRecognitionInstance extends EventTarget { - // Properties - lang: string - continuous: boolean - interimResults: boolean - maxAlternatives: number - - // Methods - start(): void - stop(): void - abort(): void - - // Event handlers - onstart: ((event: Event) => void) | null - onend: ((event: Event) => void) | null - onerror: ((event: SpeechRecognitionErrorEvent) => void) | null - onresult: ((event: SpeechRecognitionEvent) => void) | null - onspeechstart: ((event: Event) => void) | null - onspeechend: ((event: Event) => void) | null - onaudiostart: ((event: Event) => void) | null - onaudioend: ((event: Event) => void) | null - onsoundstart: ((event: Event) => void) | null - onsoundend: ((event: Event) => void) | null - onnomatch: ((event: Event) => void) | null -} - -// ============================================================================ -// Permission Types -// ============================================================================ - -/** - * Microphone permission states - * Aligned with Permissions API specification - */ -export type MicPermissionState = 'prompt' | 'granted' | 'denied' | 'unsupported' - -// ============================================================================ -// Error Types -// ============================================================================ - -/** - * All possible speech recognition error types - * Based on SpeechRecognitionErrorEvent.error values + custom types - */ -export type SpeechErrorType = - | 'no-speech' - | 'aborted' - | 'audio-capture' - | 'network' - | 'not-allowed' - | 'service-not-allowed' - | 'bad-grammar' - | 'language-not-supported' - | 'browser-not-supported' - -/** - * Structured error object for speech recognition errors - */ -export interface SpeechError { - /** Error type identifier */ - type: SpeechErrorType - /** Human-readable error message */ - message: string - /** Original browser error event (if available) */ - originalError?: Event -} - -// ============================================================================ -// Browser Capabilities -// ============================================================================ - -/** - * Browser capability detection result - */ -export interface BrowserCapabilities { - /** Whether Web Speech API is supported */ - isSupported: boolean - /** The SpeechRecognition constructor (or null) */ - SpeechRecognition: (new () => SpeechRecognitionInstance) | null - /** Whether webkit prefix is needed */ - needsWebkitPrefix: boolean - /** Whether Permissions API supports microphone query */ - supportsPermissionsAPI: boolean - /** Detected browser name */ - browserName: 'chrome' | 'edge' | 'safari' | 'firefox' | 'other' -} - -// ============================================================================ -// Recognition Engine Types -// ============================================================================ - -/** - * Options for creating a recognition instance - */ -export interface RecognitionOptions { - /** Recognition language (BCP 47 tag, e.g., 'en-US') */ - lang?: string - /** Keep listening after pause */ - continuous?: boolean - /** Return interim (partial) results */ - interimResults?: boolean - /** Number of alternative transcripts (1-5) */ - maxAlternatives?: number -} - -/** - * Callbacks for recognition events - */ -export interface RecognitionCallbacks { - /** Called when transcript is available */ - onResult: (transcript: string, isFinal: boolean) => void - /** Called on any error */ - onError: (error: SpeechError) => void - /** Called when recognition starts */ - onStart: () => void - /** Called when recognition ends */ - onEnd: () => void - /** Called when speech is detected */ - onSpeechStart?: () => void - /** Called when speech stops */ - onSpeechEnd?: () => void -} - -// ============================================================================ -// Hook Types (for Phase 2) -// ============================================================================ - -/** - * Options for useSpeechInput hook - */ -export interface UseSpeechInputOptions extends RecognitionOptions { - /** Auto-stop after silence (ms, 0 to disable) */ - silenceTimeout?: number - /** Auto-restart on network errors */ - autoRestart?: boolean - /** Callback when transcript is available */ - onResult?: (transcript: string, isFinal: boolean) => void - /** Callback on error */ - onError?: (error: SpeechError) => void - /** Callback when recognition starts */ - onStart?: () => void - /** Callback when recognition ends */ - onEnd?: () => void -} - -/** - * Return type for useSpeechInput hook - */ -export interface UseSpeechInputReturn { - // State - /** Final transcript text */ - transcript: string - /** Real-time partial transcript */ - interimTranscript: string - /** Whether currently listening */ - isListening: boolean - /** Whether browser supports Speech API */ - isSupported: boolean - /** Current microphone permission state */ - permissionState: MicPermissionState - /** Current error (or null) */ - error: SpeechError | null - - // Actions - /** Start listening (async for permission handling) */ - start: () => Promise - /** Stop listening gracefully */ - stop: () => void - /** Toggle listening state */ - toggle: () => Promise - /** Abort immediately (no final result) */ - abort: () => void - /** Clear transcript */ - clear: () => void - /** Explicitly request microphone permission */ - requestPermission: () => Promise -} -``` - ---- - -## 1.3 Browser Detection & Normalization - -### Research Findings: Browser Compatibility - -| Browser | Support | Notes | -|---------|---------|-------| -| **Chrome** | ✅ Full | Standard API, uses Google servers | -| **Edge** | ⚠️ Partial | API present but often non-functional | -| **Safari 14.1+** | ⚠️ Partial | Requires `webkit` prefix, needs Siri enabled | -| **Firefox** | ❌ None | Not implemented | -| **Chrome Android** | ⚠️ Partial | Works but inconsistent | -| **iOS Safari** | ⚠️ Partial | Requires Siri, webkit prefix | - -### src/core/browser.ts - -```typescript -import type { BrowserCapabilities, SpeechRecognitionInstance } from '../types' - -/** - * Cached capabilities to avoid repeated detection - */ -let cachedCapabilities: BrowserCapabilities | null = null - -/** - * Detect browser capabilities for Web Speech API - * Results are cached for performance - * - * @returns Browser capabilities object - */ -export function detectBrowserCapabilities(): BrowserCapabilities { - // Return cached result if available - if (cachedCapabilities) { - return cachedCapabilities - } - - // SSR-safe check - if (typeof window === 'undefined') { - cachedCapabilities = { - isSupported: false, - SpeechRecognition: null, - needsWebkitPrefix: false, - supportsPermissionsAPI: false, - browserName: 'other', - } - return cachedCapabilities - } - - // Get the SpeechRecognition constructor - const SpeechRecognition = ( - window.SpeechRecognition || window.webkitSpeechRecognition - ) as (new () => SpeechRecognitionInstance) | undefined - - // Check Permissions API support for microphone - const supportsPermissionsAPI = - 'permissions' in navigator && typeof navigator.permissions?.query === 'function' - - // Detect browser from User Agent - const browserName = detectBrowserName() - - cachedCapabilities = { - isSupported: !!SpeechRecognition, - SpeechRecognition: SpeechRecognition ?? null, - needsWebkitPrefix: - !!window.webkitSpeechRecognition && !window.SpeechRecognition, - supportsPermissionsAPI, - browserName, - } - - return cachedCapabilities -} - -/** - * Detect browser name from User Agent - * Used for compatibility warnings and workarounds - */ -function detectBrowserName(): BrowserCapabilities['browserName'] { - if (typeof navigator === 'undefined') return 'other' - - const ua = navigator.userAgent.toLowerCase() - - // Order matters: Edge contains 'chrome', Safari check excludes chrome - if (ua.includes('edg/') || ua.includes('edge/')) return 'edge' - if (ua.includes('chrome') && !ua.includes('edg')) return 'chrome' - if (ua.includes('safari') && !ua.includes('chrome')) return 'safari' - if (ua.includes('firefox')) return 'firefox' - - return 'other' -} - -/** - * Clear the cached capabilities (useful for testing) - */ -export function clearCapabilitiesCache(): void { - cachedCapabilities = null -} - -/** - * Check if the current browser has known issues with Speech API - * @returns Warning message if issues exist, null otherwise - */ -export function getBrowserCompatibilityWarning(): string | null { - const { browserName, isSupported } = detectBrowserCapabilities() - - if (!isSupported) { - return 'Web Speech API is not supported in this browser. Please use Chrome, Edge, or Safari.' - } - - switch (browserName) { - case 'edge': - return 'Microsoft Edge has known issues with Web Speech API. Results may be unreliable.' - case 'safari': - return 'Safari requires Siri to be enabled for speech recognition to work.' - case 'firefox': - return 'Firefox does not support the Web Speech API.' - default: - return null - } -} -``` - ---- - -## 1.4 Permission Management - -### Research Findings: Permissions API - -- Permissions API supports `'microphone'` query in Chrome/Edge -- Safari and Firefox throw `TypeError` for microphone query -- Permission state syncs when user changes in browser settings -- Should never request permission on page load (bad UX) - -### src/core/permissions.ts - -```typescript -import type { MicPermissionState } from '../types' -import { detectBrowserCapabilities } from './browser' - -/** - * Get the current microphone permission state - * Uses Permissions API when available, falls back to 'prompt' - * - * @returns Promise resolving to permission state - */ -export async function getMicPermissionState(): Promise { - const { isSupported, supportsPermissionsAPI } = detectBrowserCapabilities() - - // If Speech API isn't supported, mic permission is irrelevant - if (!isSupported) { - return 'unsupported' - } - - // Try Permissions API first - if (supportsPermissionsAPI) { - try { - const result = await navigator.permissions.query({ - name: 'microphone' as PermissionName, - }) - return result.state as MicPermissionState - } catch { - // Firefox and Safari throw TypeError for 'microphone' - // Fall through to default - } - } - - // Default to 'prompt' - actual state will be known when user tries to start - return 'prompt' -} - -/** - * Subscribe to permission state changes - * Returns null if Permissions API doesn't support change events - * - * @param callback - Called when permission state changes - * @returns Cleanup function, or null if not supported - */ -export function subscribeToPermissionChanges( - callback: (state: MicPermissionState) => void -): (() => void) | null { - const { supportsPermissionsAPI } = detectBrowserCapabilities() - - if (!supportsPermissionsAPI) { - return null - } - - let permissionStatus: PermissionStatus | null = null - let changeHandler: (() => void) | null = null - - navigator.permissions - .query({ name: 'microphone' as PermissionName }) - .then((status) => { - permissionStatus = status - changeHandler = () => { - callback(status.state as MicPermissionState) - } - status.addEventListener('change', changeHandler) - }) - .catch(() => { - // Fail silently - not all browsers support this - }) - - // Return cleanup function - return () => { - if (permissionStatus && changeHandler) { - permissionStatus.removeEventListener('change', changeHandler) - } - } -} - -/** - * Request microphone permission by attempting to access the device - * This triggers the browser's permission prompt - * - * @returns Promise resolving to the new permission state - */ -export async function requestMicPermission(): Promise { - const { isSupported } = detectBrowserCapabilities() - - if (!isSupported) { - return 'unsupported' - } - - try { - // Request access to the microphone - const stream = await navigator.mediaDevices.getUserMedia({ audio: true }) - - // Immediately stop all tracks - we just needed the permission - stream.getTracks().forEach((track) => track.stop()) - - return 'granted' - } catch (error) { - if (error instanceof DOMException) { - if (error.name === 'NotAllowedError' || error.name === 'PermissionDeniedError') { - return 'denied' - } - } - // Other errors (NotFoundError, etc.) - treat as denied for simplicity - return 'denied' - } -} -``` - ---- - -## 1.5 Recognition Engine - -### src/core/recognition.ts - -```typescript -import type { - SpeechRecognitionInstance, - RecognitionOptions, - RecognitionCallbacks, - SpeechError, - SpeechErrorType, -} from '../types' -import { detectBrowserCapabilities } from './browser' - -/** - * Error type mapping from native error strings to our types - */ -const ERROR_TYPE_MAP: Record = { - 'no-speech': 'no-speech', - aborted: 'aborted', - 'audio-capture': 'audio-capture', - network: 'network', - 'not-allowed': 'not-allowed', - 'service-not-allowed': 'service-not-allowed', - 'bad-grammar': 'bad-grammar', - 'language-not-supported': 'language-not-supported', -} - -/** - * Human-readable error messages - */ -const ERROR_MESSAGES: Record = { - 'no-speech': 'No speech was detected. Please try again.', - aborted: 'Speech recognition was aborted.', - 'audio-capture': 'No microphone was found or microphone access failed.', - network: 'Network error occurred during speech recognition.', - 'not-allowed': 'Microphone access was denied. Please allow microphone access.', - 'service-not-allowed': 'Speech recognition service is not allowed.', - 'bad-grammar': 'Grammar error in speech recognition.', - 'language-not-supported': 'The specified language is not supported.', - 'browser-not-supported': 'Speech recognition is not supported in this browser.', -} - -/** - * Create a configured SpeechRecognition instance - * - * @param options - Recognition configuration options - * @param callbacks - Event callbacks - * @returns Configured recognition instance, or null if not supported - */ -export function createRecognitionInstance( - options: RecognitionOptions, - callbacks: RecognitionCallbacks -): SpeechRecognitionInstance | null { - const { SpeechRecognition, isSupported } = detectBrowserCapabilities() - - if (!isSupported || !SpeechRecognition) { - return null - } - - const recognition = new SpeechRecognition() - - // Configure instance - recognition.lang = options.lang ?? navigator.language ?? 'en-US' - recognition.continuous = options.continuous ?? false - recognition.interimResults = options.interimResults ?? true - recognition.maxAlternatives = Math.min(Math.max(options.maxAlternatives ?? 1, 1), 5) - - // Wire up event handlers - recognition.onstart = () => { - callbacks.onStart() - } - - recognition.onend = () => { - callbacks.onEnd() - } - - recognition.onspeechstart = () => { - callbacks.onSpeechStart?.() - } - - recognition.onspeechend = () => { - callbacks.onSpeechEnd?.() - } - - recognition.onresult = (event: SpeechRecognitionEvent) => { - processRecognitionResult(event, callbacks.onResult) - } - - recognition.onerror = (event: SpeechRecognitionErrorEvent) => { - const error = createSpeechError(event.error, event) - callbacks.onError(error) - } - - recognition.onnomatch = () => { - // Treat nomatch as no-speech for simplicity - const error = createSpeechError('no-speech') - callbacks.onError(error) - } - - return recognition -} - -/** - * Process recognition result event and extract transcripts - */ -function processRecognitionResult( - event: SpeechRecognitionEvent, - onResult: RecognitionCallbacks['onResult'] -): void { - let finalTranscript = '' - let interimTranscript = '' - - // Process results starting from the new result index - for (let i = event.resultIndex; i < event.results.length; i++) { - const result = event.results[i] - const transcript = result[0].transcript - - if (result.isFinal) { - finalTranscript += transcript - } else { - interimTranscript += transcript - } - } - - // Emit final results first (they're complete) - if (finalTranscript) { - onResult(finalTranscript.trim(), true) - } - - // Then emit interim results - if (interimTranscript) { - onResult(interimTranscript.trim(), false) - } -} - -/** - * Create a structured SpeechError object - */ -function createSpeechError( - errorCode: string, - originalEvent?: Event -): SpeechError { - const type = ERROR_TYPE_MAP[errorCode] ?? 'network' - return { - type, - message: ERROR_MESSAGES[type], - originalError: originalEvent, - } -} - -/** - * Get error message for a speech error type - */ -export function getErrorMessage(type: SpeechErrorType): string { - return ERROR_MESSAGES[type] -} - -/** - * Map native error code to SpeechErrorType - */ -export function mapErrorType(errorCode: string): SpeechErrorType { - return ERROR_TYPE_MAP[errorCode] ?? 'network' -} -``` - ---- - -## 1.6 Testing Strategy - -### Testing Tool: Corti - -**Corti** is a mock library that replaces the browser's `SpeechRecognition` with a testable implementation. It provides a `say()` method to simulate speech input. - -```bash -yarn add -D corti -``` - -### src/__tests__/browser.test.ts - -```typescript -import { describe, it, expect, beforeEach, vi } from 'vitest' -import { - detectBrowserCapabilities, - clearCapabilitiesCache, - getBrowserCompatibilityWarning, -} from '../core/browser' - -describe('browser detection', () => { - beforeEach(() => { - clearCapabilitiesCache() - }) - - describe('detectBrowserCapabilities', () => { - it('returns isSupported: false when SpeechRecognition is not available', () => { - // happy-dom doesn't have SpeechRecognition by default - const capabilities = detectBrowserCapabilities() - expect(capabilities.isSupported).toBe(false) - expect(capabilities.SpeechRecognition).toBeNull() - }) - - it('returns isSupported: true when SpeechRecognition is available', () => { - // Mock SpeechRecognition - const mockSpeechRecognition = vi.fn() - vi.stubGlobal('SpeechRecognition', mockSpeechRecognition) - - clearCapabilitiesCache() - const capabilities = detectBrowserCapabilities() - - expect(capabilities.isSupported).toBe(true) - expect(capabilities.SpeechRecognition).toBe(mockSpeechRecognition) - - vi.unstubAllGlobals() - }) - - it('detects webkit prefix', () => { - const mockWebkitSpeechRecognition = vi.fn() - vi.stubGlobal('webkitSpeechRecognition', mockWebkitSpeechRecognition) - - clearCapabilitiesCache() - const capabilities = detectBrowserCapabilities() - - expect(capabilities.isSupported).toBe(true) - expect(capabilities.needsWebkitPrefix).toBe(true) - - vi.unstubAllGlobals() - }) - - it('caches results', () => { - const first = detectBrowserCapabilities() - const second = detectBrowserCapabilities() - expect(first).toBe(second) - }) - }) - - describe('getBrowserCompatibilityWarning', () => { - it('returns warning when Speech API is not supported', () => { - const warning = getBrowserCompatibilityWarning() - expect(warning).toContain('not supported') - }) - }) -}) -``` - -### src/__tests__/permissions.test.ts - -```typescript -import { describe, it, expect, vi, beforeEach } from 'vitest' -import { - getMicPermissionState, - requestMicPermission, -} from '../core/permissions' -import { clearCapabilitiesCache } from '../core/browser' - -describe('permissions', () => { - beforeEach(() => { - clearCapabilitiesCache() - vi.restoreAllMocks() - }) - - describe('getMicPermissionState', () => { - it('returns "unsupported" when Speech API is not available', async () => { - const state = await getMicPermissionState() - expect(state).toBe('unsupported') - }) - - it('returns "prompt" when Permissions API fails', async () => { - // Mock SpeechRecognition to make isSupported true - vi.stubGlobal('SpeechRecognition', vi.fn()) - clearCapabilitiesCache() - - // Mock Permissions API to throw - vi.stubGlobal('navigator', { - permissions: { - query: vi.fn().mockRejectedValue(new TypeError('Not supported')), - }, - }) - - const state = await getMicPermissionState() - expect(state).toBe('prompt') - - vi.unstubAllGlobals() - }) - }) - - describe('requestMicPermission', () => { - it('returns "unsupported" when Speech API is not available', async () => { - const state = await requestMicPermission() - expect(state).toBe('unsupported') - }) - }) -}) -``` - -### src/__tests__/recognition.test.ts - -```typescript -import { describe, it, expect, vi, beforeEach } from 'vitest' -import { createRecognitionInstance, getErrorMessage, mapErrorType } from '../core/recognition' -import { clearCapabilitiesCache } from '../core/browser' - -describe('recognition', () => { - beforeEach(() => { - clearCapabilitiesCache() - vi.restoreAllMocks() - }) - - describe('createRecognitionInstance', () => { - it('returns null when Speech API is not available', () => { - const instance = createRecognitionInstance( - {}, - { - onResult: vi.fn(), - onError: vi.fn(), - onStart: vi.fn(), - onEnd: vi.fn(), - } - ) - expect(instance).toBeNull() - }) - }) - - describe('mapErrorType', () => { - it('maps known error codes', () => { - expect(mapErrorType('no-speech')).toBe('no-speech') - expect(mapErrorType('not-allowed')).toBe('not-allowed') - expect(mapErrorType('audio-capture')).toBe('audio-capture') - }) - - it('returns "network" for unknown errors', () => { - expect(mapErrorType('unknown-error')).toBe('network') - }) - }) - - describe('getErrorMessage', () => { - it('returns human-readable messages', () => { - expect(getErrorMessage('not-allowed')).toContain('denied') - expect(getErrorMessage('no-speech')).toContain('No speech') - }) - }) -}) -``` - ---- - -## 1.7 Main Export - -### src/index.ts - -```typescript -// Types -export type { - // Core types - SpeechRecognitionInstance, - MicPermissionState, - SpeechError, - SpeechErrorType, - BrowserCapabilities, - RecognitionOptions, - RecognitionCallbacks, - // Hook types (for Phase 2) - UseSpeechInputOptions, - UseSpeechInputReturn, -} from './types' - -// Browser detection -export { - detectBrowserCapabilities, - clearCapabilitiesCache, - getBrowserCompatibilityWarning, -} from './core/browser' - -// Permissions -export { - getMicPermissionState, - subscribeToPermissionChanges, - requestMicPermission, -} from './core/permissions' - -// Recognition engine -export { - createRecognitionInstance, - getErrorMessage, - mapErrorType, -} from './core/recognition' - -// Version -export const VERSION = '0.0.1' -``` - ---- - -## 1.8 Phase 1 Deliverables Checklist - -| Deliverable | Status | File(s) | -|-------------|--------|---------| -| Type definitions | ⬜ | `src/types/index.ts` | -| Browser detection | ⬜ | `src/core/browser.ts` | -| Permission management | ⬜ | `src/core/permissions.ts` | -| Recognition engine | ⬜ | `src/core/recognition.ts` | -| Main exports | ⬜ | `src/index.ts` | -| Browser detection tests | ⬜ | `src/__tests__/browser.test.ts` | -| Permission tests | ⬜ | `src/__tests__/permissions.test.ts` | -| Recognition tests | ⬜ | `src/__tests__/recognition.test.ts` | - ---- - -## Summary - -Phase 1 provides the core building blocks: - -1. **Comprehensive Types** — Full TypeScript definitions for Web Speech API, errors, permissions, and hook interfaces -2. **Browser Detection** — SSR-safe detection with caching, webkit prefix handling, and compatibility warnings -3. **Permission Management** — Permissions API with fallbacks, change subscriptions, and explicit permission request -4. **Recognition Engine** — Factory function, event processing, error mapping with human-readable messages -5. **Testing** — Mock strategies using Corti, unit tests for all core modules - -Phase 2 will build React hooks on top of this foundation. diff --git a/plans/phase2.md b/plans/phase2.md deleted file mode 100644 index fa0e969..0000000 --- a/plans/phase2.md +++ /dev/null @@ -1,877 +0,0 @@ -# Phase 2: Primary Hook Implementation - -> **Goal:** Build the `useSpeechInput` hook with all core functionality including permission management, silence timeout, and SSR safety. - -**Estimated Time:** 2-3 days - ---- - -## Overview - -Phase 2 builds React hooks on top of the Phase 1 core modules: -- `useSpeechInput` — Primary hook for speech-to-text -- SSR safety utilities -- Comprehensive testing with @testing-library/react - -> [!IMPORTANT] -> This phase focuses on the hook API design. Cursor insertion comes in Phase 3. - ---- - -## 2.1 File Structure - -``` -src/ -├── hooks/ -│ ├── useSpeechInput.ts # Primary hook -│ ├── useIsSSR.ts # SSR detection utility -│ └── index.ts # Hook exports -└── __tests__/ - └── useSpeechInput.test.ts # Hook tests -``` - ---- - -## 2.2 Research: React Hook Best Practices - -### Hook Design Patterns - -| Pattern | Description | Usage | -|---------|-------------|-------| -| **Stable References** | Use `useCallback` for returned functions | Prevents unnecessary re-renders | -| **Refs for Mutable** | Use `useRef` for mutable values that don't trigger re-renders | Recognition instance, timeouts | -| **Effect Cleanup** | Always return cleanup from `useEffect` | Prevent memory leaks | -| **SSR Safety** | Check `typeof window` or use `useSyncExternalStore` | Next.js compatibility | - -### useSyncExternalStore for SSR - -```typescript -import { useSyncExternalStore } from 'react' - -// Detect SSR vs client -const subscribe = () => () => {} -const getSnapshot = () => false -const getServerSnapshot = () => true - -export function useIsSSR(): boolean { - return useSyncExternalStore(subscribe, getSnapshot, getServerSnapshot) -} -``` - -### Testing React Hooks - -From @testing-library/react best practices: - -1. **`renderHook`** — Test hooks in isolation -2. **`act`** — Wrap state changes and async operations -3. **`result.current`** — Access LIVE hook values (don't destructure early) -4. **`rerender`** — Test prop changes -5. **`waitFor`** — Test async state updates - ---- - -## 2.3 Hook Implementation - -### src/hooks/useIsSSR.ts - -```typescript -import { useSyncExternalStore } from 'react' - -/** - * Subscribe function that does nothing (client is always "subscribed") - */ -const subscribe = (): (() => void) => () => {} - -/** - * Client snapshot: we're NOT on the server - */ -const getSnapshot = (): boolean => false - -/** - * Server snapshot: we ARE on the server - */ -const getServerSnapshot = (): boolean => true - -/** - * Hook to detect if we're rendering on the server - * Uses useSyncExternalStore for proper React 18 SSR support - */ -export function useIsSSR(): boolean { - return useSyncExternalStore(subscribe, getSnapshot, getServerSnapshot) -} -``` - -### src/hooks/useSpeechInput.ts - -```typescript -import { - useState, - useRef, - useCallback, - useEffect, - useSyncExternalStore, -} from 'react' -import type { - UseSpeechInputOptions, - UseSpeechInputReturn, - MicPermissionState, - SpeechError, - SpeechRecognitionInstance, -} from '../types' -import { detectBrowserCapabilities } from '../core/browser' -import { - getMicPermissionState, - subscribeToPermissionChanges, - requestMicPermission, -} from '../core/permissions' -import { createRecognitionInstance } from '../core/recognition' - -/** - * Primary hook for speech-to-text functionality - * - * @param options - Configuration options - * @returns Speech input state and actions - * - * @example - * ```tsx - * const { transcript, isListening, start, stop } = useSpeechInput({ - * lang: 'en-US', - * continuous: false, - * silenceTimeout: 3000, - * }) - * ``` - */ -export function useSpeechInput( - options: UseSpeechInputOptions = {} -): UseSpeechInputReturn { - const { - lang, - continuous = false, - interimResults = true, - maxAlternatives = 1, - silenceTimeout = 3000, - autoRestart = false, - onResult, - onError, - onStart, - onEnd, - } = options - - // ============================================================================ - // State - // ============================================================================ - - const [transcript, setTranscript] = useState('') - const [interimTranscript, setInterimTranscript] = useState('') - const [isListening, setIsListening] = useState(false) - const [error, setError] = useState(null) - const [permissionState, setPermissionState] = - useState('prompt') - - // ============================================================================ - // Refs (mutable values that don't trigger re-renders) - // ============================================================================ - - const recognitionRef = useRef(null) - const silenceTimeoutRef = useRef | null>(null) - const isStartingRef = useRef(false) // Guard against React 18 Strict Mode double-mount - const shouldRestartRef = useRef(false) // Track if we should auto-restart - - // ============================================================================ - // Browser Capabilities (computed once, cached) - // ============================================================================ - - const capabilitiesRef = useRef(detectBrowserCapabilities()) - const isSupported = capabilitiesRef.current.isSupported - - // ============================================================================ - // Permission State Sync - // ============================================================================ - - useEffect(() => { - // Get initial permission state - getMicPermissionState().then(setPermissionState) - - // Subscribe to permission changes - const unsubscribe = subscribeToPermissionChanges(setPermissionState) - - return () => { - unsubscribe?.() - } - }, []) - - // ============================================================================ - // Silence Timeout Handler - // ============================================================================ - - const clearSilenceTimeout = useCallback(() => { - if (silenceTimeoutRef.current) { - clearTimeout(silenceTimeoutRef.current) - silenceTimeoutRef.current = null - } - }, []) - - const resetSilenceTimeout = useCallback(() => { - clearSilenceTimeout() - - if (silenceTimeout > 0 && isListening) { - silenceTimeoutRef.current = setTimeout(() => { - recognitionRef.current?.stop() - }, silenceTimeout) - } - }, [silenceTimeout, isListening, clearSilenceTimeout]) - - // Clear timeout when not listening - useEffect(() => { - if (!isListening) { - clearSilenceTimeout() - } - }, [isListening, clearSilenceTimeout]) - - // ============================================================================ - // Recognition Instance Management - // ============================================================================ - - const createRecognition = useCallback(() => { - // Abort any existing instance - if (recognitionRef.current) { - recognitionRef.current.abort() - recognitionRef.current = null - } - - recognitionRef.current = createRecognitionInstance( - { lang, continuous, interimResults, maxAlternatives }, - { - onResult: (text, isFinal) => { - if (isFinal) { - setTranscript((prev) => (prev ? prev + ' ' + text : text)) - setInterimTranscript('') - } else { - setInterimTranscript(text) - } - onResult?.(text, isFinal) - resetSilenceTimeout() - }, - onError: (err) => { - setError(err) - setIsListening(false) - onError?.(err) - - // Update permission state on denied - if (err.type === 'not-allowed') { - setPermissionState('denied') - } - - // Track if we should auto-restart on network errors - if (autoRestart && err.type === 'network') { - shouldRestartRef.current = true - } - }, - onStart: () => { - setIsListening(true) - setError(null) - shouldRestartRef.current = false - resetSilenceTimeout() - onStart?.() - }, - onEnd: () => { - setIsListening(false) - setInterimTranscript('') - clearSilenceTimeout() - onEnd?.() - - // Auto-restart if flagged - if (shouldRestartRef.current && autoRestart) { - shouldRestartRef.current = false - setTimeout(() => { - recognitionRef.current?.start() - }, 500) - } - }, - onSpeechStart: () => { - resetSilenceTimeout() - }, - onSpeechEnd: () => { - resetSilenceTimeout() - }, - } - ) - }, [ - lang, - continuous, - interimResults, - maxAlternatives, - autoRestart, - onResult, - onError, - onStart, - onEnd, - resetSilenceTimeout, - clearSilenceTimeout, - ]) - - // ============================================================================ - // Actions - // ============================================================================ - - const start = useCallback(async (): Promise => { - if (!isSupported) { - setError({ - type: 'browser-not-supported', - message: 'Speech recognition is not supported in this browser.', - }) - return - } - - // Guard against double-start (React 18 Strict Mode) - if (isStartingRef.current || isListening) { - return - } - isStartingRef.current = true - - try { - createRecognition() - recognitionRef.current?.start() - setPermissionState('granted') - } catch (e) { - // Handle "already started" race condition - if (e instanceof DOMException && e.name === 'InvalidStateError') { - // Already running, ignore - } else { - throw e - } - } finally { - isStartingRef.current = false - } - }, [isSupported, isListening, createRecognition]) - - const stop = useCallback((): void => { - shouldRestartRef.current = false // Prevent auto-restart - recognitionRef.current?.stop() - }, []) - - const abort = useCallback((): void => { - shouldRestartRef.current = false - recognitionRef.current?.abort() - setIsListening(false) - setInterimTranscript('') - }, []) - - const toggle = useCallback(async (): Promise => { - if (isListening) { - stop() - } else { - await start() - } - }, [isListening, start, stop]) - - const clear = useCallback((): void => { - setTranscript('') - setInterimTranscript('') - setError(null) - }, []) - - const requestPermissionAction = useCallback(async (): Promise => { - const state = await requestMicPermission() - setPermissionState(state) - return state - }, []) - - // ============================================================================ - // Cleanup on Unmount - // ============================================================================ - - useEffect(() => { - return () => { - // Use abort() for faster cleanup than stop() - recognitionRef.current?.abort() - clearSilenceTimeout() - } - }, [clearSilenceTimeout]) - - // ============================================================================ - // Return - // ============================================================================ - - return { - // State - transcript, - interimTranscript, - isListening, - isSupported, - permissionState, - error, - - // Actions - start, - stop, - toggle, - abort, - clear, - requestPermission: requestPermissionAction, - } -} -``` - -### src/hooks/index.ts - -```typescript -// Public exports -export { useSpeechInput } from './useSpeechInput' - -// Note: useIsSSR is internal-only, not exported to users -// Users should use framework-specific SSR handling (next/dynamic, 'use client', etc.) -``` - ---- - -## 2.4 Updated Main Export - -### src/index.ts (additions) - -```typescript -// ... existing exports ... - -// Hooks -export { useSpeechInput } from './hooks' -``` - ---- - -## 2.5 Testing Strategy - -### Testing with @testing-library/react - -```bash -# Already installed in Phase 0 -yarn add -D @testing-library/react -``` - -### src/__tests__/useSpeechInput.test.ts - -```typescript -import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest' -import { renderHook, act, waitFor } from '@testing-library/react' -import { useSpeechInput } from '../hooks/useSpeechInput' -import { clearCapabilitiesCache } from '../core/browser' - -describe('useSpeechInput', () => { - beforeEach(() => { - clearCapabilitiesCache() - vi.useFakeTimers() - vi.unstubAllGlobals() - }) - - afterEach(() => { - vi.useRealTimers() - }) - - describe('initialization', () => { - it('returns isSupported: false when Speech API is not available', () => { - const { result } = renderHook(() => useSpeechInput()) - - expect(result.current.isSupported).toBe(false) - expect(result.current.isListening).toBe(false) - expect(result.current.transcript).toBe('') - }) - - it('returns isSupported: true when Speech API is available', () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - expect(result.current.isSupported).toBe(true) - }) - }) - - describe('start/stop', () => { - it('sets isListening to true when started', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - // Simulate onstart callback - mockInstance.onstart?.(new Event('start')) - }) - - expect(result.current.isListening).toBe(true) - }) - - it('sets isListening to false when stopped', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - act(() => { - result.current.stop() - mockInstance.onend?.(new Event('end')) - }) - - expect(result.current.isListening).toBe(false) - }) - }) - - describe('transcript handling', () => { - it('accumulates final transcripts', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - // Simulate recognition results - act(() => { - mockInstance.onresult?.(createMockResultEvent('Hello', true)) - }) - - expect(result.current.transcript).toBe('Hello') - - act(() => { - mockInstance.onresult?.(createMockResultEvent('World', true)) - }) - - expect(result.current.transcript).toBe('Hello World') - }) - - it('sets interim transcript for partial results', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - act(() => { - mockInstance.onresult?.(createMockResultEvent('Hel', false)) - }) - - expect(result.current.interimTranscript).toBe('Hel') - expect(result.current.transcript).toBe('') - }) - }) - - describe('clear', () => { - it('clears transcript and error', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - mockInstance.onresult?.(createMockResultEvent('Hello', true)) - }) - - expect(result.current.transcript).toBe('Hello') - - act(() => { - result.current.clear() - }) - - expect(result.current.transcript).toBe('') - expect(result.current.interimTranscript).toBe('') - }) - }) - - describe('toggle', () => { - it('toggles listening state', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - // Toggle on - await act(async () => { - await result.current.toggle() - mockInstance.onstart?.(new Event('start')) - }) - expect(result.current.isListening).toBe(true) - - // Toggle off - act(() => { - result.current.toggle() - mockInstance.onend?.(new Event('end')) - }) - expect(result.current.isListening).toBe(false) - }) - }) - - describe('silenceTimeout', () => { - it('stops recognition after silence timeout', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => - useSpeechInput({ silenceTimeout: 3000 }) - ) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - // Fast-forward past silence timeout - act(() => { - vi.advanceTimersByTime(3000) - }) - - expect(mockInstance.stop).toHaveBeenCalled() - }) - }) - - describe('error handling', () => { - it('sets error state on recognition error', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const { result } = renderHook(() => useSpeechInput()) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - act(() => { - mockInstance.onerror?.({ - error: 'not-allowed', - message: 'Permission denied', - } as SpeechRecognitionErrorEvent) - }) - - expect(result.current.error?.type).toBe('not-allowed') - expect(result.current.permissionState).toBe('denied') - }) - }) - - describe('callbacks', () => { - it('calls onResult callback with transcript', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const onResult = vi.fn() - const { result } = renderHook(() => useSpeechInput({ onResult })) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - mockInstance.onresult?.(createMockResultEvent('Hello', true)) - }) - - expect(onResult).toHaveBeenCalledWith('Hello', true) - }) - - it('calls onStart and onEnd callbacks', async () => { - const mockInstance = createMockRecognitionInstance() - vi.stubGlobal('SpeechRecognition', vi.fn(() => mockInstance)) - vi.stubGlobal('navigator', { userAgent: 'Chrome/120' }) - clearCapabilitiesCache() - - const onStart = vi.fn() - const onEnd = vi.fn() - const { result } = renderHook(() => - useSpeechInput({ onStart, onEnd }) - ) - - await act(async () => { - await result.current.start() - mockInstance.onstart?.(new Event('start')) - }) - - expect(onStart).toHaveBeenCalled() - - act(() => { - result.current.stop() - mockInstance.onend?.(new Event('end')) - }) - - expect(onEnd).toHaveBeenCalled() - }) - }) -}) - -// ============================================================================ -// Test Helpers -// ============================================================================ - -function createMockRecognitionInstance() { - return { - lang: '', - continuous: false, - interimResults: true, - maxAlternatives: 1, - start: vi.fn(), - stop: vi.fn(), - abort: vi.fn(), - onstart: null as ((e: Event) => void) | null, - onend: null as ((e: Event) => void) | null, - onerror: null as ((e: SpeechRecognitionErrorEvent) => void) | null, - onresult: null as ((e: SpeechRecognitionEvent) => void) | null, - onspeechstart: null as ((e: Event) => void) | null, - onspeechend: null as ((e: Event) => void) | null, - onnomatch: null as ((e: Event) => void) | null, - } -} - -function createMockResultEvent( - transcript: string, - isFinal: boolean -): SpeechRecognitionEvent { - return { - resultIndex: 0, - results: { - length: 1, - item: () => ({ - length: 1, - isFinal, - item: () => ({ transcript, confidence: 0.9 }), - 0: { transcript, confidence: 0.9 }, - }), - 0: { - length: 1, - isFinal, - item: () => ({ transcript, confidence: 0.9 }), - 0: { transcript, confidence: 0.9 }, - }, - }, - } as unknown as SpeechRecognitionEvent -} -``` - ---- - -## 2.6 Phase 2 Deliverables Checklist - -| Deliverable | Status | File(s) | -|-------------|--------|---------| -| useSpeechInput hook | ⬜ | `src/hooks/useSpeechInput.ts` | -| useIsSSR utility (internal) | ⬜ | `src/hooks/useIsSSR.ts` | -| Hook exports | ⬜ | `src/hooks/index.ts` | -| Updated main exports | ⬜ | `src/index.ts` | -| Hook tests | ⬜ | `src/__tests__/useSpeechInput.test.ts` | - ---- - -## 2.7 Key Implementation Details - -### React 18 Strict Mode Guard - -React 18 double-mounts components in development, which can cause issues with `recognition.start()`. We use a ref guard: - -```typescript -const isStartingRef = useRef(false) - -const start = useCallback(async () => { - if (isStartingRef.current || isListening) return - isStartingRef.current = true - - try { - // ... start logic - } finally { - isStartingRef.current = false - } -}, [isListening]) -``` - -### Cleanup with abort() - -On unmount, we use `abort()` instead of `stop()` because it's faster and doesn't wait for final results: - -```typescript -useEffect(() => { - return () => { - recognitionRef.current?.abort() - clearSilenceTimeout() - } -}, [clearSilenceTimeout]) -``` - -### Transcript Accumulation - -Final transcripts are accumulated with space separation: - -```typescript -setTranscript((prev) => (prev ? prev + ' ' + text : text)) -``` - ---- - -## Verification Plan - -### Automated Tests - -```bash -# Run all tests -yarn test run - -# Run hook tests only -yarn test run src/__tests__/useSpeechInput.test.ts - -# Run with coverage -yarn test:coverage -``` - -### Type Check - -```bash -yarn typecheck -``` - -### Build Verification - -```bash -yarn build -``` - ---- - -## Summary - -Phase 2 delivers the production-ready `useSpeechInput` hook with: - -1. **Full State Management** — transcript, interimTranscript, isListening, error, permissionState -2. **All Actions** — start, stop, toggle, abort, clear, requestPermission -3. **Silence Timeout** — Auto-stop after configurable silence period -4. **Auto-Restart** — Optional restart on network errors -5. **SSR Safety** — useIsSSR utility for server rendering -6. **React 18 Ready** — Strict Mode compatible with guards -7. **Comprehensive Tests** — All states and transitions tested - -Phase 3 will add cursor-aware text insertion functionality. diff --git a/plans/phase3.md b/plans/phase3.md deleted file mode 100644 index 4ed4172..0000000 --- a/plans/phase3.md +++ /dev/null @@ -1,863 +0,0 @@ -# Phase 3: Cursor Insertion Utility - -> **Goal:** Enable inserting transcribed text at the cursor position in text inputs/textareas. - -**Estimated Time:** 1-2 days - ---- - -## Overview - -Phase 3 adds cursor-aware text insertion capabilities: -- Cursor position utilities for `` and `