Green Needle - Audio Transcription System

High-quality local audio transcription using OpenAI Whisper

🎯 Overview

Green Needle is a professional-grade audio transcription system designed for long-form content creators, researchers, and anyone who needs to convert hours of spoken audio into text. Built on OpenAI's Whisper model, it provides accurate, local transcription without sending your data to external servers.

Key Features

🎙️ Long-form Recording Support: Record and transcribe hours of continuous audio
🔒 100% Local Processing: Your audio never leaves your machine
📝 Multiple Output Formats: Plain text, JSON, SRT subtitles, and more
🌍 Multi-language Support: Transcribe in 99+ languages
⚡ Batch Processing: Process multiple files efficiently
🔧 Flexible Configuration: Customize model size, output format, and processing options
📊 Progress Tracking: Real-time transcription progress with time estimates
🐳 Docker Support: Easy deployment with containerization

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/yourusername/green-needle.git
cd green-needle

# Install using pip
pip install -e .

# Or using the installation script
./scripts/install.sh

Basic Usage

# Transcribe a single audio file
green-needle transcribe audio.mp3

# Record and transcribe
green-needle record --duration 3600 --output transcript.txt

# Batch process multiple files
green-needle batch /path/to/audio/files --output-dir /path/to/transcripts

📋 Requirements

Python 3.8 or higher
FFmpeg (for audio processing)
4GB+ RAM (8GB+ recommended for larger models)
CUDA-capable GPU (optional, for faster processing)

🛠️ Installation

Method 1: Using pip (Recommended)

pip install green-needle

Method 2: From source

git clone https://github.com/yourusername/green-needle.git
cd green-needle
pip install -e .

Method 3: Using Docker

docker pull greenneedle/transcriber:latest
docker run -v /path/to/audio:/audio greenneedle/transcriber transcribe /audio/file.mp3

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

Windows: Download from FFmpeg official site

📖 Documentation

Command Line Interface

green-needle [command] [options]

Commands:
  transcribe    Transcribe audio file(s)
  record        Record audio and transcribe
  batch         Process multiple files
  config        Manage configuration
  models        List and download Whisper models

Options:
  --model       Whisper model size (tiny, base, small, medium, large)
  --language    Language code (auto-detect if not specified)
  --output      Output file path
  --format      Output format (txt, json, srt, vtt, all)
  --verbose     Enable detailed logging

Configuration

Create a config.yaml file:

whisper:
  model: base
  language: auto
  device: auto  # cuda, cpu, or auto
  
output:
  format: txt
  timestamps: false
  save_segments: true
  
audio:
  sample_rate: 16000
  channels: 1
  
processing:
  batch_size: 10
  num_workers: 4

Python API

from green_needle import Transcriber

# Initialize transcriber
transcriber = Transcriber(model="base", device="auto")

# Transcribe audio file
result = transcriber.transcribe("audio.mp3")
print(result.text)

# Save in multiple formats
result.save("output.txt", format="txt")
result.save("output.json", format="json")
result.save("output.srt", format="srt")

# Batch processing
results = transcriber.batch_transcribe([
    "audio1.mp3",
    "audio2.wav",
    "audio3.m4a"
])

🔧 Advanced Usage

Recording Long Audio Sessions

from green_needle import AudioRecorder, Transcriber

# Record for 2 hours
recorder = AudioRecorder()
audio_file = recorder.record(duration=7200, output="session.wav")

# Transcribe with progress callback
transcriber = Transcriber(model="base")
result = transcriber.transcribe(
    audio_file,
    progress_callback=lambda p: print(f"Progress: {p:.1f}%")
)

Custom Processing Pipeline

from green_needle import Pipeline, processors

# Create custom pipeline
pipeline = Pipeline([
    processors.NoiseReduction(),
    processors.VoiceActivityDetection(),
    processors.WhisperTranscription(model="base"),
    processors.TextPostProcessing(),
    processors.Summarization()  # Optional: summarize long transcripts
])

result = pipeline.process("long_audio.mp3")

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=green_needle

# Run specific test module
pytest tests/test_transcriber.py

🔍 Model Selection Guide

Model	Parameters	Required VRAM	Relative Speed	WER
tiny	39 M	~1 GB	~32x	17.4%
base	74 M	~1 GB	~16x	12.6%
small	244 M	~2 GB	~6x	9.5%
medium	769 M	~5 GB	~2x	7.4%
large	1550 M	~10 GB	1x	6.2%

Recommendations:

tiny/base: Good for quick drafts or when accuracy isn't critical
small: Best balance of speed and accuracy for most use cases
medium/large: When maximum accuracy is required

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
config		config
docs		docs
scripts		scripts
src/green_needle		src/green_needle
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
STATUS.md		STATUS.md
TESTING_REPORT.md		TESTING_REPORT.md
demo.py		demo.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
quickstart.py		quickstart.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
verify.py		verify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Green Needle - Audio Transcription System

High-quality local audio transcription using OpenAI Whisper

🎯 Overview

Key Features

🚀 Quick Start

Installation

Basic Usage

📋 Requirements

🛠️ Installation

Method 1: Using pip (Recommended)

Method 2: From source

Method 3: Using Docker

Installing FFmpeg

📖 Documentation

Command Line Interface

Configuration

Python API

🔧 Advanced Usage

Recording Long Audio Sessions

Custom Processing Pipeline

🧪 Testing

🔍 Model Selection Guide

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

KaiStephens/green-needle

Folders and files

Latest commit

History

Repository files navigation

Green Needle - Audio Transcription System

High-quality local audio transcription using OpenAI Whisper

🎯 Overview

Key Features

🚀 Quick Start

Installation

Basic Usage

📋 Requirements

🛠️ Installation

Method 1: Using pip (Recommended)

Method 2: From source

Method 3: Using Docker

Installing FFmpeg

📖 Documentation

Command Line Interface

Configuration

Python API

🔧 Advanced Usage

Recording Long Audio Sessions

Custom Processing Pipeline

🧪 Testing

🔍 Model Selection Guide

🤝 Contributing

📄 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages