Green Needle is a professional-grade audio transcription system designed for long-form content creators, researchers, and anyone who needs to convert hours of spoken audio into text. Built on OpenAI's Whisper model, it provides accurate, local transcription without sending your data to external servers.
- ποΈ Long-form Recording Support: Record and transcribe hours of continuous audio
- π 100% Local Processing: Your audio never leaves your machine
- π Multiple Output Formats: Plain text, JSON, SRT subtitles, and more
- π Multi-language Support: Transcribe in 99+ languages
- β‘ Batch Processing: Process multiple files efficiently
- π§ Flexible Configuration: Customize model size, output format, and processing options
- π Progress Tracking: Real-time transcription progress with time estimates
- π³ Docker Support: Easy deployment with containerization
# Clone the repository
git clone https://github.com/yourusername/green-needle.git
cd green-needle
# Install using pip
pip install -e .
# Or using the installation script
./scripts/install.sh# Transcribe a single audio file
green-needle transcribe audio.mp3
# Record and transcribe
green-needle record --duration 3600 --output transcript.txt
# Batch process multiple files
green-needle batch /path/to/audio/files --output-dir /path/to/transcripts- Python 3.8 or higher
- FFmpeg (for audio processing)
- 4GB+ RAM (8GB+ recommended for larger models)
- CUDA-capable GPU (optional, for faster processing)
pip install green-needlegit clone https://github.com/yourusername/green-needle.git
cd green-needle
pip install -e .docker pull greenneedle/transcriber:latest
docker run -v /path/to/audio:/audio greenneedle/transcriber transcribe /audio/file.mp3macOS:
brew install ffmpegUbuntu/Debian:
sudo apt update && sudo apt install ffmpegWindows: Download from FFmpeg official site
green-needle [command] [options]
Commands:
transcribe Transcribe audio file(s)
record Record audio and transcribe
batch Process multiple files
config Manage configuration
models List and download Whisper models
Options:
--model Whisper model size (tiny, base, small, medium, large)
--language Language code (auto-detect if not specified)
--output Output file path
--format Output format (txt, json, srt, vtt, all)
--verbose Enable detailed loggingCreate a config.yaml file:
whisper:
model: base
language: auto
device: auto # cuda, cpu, or auto
output:
format: txt
timestamps: false
save_segments: true
audio:
sample_rate: 16000
channels: 1
processing:
batch_size: 10
num_workers: 4from green_needle import Transcriber
# Initialize transcriber
transcriber = Transcriber(model="base", device="auto")
# Transcribe audio file
result = transcriber.transcribe("audio.mp3")
print(result.text)
# Save in multiple formats
result.save("output.txt", format="txt")
result.save("output.json", format="json")
result.save("output.srt", format="srt")
# Batch processing
results = transcriber.batch_transcribe([
"audio1.mp3",
"audio2.wav",
"audio3.m4a"
])from green_needle import AudioRecorder, Transcriber
# Record for 2 hours
recorder = AudioRecorder()
audio_file = recorder.record(duration=7200, output="session.wav")
# Transcribe with progress callback
transcriber = Transcriber(model="base")
result = transcriber.transcribe(
audio_file,
progress_callback=lambda p: print(f"Progress: {p:.1f}%")
)from green_needle import Pipeline, processors
# Create custom pipeline
pipeline = Pipeline([
processors.NoiseReduction(),
processors.VoiceActivityDetection(),
processors.WhisperTranscription(model="base"),
processors.TextPostProcessing(),
processors.Summarization() # Optional: summarize long transcripts
])
result = pipeline.process("long_audio.mp3")# Run all tests
pytest
# Run with coverage
pytest --cov=green_needle
# Run specific test module
pytest tests/test_transcriber.py| Model | Parameters | Required VRAM | Relative Speed | WER |
|---|---|---|---|---|
| tiny | 39 M | ~1 GB | ~32x | 17.4% |
| base | 74 M | ~1 GB | ~16x | 12.6% |
| small | 244 M | ~2 GB | ~6x | 9.5% |
| medium | 769 M | ~5 GB | ~2x | 7.4% |
| large | 1550 M | ~10 GB | 1x | 6.2% |
Recommendations:
- tiny/base: Good for quick drafts or when accuracy isn't critical
- small: Best balance of speed and accuracy for most use cases
- medium/large: When maximum accuracy is required
We welcome contributions! Please see our Contributing Guide for details.
# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature
# Commit your changes
git commit -m 'Add amazing feature'
# Push to the branch
git push origin feature/amazing-feature
# Open a Pull RequestThis project is licensed under the MIT License - see the LICENSE file for details.
