ClipIt is an AI-powered Openclaw Skill powered by Elevenlabs APIs that allows you to find, cut, and process audio or video segments using natural language. Instead of manually searching through timelines, you can simply describe the part you want to extract.
- Semantic Clipping: Find segments using natural language queries (e.g., "Find the part where they talk about the budget").
- Automatic Transcription: Powered by ElevenLabs Scribe with word-level precision.
- AI Intelligence: Uses OpenAI GPT-4o-mini to intelligently identify start/end points and maintain sentence integrity.
- Audio Isolation: Remove background noise and clean up vocals using ElevenLabs Audio Isolation.
- Instant Dubbing: Translate your clips into 29+ languages while maintaining the original voice timing.
- YouTube Support: Process videos directly from YouTube URLs.
- Python 3.8+
- FFmpeg (installed on your system and in PATH)
- ElevenLabs API Key
- OpenAI API Key
-
Clone the repository:
git clone <repository-url> cd clip
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
export ELEVENLABS_API_KEY="your_key_here" export OPENAI_API_KEY="your_key_here"
ClipIt comes with a convenient wrapper script called clipper.
./bin/clipper --input "meeting.mp4" --query "discussion about the deadline"./bin/clipper --input "https://youtu.be/..." --query "the intro joke" --isolate./bin/clipper --input "tutorial.mp3" --query "how to install" --dub "hi"| Flag | Description |
|---|---|
--input, -i |
(Required) Path to local file or YouTube URL |
--query, -q |
(Required) Natural language description of the segment |
--output, -o |
Custom output filename |
--context, -c |
Padding in seconds to add to the start/end (e.g., 0.5) |
--isolate |
Flag to remove background noise |
--dub |
Language code for translation (e.g., es, fr, ja) |
- Transcribe: Extracts audio and gets word-level timestamps from ElevenLabs.
- Analyze: OpenAI identifies the best logical segment based on your query.
- Refine: Backtracks to sentence boundaries to ensure no "half-words" are caught.
- Cut: Uses FFmpeg to precisely trim the media.
- Post-Process: Optionally isolates or dubs the resulting clip.
Built with ❤️ using ElevenLabs and OpenAI.