Loom Transcript Scraper

A tool for automatically extracting transcripts from Loom videos using Selenium web automation.

Overview

This project provides a Python-based solution for scraping transcripts from Loom video URLs. It uses Selenium WebDriver to automate browser interactions, navigate to Loom video pages, and extract transcript content.

Features

Automated browser navigation to Loom video pages
Transcript extraction from Loom's internal data structure
Robust error handling with detailed debugging
Support for processing multiple videos in batch
LLM transcript processing capabilities for AI model ingestion

Prerequisites

Python 3.x
Chrome or Firefox web browser
Selenium WebDriver
Required Python packages (see Installation)

Installation

Clone this repository:

git clone https://github.com/workingpleasewait/loom-transcript-scraper.git
cd loom-transcript-scraper

Install required packages:

pip3 install selenium webdriver_manager

Ensure your browser driver is properly configured.

Usage

Add Loom video URLs to the loom-videos.txt file, one URL per line.
Run the processor script:
```
python3 process.py
```
Extracted transcripts will be saved in the data/ directory.

Project Structure

process.py - Main script for processing Loom videos and extracting transcripts
debug.py - Helper script with debug functionality
loom-videos.txt - Input file containing Loom video URLs to process
data/ - Directory where extracted transcripts are stored
debug_screenshots/ - Directory for browser screenshots (for debugging)
debug_output/ - Directory for HTML page sources and button information (for debugging)
integrated_solution.py - Integration script for LLM processing
process_llm_integration.py - Handles LLM transcript processing integration
process_transcripts_for_llm.py - Processes transcripts for LLM ingestion
README_LLM_INTEGRATION.md - Detailed guide for LLM integration
IMPLEMENTATION_GUIDE.md - Implementation guide for LLM processing

Debug Directories

This project includes two special directories for debugging purposes:

debug_screenshots/

This directory stores browser screenshots taken at various stages of the scraping process. These screenshots are valuable for troubleshooting when the script encounters issues with page rendering, element visibility, or automation steps.

The screenshots are named according to the processing stage and timestamp, allowing developers to trace the execution path visually.

debug_output/

This directory contains:

HTML page sources for problematic pages
JSON files with detailed information about buttons and interactive elements
Shadow DOM information and other page structural details

These files provide context and data needed to debug complex web scraping issues, particularly when the target elements are located within shadow DOM or are dynamically loaded.

LLM Transcript Processing

This project includes functionality to process Loom video transcripts for ingestion into Large Language Models (LLMs). The processing:

Formats transcripts in a standardized way for optimal LLM consumption
Cleans and structures text for better AI processing
Supports both integrated and standalone usage approaches

For detailed information about using the LLM transcript processing features, refer to:

README_LLM_INTEGRATION.md - Contains integration guidelines
IMPLEMENTATION_GUIDE.md - Provides detailed implementation steps

LLM Transcript Directory Organization

Transcripts prepared for LLM ingestion are organized as follows:

Raw transcripts are stored in the configured download directory
LLM-ready transcripts are stored in the llm_ready_transcripts/ directory
Each LLM-ready file is named with the video title and ID, with an '_llm' suffix
The directory structure preserves the relationship between original videos and their processed transcripts

Troubleshooting

If the script fails to extract a transcript:

Check the debug_screenshots directory for visual references of the browser state
Examine the debug_output directory for page structure information
Ensure the Loom video has a transcript (not all videos do)
Verify your browser driver is up to date

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loom Transcript Scraper

Overview

Features

Prerequisites

Installation

Usage

Project Structure

Debug Directories

debug_screenshots/

debug_output/

LLM Transcript Processing

LLM Transcript Directory Organization

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
llm_ready_transcripts		llm_ready_transcripts
.gitignore		.gitignore
IMPLEMENTATION_GUIDE.md		IMPLEMENTATION_GUIDE.md
README.md		README.md
README_LLM_INTEGRATION.md		README_LLM_INTEGRATION.md
claude-test.md		claude-test.md
debug.py		debug.py
integrated_solution.py		integrated_solution.py
loom-videos-test.txt		loom-videos-test.txt
process.py		process.py
process_llm_integration.py		process_llm_integration.py
process_logs.py		process_logs.py
process_transcripts_for_llm.py		process_transcripts_for_llm.py

workingpleasewait/loom-transcript-scraper

Folders and files

Latest commit

History

Repository files navigation

Loom Transcript Scraper

Overview

Features

Prerequisites

Installation

Usage

Project Structure

Debug Directories

debug_screenshots/

debug_output/

LLM Transcript Processing

LLM Transcript Directory Organization

Troubleshooting

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages