Add support for additional Transcription and Summarization models #8

silas-pusateri · 2025-02-20T03:43:44Z

In this PR I hope to update this repository and complete the project roadmap by adding support for additional models(incredibly-fast-whisper, locally hosted whisper) as well as additional export options (markdown, pdf, docx).

This included several pieces of work:

Updating requirements.txt to include missing backend dependencies, as well as dependencies supporting these additions
Adding additional structure to the prompt to ensure JSON response from Anthropic
Moving previously implemented Replicate/whisperX transcription and Anthropic summaries into separate modules
Adding support for incredibly-fast-whisper transcription (audio only)
Adding OpenAI support for summarization and clip generation
Adding support for local Whisper implementation (currently untested)

I've focused on a few core things:

Modifying as little as possible in the way of existing patterns/implementation
Adding as few dependencies as necessary
Promoting future extension via code re-use (BaseModels, Factories)

There are several things still to be done before I would consider this ready to merge:

Adding video support for incredibly-fast-whisper by stripping audio before upload
Adding additional export options
Testing local Whisper
More general testing across both the cli and webUI

As there are several things that still need to be done, I've opened this PR as a draft in order to hopefully get some feedback as I continue to work this week. As this is my first public PR, I'm open to suggestions as far as the direction of the work and how it fits into the vision of the project as a whole. Thanks!

- Update Anthropic API request format to ensure JSON response parsing - Improve error handling for AI response processing - Add dynamic ffmpeg path detection for media clip generation - Refactor content generation and topic extraction prompts - Implement more resilient JSON parsing for AI responses - Add error logging and exception handling for API interactions

…rization/clip generation into modular services to support additional models

…ing for openAI migrate

- Move OpenAI client initialization inside the method to avoid global state - Improve method formatting for better readability - Ensure client is created with each API call for more flexible configuration

…r support - Refactor transcription models to support multiple Replicate models (WhisperX and incredibly-fast-whisper) - Add a base Replicate model class with common API request and polling functionality - Implement LocalWhisperTranscriptionModel for local transcription processing - Update config-example.yaml to support multiple Replicate model versions and local Whisper configuration - Improve model selection and initialization in get_transcription_model factory function

silas-pusateri · 2025-02-21T23:03:47Z

I've added and tested markdown, pdf and docx requirements. As I'm unable to test localwhisper integration on my current rig, I've removed the LocalWhisper class as it is now outside of the scope of this PR. I would consider this RFR at this point, but still open to guidance and direction as necessary.

- Implement runtime configuration options for transcription, summarization, and export formats - Add new `/config` endpoint to dynamically fetch and update configuration - Create `ConfigOptions.svelte` component for frontend configuration UI - Enhance server-side config handling with runtime config merging - Improve error handling and logging in exporters and config management - Fix export options not being loaded

…quest #1 from silas-pusateri/frontend Add configuration panel to the webUI, utilizing dynamic runtime configuration

silas-pusateri · 2025-02-24T02:22:33Z

Added a config panel to the webUI which dynamically overrides config.yaml and generates a runtime config. Have also completed additional testing and caught some issues that were present with newly introduced export formats. These are the last commits that will be pushed until after I receive feedback from the review. Cheers!

sidedwards · 2025-02-27T01:07:01Z

@CodiumAI-Agent /review

QodoAI-Agent · 2025-02-27T01:07:43Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 No relevant tests
🔒 Security concerns Sensitive information exposure: The API keys for Anthropic, OpenAI, and Replicate are passed directly in headers and configuration files. Ensure these keys are securely stored and not logged in plaintext, especially in debug logs.
⚡ Recommended focus areas for review Possible Error Handling Gaps The `_make_anthropic_request` and `_make_openai_request` methods include retry logic but do not handle cases where the response structure deviates from expectations, potentially leading to runtime errors. def _make_anthropic_request(self, message: str, config: Dict, max_tokens: int = 2000, retries: int = 2) -> str: """Make a request to Anthropic API and return the response text.""" # Add JSON format enforcement to the message formatted_message = f"""You MUST respond with valid JSON only. No other text or explanation is allowed. If you need to include a message, put it in the JSON structure. {message} Remember: Your entire response must be parseable as JSON.""" headers = { "x-api-key": config["anthropic_api_key"], "anthropic-version": "2023-06-01", "content-type": "application/json", } data = { "model": config["anthropic_model"], "messages": [{"role": "user", "content": formatted_message}], "max_tokens": max_tokens, "temperature": 0 } for attempt in range(retries + 1): try: logger.debug(f"Sending request to Anthropic API (attempt {attempt + 1}): {config['anthropic_api_url']}") response = requests.post(config["anthropic_api_url"], headers=headers, json=data) response.raise_for_status() response_json = response.json() if "error" in response_json: error_msg = response_json.get("error", {}).get("message", "Unknown error") logger.error(f"Anthropic API error: {error_msg}") if attempt < retries: logger.info(f"Retrying request (attempt {attempt + 2})") continue raise Exception(f"Anthropic API error: {error_msg}") return response_json.get("content", [{}])[0].get("text", "") except requests.RequestException as e: if attempt < retries: logger.warning(f"Request failed (attempt {attempt + 1}): {str(e)}") continue raise Exception(f"Failed to communicate with Anthropic API: {str(e)}") raise Exception("All retry attempts failed") def _extract_json_from_text(self, text: str) -> Optional[str]: """Attempt to extract JSON from a text response.""" # Try to find JSON-like structure json_pattern = r'\{(?:[^{}]\|(?R))\}' matches = re.findall(json_pattern, text) for potential_json in matches: try: json.loads(potential_json) return potential_json except json.JSONDecodeError: continue return None def _parse_json_response(self, response_text: str, fallback_pattern: str = None) -> Any: """Parse JSON response with multiple fallback strategies.""" # Clean control characters cleaned_text = "".join(char for char in response_text if ord(char) >= 32 or char == '\n') # First attempt: direct JSON parsing try: return json.loads(cleaned_text) except json.JSONDecodeError: logger.debug("Direct JSON parsing failed, trying fallback methods") # Second attempt: try to extract JSON from text json_text = self._extract_json_from_text(cleaned_text) if json_text: try: return json.loads(json_text) except json.JSONDecodeError: logger.debug("Extracted JSON parsing failed, trying pattern matching") # Third attempt: pattern matching if provided if fallback_pattern: matches = re.findall(fallback_pattern, cleaned_text) if matches: logger.debug("Successfully extracted content using pattern matching") return matches # Fourth attempt: try to create JSON from plain text if not any(char in cleaned_text for char in '{['): try: # Wrap plain text in a JSON structure return {"content": cleaned_text.strip()} except Exception: logger.debug("Failed to create JSON from plain text") logger.error(f"Failed to parse response as JSON: {cleaned_text}") raise Exception("Failed to parse AI response") class OpenAIBaseModel: """Base class for OpenAI API interactions.""" def _make_openai_request(self, messages: list, config: Dict, max_tokens: int = 4000, retries: int = 2) -> str: """ Send a request to OpenAI's ChatCompletion endpoint. Args: messages: List of dicts for the conversation. config: Contains openai_api_key and other OpenAI settings. max_tokens: Maximum tokens in the response. retries: Number of retry attempts. Returns: The response text from the API. """ client = OpenAI(api_key=config["openai_api_key"]) for attempt in range(retries + 1): try: logger.debug(f"Sending OpenAI request (attempt {attempt + 1})") response = client.chat.completions.create( model=config.get("openai_model", "gpt-3.5-turbo"), messages=messages, max_tokens=max_tokens, temperature=0 ) return response.choices[0].message.content except Exception as e: if attempt < retries: logger.warning(f"OpenAI request failed (attempt {attempt + 1}): {str(e)}") continue raise Exception(f"OpenAI API error: {str(e)}") raise Exception("All retry attempts for OpenAI API failed") # Reuse the same JSON parsing helpers from AnthropicBaseModel def _extract_json_from_text(self, text: str) -> Optional[str]: """Attempt to extract JSON from a text response.""" json_pattern = r'\{(?:[^{}]\|(?R))\}' matches = re.findall(json_pattern, text) for potential_json in matches: try: json.loads(potential_json) return potential_json except json.JSONDecodeError: continue return None def _parse_json_response(self, response_text: str, fallback_pattern: str = None) -> Any: """Parse JSON response with multiple fallback strategies.""" # Clean control characters cleaned_text = "".join(char for char in response_text if ord(char) >= 32 or char == '\n') # First attempt: direct JSON parsing try: return json.loads(cleaned_text) except json.JSONDecodeError: logger.debug("Direct JSON parsing failed, trying fallback methods") # Second attempt: try to extract JSON from text json_text = self._extract_json_from_text(cleaned_text) if json_text: try: return json.loads(json_text) except json.JSONDecodeError: logger.debug("Extracted JSON parsing failed, trying pattern matching") # Third attempt: pattern matching if provided if fallback_pattern: matches = re.findall(fallback_pattern, cleaned_text) if matches: logger.debug("Successfully extracted content using pattern matching") return matches # Fourth attempt: try to create JSON from plain text if not any(char in cleaned_text for char in '{['): try: return {"content": cleaned_text.strip()} except Exception: logger.debug("Failed to create JSON from plain text") logger.error(f"Failed to parse response as JSON: {cleaned_text}") raise Exception("Failed to parse AI response") Exporter Integration The integration of exporters in the `main` function assumes that the exporter will always succeed. There is no error handling for cases where the exporter might fail due to invalid content or unsupported formats. media_file, goal=TranscriptionGoal.GENERAL_TRANSCRIPTION, progress_callback=None, runtime_config=None ): try: logger.info(f"Starting main process for file: {media_file}") if progress_callback: progress_callback("Starting transcription process", 0) config = load_config() # Merge runtime config if provided if runtime_config: config.update(runtime_config) logger.debug(f"Using configuration: {config}") # Get the configured exporter export_format = config.get("export_format") logger.debug(f"Export format from config: '{export_format}'") exporter = get_exporter(export_format) logger.debug(f"Using exporter for format: {export_format or 'markdown'}") if progress_callback: progress_callback("Uploading media to S3", 10) upload_to_s3(media_file, config) if progress_callback: progress_callback("Getting presigned URL", 20) presigned_url = get_s3_presigned_url(os.path.basename(media_file), config) if progress_callback: progress_callback("Starting transcription", 30) prediction = start_transcription(presigned_url, config) if progress_callback: progress_callback("Processing transcription", 40) transcript = get_transcription_result(prediction["urls"]["get"], config) if progress_callback: progress_callback(f"Generating {goal.value.replace('_', ' ')}", 60) content = generate_content(transcript, goal, config) output_name = os.path.splitext(os.path.basename(media_file))[0] output_folder = os.path.join(os.path.dirname(media_file), output_name) os.makedirs(output_folder, exist_ok=True) # Format transcription content transcription_content = "" for segment in transcript: transcription_content += f"{segment['start']} - {segment['end']}: {segment['text']}\n" # Save transcription using configured exporter transcription_file = os.path.join(output_folder, f"{output_name}_transcription{exporter.get_extension()}") with open(transcription_file, 'wb') as f: f.write(exporter.export(transcription_content)) logger.info(f"Transcription saved to {transcription_file}") # Save content using configured exporter output_file = os.path.join(output_folder, f"{output_name}_{goal.value}{exporter.get_extension()}") logger.info(f"Writing content to file: {output_file}") with open(output_file, "wb") as f: f.write(exporter.export(content)) if progress_callback: progress_callback("Creating media clips", 80) Model Configuration Validation The `get_transcription_model` function relies on configuration keys like `replicate_model_versions` and `selected_replicate_model` but does not validate their presence or correctness, which could lead to runtime issues. def get_transcription_model(config: Dict) -> TranscriptionModel: """Factory function to get the appropriate transcription model.""" model_type = config.get("transcription_model", "replicate") if model_type == "replicate": # Get the selected model version from the config available_versions = config.get("replicate_model_versions", {}) selected = config.get("selected_replicate_model", "whisperx") model_version = available_versions.get(selected) if not model_version: raise ValueError(f"No model version found for selected Replicate model: {selected}") # Return the appropriate Replicate model based on selection if selected == "incredibly-fast-whisper": return IncrediblyFastWhisperTranscriptionModel(model_version) elif selected == "whisperx": return WhisperXTranscriptionModel(model_version) else: raise ValueError(f"Unknown Replicate model type: {selected}") else: raise ValueError(f"Unknown transcription model: {model_type}")

sidedwards · 2025-02-27T01:16:35Z

@CodiumAI-Agent /describe

silas-pusateri added 11 commits February 15, 2025 15:36

Updated requirements.txt for backend dependencies fastapi and uvicorn

6691043

Update requirements.txt for backend dependency python-multipart

86dd407

Refactor current replicate/whisperx transcription and anthropic summa…

d5e352e

…rization/clip generation into modular services to support additional models

Add openAI base model, clip gen model and summarization model. Prepar…

c5e4363

…ing for openAI migrate

openAI migration

21a5bde

Refactor OpenAI model initialization in base models

d3d7c43

- Move OpenAI client initialization inside the method to avoid global state - Improve method formatting for better readability - Ensure client is created with each API call for more flexible configuration

Add shutil cleanup for /tmp/output_folder

8c11449

Add shutils dependency for output folder cleanup

b4ff7a6

Add missing docx and pdf requirements, remove local whisper & references

0104c5e

silas-pusateri marked this pull request as ready for review February 21, 2025 23:03

Update README

4fc69e3

sidedwards self-requested a review February 23, 2025 00:43

silas-pusateri and others added 2 commits February 23, 2025 17:59

Add webUI configuration and dynamical config override - Merge pull re…

03e9e61

…quest #1 from silas-pusateri/frontend Add configuration panel to the webUI, utilizing dynamic runtime configuration

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for additional Transcription and Summarization models #8

Add support for additional Transcription and Summarization models #8

Uh oh!

silas-pusateri commented Feb 20, 2025

Uh oh!

silas-pusateri commented Feb 21, 2025

Uh oh!

silas-pusateri commented Feb 24, 2025

Uh oh!

sidedwards commented Feb 27, 2025

Uh oh!

QodoAI-Agent commented Feb 27, 2025

Uh oh!

sidedwards commented Feb 27, 2025

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add support for additional Transcription and Summarization models #8

Are you sure you want to change the base?

Add support for additional Transcription and Summarization models #8

Uh oh!

Conversation

silas-pusateri commented Feb 20, 2025

Uh oh!

silas-pusateri commented Feb 21, 2025

Uh oh!

silas-pusateri commented Feb 24, 2025

Uh oh!

sidedwards commented Feb 27, 2025

Uh oh!

QodoAI-Agent commented Feb 27, 2025

PR Reviewer Guide 🔍

Uh oh!

sidedwards commented Feb 27, 2025

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants