Skip to content

Conversation

@silas-pusateri
Copy link

In this PR I hope to update this repository and complete the project roadmap by adding support for additional models(incredibly-fast-whisper, locally hosted whisper) as well as additional export options (markdown, pdf, docx).

This included several pieces of work:

  • Updating requirements.txt to include missing backend dependencies, as well as dependencies supporting these additions
  • Adding additional structure to the prompt to ensure JSON response from Anthropic
  • Moving previously implemented Replicate/whisperX transcription and Anthropic summaries into separate modules
  • Adding support for incredibly-fast-whisper transcription (audio only)
  • Adding OpenAI support for summarization and clip generation
  • Adding support for local Whisper implementation (currently untested)

I've focused on a few core things:

  • Modifying as little as possible in the way of existing patterns/implementation
  • Adding as few dependencies as necessary
  • Promoting future extension via code re-use (BaseModels, Factories)

There are several things still to be done before I would consider this ready to merge:

  • Adding video support for incredibly-fast-whisper by stripping audio before upload
  • Adding additional export options
  • Testing local Whisper
  • More general testing across both the cli and webUI

As there are several things that still need to be done, I've opened this PR as a draft in order to hopefully get some feedback as I continue to work this week. As this is my first public PR, I'm open to suggestions as far as the direction of the work and how it fits into the vision of the project as a whole. Thanks!

- Update Anthropic API request format to ensure JSON response parsing
- Improve error handling for AI response processing
- Add dynamic ffmpeg path detection for media clip generation
- Refactor content generation and topic extraction prompts
- Implement more resilient JSON parsing for AI responses
- Add error logging and exception handling for API interactions
…rization/clip generation into modular services to support additional models
- Move OpenAI client initialization inside the method to avoid global state
- Improve method formatting for better readability
- Ensure client is created with each API call for more flexible configuration
…r support

- Refactor transcription models to support multiple Replicate models (WhisperX and incredibly-fast-whisper)
- Add a base Replicate model class with common API request and polling functionality
- Implement LocalWhisperTranscriptionModel for local transcription processing
- Update config-example.yaml to support multiple Replicate model versions and local Whisper configuration
- Improve model selection and initialization in get_transcription_model factory function
@silas-pusateri
Copy link
Author

I've added and tested markdown, pdf and docx requirements. As I'm unable to test localwhisper integration on my current rig, I've removed the LocalWhisper class as it is now outside of the scope of this PR. I would consider this RFR at this point, but still open to guidance and direction as necessary.

@silas-pusateri silas-pusateri marked this pull request as ready for review February 21, 2025 23:03
@sidedwards sidedwards self-requested a review February 23, 2025 00:43
silas-pusateri and others added 2 commits February 23, 2025 17:59
- Implement runtime configuration options for transcription, summarization, and export formats
- Add new `/config` endpoint to dynamically fetch and update configuration
- Create `ConfigOptions.svelte` component for frontend configuration UI
- Enhance server-side config handling with runtime config merging
- Improve error handling and logging in exporters and config management
- Fix export options not being loaded
…quest #1 from silas-pusateri/frontend

Add configuration panel to the webUI, utilizing dynamic runtime configuration
@silas-pusateri
Copy link
Author

Added a config panel to the webUI which dynamically overrides config.yaml and generates a runtime config. Have also completed additional testing and caught some issues that were present with newly introduced export formats. These are the last commits that will be pushed until after I receive feedback from the review. Cheers!

@sidedwards
Copy link
Owner

@CodiumAI-Agent /review

@QodoAI-Agent
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 No relevant tests
🔒 Security concerns

Sensitive information exposure:
The API keys for Anthropic, OpenAI, and Replicate are passed directly in headers and configuration files. Ensure these keys are securely stored and not logged in plaintext, especially in debug logs.

⚡ Recommended focus areas for review

Possible Error Handling Gaps

The _make_anthropic_request and _make_openai_request methods include retry logic but do not handle cases where the response structure deviates from expectations, potentially leading to runtime errors.

    def _make_anthropic_request(self, message: str, config: Dict, max_tokens: int = 2000, retries: int = 2) -> str:
        """Make a request to Anthropic API and return the response text."""
        # Add JSON format enforcement to the message
        formatted_message = f"""You MUST respond with valid JSON only. No other text or explanation is allowed.
        If you need to include a message, put it in the JSON structure.

        {message}

        Remember: Your entire response must be parseable as JSON."""

        headers = {
            "x-api-key": config["anthropic_api_key"],
            "anthropic-version": "2023-06-01",
            "content-type": "application/json",
        }

        data = {
            "model": config["anthropic_model"],
            "messages": [{"role": "user", "content": formatted_message}],
            "max_tokens": max_tokens,
            "temperature": 0
        }

        for attempt in range(retries + 1):
            try:
                logger.debug(f"Sending request to Anthropic API (attempt {attempt + 1}): {config['anthropic_api_url']}")
                response = requests.post(config["anthropic_api_url"], headers=headers, json=data)
                response.raise_for_status()
                response_json = response.json()

                if "error" in response_json:
                    error_msg = response_json.get("error", {}).get("message", "Unknown error")
                    logger.error(f"Anthropic API error: {error_msg}")
                    if attempt < retries:
                        logger.info(f"Retrying request (attempt {attempt + 2})")
                        continue
                    raise Exception(f"Anthropic API error: {error_msg}")

                return response_json.get("content", [{}])[0].get("text", "")

            except requests.RequestException as e:
                if attempt < retries:
                    logger.warning(f"Request failed (attempt {attempt + 1}): {str(e)}")
                    continue
                raise Exception(f"Failed to communicate with Anthropic API: {str(e)}")

        raise Exception("All retry attempts failed")

    def _extract_json_from_text(self, text: str) -> Optional[str]:
        """Attempt to extract JSON from a text response."""
        # Try to find JSON-like structure
        json_pattern = r'\{(?:[^{}]|(?R))*\}'
        matches = re.findall(json_pattern, text)

        for potential_json in matches:
            try:
                json.loads(potential_json)
                return potential_json
            except json.JSONDecodeError:
                continue

        return None

    def _parse_json_response(self, response_text: str, fallback_pattern: str = None) -> Any:
        """Parse JSON response with multiple fallback strategies."""
        # Clean control characters
        cleaned_text = "".join(char for char in response_text if ord(char) >= 32 or char == '\n')

        # First attempt: direct JSON parsing
        try:
            return json.loads(cleaned_text)
        except json.JSONDecodeError:
            logger.debug("Direct JSON parsing failed, trying fallback methods")

        # Second attempt: try to extract JSON from text
        json_text = self._extract_json_from_text(cleaned_text)
        if json_text:
            try:
                return json.loads(json_text)
            except json.JSONDecodeError:
                logger.debug("Extracted JSON parsing failed, trying pattern matching")

        # Third attempt: pattern matching if provided
        if fallback_pattern:
            matches = re.findall(fallback_pattern, cleaned_text)
            if matches:
                logger.debug("Successfully extracted content using pattern matching")
                return matches

        # Fourth attempt: try to create JSON from plain text
        if not any(char in cleaned_text for char in '{['):
            try:
                # Wrap plain text in a JSON structure
                return {"content": cleaned_text.strip()}
            except Exception:
                logger.debug("Failed to create JSON from plain text")

        logger.error(f"Failed to parse response as JSON: {cleaned_text}")
        raise Exception("Failed to parse AI response")

class OpenAIBaseModel:
    """Base class for OpenAI API interactions."""

    def _make_openai_request(self, messages: list, config: Dict, max_tokens: int = 4000, retries: int = 2) -> str:
        """
        Send a request to OpenAI's ChatCompletion endpoint.

        Args:
            messages: List of dicts for the conversation.
            config: Contains openai_api_key and other OpenAI settings.
            max_tokens: Maximum tokens in the response.
            retries: Number of retry attempts.

        Returns:
            The response text from the API.
        """
        client = OpenAI(api_key=config["openai_api_key"])

        for attempt in range(retries + 1):
            try:
                logger.debug(f"Sending OpenAI request (attempt {attempt + 1})")
                response = client.chat.completions.create(
                    model=config.get("openai_model", "gpt-3.5-turbo"),
                    messages=messages,
                    max_tokens=max_tokens,
                    temperature=0
                )
                return response.choices[0].message.content

            except Exception as e:
                if attempt < retries:
                    logger.warning(f"OpenAI request failed (attempt {attempt + 1}): {str(e)}")
                    continue
                raise Exception(f"OpenAI API error: {str(e)}")

        raise Exception("All retry attempts for OpenAI API failed")

    # Reuse the same JSON parsing helpers from AnthropicBaseModel
    def _extract_json_from_text(self, text: str) -> Optional[str]:
        """Attempt to extract JSON from a text response."""
        json_pattern = r'\{(?:[^{}]|(?R))*\}'
        matches = re.findall(json_pattern, text)

        for potential_json in matches:
            try:
                json.loads(potential_json)
                return potential_json
            except json.JSONDecodeError:
                continue

        return None

    def _parse_json_response(self, response_text: str, fallback_pattern: str = None) -> Any:
        """Parse JSON response with multiple fallback strategies."""
        # Clean control characters
        cleaned_text = "".join(char for char in response_text if ord(char) >= 32 or char == '\n')

        # First attempt: direct JSON parsing
        try:
            return json.loads(cleaned_text)
        except json.JSONDecodeError:
            logger.debug("Direct JSON parsing failed, trying fallback methods")

        # Second attempt: try to extract JSON from text
        json_text = self._extract_json_from_text(cleaned_text)
        if json_text:
            try:
                return json.loads(json_text)
            except json.JSONDecodeError:
                logger.debug("Extracted JSON parsing failed, trying pattern matching")

        # Third attempt: pattern matching if provided
        if fallback_pattern:
            matches = re.findall(fallback_pattern, cleaned_text)
            if matches:
                logger.debug("Successfully extracted content using pattern matching")
                return matches

        # Fourth attempt: try to create JSON from plain text
        if not any(char in cleaned_text for char in '{['):
            try:
                return {"content": cleaned_text.strip()}
            except Exception:
                logger.debug("Failed to create JSON from plain text")

        logger.error(f"Failed to parse response as JSON: {cleaned_text}")
        raise Exception("Failed to parse AI response") 
Exporter Integration

The integration of exporters in the main function assumes that the exporter will always succeed. There is no error handling for cases where the exporter might fail due to invalid content or unsupported formats.

    media_file, 
    goal=TranscriptionGoal.GENERAL_TRANSCRIPTION, 
    progress_callback=None,
    runtime_config=None
):
    try:
        logger.info(f"Starting main process for file: {media_file}")
        if progress_callback:
            progress_callback("Starting transcription process", 0)

        config = load_config()

        # Merge runtime config if provided
        if runtime_config:
            config.update(runtime_config)

        logger.debug(f"Using configuration: {config}")

        # Get the configured exporter
        export_format = config.get("export_format")
        logger.debug(f"Export format from config: '{export_format}'")
        exporter = get_exporter(export_format)
        logger.debug(f"Using exporter for format: {export_format or 'markdown'}")

        if progress_callback:
            progress_callback("Uploading media to S3", 10)
        upload_to_s3(media_file, config)

        if progress_callback:
            progress_callback("Getting presigned URL", 20)
        presigned_url = get_s3_presigned_url(os.path.basename(media_file), config)

        if progress_callback:
            progress_callback("Starting transcription", 30)
        prediction = start_transcription(presigned_url, config)

        if progress_callback:
            progress_callback("Processing transcription", 40)
        transcript = get_transcription_result(prediction["urls"]["get"], config)

        if progress_callback:
            progress_callback(f"Generating {goal.value.replace('_', ' ')}", 60)
        content = generate_content(transcript, goal, config)

        output_name = os.path.splitext(os.path.basename(media_file))[0]
        output_folder = os.path.join(os.path.dirname(media_file), output_name)
        os.makedirs(output_folder, exist_ok=True)

        # Format transcription content
        transcription_content = ""
        for segment in transcript:
            transcription_content += f"{segment['start']} - {segment['end']}: {segment['text']}\n"

        # Save transcription using configured exporter
        transcription_file = os.path.join(output_folder, f"{output_name}_transcription{exporter.get_extension()}")
        with open(transcription_file, 'wb') as f:
            f.write(exporter.export(transcription_content))
        logger.info(f"Transcription saved to {transcription_file}")

        # Save content using configured exporter
        output_file = os.path.join(output_folder, f"{output_name}_{goal.value}{exporter.get_extension()}")
        logger.info(f"Writing content to file: {output_file}")
        with open(output_file, "wb") as f:
            f.write(exporter.export(content))

        if progress_callback:
            progress_callback("Creating media clips", 80)
Model Configuration Validation

The get_transcription_model function relies on configuration keys like replicate_model_versions and selected_replicate_model but does not validate their presence or correctness, which could lead to runtime issues.

def get_transcription_model(config: Dict) -> TranscriptionModel:
    """Factory function to get the appropriate transcription model."""
    model_type = config.get("transcription_model", "replicate")

    if model_type == "replicate":
        # Get the selected model version from the config
        available_versions = config.get("replicate_model_versions", {})
        selected = config.get("selected_replicate_model", "whisperx")
        model_version = available_versions.get(selected)

        if not model_version:
            raise ValueError(f"No model version found for selected Replicate model: {selected}")

        # Return the appropriate Replicate model based on selection
        if selected == "incredibly-fast-whisper":
            return IncrediblyFastWhisperTranscriptionModel(model_version)
        elif selected == "whisperx":
            return WhisperXTranscriptionModel(model_version)
        else:
            raise ValueError(f"Unknown Replicate model type: {selected}")

    else:
        raise ValueError(f"Unknown transcription model: {model_type}") 

@sidedwards
Copy link
Owner

@CodiumAI-Agent /describe

@QodoAI-Agent

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants