Skip to content

Conversation

@NumericalAdvantage
Copy link
Collaborator

@NumericalAdvantage NumericalAdvantage commented Nov 14, 2025

Adding a "Selection Output Type" to the list of possible output types like Text, Boolean, Numeric. This essentially behaves like a enum type - It accepts (for now upto 7) possible options (or lets say "Selections"), which are then sent to the LLM and the LLM is asked to choose one of the "Selections" as answer.

The form is rendered by adding an Alpine component called SelectionOptions. A new template is created to serve the form called _selection_options_field. Other templates that small updates are extraction_job_detail adn extraction_job_wizard_summary, because out new selection type has to be shown on these pages.

Added tests to verify correctness of data coming out of the Selection form.

Summary by CodeRabbit

  • New Features

    • Added a "Selection" output type with configurable enumerated options and an array mode; schema and prompts now reflect selections and array semantics with a max-options limit.
  • UX / Style

    • Interactive controls and styles for managing selection options and an array-toggle; Enter key prevented from submitting certain forms.
  • Templates

    • Extraction views and summaries now display array and selection metadata.
  • Forms / Validation

    • Server-side form/model validation and hidden fields enforce selection rules and array handling.
  • Tests

    • New tests for forms, models, schema generation, and edge cases.
  • Chores

    • Database migrations to add selection and array fields.

✏️ Tip: You can customize this high-level summary in your review settings.

@claude
Copy link

claude bot commented Nov 14, 2025

Pull Request Review: Selection Output Type

I've completed a comprehensive review of PR #152. Overall, this is a well-implemented feature with good attention to validation and user experience. Here's my detailed feedback:


Strengths

  1. Comprehensive Validation - Excellent multi-layer validation approach:

    • Form-level validation in OutputFieldForm.clean() and clean_selection_options()
    • Model-level validation in OutputField.clean()
    • Pydantic schema validation in processor_utils.py
  2. Good Test Coverage - Four solid tests covering key scenarios:

    • Valid selection options
    • Required options for selection type
    • Rejection of options for non-selection types
    • Trimming of whitespace
  3. User Experience - The Alpine.js component provides good UX with real-time feedback, dynamic enabling/disabling, and clear counter display

  4. Type Safety - Using Literal types in Pydantic for LLM schema generation is the correct approach for constrained choices


🐛 Issues Found

Critical: Data Loss Risk in JavaScript

Location: radis/core/static/core/core.js:31-35

if (!this.supportsSelection) {
  this.options = [];
} else if (!wasSelection && this.options.length === 0) {
  this.options = this.parseOptions(hiddenInput?.value);
}

Problem: When a user switches from "Selection" to another type and back to "Selection", all their entered options are permanently lost because the array is cleared but not restored from the hidden input.

Impact: Users would need to re-enter all selection options if they accidentally change the output type.

Recommendation: Store the options in a temporary variable before clearing:

if (!this.supportsSelection) {
  this.lastSelectionOptions = [...this.options]; // Store before clearing
  this.options = [];
} else if (!wasSelection && this.options.length === 0) {
  this.options = this.lastSelectionOptions?.length 
    ? this.lastSelectionOptions 
    : this.parseOptions(hiddenInput?.value);
}

Medium: Edge Case in Model Validation

Location: radis/extractions/models.py:111-132

Issue: The model's clean() method resets selection_options = [] for non-selection types, but this could cause issues if:

  1. A user creates a field with selection type and options
  2. Later edits it to a different type
  3. The options are cleared silently

Recommendation: Consider warning the user before data loss or storing the options even when not active (just don't use them). This allows users to switch back without losing data.


Low: Inconsistent Validation Messages

Form validation says: "Add at least one selection to use the Selection type."
Model validation says: "Add at least one selection option."

Recommendation: Standardize error messages across form and model validation for consistency.


🔒 Security Considerations

  1. ✅ JSON Injection Protection - Good use of json.loads() with proper error handling
  2. ✅ Type Validation - Proper checking that all items are strings
  3. ✅ XSS Protection - Template properly escapes output with {{ option }}
  4. ⚠️ Recommendation: Consider adding a max length per option (e.g., 100 chars) to prevent extremely long strings

Performance Considerations

  1. Minor: N+1 Query Potential - In templates, iterating field.selection_options is fine since it's a JSON field, not a relation
  2. Good: Using JSONField is appropriate for this small list (max 7 items)
  3. Consider: If the 7-option limit might increase significantly in the future, document why 7 was chosen

🧪 Test Coverage Gaps

Current tests are good, but consider adding:

  1. Test for maximum limit enforcement:

    def test_output_field_form_rejects_more_than_7_options():
        # Test that 8+ options are rejected
  2. Test for duplicate options:

    def test_output_field_form_allows_duplicate_options():
        # Clarify if duplicates should be allowed or deduplicated
  3. Test for special characters and unicode:

    def test_output_field_form_handles_unicode_options():
        # Test with emoji, Chinese characters, etc.
  4. Integration test with Pydantic schema generation:

    def test_generate_output_fields_schema_with_selection():
        # Test that Literal types are created correctly
  5. Test for prompt generation with selections:

    def test_generate_output_fields_prompt_includes_selections():
        # Verify LLM prompt includes selection options

📋 Code Quality Observations

  1. ✅ Good: Type hints are used consistently
  2. ✅ Good: Proper separation of concerns (JS component, form validation, model validation)
  3. ⚠️ Minor: The formset template change appears to only adjust indentation - confirm this is intentional

Location: radis/core/templates/cotton/formset.html


🎯 Recommendations

  1. High Priority: Fix the data loss issue in the JavaScript component
  2. Medium Priority: Add the suggested tests, especially for edge cases
  3. Low Priority: Standardize validation error messages
  4. Consider: Add a character limit per selection option (e.g., 100 chars)
  5. Consider: Document why 7 options is the limit (is it based on LLM context window, UI constraints, or business logic?)

📝 Documentation Suggestions

Consider adding inline comments explaining:

  • Why 7 options maximum?
  • How the LLM uses these selection constraints
  • Migration path if users need more than 7 options in the future

Summary

This is a well-structured feature addition with good validation and test coverage. The main concern is the potential data loss in the JavaScript component when users change output types. Once that's addressed, this should be ready to merge.

Overall Assessment: ✅ Approve with minor changes recommended

Great work on this implementation! The multi-layer validation approach and attention to UX details are particularly commendable.

@NumericalAdvantage NumericalAdvantage marked this pull request as draft November 17, 2025 09:19
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

Walkthrough

Adds a Selection output type and array support: new model fields and migrations, form fields and validation, Alpine/JS UI and CSS for editing selections, processor changes to emit Literal (and List[Literal]) types and prompts, template display updates, and tests covering form/model/processor behavior.

Changes

Cohort / File(s) Summary
Frontend styling & formset template
radis/core/static/core/core.css, radis/core/templates/cotton/formset.html
Add CSS classes for selection controls and array toggle; adjust per-form crispy invocation in formset template.
Frontend JS interaction
radis/core/static/core/core.js
Add Enter-prevent handler and SelectionOptions(rootEl) Alpine component to manage option list, array toggle, hidden-input sync, and UI interactions.
Selection UI template & display updates
radis/extractions/templates/extractions/_selection_options_field.html, radis/extractions/templates/extractions/extraction_job_detail.html, radis/extractions/templates/extractions/extraction_job_wizard_summary.html
New Alpine partial for editing selection options; templates now render is_array and selection_options; wizard form includes data-prevent-enter-submit attribute.
Constants
radis/extractions/constants.py
Add MAX_SELECTION_OPTIONS = 7.0.
Models & migrations
radis/extractions/models.py, radis/extractions/migrations/0004_outputfield_selection_options.py, radis/extractions/migrations/0005_outputfield_is_array.py
Add OutputType.SELECTION, selection_options: JSONField (default=list, blank=True), is_array: BooleanField; add OutputField.clean() to validate/normalize options; migrations add/alter fields.
Forms & validation
radis/extractions/forms.py
OutputFieldForm adds hidden selection_options and is_array fields, Crispy layout integration, data attributes, and validation methods (clean_selection_options, clean_is_array, clean).
Processor / schema & prompt generation
radis/extractions/utils/processor_utils.py
generate_output_fields_schema() maps SELECTION to Literal[*options] and wraps in list[...] when is_array is true; raises on empty selections. generate_output_fields_prompt() appends selection options and array notes.
Tests
radis/extractions/tests/test_forms.py, radis/extractions/tests/unit/test_processor_utils.py
New tests for form/model validation (JSON parsing, trimming, uniqueness, required for SELECTION, is_array handling) and for processor schema producing Literal and List[Literal] annotations.

Sequence Diagram

sequenceDiagram
    participant User as User (browser)
    participant UI as SelectionOptions (client JS / Alpine)
    participant Form as OutputFieldForm (server)
    participant Model as OutputField (DB)
    participant Processor as processor_utils

    User->>UI: Add/edit option, toggle array
    UI->>UI: enforce max, manage list, update UI states
    UI->>UI: sync hidden `selection_options` and `is_array`
    User->>Form: Submit form (includes hidden fields)
    Form->>Form: parse JSON, validate options & is_array, require when type=SELECTION
    Form->>Model: save -> Model.clean() normalizes & validates
    Model->>Model: persist `selection_options` and `is_array`

    Note over Processor,Model: Schema generation reads persisted fields
    Processor->>Processor: If SELECTION -> build Literal[*options]
    alt is_array == true
        Processor->>Processor: Wrap in list[Literal[...]]
    end
    Processor->>User: return schema/prompt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing attention:
    • radis/extractions/forms.py: JSON parsing, trimming, uniqueness, MAX_SELECTION_OPTIONS enforcement, and Crispy layout wiring.
    • radis/extractions/models.py and migrations: clean() logic, defaults, and migration effects.
    • radis/extractions/utils/processor_utils.py: correct Literal construction and array wrapping (typing semantics).
    • radis/core/static/core/core.js and _selection_options_field.html: client-side state sync, focus/UX, and prevent-Enter submit behavior.
    • Tests: ensure edge cases (whitespace-only, duplicates, max limit) match server validation.

Poem

🐇 I nibble keys and trim each name,
Toggles click as options stake their claim,
Tiny inputs lined, no trailing space to keep,
Arrays gather berries—neat rows, not a heap,
Hop! Saved choices snug, in fields now fast asleep.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.64% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Selection output type' accurately describes the main feature added: a new Selection output type for handling enumerated options, which is the primary objective of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch selectionOutputType

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7043d4a and 9a9500b.

📒 Files selected for processing (14)
  • radis/core/static/core/core.css (1 hunks)
  • radis/core/static/core/core.js (2 hunks)
  • radis/core/templates/cotton/formset.html (1 hunks)
  • radis/extractions/constants.py (1 hunks)
  • radis/extractions/forms.py (3 hunks)
  • radis/extractions/migrations/0004_outputfield_selection_options.py (1 hunks)
  • radis/extractions/migrations/0005_outputfield_is_array.py (1 hunks)
  • radis/extractions/models.py (4 hunks)
  • radis/extractions/templates/extractions/_selection_options_field.html (1 hunks)
  • radis/extractions/templates/extractions/extraction_job_detail.html (1 hunks)
  • radis/extractions/templates/extractions/extraction_job_wizard_summary.html (1 hunks)
  • radis/extractions/tests/test_forms.py (1 hunks)
  • radis/extractions/tests/unit/test_processor_utils.py (1 hunks)
  • radis/extractions/utils/processor_utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • radis/extractions/models.py
  • radis/extractions/tests/test_forms.py
  • radis/extractions/constants.py
  • radis/extractions/templates/extractions/extraction_job_detail.html
  • radis/core/static/core/core.js
  • radis/extractions/templates/extractions/extraction_job_wizard_summary.html
🧰 Additional context used
🧬 Code graph analysis (5)
radis/extractions/tests/unit/test_processor_utils.py (3)
radis/extractions/factories.py (2)
  • ExtractionJobFactory (26-59)
  • OutputFieldFactory (62-69)
radis/extractions/models.py (1)
  • OutputType (77-83)
radis/extractions/utils/processor_utils.py (1)
  • generate_output_fields_schema (13-40)
radis/extractions/utils/processor_utils.py (1)
radis/extractions/models.py (2)
  • OutputField (86-153)
  • OutputType (77-83)
radis/extractions/forms.py (1)
radis/extractions/models.py (7)
  • ExtractionJob (28-74)
  • OutputField (86-153)
  • OutputType (77-83)
  • Meta (18-19)
  • Meta (58-59)
  • Meta (103-109)
  • clean (114-153)
radis/extractions/migrations/0004_outputfield_selection_options.py (1)
radis/extractions/migrations/0005_outputfield_is_array.py (1)
  • Migration (4-16)
radis/extractions/migrations/0005_outputfield_is_array.py (1)
radis/extractions/migrations/0004_outputfield_selection_options.py (1)
  • Migration (4-33)
🪛 Ruff (0.14.7)
radis/extractions/utils/processor_utils.py

27-27: Avoid specifying long messages outside the exception class

(TRY003)


32-32: Avoid specifying long messages outside the exception class

(TRY003)

radis/extractions/forms.py

205-211: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


283-283: Avoid specifying long messages outside the exception class

(TRY003)


286-286: Avoid specifying long messages outside the exception class

(TRY003)


291-291: Avoid specifying long messages outside the exception class

(TRY003)


294-294: Avoid specifying long messages outside the exception class

(TRY003)


298-300: Avoid specifying long messages outside the exception class

(TRY003)


302-302: Avoid specifying long messages outside the exception class

(TRY003)

radis/extractions/migrations/0004_outputfield_selection_options.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-33: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/migrations/0005_outputfield_is_array.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-16: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (9)
radis/core/templates/cotton/formset.html (1)

6-11: LGTM!

The migration from Crispy filter syntax ({{ form|crispy }}) to tag syntax ({% crispy form %}) is idiomatic and correct. Wrapping the empty form in a <template> element is the proper approach for client-side form cloning.

radis/extractions/migrations/0004_outputfield_selection_options.py (1)

1-33: LGTM!

The migration correctly adds the selection_options JSONField and extends output_type choices to include SELECTION ("S"). The structure follows Django migration best practices.

Note: The Ruff RUF012 hints about ClassVar are false positives—Django migration classes use standard class attributes, not typing annotations.

radis/core/static/core/core.css (1)

43-72: LGTM!

The new CSS classes provide clean, Bootstrap-compatible styling for the selection options UI. The flexbox layout, circular toggle button design, and active state handling are all well-implemented.

radis/extractions/templates/extractions/_selection_options_field.html (1)

1-51: LGTM!

The Alpine.js-powered selection options UI is well-structured:

  • Hidden fields are correctly positioned for state synchronization
  • Disabled logic prevents exceeding maxOptions
  • Dynamic placeholders and error rendering enhance UX
  • Guidance text clarifies when Selection type must be chosen
radis/extractions/migrations/0005_outputfield_is_array.py (1)

1-16: LGTM!

The migration correctly adds the is_array boolean field with appropriate defaults and follows the proper dependency chain from migration 0004.

Note: The Ruff RUF012 hint is a false positive for Django migrations.

radis/extractions/utils/processor_utils.py (1)

43-59: LGTM!

The enhanced prompt generation correctly appends selection constraints and array indicators to field descriptions. The logic clearly communicates requirements to the LLM processor.

radis/extractions/forms.py (3)

189-211: LGTM!

The addition of selection_options and is_array as hidden fields is the right approach for managing dynamic UI state that needs to round-trip through form submission.

Note: The Ruff RUF012 hint on Meta.fields is a false positive—Django form Meta classes use standard attributes.


213-270: LGTM!

The __init__ method properly:

  • Configures the Textarea widget for better UX
  • Stamps data attributes for Alpine.js component integration
  • Prepopulates hidden fields from instance state
  • Builds a clean Crispy Forms layout with the array toggle button and selection options template include

274-335: Validation logic is thorough and correct.

The three clean methods comprehensively handle:

  • JSON parsing with proper error handling
  • Type validation and sanitization (strip, non-empty)
  • Constraint enforcement (max count, uniqueness)
  • Cross-field validation (SELECTION type requires options)

Error messages are clear and actionable for end users.

Note: The Ruff TRY003 hints suggest extracting error messages to exception classes, but for user-facing Django form validation messages, inline strings are the standard and acceptable practice.


Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Nov 25, 2025

Pull Request Review: Selection Output Type

Summary

This PR adds a new "Selection" output type to the extraction system, allowing users to define enumerated values (up to 7 options) that the LLM can choose from. The implementation includes frontend (Alpine.js), backend (Django forms and models), validation, and tests.

✅ Strengths

  1. Comprehensive Implementation: The feature is well-integrated across all layers (models, forms, templates, JavaScript, and utilities).

  2. Good Test Coverage: The new test_forms.py includes tests for the critical paths:

    • Accepting valid selection options
    • Requiring options for selection type
    • Rejecting options for non-selection types
    • Model-level validation
  3. User Experience: The Alpine.js component provides good UX with dynamic showing/hiding based on output type selection.

  4. Data Validation: Multiple layers of validation (JavaScript, form, and model) ensure data integrity.

  5. Migration: Properly adds the new field with appropriate default.

🔍 Code Quality Issues

Critical

1. Potential Type Safety Issue in processor_utils.py:22-25

options = tuple(field.selection_options)
if not options:
    raise ValueError("Selection output requires at least one option.")
output_type = Literal.__getitem__(options)

Issue: Literal types require literal values known at module load time. Using runtime values with Literal.getitem() may cause type checking issues or runtime errors with certain values (especially those containing special characters).

Recommendation: Consider using Enum instead for runtime-generated types:

from enum import Enum
SelectionEnum = Enum('SelectionEnum', {opt: opt for opt in field.selection_options})
output_type = SelectionEnum

High Priority

2. Inconsistent Whitespace Changes in formset.html:6-10
The PR includes unrelated formatting changes (indentation adjustments) that should be in a separate commit or PR.

3. Missing Duplicate Option Validation
Neither the form nor model validation checks for duplicate options. Users could accidentally create ["Grade 1", "Grade 1"].

Recommendation: Add validation in OutputFieldForm.clean_selection_options():

if len(cleaned) \!= len(set(cleaned)):
    raise forms.ValidationError("Selection options must be unique.")

4. No Maximum Length Validation for Individual Options
While the form validates the count (max 7), there's no validation on the length of each option string. Very long options could cause UI or LLM prompt issues.

Recommendation: Add length validation (e.g., max 100 characters per option).

Medium Priority

5. LocaleMiddleware Addition Appears Unrelated
In settings/base.py:95, the addition of LocaleMiddleware seems unrelated to the selection output type feature. According to the PR description, this is about "detecting locale from users browser", but it's not connected to the selection feature.

Recommendation: This change should be in a separate PR or needs explanation in the description of why it's required for this feature.

6. Console.log Statement in Production Code
core.js:38 contains console.log(inputs); which should be removed.

7. JavaScript: No Validation for Empty String Options
The syncOptions() method in core.js:147-154 filters empty strings, but users can still have empty inputs in the UI temporarily. Consider adding visual feedback or preventing submission with empty options.

Low Priority

8. Magic Number 7
The maximum number of options (7) is hardcoded in multiple places:

  • JavaScript: core.js:109
  • Form: forms.py:255
  • Model: models.py:114

Recommendation: Extract to a constant.

9. Type Annotation Inconsistency
In processor_utils.py:1, Literal is imported from typing, but the Python 3.9+ style type Numeric = float | int is used at line 8. Consider consistency (either all modern or all traditional).

10. Missing Accessibility Attributes
The template _selection_options_field.html could benefit from additional ARIA attributes for better screen reader support, particularly for the dynamic list of options.

🔒 Security Concerns

Low Risk: No significant security vulnerabilities identified. The JSON parsing is properly error-handled, and all user inputs are validated and sanitized.

⚡ Performance Considerations

Minor: The implementation is efficient for the expected scale (max 7 options). The x-effect="syncOptions()" in Alpine.js will run on every reactive change, which is fine for this small dataset.

📝 Suggestions

  1. Documentation: Consider adding docstrings to the SelectionOptions JavaScript function and the new form methods.

  2. Integration Tests: Add tests that verify the end-to-end flow including:

    • Saving and retrieving selection options from the database
    • LLM prompt generation with selection options
    • Processing LLM responses with selection values
  3. User Feedback: Consider adding client-side validation feedback in the UI when users try to add duplicate options or exceed the limit.

🎯 Recommendations for Merge

Before merging, please address:

  1. Fix the Literal type safety issue in processor_utils.py
  2. Add duplicate option validation
  3. Remove console.log statement
  4. Clarify or separate the LocaleMiddleware change
  5. Revert unrelated formatting changes in formset.html or explain them

Nice to have:

  • Extract magic number 7 to a constant
  • Add length validation for individual options
  • Add integration tests

Overall Assessment

This is a solid feature implementation with good test coverage and user experience. The code follows Django and Google Python Style Guide conventions. With the critical issues addressed (particularly the Literal type issue), this PR will be ready to merge.

Estimated effort to address critical issues: 1-2 hours

Great work on the comprehensive implementation! 🎉


Review generated by Claude Code

@claude
Copy link

claude bot commented Nov 25, 2025

Pull Request Review: Selection Output Type

Summary

This PR adds a new "Selection" output type to the extraction system, allowing users to define enumerated values (up to 7 options) that the LLM can choose from. It also introduces an "array" toggle feature for all output types. The implementation includes comprehensive form handling, validation, and testing.

✅ Strengths

Code Quality

  • Well-structured architecture: Clean separation between frontend (Alpine.js) and backend (Django) logic
  • Comprehensive validation: Both client-side and server-side validation at multiple levels (form, model, and database)
  • Good type safety: Proper use of Python type hints and Pydantic's Literal type for selection options
  • Follows project conventions: Adheres to Google Python Style Guide and existing patterns in the codebase

Testing

  • Good test coverage: Tests cover the main scenarios:
    • Valid selection options accepted
    • Empty selection options rejected for Selection type
    • Selection options rejected for non-Selection types
    • Whitespace trimming
    • Array toggle functionality
  • Uses appropriate fixtures: Leverages ExtractionJobFactory for test setup

UI/UX

  • Interactive and intuitive: Dynamic form that shows/hides selection inputs based on output type
  • Clear visual feedback: Counter showing "N / 7" options, disabled states when limits reached
  • Accessibility: Uses proper ARIA attributes (aria-pressed, aria-label)

⚠️ Issues and Concerns

1. Critical: Formset Template Indentation Change

Location: radis/core/templates/cotton/formset.html:6-11

The PR modifies indentation in the formset template, which appears unrelated to the feature. This change is cosmetic and unrelated to the PR's purpose.

Recommendation: Revert this change or explain why it's necessary.

2. Potential Bug: Race Condition in syncState

Location: radis/core/static/core/core.js:4 and core.js:181-185

The template uses x-effect="syncState()" which means syncState() is called every time any reactive property changes. This could lead to unnecessary DOM updates.

Recommendation: Consider using @change events or Alpine's $watch for more explicit control.

3. Missing Edge Case Validation - Duplicates

Location: radis/extractions/forms.py:253-279

The form validation allows duplicate selection options. Users could create selections like ["Grade 1", "Grade 1", "Grade 2"], which would confuse the LLM and users.

Recommendation: Add duplicate detection in the validation logic.

4. Inconsistent Validation Between Form and Model

Location: radis/extractions/forms.py:253-279 vs radis/extractions/models.py:107-134

Both the form and model perform similar validation with slight differences. Consider consolidating validation logic to avoid duplication and ensure consistency.

5. Performance: String Concatenation in Loop

Location: radis/extractions/utils/processor_utils.py:37-52

String concatenation in a loop creates new string objects each iteration. Consider using a list and "".join() instead.

6. Missing Test Cases

The following scenarios are not tested:

  • Duplicate selection options
  • Selection options with only whitespace
  • Very long selection options
  • Unicode/special characters in selection options
  • Array selection combinations
  • Integration test for full workflow
  • JavaScript unit tests for SelectionOptions component
  • Most important: Tests for generate_output_fields_schema() with Selection type to ensure Pydantic Literal types work correctly

7. Documentation: Magic Number

Location: Multiple files

The number 7 appears multiple times as the maximum selection options without explanation.

Recommendation: Extract to a constant with documentation explaining the rationale.

8. Minor: Console.log in Production Code

Location: radis/core/static/core/core.js:70

Remove or convert to proper debugging statement.

9. Locale Middleware Addition

Location: radis/settings/base.py:95

The addition of LocaleMiddleware is mentioned as part of a separate commit. Consider splitting this into a separate PR or document why it's required for the Selection feature.

🔒 Security Assessment

Overall: No significant security vulnerabilities detected.

  • ✅ Input validation is comprehensive
  • ✅ XSS protection via Django's auto-escaping
  • ✅ JSON parsing has proper error handling
  • ✅ No SQL injection risks (uses ORM)
  • ✅ CSRF protection maintained

📊 Performance Considerations

  • ⚠️ String concatenation in loop could be optimized
  • ✅ JSON fields are indexed appropriately
  • ✅ Database migrations are non-blocking
  • ✅ Frontend JavaScript is lightweight
  • ⚠️ x-effect="syncState()" may cause unnecessary re-renders

🧪 Test Coverage Assessment

Coverage: Moderate (60-70% estimated)

Covered:

  • Form validation (happy path and error cases)
  • Model validation
  • Array toggle

Missing:

  • JavaScript component testing
  • Template rendering tests
  • Integration tests
  • Edge cases (duplicates, unicode, etc.)
  • Critical: processor_utils.py changes (no tests for Selection type in Pydantic schema)

📝 Recommendations Summary

Must Fix (Before Merge)

  1. Add duplicate detection in selection options validation
  2. Add tests for processor_utils.py Selection type handling - this is critical to ensure Pydantic Literal types work correctly with the LLM
  3. Revert or justify the formset template indentation change

Should Fix

  1. Extract magic number 7 to a named constant
  2. Remove console.log from production code
  3. Optimize string concatenation in generate_output_fields_prompt()
  4. Consider extracting locale middleware to separate PR

Nice to Have

  1. Add more comprehensive test coverage for edge cases
  2. Add JavaScript unit tests
  3. Consider using $watch instead of x-effect for better control

🎯 Overall Assessment

Quality Score: 7.5/10

This is a solid feature implementation with good structure and testing fundamentals. The main concerns are:

  • Missing duplicate validation (impacts user experience)
  • Missing tests for the Pydantic schema generation with Selection type (critical for LLM integration)
  • Some minor performance and code quality issues

The code follows the project's conventions well and demonstrates good understanding of the Django/Alpine.js architecture. With the recommended fixes, this would be production-ready.

Recommendation: Request changes for the duplicate validation and additional tests, then approve.


Review generated by Claude Code

@claude
Copy link

claude bot commented Nov 25, 2025

Code Review: Selection Output Type Feature

Thank you for this comprehensive PR! The addition of a Selection output type is a valuable feature. Here's my detailed review:


✅ Strengths

Well-Structured Implementation

  • Clean separation of concerns: Form logic, model validation, and JavaScript component are properly separated
  • Good test coverage: Tests cover form validation, model validation, and edge cases
  • Proper migration strategy: Two migrations cleanly add the new fields
  • User experience: The Alpine.js component provides good interactivity

Code Quality

  • Type hints are used appropriately in Python code
  • JSDoc comments document the JavaScript functions
  • Follows Django best practices with model clean() methods

🔍 Issues & Recommendations

HIGH PRIORITY

1. Missing Model save() override (models.py:107-133)

The OutputField.clean() method modifies self.selection_options, but this won't persist unless you also override save():

def save(self, *args, **kwargs):
    self.full_clean()  # Ensure validation runs
    super().save(*args, **kwargs)

Why: Currently, clean() is only called by forms/admin, not by direct model operations. The trimming/cleaning logic won't run for programmatic saves.

2. JSON Injection Risk (forms.py:225)

self.initial["selection_options"] = json.dumps(initial_options)

While the data comes from the database (trusted source), it's good practice to validate that initial_options is actually a list before dumping:

if isinstance(initial_options, list):
    self.initial["selection_options"] = json.dumps(initial_options)
else:
    self.initial["selection_options"] = "[]"

3. Inconsistent Validation (processor_utils.py:24)

The error message differs from form/model validation:

  • Form/Model: "Add at least one selection option"
  • Processor: "Selection output requires at least one option"

Recommendation: Use consistent messages, or better yet, call field.full_clean() before processing.


MEDIUM PRIORITY

4. Array Toggle Button Accessibility (core.js:119-123)

Good that you added aria-pressed, but the button should also have an aria-label:

arrayToggleButton.setAttribute("aria-label", 
  this.isArray ? "Disable array output" : "Enable array output");

5. Potential Empty Literal Type (processor_utils.py:25)

If field.selection_options is an empty tuple after validation failure, this creates an invalid type:

output_type = Literal.__getitem__(options)  # Literal[()] is invalid

The earlier check (line 23-24) should prevent this, but consider making it more explicit with an assert or raising a more specific exception.

6. Formset Template Formatting Change (cotton/formset.html)

The indentation change seems unrelated to this PR. Was this intentional? It changes from 4-space to 0-space indentation for the template/div elements.

Recommendation: Revert formatting changes unrelated to the feature, or explain why this was needed.

7. Magic Number: 7 Options Max (multiple files)

The limit of 7 options appears hardcoded in multiple places:

  • forms.py:274
  • models.py:115
  • core.js:108

Recommendation: Define as a constant in one place:

# models.py
class OutputField(models.Model):
    MAX_SELECTION_OPTIONS = 7
    # ... use OutputField.MAX_SELECTION_OPTIONS everywhere

LOW PRIORITY / SUGGESTIONS

8. Test Coverage Gaps

The tests are good but could be expanded:

  • ✅ Form validation with/without selections
  • ✅ Model validation
  • ✅ Array toggle
  • Missing: Integration test for generate_output_fields_schema() with Selection type
  • Missing: Test for empty selection options edge case in processor
  • Missing: Test for array + selection combination

Suggested test:

@pytest.mark.django_db
def test_generate_schema_with_selection_type():
    job = ExtractionJobFactory.create()
    OutputField.objects.create(
        job=job,
        name="grade",
        output_type=OutputType.SELECTION,
        selection_options=["A", "B", "C"]
    )
    schema = generate_output_fields_schema(job.output_fields)
    # Verify schema has Literal["A", "B", "C"] type

9. JavaScript Error Handling (core.js:68)

} catch (err) {
    console.warn("Invalid selection options payload", err);
}
return [];

Consider providing user feedback instead of silently failing. Perhaps set an error state that the template can display.

10. LocaleMiddleware Addition (settings/base.py:95)

This appears unrelated to the Selection feature. Should this be in a separate PR?

11. Enter Key Prevention (core.js:49-64)

Good addition to prevent accidental form submission, but:

  • The attribute name data-prevent-enter-submit is verbose. Consider data-no-enter-submit
  • Only added to wizard summary form—should it apply to other forms too?

🔒 Security Analysis

✅ No Critical Issues Found

Validated:

  • ✅ JSON parsing has proper error handling
  • ✅ XSS prevention: Data is properly escaped in templates
  • ✅ No SQL injection risk (using Django ORM)
  • ✅ Input validation at multiple layers (form, model, processor)
  • ✅ No sensitive data exposure

Minor note: The console.warn at core.js:68 could leak some info in browser console, but this is acceptable for debugging.


🧪 Test Coverage Summary

Coverage: Good (estimated ~80%)

Covered:

  • Form validation (empty options, wrong type)
  • Model validation (trim, max options)
  • Array toggle functionality

Needs Coverage:

  • Processor integration with Selection type
  • Array + Selection combination
  • Edge cases in schema generation

🎯 Performance Considerations

Good:

  • ✅ JSONField is indexed efficiently by Django
  • ✅ No N+1 queries introduced
  • ✅ JavaScript is lightweight and doesn't block

Minor optimizations:

  • The generate_output_fields_prompt() function could cache results if called frequently
  • Consider adding a database index on output_type if filtering by type is common

📋 Checklist Before Merge

  • Revert unrelated formatting changes (cotton/formset.html, LocaleMiddleware)
  • Add save() override to OutputField model or ensure full_clean() is always called
  • Make MAX_SELECTION_OPTIONS a constant
  • Add integration tests for processor with Selection type
  • Improve JavaScript error handling with user feedback
  • Add aria-label to array toggle button
  • Consistent error messages across form/model/processor

📚 Documentation

Consider adding:

  • User-facing documentation explaining the Selection type
  • Developer documentation on the 7-option limit rationale
  • Migration guide if existing installations need to be updated

Summary

This is a well-implemented feature with good test coverage and clean code organization. The main concerns are:

  1. Model validation not guaranteed to run (needs save() override)
  2. Unrelated changes should be separated
  3. Some test gaps for integration scenarios

The code follows the Google Python Style Guide well and demonstrates good Django/Alpine.js patterns. Great work overall! 🎉

Recommendation: Request changes for the model validation issue, then approve after addressing.

@claude
Copy link

claude bot commented Nov 25, 2025

PR Review: Selection Output Type

Thank you for this comprehensive feature addition! This PR adds a valuable "Selection" output type that behaves like an enum, allowing users to define up to 7 predefined options for LLM-based extraction. Overall, the implementation is solid and well-tested. Below are my findings:


Strengths

  1. Comprehensive test coverage: The new tests in test_forms.py cover key scenarios including form validation, selection options parsing, array toggle functionality, and edge cases.

  2. Good validation at multiple layers: Validation logic exists both in the form (OutputFieldForm.clean) and model (OutputField.clean), providing defense in depth.

  3. Clean Alpine.js integration: The SelectionOptions component follows the existing FormSet pattern nicely and integrates well with the Django form system.

  4. Proper use of Pydantic Literal types: The processor_utils.py correctly generates Pydantic schemas with Literal types for selections, which will enforce validation at the LLM processing layer.

  5. User experience enhancements: The array toggle button and the prevention of accidental form submission via Enter key are thoughtful UX improvements.


🐛 Potential Bugs & Issues

  1. Inconsistent whitespace handling in formset.html (radis/core/templates/cotton/formset.html:6-9)

    • The indentation changes appear to be unintentional. The template moved from proper indentation to no indentation for the <template> and <div class="formset-forms"> blocks.
    • Impact: This doesn't affect functionality but reduces code consistency.
    • Recommendation: Revert the whitespace changes unless they were intentional.
  2. Missing validation for duplicate selection options (radis/extractions/forms.py:268-288, radis/extractions/models.py:117-128)

    • The code doesn't check if selection options contain duplicates. For example: ["Grade 1", "Grade 1"] would be accepted.
    • Impact: Could lead to confusing UX and potentially ambiguous LLM responses.
    • Recommendation: Add duplicate checking in both OutputFieldForm.clean_selection_options and OutputField.clean.
  3. Locale middleware added without explanation (radis/settings/base.py:95)

    • The commit message for commit da048fb mentions "Allow middleware to detect locale from users browser", but this seems unrelated to the Selection output type feature.
    • Impact: This introduces functionality that's outside the scope of this PR and may have unintended consequences.
    • Recommendation: This should be in a separate PR. If it's required for translations in the selection feature, please document why.
  4. Potential XSS risk in template (radis/extractions/templates/extractions/_selection_options_field.html:27)

    • The template uses x-model="options[index]" which directly binds user input. While Alpine.js generally escapes output, the data eventually goes into a hidden JSON field.
    • Impact: Low risk since Django form validation sanitizes the data, but worth noting.
    • Recommendation: Current implementation is likely safe, but ensure the JSON encoding/decoding is always done through Django's form system, not client-side submission.
  5. Array toggle button accessibility (radis/extractions/forms.py:238-243)

    • The array toggle button has aria-pressed which is good, but no aria-label or accessible name.
    • Impact: Screen reader users may not understand what the "[ ]" button does.
    • Recommendation: Add an aria-label attribute, e.g., aria-label="Toggle array output".

Performance Considerations

  1. JSON field queries: Using JSONField for selection_options is appropriate, but be aware that querying/filtering on this field in the database won't be indexed.

    • Impact: Minimal for this use case since you're not filtering by selection values.
  2. Frontend reactivity: The x-effect="syncState()" in the Alpine component runs on every reactive change, which could be optimized.

    • Impact: Low - the form is small and changes are infrequent.
    • Recommendation: Consider debouncing if performance issues arise in the future.

🔒 Security Concerns

  1. JSON injection protection: The form properly uses json.loads() and validates the structure, which is good. The error handling for JSONDecodeError prevents crashes.

    • ✅ Well implemented.
  2. Input sanitization: String options are stripped of whitespace and checked for empty values.

    • ✅ Good defense against trivial injection attempts.
  3. Max options limit: The 7-option limit is enforced in multiple places (form, model, JavaScript).

    • ✅ Prevents potential DoS through large payloads.
  4. Type validation: The code checks that all options are strings and rejects other types.

    • ✅ Prevents type confusion attacks.

No significant security vulnerabilities identified.


🧪 Test Coverage

Excellent test coverage! The new tests include:

  • ✅ Form accepts valid selection options
  • ✅ Form requires options for Selection type
  • ✅ Form rejects options for non-Selection types
  • ✅ Model validation trims whitespace
  • ✅ Model validation rejects options for wrong types
  • ✅ Array toggle functionality

Suggestions for additional tests:

  1. Test duplicate detection (once implemented)
  2. Test the JavaScript SelectionOptions component behavior (consider Playwright or similar for E2E)
  3. Test generate_output_fields_schema with selection types in test_processors.py
  4. Test that selection options appear correctly in the LLM prompt via generate_output_fields_prompt
  5. Test edge case: what happens if selection_options is set to a very long string that exceeds typical limits?

📝 Code Quality & Best Practices

  1. Follows Google Python Style Guide: The code is well-formatted and uses type hints appropriately. ✅

  2. Magic number: The maxOptions: 7 is defined in JavaScript but should ideally be a constant shared between backend and frontend.

    • Recommendation: Consider defining this in Django settings and passing it to the template context.
  3. Error messages: User-facing error messages are clear and helpful. ✅

  4. Consistent naming: selection_options, is_array, etc. follow Django conventions. ✅

  5. DRY principle: Some validation logic is duplicated between form and model, but this is acceptable for defense in depth.


🔧 Minor Issues & Suggestions

  1. Type annotation in processor_utils.py:25: Using Literal.__getitem__(options) is clever but slightly unconventional. Consider adding a comment explaining this pattern for future maintainers.

  2. Help text: The form fields could benefit from help_text attributes to guide users.

  3. Migration dependencies: Ensure migrations 0004 and 0005 are tested in a clean database setup.

  4. Enter key prevention: The data-prevent-enter-submit feature is useful but only applied to the summary form. Consider applying it consistently to all multi-step forms.


🎯 Recommendations Summary

Must fix:

  • ❗ Remove or explain the LocaleMiddleware addition (separate PR)
  • ❗ Add duplicate detection for selection options

Should fix:

  • ⚠️ Revert unintentional formset.html whitespace changes
  • ⚠️ Add aria-label to array toggle button

Nice to have:

  • 💡 Add tests for processor_utils.py with selection types
  • 💡 Share max options constant between backend and frontend
  • 💡 Add form field help text

Conclusion

This is a well-implemented feature with good test coverage and thoughtful UX considerations. The code follows Django best practices and the Google Python Style Guide. The main concerns are the unrelated middleware change and the missing duplicate validation. Once those are addressed, this will be ready to merge!

Great work overall! 🚀

@claude
Copy link

claude bot commented Nov 26, 2025

Pull Request Review: Selection Output Type Feature

Summary

This PR adds a new "Selection" output type that allows users to define enumerated values (up to 7 options) for extraction output fields. The implementation includes database migrations, form validation, Alpine.js components, and comprehensive test coverage. Overall, this is a well-structured and thoughtfully implemented feature.


✅ Strengths

Code Quality

  • Excellent separation of concerns: Frontend (Alpine.js), backend (Django forms/models), and validation logic are cleanly separated
  • Consistent validation: Both client-side (JS) and server-side (Python) validation with matching rules
  • Good defensive programming: Proper null/undefined checks in JavaScript (hiddenInput?.value, arrayInput?.value)
  • Well-documented code: Clear JSDoc comments for JavaScript functions

Test Coverage

  • Comprehensive test suite: 10 new tests covering various scenarios:
    • Valid selection options
    • Empty selection validation
    • Duplicate detection
    • Whitespace trimming
    • Unicode and long option support
    • Array toggle functionality
    • Type-specific validation
  • Both unit and integration tests: Tests for forms, models, and processor utilities

User Experience

  • Smart state preservation: The lastSelectionOptions mechanism preserves user input when switching output types
  • Good accessibility: Proper ARIA attributes (aria-pressed, aria-label)
  • Clear validation messages: User-friendly error messages at both form and model levels

@claude
Copy link

claude bot commented Nov 26, 2025

⚠️ Issues & Recommendations

1. Security: JSON Injection Risk ⚠️

Location: radis/extractions/forms.py:259-268

The form accepts JSON from user input but only validates the structure after parsing. Consider adding a size limit before parsing to prevent potential DoS attacks with large payloads.

Recommendation: Add a length check before json.loads():

if len(raw_value) > 10000:  # Reasonable limit
    raise forms.ValidationError("Selection data too large.")

2. Potential Bug: Performance in syncState() 🐛

Location: radis/core/static/core/core.js:206-210

The x-effect="syncState()" directive runs on every reactive change, which could cause performance issues if options array is modified frequently.

Recommendation: Consider debouncing the sync operation or only update lastSelectionOptions when actually saving.

3. Code Duplication: Validation Logic 🔄

Location: radis/extractions/forms.py:259-287 and radis/extractions/models.py:107-140

Nearly identical validation logic exists in both form and model. While defense-in-depth is good, the duplication makes maintenance harder.

Recommendation: Extract to a shared validator function that both can use.

4. Missing: Maximum String Length Validation 📏

Location: radis/extractions/forms.py:274-280

While you validate for empty strings and duplicates, there's no maximum length check for individual selection options. Your test creates a 156-character string, but there's no enforced limit.

Recommendation: Add a reasonable maximum length (e.g., 200 characters) to prevent database issues and UI rendering problems.

5. Code Style: Magic Numbers 🔢

Location: Multiple files

The number 7 (max options) appears in multiple places without explanation:

  • radis/core/static/core/core.js:136 - maxOptions: 7
  • radis/extractions/forms.py:282 - if len(cleaned) > 7
  • radis/extractions/models.py:116 - if len(self.selection_options) > 7

Recommendation: Extract to a named constant like SELECTION_MAX_OPTIONS with a comment explaining it's optimized for LLM structured output.

6. JavaScript: Console.log Left in Production 🐛

Location: Lines 38 and 87 in core.js

console.log(this.formCount);  // Line 87
console.log(inputs);  // Line 38

Recommendation: Remove or replace with proper logging.

@claude
Copy link

claude bot commented Nov 26, 2025

🔍 Edge Cases & Additional Considerations

Edge Cases to Consider:

  1. XSS in selection_options: Appears safe due to Django template auto-escaping, but verify in all templates where options are displayed
  2. Database migration rollback: Migrations look reversible, but worth testing the rollback path
  3. Scale concerns: Consider what happens with many forms containing selection fields (memory usage)

📊 Performance Assessment

Current Performance: ✅ Good

  • JSON field for selection_options is efficient for small arrays
  • Boolean field for is_array is optimal
  • No additional database queries introduced

Potential Improvements:

  1. Consider adding a database index on output_type if filtering by type becomes common
  2. The Alpine.js component could benefit from debouncing in high-interaction scenarios

🛡️ Security Assessment

Overall: ✅ Good with minor concerns

Strengths:

  • Django's built-in XSS protection via template auto-escaping
  • JSON validation before processing
  • Server-side validation prevents malicious data
  • No SQL injection risk (using ORM)
  • CSRF protection handled by Django

Concerns:

  1. ⚠️ No size limit on JSON input before parsing
  2. ⚠️ No sanitization of selection option content (special characters, emojis)

✅ Test Coverage Assessment

Coverage: Excellent 🌟

The PR includes 10 new tests that cover:

  • ✅ Happy path scenarios
  • ✅ Validation edge cases
  • ✅ Unicode handling
  • ✅ Array functionality
  • ✅ Processor utility integration

Suggested Additional Tests:

  1. Integration test for the full form submission flow
  2. JavaScript unit tests for the SelectionOptions component
  3. Test for migration rollback
  4. Test for XSS prevention in templates

@claude
Copy link

claude bot commented Nov 26, 2025

🎯 Overall Assessment

Category Rating Notes
Code Quality ⭐⭐⭐⭐⭐ Clean, well-structured, follows Django best practices
Test Coverage ⭐⭐⭐⭐⭐ Comprehensive tests covering edge cases
Security ⭐⭐⭐⭐☆ Minor improvements needed (size limits)
Performance ⭐⭐⭐⭐⭐ No performance concerns
UX/Accessibility ⭐⭐⭐⭐⭐ Good ARIA support, clear UI
Documentation ⭐⭐⭐☆☆ Code is clear, but missing user docs

Overall: ⭐⭐⭐⭐⭐ (4.7/5)


✅ Approval Recommendation

APPROVE with minor changes

This is a high-quality PR that demonstrates:

  • Strong engineering practices
  • Comprehensive testing
  • Good UX design
  • Proper validation

The issues identified are minor and can be addressed in follow-up PRs or quick fixes. None are blocking.

Suggested Action Items (Priority Order):

  1. 🔴 High: Add JSON size limit validation (security concern)
  2. 🟡 Medium: Remove console.log statements
  3. 🟡 Medium: Add max length validation for individual selection options
  4. 🟢 Low: Extract validation to shared function (reduce duplication)
  5. 🟢 Low: Standardize error messages between form and model
  6. 🟢 Low: Extract magic number 7 to a named constant
  7. 🟢 Low: Add user documentation for the Selection type feature

Great work on this feature! The implementation is solid, the tests are comprehensive, and the code quality is excellent. 🎉


Reviewed by: Claude (AI Code Review Assistant)
Review Date: 2025-11-26
PR: #152 - Selection output type

@claude
Copy link

claude bot commented Nov 26, 2025

Pull Request Review: Selection Output Type

Overview

This PR adds a "Selection" output type (enum-like) to the extraction system, along with an array toggle feature. The implementation includes frontend Alpine.js components, Django forms/models, migrations, and comprehensive test coverage.

✅ Strengths

Code Quality

  • Excellent test coverage: Comprehensive tests for form validation, model validation, edge cases (duplicates, whitespace, unicode, long strings)
  • Good separation of concerns: Frontend logic in Alpine.js, backend validation in both form and model layers
  • Defensive validation: Validation exists at multiple layers (form + model) which provides good data integrity
  • User experience: State preservation when switching between output types is a nice touch

Implementation Details

  • Well-structured JavaScript component with clear responsibilities
  • Proper use of Django's validation framework
  • Good accessibility features (aria-pressed, aria-label)
  • Migration files look clean and correct

🔍 Issues & Concerns

1. Security: XSS Vulnerability Risk ⚠️ HIGH

Location: radis/extractions/forms.py:236-243

The HTML button is hardcoded in the form layout without escaping. While this specific case is safe (static HTML), it sets a risky pattern. More importantly, selection options could be vulnerable to XSS when displayed in templates if not properly escaped.

Recommendation:

  • Verify that radis/extractions/templates/extractions/extraction_job_detail.html:127 and extraction_job_wizard_summary.html:104 properly escape {{ option }} when rendering
  • Consider using Django template components or a separate template file instead of inline HTML strings

2. Data Integrity: JSONField Validation Timing ⚠️ MEDIUM

Location: radis/extractions/models.py:107-139

The clean() method modifies self.selection_options in place, which can cause issues:

  • Changes occur before save, potentially causing confusion about when validation happens
  • Direct modification of mutable fields in clean() can have side effects

Recommendation: Store the cleaned options in a variable and only assign if validation passes completely.

3. Magic Number 📊 LOW

Location: Multiple files

The number 7 (max selection options) is hardcoded in:

  • JavaScript: core.js:136
  • Python form: forms.py:282
  • Python model: models.py:115

Recommendation: Extract to a constant (e.g., MAX_SELECTION_OPTIONS = 7) that can be shared and easily updated.

4. Form Validation Inconsistency 🔧 LOW

Location: radis/extractions/forms.py:289-293

The clean_is_array() method silently defaults to False for invalid input. While pragmatic, this could hide bugs.

Recommendation: Consider logging unexpected values or being more explicit about what values are accepted.

5. Incomplete Whitespace Formatting 📝 LOW

Location: radis/core/templates/cotton/formset.html

The indentation changes appear to be unintentional whitespace modifications:

-    <template>
+<template>

This is inconsistent with the rest of the file formatting.

Recommendation: Revert these whitespace-only changes unless they were intentional reformatting.

6. JavaScript Error Handling 🐛 LOW

Location: radis/core/static/core/core.js:180-182

JSON parsing errors are only logged to console with console.warn(). Users won't see these errors, making debugging difficult.

Recommendation: Consider surfacing parse errors to the user or having a more robust error recovery strategy.

7. Potential Race Condition ⚡ LOW

Location: radis/core/static/core/core.js:216-222

Using $nextTick() to focus the input after adding an option relies on Alpine's reactivity cycle. If the DOM update is delayed, focus might not work.

Recommendation: While this is likely fine, consider testing on slower devices/browsers.

8. Type Safety 🔤 LOW

Location: radis/extractions/utils/processor_utils.py:22-25

The Literal type construction is correct, but there's no runtime check that selection_options is non-empty before converting to tuple. The check on line 23-24 helps, but it comes after the assignment on line 22.

Recommendation: Reorder lines 22-24 to check emptiness before creating the tuple.

🧪 Test Coverage

Excellent coverage including:

  • ✅ Form validation (valid/invalid inputs)
  • ✅ Model validation
  • ✅ Edge cases (duplicates, whitespace, unicode, long strings)
  • ✅ Array toggle functionality
  • ✅ Schema generation with Literal types

Missing tests:

  • ❌ Frontend JavaScript component tests (Alpine.js logic)
  • ❌ Integration tests for the full form submission flow
  • ❌ Tests for the "prevent Enter submit" functionality

🎯 Performance Considerations

  • JSONField lookups: Querying/filtering by selection_options will be slow without proper indexing. If you need to query by selection values, consider a separate related model.
  • Frontend: The Alpine component is lightweight and should perform well for the expected use case (≤7 options).

🔒 Security Summary

  1. XSS Risk: Verify template escaping for selection options ⚠️
  2. Input Validation: Strong validation at multiple layers ✅
  3. JSON Injection: Proper JSON parsing with error handling ✅
  4. SQL Injection: Using Django ORM, no direct SQL ✅

📋 Recommendations Summary

High Priority

  1. ⚠️ Verify XSS protection in templates where selection options are displayed
  2. 🔧 Review and potentially refactor the model clean() method to avoid in-place mutation

Medium Priority

  1. 📊 Extract magic number 7 to a shared constant
  2. 🧪 Add integration tests for the full form flow
  3. 📝 Revert unintentional whitespace changes in formset.html

Low Priority

  1. 🐛 Improve JavaScript error handling UX
  2. 🔤 Add more explicit type checking in processor_utils.py
  3. 📖 Consider adding JSDoc comments for the Alpine component's public interface

🎉 Conclusion

This is a solid implementation with excellent test coverage and good UX considerations. The main concerns are:

  • Potential XSS vulnerability (needs verification)
  • Some code organization improvements (magic numbers, validation timing)

The feature is well-designed and the code quality is generally high. After addressing the XSS verification and considering the data integrity concern, this should be ready to merge.

Overall Assessment: ✅ Approve with minor changes requested

Great work on the comprehensive testing! 🚀

@claude
Copy link

claude bot commented Nov 27, 2025

Pull Request Review: Selection Output Type

Overview

This PR adds a "Selection" output type to the extraction system, allowing users to define enumerated values (up to 7 options) for LLM responses. The implementation includes frontend Alpine.js components, Django form/model updates, and comprehensive test coverage.

✅ Strengths

Code Quality

  • Well-structured validation: Excellent dual-layer validation in both OutputFieldForm.clean() and OutputField.clean() methods
  • Comprehensive test coverage: Strong test suite covering edge cases (duplicates, whitespace, unicode, array toggles)
  • Type safety: Good use of Pydantic Literal types in processor_utils.py for runtime validation
  • User experience: Smart preservation of selection options when switching output types (lines 154-166 in core.js)

Best Practices

  • Data sanitization: Proper trimming and cleaning of selection options before persistence
  • Accessibility: Includes aria-pressed attributes on toggle buttons
  • Clear error messages: User-friendly validation error messages

🔍 Issues & Suggestions

1. Critical: Indentation-Only Change in formset.html

Location: radis/core/templates/cotton/formset.html (lines 6-10)

Issue: This appears to be an unintentional formatting change. The indentation reduction doesn't improve the code and creates unnecessary diff noise.

Recommendation: Revert these whitespace-only changes unless there's a specific reason for them.

2. Code Quality: Leftover Debug Statement

Location: core.js (line 38)

console.log(inputs);

Issue: Debug console.log should be removed from production code.

Recommendation: Remove this line.

3. JavaScript: Potential Memory Leak

Location: core.js (lines 145-151)

Issue: Event listeners are added in init() without cleanup. If Alpine.js components are destroyed/recreated, this could accumulate listeners.

Recommendation: Consider using Alpine's @click directive in the template instead to let Alpine handle lifecycle management automatically.

4. Performance: x-effect Usage

Location: _selection_options_field.html (line 4)

x-effect="syncState()"

Issue: The x-effect="syncState()" runs on every Alpine tick. This could be inefficient.

Recommendation: Consider using @change events instead for more explicit control.

5. Type Safety: Variable Name Reuse

Location: processor_utils.py (line 30)

output_type = list[output_type]

Issue: Type checkers may flag this as reassigning a type. While it works at runtime, it's technically reusing a variable name for a different type.

Recommendation: Use a different variable name for clarity.

6. UX: Max Options Limit Not Documented

Location: Multiple files (hardcoded to 7)

Issue: The 7-option maximum is hardcoded without explanation in comments. Is this a UX decision or an LLM limitation?

Recommendation: Add a comment explaining the rationale.

7. Testing: Missing Integration Tests

Issue: No tests verify the full form submission flow with JavaScript enabled.

Recommendation: Add integration tests to ensure the Alpine.js components work correctly end-to-end.

🎯 Performance Considerations

  • Database: JSONField is appropriate for selection_options
  • Frontend: Alpine.js reactivity is efficient for this use case
  • ⚠️ Minor: Consider more explicit change tracking instead of x-effect as mentioned above

🔒 Security Assessment

  • SQL Injection: Parameterized queries via Django ORM
  • XSS: Django templates auto-escape by default
  • JSON Injection: Proper json.dumps()/JSON.parse() usage
  • Input Validation: Strong validation on both frontend and backend
  • No security vulnerabilities identified

📋 Test Coverage

Excellent test coverage! The PR includes:

  • ✅ Form validation tests (8 test cases)
  • ✅ Model validation tests
  • ✅ Pydantic schema generation tests
  • ✅ Edge cases: unicode, long strings, duplicates, whitespace
  • ⚠️ Missing: Integration tests for the full form submission flow with JavaScript enabled

🎨 Style & Conventions

  • ✅ Follows Google Python Style Guide
  • ✅ Proper type hints throughout
  • ✅ Consistent naming conventions
  • ✅ Clean code organization

📝 Recommendations Summary

Must Fix (Before Merge)

  1. Remove console.log(inputs) from core.js:38
  2. Review formset.html indentation changes - revert if unintentional

Should Fix

  1. Consider moving event listener to Alpine directive to prevent potential memory leaks
  2. Add comment explaining the 7-option maximum limit
  3. Rename variable in processor_utils.py:30 to avoid type reassignment

Nice to Have

  1. Add integration tests with JavaScript
  2. Consider more explicit change tracking instead of x-effect

✨ Overall Assessment

Strong PR with well-thought-out implementation! The code demonstrates good engineering practices with comprehensive validation, test coverage, and user experience considerations. The issues identified are mostly minor refinements rather than fundamental problems. With the debug statement removed and the indentation clarified, this is ready for merge.

Recommendation: ✅ Approve with minor changes

@claude
Copy link

claude bot commented Nov 27, 2025

Code Review: Selection Output Type Feature

Thank you for this comprehensive implementation! This is a well-structured PR that adds selection/enum type support to extraction output fields. Here's my detailed review:

✅ Strengths

  1. Excellent test coverage - Comprehensive tests for forms, models, and processor utils covering edge cases like:

    • Empty selections, duplicates, whitespace-only values
    • Unicode and long option strings
    • Array toggling functionality
  2. Good separation of concerns - Clean separation between form validation, model validation, and business logic

  3. User experience - The Alpine.js component provides good UX with dynamic form updates and state preservation

  4. Documentation - The PR description clearly explains what's being added and why

🔍 Code Quality Issues

1. Magic Number Extraction ✅ (Fixed in latest commit)

The constant MAX_SELECTION_OPTIONS = 7 was properly extracted to constants.py. Good job!

2. JavaScript: Missing JSDoc for inner functions

In core.js:126-141, the helper functions parseArrayValue and parseMaxOptions lack documentation:

// Add JSDoc comments like:
/**
 * Parses a string value to boolean for array toggle state.
 * @param {string} value - The value to parse
 * @returns {boolean}
 */
const parseArrayValue = (value) => { ... }

3. Potential XSS in Template (Low Risk)

In _selection_options_field.html, the placeholder uses Alpine's string interpolation which is safe, but ensure all user input is properly escaped when rendered elsewhere.

4. Inconsistent Indentation in formset.html

Lines 6-10 in formset.html have inconsistent indentation (seems like a reformat). While this works, it's stylistically inconsistent with the rest of the file.

🐛 Potential Bugs

1. Race Condition in Alpine Component

In core.js:162-176, the outputTypeField.addEventListener('change') could fire before the component is fully initialized. Consider using Alpine's @change directive in the template instead for better lifecycle management.

2. Form Submission on Enter Key

The data-prevent-enter-submit attribute (lines 49-64) prevents form submission, but it may interfere with users who expect Enter to submit after filling the last field. Consider:

  • Only preventing Enter in the middle of multi-field forms
  • Adding visual feedback that Enter is disabled

3. Missing NULL Check in processor_utils.py

Line 25: Literal[*options] - While the code checks if not options, ensure that field.selection_options can never be None from the database. The model has default=list, which should be safe, but add a comment explaining this dependency.

⚡ Performance Considerations

1. Redundant Trimming

Selection options are trimmed in three places:

  • Form's clean_selection_options (forms.py:279)
  • Model's clean method (models.py:131)
  • JavaScript's syncOptions (core.js:204)

Recommendation: Trim only in the form's clean method and JavaScript. The model's clean should trust the form validation for performance.

2. JSON Serialization on Every Change

The syncState method runs on x-effect, meaning it serializes to JSON on every keystroke. Consider debouncing this operation:

syncState: Alpine.debounce(function() {
  this.syncOptions();
  this.syncArray();
  this.updateArrayButton();
}, 300)

🔒 Security Concerns

1. JSON Injection ✅ (Properly Handled)

The code correctly uses json.loads with proper error handling and validation. No security issues found.

2. SQL Injection ✅ (Not Applicable)

Using Django ORM properly - no raw SQL.

3. Input Validation ✅ (Robust)

Good validation at multiple layers (form, model, JavaScript).

📚 Best Practices

1. Consider Adding Migration Data Validation

For existing installations, consider adding a data migration to validate any manually created OutputField records don't violate the new constraints.

2. Add Inline Comments for Complex Logic

In SelectionOptions.init() (core.js:149-176), the state restoration logic is complex. Add comments explaining:

  • Why lastSelectionOptions is needed
  • When options are preserved vs. cleared

3. Type Hinting in processor_utils.py

Line 8: type Numeric = float | int - This is Python 3.12+ syntax. Ensure your project targets Python 3.12+ or use Union[float, int] for compatibility.

4. Consider Field Name Validation

The model doesn't validate that OutputField.name follows valid Python identifier rules (needed since it's used as a Pydantic field name). Add validation:

def clean(self):
    super().clean()
    if not self.name.isidentifier():
        raise ValidationError({'name': 'Must be a valid Python identifier'})
    # ... rest of validation

🧪 Test Coverage Suggestions

  1. Missing Test: What happens when editing an existing field from Selection to Text type with saved options? (Should clear options)
  2. Missing Test: JavaScript component behavior when formset adds/removes forms dynamically
  3. Missing Test: Concurrent editing of selection options by multiple users
  4. Missing Test: Very long selection option strings (>1000 chars) - any practical limits?

📊 Summary

Overall Assessment: ⭐⭐⭐⭐ (4/5)

This is a solid, production-ready implementation with only minor issues. The code follows Django best practices and includes good test coverage.

Required Changes

  • None (all critical issues already fixed)

Recommended Changes (Before Merge)

  1. Add debouncing to syncState
  2. Add field name validation for Python identifier compatibility
  3. Document the state management logic in JavaScript

Nice-to-Have (Future PRs)

  1. Add JSDoc comments for helper functions
  2. Consider adding data migration for existing records
  3. Add missing test cases mentioned above

Great work on this feature! 🚀

@NumericalAdvantage NumericalAdvantage marked this pull request as ready for review November 27, 2025 15:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
radis/core/static/core/core.js (1)

110-247: Ensure array toggle click also syncs hidden field and button state

SelectionOptions looks solid overall (parsing JSON, enforcing maxOptions, handling SELECTION type), but the arrayToggleButton click handler currently only flips isArray:

arrayToggleButton.addEventListener("click", (event) => {
  event.preventDefault();
  this.toggleArray();
});

Unless the Alpine template separately calls syncState() or syncArray(), this risks the hidden data-array-input value and the button's active/aria-pressed state getting out of sync with isArray.

Consider wiring the click to sync everything so the component is self-contained, e.g.:

-      if (arrayToggleButton) {
-        arrayToggleButton.addEventListener("click", (event) => {
-          event.preventDefault();
-          this.toggleArray();
-        });
-        this.updateArrayButton();
-      }
+      if (arrayToggleButton) {
+        arrayToggleButton.addEventListener("click", (event) => {
+          event.preventDefault();
+          this.toggleArray();
+          this.syncState();
+        });
+        this.updateArrayButton();
+      }

This guarantees the posted is_array value and visual toggle always reflect the internal state, regardless of template wiring.

radis/extractions/templates/extractions/extraction_job_detail.html (1)

117-132: Consider extracting shared array/selection display into a partial

The “Array Output” and “Selections” blocks here mirror the wizard summary template. To keep future copy/structure changes in sync, consider extracting this snippet into a small include/partial shared by both views.

radis/extractions/utils/processor_utils.py (1)

40-50: Minor simplification opportunity in prompt generation

In generate_output_fields_prompt, you can avoid re-instantiating OutputType on each iteration and mirror the model’s pattern by comparing the raw value directly:

-        if OutputType(field.output_type) == OutputType.SELECTION and field.selection_options:
+        if field.output_type == OutputType.SELECTION and field.selection_options:

This is a small readability/robustness improvement and keeps the logic consistent with OutputField.clean.

radis/extractions/forms.py (1)

189-321: Form-side selection/array handling looks solid; consider centralizing validation logic

The new OutputFieldForm fields, layout wiring, and clean_* methods correctly:

  • Treat selection_options as hidden JSON serialized data, normalizing to a list of trimmed, unique strings within MAX_SELECTION_OPTIONS.
  • Parse is_array from the hidden field into a boolean.
  • Enforce that selections are present only when output_type == OutputType.SELECTION, mirroring OutputField.clean.

One improvement to consider long term is centralizing the selection-options validation (type checks, trimming, length, uniqueness) in a shared helper used by both OutputField.clean and OutputFieldForm.clean_selection_options, to avoid the two copies drifting as the rules evolve.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c338107 and 7f52438.

📒 Files selected for processing (14)
  • radis/core/static/core/core.css (1 hunks)
  • radis/core/static/core/core.js (2 hunks)
  • radis/core/templates/cotton/formset.html (1 hunks)
  • radis/extractions/constants.py (1 hunks)
  • radis/extractions/forms.py (3 hunks)
  • radis/extractions/migrations/0004_outputfield_selection_options.py (1 hunks)
  • radis/extractions/migrations/0005_outputfield_is_array.py (1 hunks)
  • radis/extractions/models.py (4 hunks)
  • radis/extractions/templates/extractions/_selection_options_field.html (1 hunks)
  • radis/extractions/templates/extractions/extraction_job_detail.html (1 hunks)
  • radis/extractions/templates/extractions/extraction_job_wizard_summary.html (1 hunks)
  • radis/extractions/tests/test_forms.py (1 hunks)
  • radis/extractions/tests/unit/test_processor_utils.py (1 hunks)
  • radis/extractions/utils/processor_utils.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
radis/extractions/utils/processor_utils.py (1)
radis/extractions/models.py (1)
  • OutputType (77-81)
radis/extractions/forms.py (1)
radis/extractions/models.py (2)
  • OutputType (77-81)
  • clean (109-147)
radis/extractions/tests/unit/test_processor_utils.py (2)
radis/extractions/models.py (1)
  • OutputType (77-81)
radis/extractions/utils/processor_utils.py (1)
  • generate_output_fields_schema (11-34)
🪛 GitHub Actions: Run Continuous Integration
radis/extractions/models.py

[error] 1-1: NotNullViolation: null value in column "owner_id" of relation "extractions_extractionjob" violates not-null constraint during INSERT. Ensure an owner is set when creating ExtractionJob.

radis/extractions/utils/processor_utils.py

[error] 24-24: ValueError: Selection output requires at least one option.

radis/extractions/tests/unit/test_processor_utils.py

[error] 1-1: NotNullViolation: null value in column "owner_id" of relation "extractions_extractionjob" during unit tests when creating ExtractionJob via factory.

radis/extractions/tests/test_forms.py

[error] 1-1: NotNullViolation: null value in column "owner_id" of relation "extractions_extractionjob" during test setup. ExtractionJobFactory.create() attempted to insert without an owner.

🪛 Ruff (0.14.6)
radis/extractions/utils/processor_utils.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)


27-27: Avoid specifying long messages outside the exception class

(TRY003)

radis/extractions/migrations/0005_outputfield_is_array.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-15: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

radis/extractions/forms.py

201-207: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


270-270: Avoid specifying long messages outside the exception class

(TRY003)


273-273: Avoid specifying long messages outside the exception class

(TRY003)


278-278: Avoid specifying long messages outside the exception class

(TRY003)


281-281: Avoid specifying long messages outside the exception class

(TRY003)


285-287: Avoid specifying long messages outside the exception class

(TRY003)


289-289: Avoid specifying long messages outside the exception class

(TRY003)

radis/extractions/migrations/0004_outputfield_selection_options.py

5-7: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


9-29: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (9)
radis/core/templates/cotton/formset.html (1)

6-11: Crispy rendering and formset template structure look correct

Using {% crispy ... %} inside <c-formset-form> keeps the formset template consistent with the JS FormSet helper and should not change behavior versus the previous |crispy usage.

radis/core/static/core/core.js (1)

49-65: Enter-key suppression for marked forms is scoped and safe

The data-prevent-enter-submit handler cleanly prevents accidental submits from text-like <input> elements while leaving other forms/pages unaffected. This is a good UX improvement with minimal risk.

radis/extractions/migrations/0005_outputfield_is_array.py (1)

1-15: Migration for is_array field matches the model change

Adding is_array = models.BooleanField(default=False) on OutputField with dependency on the previous selection-options migration is consistent and should apply cleanly.

radis/extractions/constants.py (1)

1-1: Max-selection-options constant is clear and aligned with usage

Defining MAX_SELECTION_OPTIONS = 7 centrally is appropriate and keeps model/form/UI validation in sync.

radis/extractions/templates/extractions/extraction_job_wizard_summary.html (1)

94-109: Array/selections summary and Enter-prevention wiring look consistent

The new “Array Output” and “Selections” rows give a clear summary of is_array and selection_options, and the wording aligns with the detail view. Adding data-prevent-enter-submit="true" to this form correctly hooks into the new JS to avoid accidental submits while editing.

Also applies to: 117-120

radis/extractions/models.py (1)

14-15: Selection output type and option validation are well-structured

The OutputType.SELECTION addition, selection_options/is_array fields, and the clean() implementation together provide a clear contract:

  • SELECTION fields must have 1–MAX_SELECTION_OPTIONS non-empty, string, whitespace-trimmed, unique options.
  • Non-SELECTION fields cannot carry selection_options and are normalized back to an empty list.

This is a robust place to centralize invariants that other layers (forms, processor utils, UI) can safely rely on.

Also applies to: 81-82, 92-93, 109-148

radis/core/static/core/core.css (1)

42-72: Selection/array toggle styles look consistent and non-invasive

The new flex gaps and array-toggle button styles integrate cleanly with the existing Bootstrap layout and the template markup; no issues spotted from a CSS/UX perspective.

radis/extractions/templates/extractions/_selection_options_field.html (1)

1-51: Selection options UI wiring looks coherent

The Alpine bindings, hidden form fields, and error display are all consistent with the form API and new CSS. The per-option controls and empty/supported states look correct.

radis/extractions/migrations/0004_outputfield_selection_options.py (1)

10-28: Migration matches model OutputType semantics

Adding selection_options as a JSONField with blank=True, default=list and constraining output_type choices (including "S", "Selection") is consistent with the OutputType TextChoices and should migrate cleanly.

Comment on lines +11 to +186
@pytest.mark.django_db
def test_output_field_form_accepts_selection_options():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.SELECTION,
"selection_options": json.dumps(["Grade 1", "Grade 2"]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert form.is_valid()
instance = form.save(commit=False)

assert instance.selection_options == ["Grade 1", "Grade 2"]


@pytest.mark.django_db
def test_output_field_form_requires_options_for_selection():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.SELECTION,
"selection_options": json.dumps([]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert not form.is_valid()
assert "selection_options" in form.errors


@pytest.mark.django_db
def test_output_field_form_rejects_options_for_non_selection():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.TEXT,
"selection_options": json.dumps(["Grade 1"]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert not form.is_valid()
assert "selection_options" in form.errors


@pytest.mark.django_db
def test_output_field_clean_trims_selection_options():
job = ExtractionJobFactory.create()
field = OutputField(
job=job,
name="tumor_grade",
description="Classified tumor grade.",
output_type=OutputType.SELECTION,
selection_options=[" Grade 1 ", "Grade 2 "],
)

field.full_clean()

assert field.selection_options == ["Grade 1", "Grade 2"]


@pytest.mark.django_db
def test_output_field_clean_rejects_selection_options_for_other_types():
job = ExtractionJobFactory.create()
field = OutputField(
job=job,
name="tumor_grade",
description="Classified tumor grade.",
output_type=OutputType.TEXT,
selection_options=["Grade 1"],
)

with pytest.raises(ValidationError):
field.full_clean()


@pytest.mark.django_db
def test_output_field_form_handles_array_toggle():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "measurements",
"description": "Multiple numeric values.",
"output_type": OutputType.NUMERIC,
"selection_options": json.dumps([]),
"is_array": "true",
},
instance=OutputField(job=job),
)

assert form.is_valid()
instance = form.save(commit=False)
assert instance.is_array is True


@pytest.mark.django_db
def test_output_field_form_rejects_duplicate_selection_options():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.SELECTION,
"selection_options": json.dumps(["Grade 1", "Grade 1"]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert not form.is_valid()
assert "selection_options" in form.errors


@pytest.mark.django_db
def test_output_field_clean_rejects_duplicate_selection_options():
job = ExtractionJobFactory.create()
field = OutputField(
job=job,
name="tumor_grade",
description="Classified tumor grade.",
output_type=OutputType.SELECTION,
selection_options=["Grade 1", "Grade 1"],
)

with pytest.raises(ValidationError):
field.full_clean()


@pytest.mark.django_db
def test_output_field_form_rejects_whitespace_only_selection_option():
job = ExtractionJobFactory.create()
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.SELECTION,
"selection_options": json.dumps(["Grade 1", " "]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert not form.is_valid()
assert "selection_options" in form.errors


@pytest.mark.django_db
def test_output_field_form_accepts_unicode_and_long_selection_options():
job = ExtractionJobFactory.create()
unicode_option = "Grädé αβγ測試"
long_option = "Grade " + ("X" * 150)
form = OutputFieldForm(
data={
"name": "tumor_grade",
"description": "Classified tumor grade.",
"output_type": OutputType.SELECTION,
"selection_options": json.dumps([unicode_option, long_option]),
"is_array": "false",
},
instance=OutputField(job=job),
)

assert form.is_valid()
instance = form.save(commit=False)
assert instance.selection_options == [unicode_option.strip(), long_option.strip()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ExtractionJobFactory usage in tests is causing NotNullViolation (missing owner)

CI is failing with a NotNullViolation on owner_id when ExtractionJobFactory.create() is called without an owner. All tests here construct jobs that way, so they’ll break as soon as owner becomes required.

Consider centralizing job creation in a pytest fixture that sets a valid owner via django_user_model, and then injecting that fixture into all tests instead of calling ExtractionJobFactory.create() directly:

@@
 import json
 
 import pytest
 from django.core.exceptions import ValidationError
 
 from radis.extractions.factories import ExtractionJobFactory
@@
-from radis.extractions.models import OutputField, OutputType
+from radis.extractions.models import OutputField, OutputType
+
+
+@pytest.fixture
+def job(django_user_model):
+    owner = django_user_model.objects.create_user(
+        username="test-owner",
+        password="test-pass",
+    )
+    return ExtractionJobFactory.create(owner=owner)
@@
-@pytest.mark.django_db
-def test_output_field_form_accepts_selection_options():
-    job = ExtractionJobFactory.create()
+@pytest.mark.django_db
+def test_output_field_form_accepts_selection_options(job):
@@
-        instance=OutputField(job=job),
+        instance=OutputField(job=job),

Apply the same pattern (accepting the job fixture and dropping the local ExtractionJobFactory.create() call) to the remaining tests in this module.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
radis/extractions/tests/test_forms.py lines 11-186: tests call
ExtractionJobFactory.create() without an owner which fails when owner becomes
required; create a pytest fixture (e.g., job) that accepts django_user_model to
create a user and then create the ExtractionJob via
ExtractionJobFactory.create(owner=user) and update every test in this file to
accept that job fixture (remove local ExtractionJobFactory.create() calls) so
each test uses the provided job with a valid owner.

Comment on lines +10 to +48
@pytest.mark.django_db
def test_generate_output_fields_schema_uses_literal_for_selection_fields():
job = ExtractionJobFactory.create()
field = OutputFieldFactory(
job=job,
name="grade",
output_type=OutputType.SELECTION,
)
field.selection_options = ["Grade 1", "Grade 2"]
field.save()

schema = generate_output_fields_schema(job.output_fields)

grade_field = schema.model_fields["grade"]
annotation = grade_field.annotation
assert get_origin(annotation) is Literal
assert set(get_args(annotation)) == {"Grade 1", "Grade 2"}


@pytest.mark.django_db
def test_generate_output_fields_schema_wraps_literal_in_list_for_array_selections():
job = ExtractionJobFactory.create()
field = OutputFieldFactory(
job=job,
name="grade_multi",
output_type=OutputType.SELECTION,
)
field.selection_options = ["High", "Low"]
field.is_array = True
field.save()

schema = generate_output_fields_schema(job.output_fields)

grade_field = schema.model_fields["grade_multi"]
annotation = grade_field.annotation
assert get_origin(annotation) is list
(inner_annotation,) = get_args(annotation)
assert get_origin(inner_annotation) is Literal
assert set(get_args(inner_annotation)) == {"High", "Low"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Schema assertions look good; fix factory usage to set ExtractionJob.owner

Both tests correctly check that SELECTION fields become Literal[...] and array selections become list[Literal[...]], matching generate_output_fields_schema.

However, CI is failing with NotNullViolation: null value in column "owner_id" of relation "extractions_extractionjob" when calling ExtractionJobFactory.create(). That indicates the factory (or this call site) is not providing a non-null owner, which is required via the base AnalysisJob model.

Update the tests (or the factory) so that ExtractionJob instances created here always have a valid owner, e.g. by:

  • Passing an owner explicitly in these tests, or
  • Adjusting ExtractionJobFactory so it constructs an owner by default.

After that change, re-run the test suite to confirm the migration/tests pass.

🤖 Prompt for AI Agents
In radis/extractions/tests/unit/test_processor_utils.py around lines 10 to 48,
the tests call ExtractionJobFactory.create() but fail with NotNullViolation due
to a missing required owner on ExtractionJob; fix by ensuring created
ExtractionJob instances have a valid owner: either pass an owner explicitly in
these tests (e.g., ExtractionJobFactory.create(owner=SomeUserFactory.create()))
or modify ExtractionJobFactory to always create and assign a default owner
(e.g., Add a related owner factory/factory_boy SubFactory on
ExtractionJobFactory) so the owner_id is never null; update both test cases
accordingly and re-run the test suite.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f52438 and ebd23b7.

📒 Files selected for processing (1)
  • radis/extractions/constants.py (1 hunks)

Comment on lines +1 to +2
MAX_SELECTION_OPTIONS = 7.0 # Maximum number of Enum values (selection output types) a
# user can add per Output Field the Extraction Output Fields Page.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Define as an integer, not a float.

The constant represents a maximum count and should be defined as an integer (7) rather than a float (7.0). Using a float for a count is semantically incorrect and may cause issues with type checkers.

Apply this diff:

-MAX_SELECTION_OPTIONS = 7.0  # Maximum number of Enum values (selection output types) a
+MAX_SELECTION_OPTIONS = 7  # Maximum number of Enum values (selection output types) a
 # user can add per Output Field the Extraction Output Fields Page.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MAX_SELECTION_OPTIONS = 7.0 # Maximum number of Enum values (selection output types) a
# user can add per Output Field the Extraction Output Fields Page.
MAX_SELECTION_OPTIONS = 7 # Maximum number of Enum values (selection output types) a
# user can add per Output Field the Extraction Output Fields Page.
🤖 Prompt for AI Agents
In radis/extractions/constants.py around lines 1 to 2, the MAX_SELECTION_OPTIONS
constant is defined as a float (7.0) but represents a count; change its
definition to an integer (7) so the value is semantically correct and type
checkers treat it as an int.

@NumericalAdvantage NumericalAdvantage force-pushed the selectionOutputType branch 2 times, most recently from 2569c89 to 2b33b22 Compare December 5, 2025 13:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
radis/core/static/core/core.js (1)

159-159: Remove unused variable.

The maxSelectionOptions variable is declared but never used. The actual max options value is computed by parseMaxOptions() and stored in initialMaxOptions at line 172.

Apply this diff to remove the unused variable:

-  const maxSelectionOptions = outputTypeField?.dataset.maxSelectionOptions;
-
   /*Reads the maximum number of selections allowed. 
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b33b22 and 7043d4a.

📒 Files selected for processing (3)
  • radis/core/static/core/core.js (2 hunks)
  • radis/extractions/models.py (4 hunks)
  • radis/extractions/templates/extractions/_selection_options_field.html (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • radis/extractions/models.py
  • radis/extractions/templates/extractions/_selection_options_field.html
🔇 Additional comments (3)
radis/core/static/core/core.js (3)

49-68: LGTM!

The Enter key prevention logic is well-implemented with appropriate type checking and selective application only to text inputs. The opt-in approach using data-prevent-enter-submit provides good flexibility.


184-227: Well-implemented state restoration logic.

The init() method correctly preserves user input when toggling between output types. The restoration logic at lines 202-226 ensures that selection options are saved when switching away from Selection type and restored when switching back, preventing data loss.


231-331: Excellent implementation with good practices.

The parsing, syncing, and manipulation methods demonstrate several strengths:

  • Robust error handling in parseOptions with try-catch
  • Data sanitization in syncOptions removes empty entries
  • Good UX with automatic focus management in addOption
  • Proper accessibility via aria-pressed attribute in updateArrayButton

@medihack
Copy link
Member

Closed in favor of #172 (the changes/commits are integrated there).

@medihack medihack closed this Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants