Skip to content

Js initial examsoft imports#15

Merged
mpetrowi merged 30 commits intomainfrom
js-initial-examsoft-imports
Feb 26, 2026
Merged

Js initial examsoft imports#15
mpetrowi merged 30 commits intomainfrom
js-initial-examsoft-imports

Conversation

@Bamboo72
Copy link
Copy Markdown
Contributor

@Bamboo72 Bamboo72 commented Feb 3, 2026

Note: this is what I consider to be a rough draft - I'm looking for suggestions on next steps. I know I need to work on clearing up a few TODO items, as well as doing more testing and documentation updates.

Copilot Summary of Changes:

This pull request introduces major improvements to the atomic_assessments_import library by refactoring its question classes, centralizing utility functions, and adding support for importing ExamSoft files in multiple formats. The changes enhance extensibility, maintainability, and allow for new import sources beyond CSV, such as RTF, DOCX, and HTML from ExamSoft.

Support for ExamSoft imports:

  • Added a new ExamSoft::Converter class to handle ExamSoft file formats (RTF, DOCX, HTML, XHTML) using Pandoc for conversion to HTML, and registered these converters in the main import module. This enables importing ExamSoft assessments alongside CSV. [1] [2] [3] [4]

Refactoring and code organization:

  • Moved the Question and MultipleChoice classes from the CSV-specific namespace to a shared Questions namespace, making them reusable for multiple import sources. [1] [2] [3]
  • Centralized the Utils module for shared utility functions (e.g., parse_boolean) so it can be used by both CSV and ExamSoft converters. [1] [2]

Extensibility improvements:

  • Refactored the converter registration logic to allow dynamic addition of new converters for different MIME types and sources, making it easier to support future import formats.

- Introduced ExamSoft converter for processing ExamSoft files.
- Registered new converters for different file types.
- Updated the convert method to accept an import source.
- Refactored question handling to support multiple choice questions.
- Added pandoc-ruby gem for document conversion.
@Bamboo72 Bamboo72 requested a review from mpetrowi February 3, 2026 20:20
Copy link
Copy Markdown
Contributor

@mpetrowi mpetrowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start! I like the reorganization of the Gem, and how it can easily be expanded later with more converters.

I haven't looked at the conversion code in detail yet, mostly because I don't have samples of exam soft exports. Send those to me separately so I can think about the conversion decisions.

What's next:

  1. Fix some small comments I made, nothing major.
  2. Write rspec tests. I think functional tests would be appropriate. So add some fixture files in different formats, have the specs run the conversion and write rspec rules to check what happens. When you add the fixtures, make sure there isn't customer data in them.

@Bamboo72
Copy link
Copy Markdown
Contributor Author

Bamboo72 commented Feb 5, 2026

This PR is to address this Atomic Assessments issue

Bamboo72 and others added 19 commits February 5, 2026 16:54
…t usage instructions, and improve converter registration and specs
…documents

- Created a new spec file for testing the loading of questions in the AtomicAssessmentsImport module, ensuring that multiple choice questions are correctly instantiated from various input formats.
- Added a spec file for utility functions, specifically testing the boolean parsing functionality with various inputs and defaults.
- Introduced sample documents in different formats (DOCX, HTML, RTF) to be used as fixtures for testing the import functionality.
… and update feedback fields in ExamSoft converter
Documents the heuristic chunker + field detector approach for
handling unknown ExamSoft export formats with best-effort parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce the ExamSoft::Chunker module with a Strategy base class and
MetadataMarkerStrategy that splits HTML documents on Folder:/Type: markers.
This is the first step in refactoring the ExamSoft converter from rigid
regex parsing to a flexible chunker+extractor pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a chunking strategy that splits HTML documents on numbered question
patterns (e.g., "1)" or "1.") while ignoring lettered answer options.
Header content before the first question is captured separately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add module-level .chunk(doc) method to ExamSoft::Chunker that tries each
strategy in priority order (MetadataMarker > NumberedQuestion > HeadingSplit >
HorizontalRuleSplit) and returns the first valid result. Falls back to
treating the entire document as a single chunk with a warning when no
strategy matches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three extractor detector classes for parsing ExamSoft question chunks:
- QuestionStemDetector: extracts question text, strips metadata prefixes and explanations
- OptionsDetector: finds lettered answer options with asterisk/bold correct markers
- CorrectAnswerDetector: determines correct answers from option markers or Answer: labels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Essay (longanswer) and ShortAnswer (shorttext) question type classes
that inherit from Question. Update Question.load to dispatch essay,
longanswer, short_answer, shorttext, and true_false question types.
Also tighten the /ma/ regex to /^ma$/ to avoid false matches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the monolithic ExamSoft converter with a pipeline that:
1. Normalizes input to HTML via Pandoc
2. Chunks the document into per-question segments
3. Extracts fields (stem, options, answers, metadata, feedback) per chunk
4. Builds Learnosity items/questions from extracted data
5. Collects warnings in :errors array instead of raising

Key fixes:
- Clean embedded newlines from stems and feedback text
- Set template to nil (not question type) to avoid ui_style errors
- Update specs to expect warnings instead of raised errors
- Fix HTML spec option-removal regex to use [^<] instead of [^\}]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove commented-out code, consolidate redundant require_relative
statements in exam_soft.rb, and apply safe rubocop auto-corrections
(modifier if/unless, %r regexp literals, safe navigation, etc).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mpetrowi mpetrowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting really close! Apologies for not getting to this earlier, I've had it on my todo list for far too long.

A couple comments, and I'd like to figure out the tag conversion before deploying this.

… <br> children into separate <p> elements; added single-chunk warning

— Fixed to collect multi-line feedback after tilde, stopping at option lines
— Added F, FIB, E, and SA type codes                                                                    — Added FITB correct answer detection from option texts
…onverter

- Improved category extraction to handle line-wrapped categories
- Updated title extraction to avoid truncation at parenthetical numbers
- Enhanced FillInTheBlank question type to build stimulus with response placeholders
- Added tests for new functionality in metadata and question stem detectors
- Clarified normalize_to_html method to show how it handles both file paths and file-like objects.
- Updated categories_to_tags method for better key-value extraction.
- Adjusted question handling for Multiple Answer types in the conversion process.
- Modified metadata extraction to support new category parsing logic.
- Updated Fill in the Blank and Multiple Choice question templates for consistency.
- Fixed integration tests to reflect changes in question data structure.
- Added ClozeDropdown class for handling dropdown options in fill-in-the-blank questions.
- Updated question_type for Essay from "longanswer" to "longtext".
- Improved validation structure for FillInTheBlank and Ordering questions.
- Added tests for new ClozeDropdown functionality and updated existing tests for consistency.
Copy link
Copy Markdown
Contributor

@mpetrowi mpetrowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion code looks much more capable. I'm not surprised AI is really good at writing that. This is really close.

A few changes:

  • Don't import anything that you're labeling as status draft. Doing that would just accumulate junk in the item bank. If we do need to support a full conversion that could later be fixed by hand I think we would make passage features with the examsoft source in them, so content could be hand-authored inside AA. But that would be a future refinement. The goal is to not have to do that at all.
  • The pandoc-ruby gem needs to move to the gemspec
  • Update the version in the gemspec

Thanks!

@mpetrowi
Copy link
Copy Markdown
Contributor

The version is here: lib/atomic_assessments_import/version.rb

Copy link
Copy Markdown
Contributor

@mpetrowi mpetrowi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Approved

@mpetrowi mpetrowi merged commit c35d738 into main Feb 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants