Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
e4a91fb
Add ExamSoft integration and refactor question handling
Bamboo72 Feb 3, 2026
35baba2
Cleaned up a few comments and code sections.
Bamboo72 Feb 3, 2026
7d46e2e
Update .gitignore, refine Gemfile dependencies, enhance convert scrip…
Bamboo72 Feb 5, 2026
f882e4a
Remove .DS_Store file
Bamboo72 Feb 5, 2026
0e54db9
Add tests for question loading and utility functions; include sample …
Bamboo72 Feb 6, 2026
42f73a6
Add tests for DOCX and HTML; include sample documents and error handling
Bamboo72 Feb 6, 2026
a5b35be
Add tests to verify question data structure in DOCX, HTML, and RTF co…
Bamboo72 Feb 6, 2026
396d891
Refactor convert_to_aa_format to accept dynamic import_from parameter…
Bamboo72 Feb 10, 2026
6d48ce2
Add design doc for flexible ExamSoft importer
Bamboo72 Feb 13, 2026
74541f0
feat: add chunker base class and MetadataMarkerStrategy
Bamboo72 Feb 13, 2026
1b7f111
feat: add NumberedQuestionStrategy for chunking
Bamboo72 Feb 13, 2026
eae8147
feat: add HeadingSplitStrategy and HorizontalRuleSplitStrategy
Bamboo72 Feb 13, 2026
944d1d0
feat: add Chunker orchestrator with strategy cascade
Bamboo72 Feb 13, 2026
33517f2
feat: add core field detectors (stem, options, correct answer)
Bamboo72 Feb 13, 2026
d46448e
feat: add metadata, feedback, and question type detectors
Bamboo72 Feb 13, 2026
4a97f95
feat: add Extractor orchestrator with field detection pipeline
Bamboo72 Feb 13, 2026
950e8f2
feat: add Essay and ShortAnswer question types
Bamboo72 Feb 13, 2026
0939911
feat: add FillInTheBlank, Matching, and Ordering question types
Bamboo72 Feb 13, 2026
c103dda
refactor: rewrite ExamSoft converter to use chunker + extractor pipeline
Bamboo72 Feb 14, 2026
5c3779d
test: add integration tests for mixed types, messy docs, backward compat
Bamboo72 Feb 14, 2026
8bb5548
chore: cleanup after ExamSoft converter refactor
Bamboo72 Feb 14, 2026
ea874af
— Added normalize_html_structure(doc) that splits <p> tags containing…
Bamboo72 Feb 19, 2026
8e6847f
feat: enhance metadata extraction and question handling in ExamSoft c…
Bamboo72 Feb 19, 2026
242c104
Address review comments:
Bamboo72 Feb 20, 2026
18bfb28
feat: Ensured all current question types work when imported.
Bamboo72 Feb 22, 2026
323249e
feat: prevent inclusion of items with empty definition.widgets in con…
Bamboo72 Feb 24, 2026
22ae432
feat: update status handling for unsupported question types and missi…
Bamboo72 Feb 24, 2026
f2c79f8
feat: expand supported file types and update usage instructions in RE…
Bamboo72 Feb 24, 2026
58af77f
feat: update extractor logic to handle non-published status and add p…
Bamboo72 Feb 25, 2026
1422b27
Fix CI
mpetrowi Feb 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/github-actions-ci-rspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ jobs:
ruby-version: ${{ matrix.ruby-version }}
bundler-cache: true

- name: Install pandoc
run: sudo apt-get install -y pandoc

- name: Install dependencies
env:
RAILS_ENV: test
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,6 @@

# rspec failure tracking
.rspec_status

# MacOS system files
.DS_Store
4 changes: 3 additions & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
PATH
remote: .
specs:
atomic_assessments_import (0.3.0)
atomic_assessments_import (0.4.0)
activesupport
csv
mimemagic
pandoc-ruby (~> 2.1)
rubyzip (~> 3.0)

GEM
Expand Down Expand Up @@ -49,6 +50,7 @@ GEM
racc (~> 1.4)
nokogiri (1.18.3-arm64-darwin)
racc (~> 1.4)
pandoc-ruby (2.1.10)
parallel (1.26.3)
parser (3.3.7.1)
ast (~> 2.4.1)
Expand Down
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# Atomic Assessments Import

Import converters for atomic assessments. Currently only CSV multiple choice format is supported by this GEM.
Import converters for atomic assessments. Currently this GEM supports the following export and file types:
* CSV
- Multiple Choice
* ExamSoft (in RTF, HTML, or DOCX file format)
- Multiple Choice
- True/False
- Fill in the Blank / Cloze
- Ordering
- Essay

For QTI conversion, see:

Expand All @@ -21,6 +29,14 @@ If bundler is not being used to manage dependencies, install the gem by executin

$ gem install atomic_assessments_import

## Usage
```
Usage: bin/convert <file> <export_path> [converter]
<file> Path to CSV or RTF file to convert
<export_path> Path for output ZIP file
[converter] Which converter to use- 'examsoft' for files coming from ExamSoft, 'csv' for standard CSV files. Defaults to csv if not specified.
```

## Standalone conversion scripts

Convert a CSV to a learnosity archive:
Expand All @@ -31,6 +47,10 @@ Convert a CSV to json on standard out:

$ bin/convert_to_json input.csv

Convert an ExamSoft RTF to a learnosity archive:

$ bin/convert input.rtf output.zip examsoft

## CSV input format

All columns are optional execpt "Option A", "Option B", and "Correct Answer".
Expand Down
3 changes: 2 additions & 1 deletion atomic_assessments_import.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ require_relative "lib/atomic_assessments_import/version"
Gem::Specification.new do |spec|
spec.name = "atomic_assessments_import"
spec.version = AtomicAssessmentsImport::VERSION
spec.authors = ["Sean Collings", "Matt Petro"]
spec.authors = ["Sean Collings", "Matt Petro", "Jacob Schwartz"]
spec.email = ["support@atomicjolt.com"]

spec.summary = "Importer to Convert different formats to AA's import format"
Expand Down Expand Up @@ -37,4 +37,5 @@ Gem::Specification.new do |spec|
spec.add_dependency "csv"
spec.add_dependency "mimemagic"
spec.add_dependency "rubyzip", "~> 3.0"
spec.add_dependency "pandoc-ruby", "~> 2.1"
end
10 changes: 8 additions & 2 deletions bin/convert
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,17 @@ require "atomic_assessments_import"

file = ARGV[0]
export_path = ARGV[1]
converter = ARGV[2]
if file.nil? || export_path.nil?
puts "Usage: convert.rb <file> <export_path>"
puts "Usage: bin/convert <file> <export_path> [converter]"
puts " <file> Path to CSV or RTF file to convert"
puts " <export_path> Path for output ZIP file"
puts " [converter] Which converter to use- 'examsoft' for files coming from ExamSoft, 'csv' for standard CSV files. Defaults to csv if not specified."
exit 1
end

res = AtomicAssessmentsImport.convert(file)
converter ||= "csv"

res = AtomicAssessmentsImport.convert(file, converter)
AtomicAssessmentsImport::Export.create(export_path, res)

127 changes: 127 additions & 0 deletions docs/plans/2026-02-11-flexible-examsoft-importer-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Flexible ExamSoft Importer Design

## Problem

The current ExamSoft converter uses rigid regex patterns tied to an assumed export format. Since we don't have real ExamSoft export files and can't confirm the actual format, the importer needs to be flexible enough to handle format variations gracefully.

## Goals

- Handle unknown ExamSoft export formats without breaking
- Support all ExamSoft question types (MCQ, multiple-select, T/F, essay, short answer, fill-in-the-blank, matching, ordering)
- Best-effort import with warnings for unparseable content
- Easy to extend with new chunking strategies and question types

## Pipeline

```
Input File (docx/html/rtf/etc.)
|
v
1. Normalize -- Pandoc converts to HTML, Nokogiri parses to DOM
|
v
2. Chunk -- Split DOM into one chunk per question
Tries multiple strategies, picks best
|
v
3. Extract -- Per chunk: detect question type,
extract fields, build row_mock
|
v
Existing Question pipeline (Questions::Question.load -> to_learnosity)
```

### Stage 1: Normalize

Unchanged from current approach. Pandoc converts any input format to HTML. Nokogiri (already in bundle) parses the HTML into a DOM. All subsequent processing works on DOM nodes and text content, not raw HTML strings.

### Stage 2: Chunk

The chunker tries multiple splitting strategies in order and picks the first one that produces reasonable results.

**Strategies (in priority order):**

1. Metadata marker split -- split where `Folder:` or `Type:` appears at the start of a paragraph
2. Numbered question split -- split where a paragraph starts with `\d+)` or `\d+.`
3. Heading split -- split on `<h1>`-`<h6>` tags
4. Horizontal rule split -- split on `<hr>` tags

**Scoring:** Each strategy produces candidate chunks. The chunker picks the strategy where the most chunks look "question-like" (contain text followed by lettered/numbered items). Must produce > 1 chunk.

**Exam header:** Content before the first question chunk is treated as a document-level header. Logged for informational purposes (question count, total points, creation date). Can be wired into output later if valuable.

**Extensibility:** Each strategy is a self-contained class with a `split(doc)` method. Adding a new strategy means writing the class and adding it to the list.

If no strategy produces good results, the whole document becomes one chunk and the extractor does its best.

### Stage 3: Extract

The extractor runs independent field detectors against each chunk:

| Detector | What it looks for | Required? |
|------------------|-------------------------------------------------------------------------|------------------------------------|
| QuestionType | "Type:" labels, keywords, or inferred from structure | No (defaults based on structure) |
| QuestionStem | Main question text before options, after numbered prefix | Yes (warns if missing) |
| Options | Lettered/numbered items, bulleted lists | Required for MCQ types |
| CorrectAnswer | `*` prefix, bold, "Answer:" label | Required for MCQ types |
| Metadata | `Folder:`, `Title:`, `Category:` labels (any order) | No |
| Feedback | Text after `~`, or "Explanation:"/"Rationale:" labels | No |
| MatchingPairs | Two parallel lists or table structure | Required for matching type |
| OrderingSequence | Numbered/labeled sequence with correct order indicator | Required for ordering type |

Each detector returns its result or nil. The extractor assembles findings into a `row_mock` hash compatible with the existing `Questions::Question.load` pipeline.

## Question Type Mapping

| ExamSoft Type | Question Class | Learnosity type | Notes |
|-------------------|-----------------------------|-----------------|---------------------------------------------|
| Multiple Choice | MultipleChoice (existing) | mcq | Single response |
| Multiple Select | MultipleChoice (existing) | mcq | `multiple_responses: true` |
| True/False | MultipleChoice (existing) | mcq | Two options (True/False) |
| Essay | Essay (new) | longanswer | Optional word limit, sample answer |
| Short Answer | ShortAnswer (new) | shorttext | Expected answer(s) |
| Fill in the Blank | FillInTheBlank (new) | cloze | Text with blanks, accepted answers per blank|
| Matching | Matching (new) | association | Two lists of items to pair |
| Ordering | Ordering (new) | orderlist | Items with correct sequence |

**Future types (out of scope):** Drag and drop, hotspot, numeric/formula, matrix/grid, NGN types (bowtie). When encountered, these are imported best-effort as draft items with a warning.

## Error Handling

**Approach:** Best-effort throughout. Never fail the whole import due to one bad question.

**Warning/error levels:**

- **Info** -- exam header metadata (logged, not surfaced)
- **Warning** -- missing optional fields, unsupported question type imported as draft
- **Error** -- chunk with no usable content, skipped entirely

**Item status based on parse completeness:**

- Fully parsed -> `status: "published"`
- Partially parsed (missing required fields or unsupported type) -> `status: "draft"`
- Completely unparseable -> skipped, error logged

All warnings and errors collected in the output `:errors` array with chunk identifiers.

## Dependencies

- **Nokogiri** -- already in bundle (v1.18.3), used for DOM parsing of Pandoc HTML output
- **Pandoc** -- already used, unchanged
- No new external dependencies

## Testing Strategy

**Fixture-based tests:**
- Existing fixtures (simple.docx, simple.html, simple.rtf) for backward compatibility
- New fixtures for each question type
- "Messy" fixtures: missing fields, mixed types, exam headers, unexpected formatting

**Unit tests:**
- Chunker strategies tested independently
- Field detectors tested independently
- New question type classes tested same as MultipleChoice

**Integration tests:**
- Full pipeline: file in -> items + questions + warnings out
- Partial-parse scenarios: document with N questions where some are unparseable
Loading