atomicjolt · mpetrowi · Feb 26, 2026 · Feb 3, 2026 · Feb 3, 2026 · Feb 5, 2026
diff --git a/.github/workflows/github-actions-ci-rspec.yml b/.github/workflows/github-actions-ci-rspec.yml
@@ -22,6 +22,9 @@ jobs:
           ruby-version: ${{ matrix.ruby-version }}
           bundler-cache: true
 
+      - name: Install pandoc
+        run: sudo apt-get install -y pandoc
+
       - name: Install dependencies
         env:
           RAILS_ENV: test

diff --git a/.gitignore b/.gitignore
@@ -13,3 +13,6 @@
 
 # rspec failure tracking
 .rspec_status
+
+# MacOS system files
+.DS_Store
diff --git a/Gemfile.lock b/Gemfile.lock
@@ -1,10 +1,11 @@
 PATH
   remote: .
   specs:
-    atomic_assessments_import (0.3.0)
+    atomic_assessments_import (0.4.0)
       activesupport
       csv
       mimemagic
+      pandoc-ruby (~> 2.1)
       rubyzip (~> 3.0)
 
 GEM
@@ -49,6 +50,7 @@ GEM
       racc (~> 1.4)
     nokogiri (1.18.3-arm64-darwin)
       racc (~> 1.4)
+    pandoc-ruby (2.1.10)
     parallel (1.26.3)
     parser (3.3.7.1)
       ast (~> 2.4.1)

diff --git a/README.md b/README.md
@@ -1,6 +1,14 @@
 # Atomic Assessments Import
 
-Import converters for atomic assessments.  Currently only CSV multiple choice format is supported by this GEM. 
+Import converters for atomic assessments.  Currently this GEM supports the following export and file types:
+* CSV 
+    - Multiple Choice
+* ExamSoft (in RTF, HTML, or DOCX file format)
+    - Multiple Choice
+    - True/False
+    - Fill in the Blank / Cloze
+    - Ordering
+    - Essay
 
 For QTI conversion, see:
 
@@ -21,6 +29,14 @@ If bundler is not being used to manage dependencies, install the gem by executin
 
     $ gem install atomic_assessments_import
 
+## Usage
+```
+Usage: bin/convert <file> <export_path> [converter]
+  <file>          Path to CSV or RTF file to convert
+  <export_path>   Path for output ZIP file
+  [converter]     Which converter to use- 'examsoft' for files coming from ExamSoft, 'csv' for standard CSV files. Defaults to csv if not specified.
+```
+
 ## Standalone conversion scripts
 
 Convert a CSV to a learnosity archive:
@@ -31,6 +47,10 @@ Convert a CSV to json on standard out:
 
     $ bin/convert_to_json input.csv
 
+Convert an ExamSoft RTF to a learnosity archive:
+
+    $ bin/convert input.rtf output.zip examsoft
+
 ## CSV input format
 
 All columns are optional execpt "Option A", "Option B", and "Correct Answer".

diff --git a/atomic_assessments_import.gemspec b/atomic_assessments_import.gemspec
@@ -5,7 +5,7 @@ require_relative "lib/atomic_assessments_import/version"
 Gem::Specification.new do |spec|
   spec.name = "atomic_assessments_import"
   spec.version = AtomicAssessmentsImport::VERSION
-  spec.authors = ["Sean Collings", "Matt Petro"]
+  spec.authors = ["Sean Collings", "Matt Petro", "Jacob Schwartz"]
   spec.email = ["support@atomicjolt.com"]
 
   spec.summary = "Importer to Convert different formats to AA's import format"
@@ -37,4 +37,5 @@ Gem::Specification.new do |spec|
   spec.add_dependency "csv"
   spec.add_dependency "mimemagic"
   spec.add_dependency "rubyzip", "~> 3.0"
+  spec.add_dependency "pandoc-ruby", "~> 2.1"
 end
diff --git a/bin/convert b/bin/convert
@@ -6,11 +6,17 @@ require "atomic_assessments_import"
 
 file = ARGV[0]
 export_path = ARGV[1]
+converter = ARGV[2]
 if file.nil? || export_path.nil?
-  puts "Usage: convert.rb <file> <export_path>"
+  puts "Usage: bin/convert <file> <export_path> [converter]"
+  puts "  <file>          Path to CSV or RTF file to convert"
+  puts "  <export_path>   Path for output ZIP file"
+  puts "  [converter]     Which converter to use- 'examsoft' for files coming from ExamSoft, 'csv' for standard CSV files. Defaults to csv if not specified."
   exit 1
 end
 
-res = AtomicAssessmentsImport.convert(file)
+converter ||= "csv"
+
+res = AtomicAssessmentsImport.convert(file, converter)
 AtomicAssessmentsImport::Export.create(export_path, res)
 
diff --git a/docs/plans/2026-02-11-flexible-examsoft-importer-design.md b/docs/plans/2026-02-11-flexible-examsoft-importer-design.md
@@ -0,0 +1,127 @@
+# Flexible ExamSoft Importer Design
+
+## Problem
+
+The current ExamSoft converter uses rigid regex patterns tied to an assumed export format. Since we don't have real ExamSoft export files and can't confirm the actual format, the importer needs to be flexible enough to handle format variations gracefully.
+
+## Goals
+
+- Handle unknown ExamSoft export formats without breaking
+- Support all ExamSoft question types (MCQ, multiple-select, T/F, essay, short answer, fill-in-the-blank, matching, ordering)
+- Best-effort import with warnings for unparseable content
+- Easy to extend with new chunking strategies and question types
+
+## Pipeline
+
+```
+Input File (docx/html/rtf/etc.)
+    |
+    v
+1. Normalize -- Pandoc converts to HTML, Nokogiri parses to DOM
+    |
+    v
+2. Chunk -- Split DOM into one chunk per question
+             Tries multiple strategies, picks best
+    |
+    v
+3. Extract -- Per chunk: detect question type,
+              extract fields, build row_mock
+    |
+    v
+Existing Question pipeline (Questions::Question.load -> to_learnosity)
+```
+
+### Stage 1: Normalize
+
+Unchanged from current approach. Pandoc converts any input format to HTML. Nokogiri (already in bundle) parses the HTML into a DOM. All subsequent processing works on DOM nodes and text content, not raw HTML strings.
+
+### Stage 2: Chunk
+
+The chunker tries multiple splitting strategies in order and picks the first one that produces reasonable results.
+
+**Strategies (in priority order):**
+
+1. Metadata marker split -- split where `Folder:` or `Type:` appears at the start of a paragraph
+2. Numbered question split -- split where a paragraph starts with `\d+)` or `\d+.`
+3. Heading split -- split on `<h1>`-`<h6>` tags
+4. Horizontal rule split -- split on `<hr>` tags
+
+**Scoring:** Each strategy produces candidate chunks. The chunker picks the strategy where the most chunks look "question-like" (contain text followed by lettered/numbered items). Must produce > 1 chunk.
+
+**Exam header:** Content before the first question chunk is treated as a document-level header. Logged for informational purposes (question count, total points, creation date). Can be wired into output later if valuable.
+
+**Extensibility:** Each strategy is a self-contained class with a `split(doc)` method. Adding a new strategy means writing the class and adding it to the list.
+
+If no strategy produces good results, the whole document becomes one chunk and the extractor does its best.
+
+### Stage 3: Extract
+
+The extractor runs independent field detectors against each chunk:
+
+| Detector         | What it looks for                                                       | Required?                          |
+|------------------|-------------------------------------------------------------------------|------------------------------------|
+| QuestionType     | "Type:" labels, keywords, or inferred from structure                    | No (defaults based on structure)   |
+| QuestionStem     | Main question text before options, after numbered prefix                | Yes (warns if missing)             |
+| Options          | Lettered/numbered items, bulleted lists                                 | Required for MCQ types             |
+| CorrectAnswer    | `*` prefix, bold, "Answer:" label                                       | Required for MCQ types             |
+| Metadata         | `Folder:`, `Title:`, `Category:` labels (any order)                     | No                                 |
+| Feedback         | Text after `~`, or "Explanation:"/"Rationale:" labels                   | No                                 |
+| MatchingPairs    | Two parallel lists or table structure                                   | Required for matching type         |
+| OrderingSequence | Numbered/labeled sequence with correct order indicator                  | Required for ordering type         |
+
+Each detector returns its result or nil. The extractor assembles findings into a `row_mock` hash compatible with the existing `Questions::Question.load` pipeline.
+
+## Question Type Mapping
+
+| ExamSoft Type     | Question Class              | Learnosity type | Notes                                      |
+|-------------------|-----------------------------|-----------------|---------------------------------------------|
+| Multiple Choice   | MultipleChoice (existing)   | mcq             | Single response                             |
+| Multiple Select   | MultipleChoice (existing)   | mcq             | `multiple_responses: true`                  |
+| True/False        | MultipleChoice (existing)   | mcq             | Two options (True/False)                    |
+| Essay             | Essay (new)                 | longanswer      | Optional word limit, sample answer          |
+| Short Answer      | ShortAnswer (new)           | shorttext       | Expected answer(s)                          |
+| Fill in the Blank | FillInTheBlank (new)        | cloze           | Text with blanks, accepted answers per blank|
+| Matching          | Matching (new)              | association     | Two lists of items to pair                  |
+| Ordering          | Ordering (new)              | orderlist       | Items with correct sequence                 |
+
+**Future types (out of scope):** Drag and drop, hotspot, numeric/formula, matrix/grid, NGN types (bowtie). When encountered, these are imported best-effort as draft items with a warning.
+
+## Error Handling
+
+**Approach:** Best-effort throughout. Never fail the whole import due to one bad question.
+
+**Warning/error levels:**
+
+- **Info** -- exam header metadata (logged, not surfaced)
+- **Warning** -- missing optional fields, unsupported question type imported as draft
+- **Error** -- chunk with no usable content, skipped entirely
+
+**Item status based on parse completeness:**
+
+- Fully parsed -> `status: "published"`
+- Partially parsed (missing required fields or unsupported type) -> `status: "draft"`
+- Completely unparseable -> skipped, error logged
+
+All warnings and errors collected in the output `:errors` array with chunk identifiers.
+
+## Dependencies
+
+- **Nokogiri** -- already in bundle (v1.18.3), used for DOM parsing of Pandoc HTML output
+- **Pandoc** -- already used, unchanged
+- No new external dependencies
+
+## Testing Strategy
+
+**Fixture-based tests:**
+- Existing fixtures (simple.docx, simple.html, simple.rtf) for backward compatibility
+- New fixtures for each question type
+- "Messy" fixtures: missing fields, mixed types, exam headers, unexpected formatting
+
+**Unit tests:**
+- Chunker strategies tested independently
+- Field detectors tested independently
+- New question type classes tested same as MultipleChoice
+
+**Integration tests:**
+- Full pipeline: file in -> items + questions + warnings out
+- Partial-parse scenarios: document with N questions where some are unparseable