This project documents the complete process of extracting content from a Google Form and converting it into a fully functional XLSForm for KoboToolbox deployment.
Original Source: Google Forms pre-course survey for Data Science for
OpenWashData (DS4OWD) course iteration 002
Target Output: KoboToolbox-compatible XLSForm with complete content
preservation
- Extracted basic survey structure from Google Form
- Created initial Quarto document for review
- Generated baseline CSV files for KoboToolbox
- Created project documentation (CLAUDE.md)
- Critical Discovery: Initial extraction missed significant content
- Added complete technical skills questions (programming languages, Git, IDEs, LLMs)
- Expanded choice lists from 23 to 34 LLM tools
- Added detailed programming experience questions
- Enhanced IDE options from 5 to 12 environments
- Created Python script to generate Excel files from CSV
- Major Challenge: CSV formatting issues with commas in text fields
- Fixed column count mismatches (21 header vs 22 data columns)
- Resolved group syntax issues (begin_group → "begin group")
- Changed acknowledge question type to select_one with yes_only choices
- Added full Google Form description (course details, signup steps)
- Integrated all response examples from detailed .qmd analysis
- Enhanced education level choices with specific examples
- Added platform-specific CLI descriptions
- Included detailed LLM task examples with real use cases
- Resolved multiline text formatting issues
- Fixed XPath expression errors in instance naming
- Eliminated deployment validation errors
- Final compatibility testing and verification
Challenge: Initial extraction missed substantial content including detailed examples and technical question categories.
Solution: Implemented iterative verification process comparing Google Form content with generated files. Created intermediate .qmd file to capture all content for systematic review.
Challenge: Text fields containing commas, quotes, and special characters broke CSV parsing.
Solution: Implemented proper CSV quoting standards, escaped special characters, and added column count validation to prevent structure mismatches.
Challenge: KoboToolbox has strict requirements for group syntax, question types, and XPath expressions.
Solution:
- Changed
begin_group/end_grouptobegin group/end group - Replaced unsupported
acknowledgetype withselect_one yes_only - Simplified instance naming to avoid XPath validation errors
Challenge: Questions with multiline descriptions (e.g., CLI usage) caused parsing errors in KoboToolbox.
Solution: Converted multiline content to single-line format while preserving all information about platform-specific differences.
pre-course-survey/
├── README.md # This file - complete documentation
├── CLAUDE.md # Development guidelines and lessons learned
├── pre-course-survey.Rproj # R project configuration
├── forms/ # Final survey forms and review documents
│ ├── ds4owd_precourse_survey.xlsx # **Final XLSForm for KoboToolbox**
│ └── survey-content.qmd # Human-readable content for review
├── data/ # Source data and configuration files
│ ├── survey-questions.csv # Survey structure and questions
│ ├── survey-choices.csv # All choice lists with examples
│ ├── survey-settings.csv # Form metadata and description
│ └── countries-iso3c.csv # Complete country reference
└── scripts/ # Development and build tools
└── create_xlsform.py # Python script to generate XLSForm
- Production Files: Ready-to-deploy survey forms
- Review Documents: Human-readable content for collaboration
- Source Files: CSV components for XLSForm generation
- Reference Data: Supporting data like country codes
- Build Tools: Automation scripts for form generation
- Utilities: Development and validation helpers
- Personal Information: 6 questions (GitHub, ORCID, email, name, country)
- Education & Employment: 4 questions with detailed examples
- Barriers to Participation: 6 barrier assessment questions
- Technical Experience: 15 questions covering programming, tools, and platforms
- Project Participation: 3 questions about goals and mentorship
- Agreements: 2 consent/acknowledgment questions
- Countries: 196 countries with ISO3c codes
- Programming Languages: 23 languages and tools
- LLM Platforms: 34 AI tools and platforms
- IDE Options: 12 development environments
- Education Levels: 8 levels with detailed examples
- Complete Description: Full course information, meeting schedule, signup instructions
- Detailed Examples: Every choice includes relevant examples (e.g., "Bachelor's degree (e.g. BA, BSc, BEng)")
- Platform Specifics: OS-specific instructions for CLI usage
- Use Case Examples: Real examples for each LLM task category
- Go to KoboToolbox
- Create new project
- Upload
forms/ds4owd_precourse_survey.xlsx - Deploy form
# Regenerate XLSForm from CSV files
python3 scripts/create_xlsform.py
# Render survey content for review
quarto render forms/survey-content.qmd- Always verify completeness - Initial extraction often misses nuanced content
- Use intermediate formats - .qmd files help verify content preservation
- Cross-reference systematically - Compare original with generated files section by section
- Test early and often - KoboToolbox validation catches issues not visible in Excel
- Handle special characters carefully - Proper CSV quoting is essential
- Keep XPath expressions simple - Complex expressions often fail validation
- Use supported question types - Stick to documented KoboToolbox question types
- Document every iteration - Tracking development history helps with debugging
- Commit frequently - Small, focused commits make debugging easier
- Version control everything - Including intermediate files during development
Source Format: Google Forms
Intermediate Format: Quarto Markdown (.qmd) + CSV files
Target Format: XLSForm (.xlsx)
Validation: KoboToolbox ODK Validate
Dependencies: Python 3, pandas, openpyxl
- Files Generated: 10+ including documentation
- Questions Captured: 36 across 6 categories
- Choice Options: 300+ with detailed examples
This project demonstrates the complexity of accurately preserving survey content while adapting to different platform requirements, highlighting the importance of systematic verification and iterative refinement in form migration projects.