-
Notifications
You must be signed in to change notification settings - Fork 43
feat: add legacy .doc file conversion support #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for converting legacy .doc files to .docx format using a
conversion server powered by LibreOffice.
Changes:
- Add DOC type constant to document-types.ts
- Create documentConverter.js helper module for conversion logic
- Update SuperDoc.js with automatic .doc detection and conversion
- Add conversion events (onConversionStart, onConversionComplete, onConversionError)
- Add modules.conversion config option for server URL
- Update file.js to detect .doc files
- Update BasicUpload.vue to accept .doc files
- Add conversion-server example with Docker support
Usage:
```javascript
const superdoc = new SuperDoc({
document: docFile,
modules: {
conversion: {
serverUrl: 'http://localhost:3001',
},
},
});
```
Closes superdoc-dev#1019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for converting legacy .doc files to .docx format, enabling SuperDoc to work with older Word document formats. The implementation includes a LibreOffice-based conversion server and seamless client-side integration with automatic conversion detection and event handling.
Key Changes:
- Adds
.docMIME type constant and file detection logic throughout the codebase - Implements a conversion helper module with timeout handling and error management
- Integrates automatic
.docto.docxconversion into SuperDoc's initialization flow - Provides a Docker-based conversion server using LibreOffice for reliable document conversion
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
shared/common/document-types.ts |
Adds DOC MIME type constant and includes it in DocumentType union |
shared/common/components/BasicUpload.vue |
Updates file upload component to accept .doc files |
packages/superdoc/src/dev/components/SuperdocDev.vue |
Adds conversion UI dialogs, state management, and event handlers for development |
packages/superdoc/src/core/helpers/file.js |
Adds .doc extension detection in file type inference |
packages/superdoc/src/core/helpers/documentConverter.js |
New module providing conversion logic, server communication, and utilities |
packages/superdoc/src/core/SuperDoc.js |
Integrates automatic conversion into document initialization with dual event emission pattern |
examples/conversion-server/server.js |
Express server implementing LibreOffice-based conversion with file upload handling |
examples/conversion-server/package.json |
Dependencies and scripts for the conversion server |
examples/conversion-server/docker-compose.yml |
Docker Compose configuration with health checks and resource limits |
examples/conversion-server/README.md |
Comprehensive documentation for setup, usage, and troubleshooting |
examples/conversion-server/Dockerfile |
Multi-stage Docker build with LibreOffice and security best practices |
examples/conversion-server/.gitignore |
Standard ignore patterns for Node.js projects |
examples/conversion-server/.dockerignore |
Excludes unnecessary files from Docker build context |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <div class="doc-conversion-error"> | ||
| {{ conversionError }} | ||
| </div> | ||
| <p style="margin-top: 12px">Make sure the conversion server is running at {{ CONVERSION_SERVER_URL }}</p> |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CONVERSION_SERVER_URL is hardcoded and displayed in user-facing error messages. If this value could be derived from user input or environment variables in the future, ensure proper sanitization to prevent XSS attacks through the template interpolation on line 680.
| if (!file) return false; | ||
|
|
||
| // Check by MIME type | ||
| if (file.type === DOC || file.type === 'application/msword') { |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MIME type check on line 27 redundantly checks for both DOC constant (which equals 'application/msword') and the literal string 'application/msword'. Since DOC is already defined as 'application/msword', the second condition is unnecessary. Remove the redundant check for cleaner code.
| if (file.type === DOC || file.type === 'application/msword') { | |
| if (file.type === DOC) { |
| const originalName = req.file.originalname.replace(/\.doc$/i, '.docx'); | ||
| res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | ||
| res.setHeader('Content-Disposition', `attachment; filename="${originalName}"`); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename replacement using regex is not escaped in the Content-Disposition header, which could lead to HTTP header injection if the original filename contains newline characters or other special characters. Use a proper header value escaping function or validate the filename before using it.
| memory: 512M | ||
| # Health check | ||
| healthcheck: | ||
| test: ["CMD", "node", "-e", "fetch('http://localhost:3001/health').then(r => process.exit(r.ok ? 0 : 1))"] |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health check command uses fetch which is not available in Node.js versions prior to 18 (it was added as experimental in v18 and stable in v21). While the Dockerfile uses node:20-slim, this should still work, but the command will fail in Node 18. Consider using a more portable approach or documenting the Node version requirement.
| test: ["CMD", "node", "-e", "fetch('http://localhost:3001/health').then(r => process.exit(r.ok ? 0 : 1))"] | |
| test: ["CMD", "curl", "-f", "http://localhost:3001/health"] |
| HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ | ||
| CMD node -e "fetch('http://localhost:3001/health').then(r => process.exit(r.ok ? 0 : 1))" || exit 1 |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The health check command uses fetch which is not available in Node.js versions prior to 18 (it was added as experimental in v18 and stable in v21). While this uses node:20-slim, the command will fail in Node 18. Consider using a more portable approach or documenting the Node version requirement clearly.
| console.log(`Converting: ${inputPath}`); | ||
| const command = `"${libreOfficePath}" --headless --convert-to docx --outdir "${outputDir}" "${inputPath}"`; | ||
|
|
||
| await execAsync(command, { timeout: 60000 }); // 60 second timeout |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command construction is vulnerable to command injection. The inputPath and outputDir come from user-controlled file uploads and are not properly sanitized before being used in shell commands. An attacker could craft a malicious filename to execute arbitrary commands. Use proper argument escaping or pass arguments as an array to spawn instead of using string concatenation with exec.
| const response = await fetch(`${serverUrl}/convert`, { | ||
| method: 'POST', | ||
| body: formData, | ||
| signal: controller.signal, | ||
| }); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serverUrl is not validated before being used in fetch requests. This could allow Server-Side Request Forgery (SSRF) attacks if an attacker can control the conversion config. Consider validating that the URL uses an allowed protocol (http/https) and optionally checking against an allowlist of domains.
| // Configure multer for file uploads | ||
| const storage = multer.diskStorage({ | ||
| destination: async (req, file, cb) => { | ||
| const tempDir = path.join(os.tmpdir(), 'superdoc-conversions'); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential race condition where multiple concurrent conversions could attempt to write to the same temp directory. While the unique filename generation helps, consider adding file locking or ensuring the temp directory can handle concurrent writes safely. This is especially important if the conversion server will be used in production.
| const tempDir = path.join(os.tmpdir(), 'superdoc-conversions'); | |
| const baseTempDir = path.join(os.tmpdir(), 'superdoc-conversions'); | |
| const uniqueSubdir = crypto.randomBytes(8).toString('hex'); | |
| const tempDir = path.join(baseTempDir, uniqueSubdir); |
| // Read the converted file | ||
| const convertedFile = await fs.readFile(expectedOutputPath); | ||
|
|
||
| // Set response headers | ||
| const originalName = req.file.originalname.replace(/\.doc$/i, '.docx'); | ||
| res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | ||
| res.setHeader('Content-Disposition', `attachment; filename="${originalName}"`); | ||
| res.setHeader('Content-Length', convertedFile.length); | ||
|
|
||
| // Send the file | ||
| res.send(convertedFile); | ||
|
|
||
| // Cleanup files | ||
| await cleanupFiles([inputPath, expectedOutputPath]); | ||
|
|
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The converted file is read entirely into memory before sending. For large files (up to the 50MB limit), this could cause memory issues under high load. Consider using streaming with fs.createReadStream() and res.sendFile() or piping the stream directly to the response for better memory efficiency.
| // Read the converted file | |
| const convertedFile = await fs.readFile(expectedOutputPath); | |
| // Set response headers | |
| const originalName = req.file.originalname.replace(/\.doc$/i, '.docx'); | |
| res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | |
| res.setHeader('Content-Disposition', `attachment; filename="${originalName}"`); | |
| res.setHeader('Content-Length', convertedFile.length); | |
| // Send the file | |
| res.send(convertedFile); | |
| // Cleanup files | |
| await cleanupFiles([inputPath, expectedOutputPath]); | |
| // Set response headers | |
| const originalName = req.file.originalname.replace(/\.doc$/i, '.docx'); | |
| res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'); | |
| res.setHeader('Content-Disposition', `attachment; filename="${originalName}"`); | |
| // Stream the file to the response | |
| res.sendFile(expectedOutputPath, {}, (err) => { | |
| // Cleanup files after response is sent or on error | |
| cleanupFiles([inputPath, expectedOutputPath]).catch(() => {}); | |
| if (err) { | |
| console.error('Error sending file:', err); | |
| if (!res.headersSent) { | |
| res.status(500).json({ error: 'Failed to send file' }); | |
| } | |
| } | |
| }); |
| res.send(convertedFile); | ||
|
|
||
| // Cleanup files | ||
| await cleanupFiles([inputPath, expectedOutputPath]); | ||
|
|
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client disconnects during the response (after line 115 but before line 118), the cleanup at line 118 won't execute, leaving temporary files on disk. Consider adding a response event listener or using a try-finally block to ensure cleanup always occurs.
| res.send(convertedFile); | |
| // Cleanup files | |
| await cleanupFiles([inputPath, expectedOutputPath]); | |
| let cleanedUp = false; | |
| const doCleanup = async () => { | |
| if (!cleanedUp) { | |
| cleanedUp = true; | |
| try { | |
| await cleanupFiles([inputPath, expectedOutputPath]); | |
| } catch (e) { | |
| // Optionally log cleanup error | |
| } | |
| } | |
| }; | |
| res.on('close', doCleanup); | |
| res.send(convertedFile); | |
| // Ensure cleanup after send (in case 'close' hasn't fired yet) | |
| await doCleanup(); |
|
@edoversb 🙏🏻 |
| findLibreOffice() | ||
| .then(path => console.log(` LibreOffice found at: ${path}\n`)) | ||
| .catch(() => { | ||
| console.log(` WARNING: LibreOffice not found!`); | ||
| console.log(` Install instructions:`, getInstallInstructions()); | ||
| console.log(''); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove please
84f8ea5 to
2a443d8
Compare
Summary
Add support for converting legacy
.docfiles to.docxformat using a conversion server powered by LibreOffice. This enables SuperDoc to open and edit legacy Word documents seamlessly.DOCtype constant todocument-types.tsdocumentConverter.jshelper module with conversion logicSuperDoc.jswith automatic.docdetection and conversiononConversionStart,onConversionComplete,onConversionError)modules.conversionconfig option for server URLfile.jsto detect.docfiles by extensionBasicUpload.vueto accept.docfilesconversion-serverexample with Docker supportDemo
CleanShot.2025-12-12.at.10.26.58.mp4
Usage
Conversion Server
A Docker-based conversion server is included in
examples/conversion-server/:cd examples/conversion-server docker-compose up -dThe server uses LibreOffice for reliable
.docto.docxconversion.Test plan
.docfile and verify it converts and loads correctly.docfiles (different Word versions).docxfiles continue to work normally (no regression)Closes #1019