Skip to content

Conversation

@micmarty-deepsense
Copy link
Contributor

@micmarty-deepsense micmarty-deepsense commented Jan 30, 2026

Summary

Remove device_map parameter from TableTransformer initialization to prevent meta tensor errors during concurrent processing.

Problem

We were observing this error "Cannot copy out of meta tensor; no data!"

image

Solution

  • Remove device_map from DetrImageProcessor.from_pretrained()
  • Remove device_map from TableTransformerForObjectDetection.from_pretrained()
  • Add explicit .to(device, dtype=torch.float32) with return value capture per PyTorch best practices
  • Normalize device names (cuda -> cuda:0) for consistent caching
  • Add CUDA availability check before torch.cuda.current_device() to prevent crashes
  • Fallback to CPU if CUDA requested but unavailable (improved error handling)
  • Enhanced logging with device placement information

Testing

Tested in production (core-product) where this fix eliminated:

  • Meta tensor type errors
  • Concurrent processing failures
  • Device placement issues

Performance Impact

  • No performance degradation
  • Improved stability during concurrent operations
  • Better error messages with device logging

Backward Compatibility

✅ Fully backward compatible - no API changes

Remove device_map parameter from from_pretrained() calls to fix
meta tensor errors during concurrent processing.

Changes:
- Device normalization (cuda -> cuda:0) for consistent caching
- Remove device_map from DetrImageProcessor.from_pretrained()
- Remove device_map from TableTransformerForObjectDetection.from_pretrained()
- Add explicit .to(device, dtype=torch.float32) for proper placement
- Improve logging to show target device

Fixes:
- "Trying to set a tensor of type Float but got Meta" errors
- AssertionError during concurrent PDF processing
- Finalization race conditions with device_map

Root Cause:
device_map causes models to initialize with meta tensors, which fail
when explicitly moved to device. Removing device_map and using explicit
.to() ensures proper tensor placement.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@micmarty-deepsense micmarty-deepsense force-pushed the fix/table-transformer-device-map branch from 08b5a5b to 01304fd Compare January 30, 2026 10:35
- Add CUDA availability check before torch.cuda.current_device()
- Capture model.to() return value per PyTorch documentation
- Add fallback to CPU if CUDA requested but unavailable
- Document rationale for return value capture
@micmarty-deepsense micmarty-deepsense force-pushed the fix/table-transformer-device-map branch from 5a9fc95 to 86e3b6a Compare January 30, 2026 12:02
Resolved conflicts:
- CHANGELOG.md: Keep both 1.1.9 fixes (TableTransformer device_map + pdfium context manager)
@micmarty-deepsense
Copy link
Contributor Author

@claude review

@claude
Copy link

claude bot commented Jan 30, 2026

Claude finished @micmarty-deepsense's task —— View job


Code Review Complete ✅

Todo List:

  • Read repository guidelines (CLAUDE.md) - No CLAUDE.md found
  • Review CHANGELOG.md changes
  • Review unstructured_inference/models/tables.py changes
  • Analyze the fix for meta tensor errors
  • Check for potential issues and improvements
  • Provide comprehensive review feedback

Review Summary

🎯 Overall Assessment: APPROVE WITH MINOR SUGGESTIONS

This PR effectively addresses the meta tensor error during concurrent processing by removing device_map parameters and implementing explicit device placement. The changes are well-implemented and follow PyTorch best practices.

Strengths

  1. Correct Problem Identification & Solution

    • Properly identifies that device_map="auto" can cause meta tensor issues during concurrent operations
    • Removing device_map and using explicit .to(device, dtype=torch.float32) is the right approach
  2. Robust Device Handling (tables.py:80-89)

    • Device normalization (cudacuda:0) for consistent caching
    • CUDA availability checks before calling torch.cuda.current_device()
    • Graceful fallback to CPU when CUDA is unavailable
    • Good error messaging for device placement
  3. PyTorch Best Practices (tables.py:107-111)

    • Explicit dtype specification (dtype=torch.float32)
    • Proper return value capture from .to() calls (recommended practice)
    • Clear documentation explaining the approach
  4. Production-Tested

    • Already validated in production environment
    • Addresses real concurrent processing failures
  5. Good Documentation

    • Clear CHANGELOG entries
    • Inline comments explaining device placement strategy
    • Comprehensive PR description

🔍 Technical Analysis

Device Map Removal:

# OLD (problematic):
self.model = TableTransformerForObjectDetection.from_pretrained(model, device_map="auto")

# NEW (correct):
self.model = TableTransformerForObjectDetection.from_pretrained(model)
self.model = self.model.to(self.device, dtype=torch.float32)

This change prevents the model from being loaded with meta tensors that can't be properly copied during concurrent operations.

Device Normalization (tables.py:83-85):

if device.startswith("cuda") and ":" not in device:
    if torch.cuda.is_available():
        device = f"cuda:{torch.cuda.current_device()}"

Smart approach for consistent device naming and caching.

🔧 Minor Suggestions

  1. Threading Safety Enhancement (tables.py:84-85)
    Consider moving the CUDA device check inside the lock for maximum thread safety:

    if device.startswith("cuda") and ":" not in device:
        if torch.cuda.is_available():
            # Consider: torch.cuda.current_device() might vary between threads
            device = f"cuda:{torch.cuda.current_device()}"
  2. Error Context (tables.py:87-88)
    The fallback warning could include the originally requested device:

    logger.warning(f"CUDA device '{device}' requested but not available, falling back to CPU")
  3. Documentation Enhancement
    Consider adding a docstring comment about concurrent processing compatibility to the initialize method.

📋 Code Quality

  • Backward Compatibility: ✅ Fully maintained - no API changes
  • Error Handling: ✅ Comprehensive with proper fallbacks
  • Performance: ✅ No degradation expected, stability improved
  • Threading: ✅ Addresses concurrent processing issues
  • Testing: ✅ Production-validated

🚀 Impact

This fix will:

  • ✅ Eliminate "Cannot copy out of meta tensor; no data!" errors
  • ✅ Improve stability during concurrent PDF processing
  • ✅ Maintain existing performance characteristics
  • ✅ Provide better error diagnostics

📝 Minor Nits

  • tables.py:109: The comment about capturing return values is good practice documentation
  • CHANGELOG.md entries are comprehensive and well-structured

Recommendation: MERGE

This is a solid fix that addresses a real production issue with a clean, well-tested solution. The minor suggestions above are optional improvements that don't block merging.


@badGarnet badGarnet merged commit 08396e1 into main Jan 30, 2026
17 checks passed
@badGarnet badGarnet deleted the fix/table-transformer-device-map branch January 30, 2026 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants