File processing in the LinX platform is handled through a skill-based architecture. Files (images, documents, audio, video) are not processed by the core system directly. Instead, processing is delegated to dynamically loaded agent skills.
- No Built-in Processing: The core agent system does not have built-in file processing capabilities
- Dynamic Skills: File processing is handled by skills loaded from the skill library
- Graceful Degradation: If an agent lacks the required skill, files are skipped without errors
- Extensible: New file types can be supported by adding new skills
- Modularity: File processing logic is separated from core agent logic
- Flexibility: Different agents can have different file processing capabilities
- Scalability: Skills can be added/removed without changing core code
- Resource Efficiency: Only load processing capabilities when needed
- Customization: Each agent can have tailored file processing skills
Skill Name: image_processing
Capabilities:
- Image recognition and classification
- Object detection
- Scene understanding
- Visual question answering
- Image captioning
Example Usage:
# Agent with image_processing skill
agent = Agent(
name="Vision Agent",
skills=["image_processing", "general_chat"]
)
# When user sends image + text
# Skill processes image and extracts information
# Agent receives: "User asked: 'What's in this image?' Image contains: [description]"Skill Name: document_processing
Capabilities:
- PDF text extraction
- Document structure analysis
- Table extraction
- Metadata extraction
- Multi-page document handling
Example Usage:
# Agent with document_processing skill
agent = Agent(
name="Document Analyst",
skills=["document_processing", "data_analysis"]
)
# When user uploads PDF
# Skill extracts text and structure
# Agent receives: "Document content: [extracted text]"Skill Name: ocr
Capabilities:
- Text extraction from images
- Handwriting recognition
- Multi-language support
- Layout preservation
Example Usage:
# Agent with OCR skill
agent = Agent(
name="OCR Agent",
skills=["ocr", "text_analysis"]
)
# When user sends image with text
# Skill extracts text via OCR
# Agent receives: "Extracted text: [OCR result]"Skill Name: audio_processing
Capabilities:
- Speech-to-text transcription
- Audio classification
- Speaker identification
- Audio analysis
Skill Name: video_processing
Capabilities:
- Frame extraction
- Video summarization
- Action recognition
- Scene detection
User → Frontend → Backend API
↓
File Storage (MinIO)
↓
File Reference Created
Message with Files → Agent Executor
↓
Check Agent Skills
↓
┌─────────────┴─────────────┐
↓ ↓
Has Required Skill No Required Skill
↓ ↓
Load & Execute Skill Skip File Processing
↓ ↓
Process File Use Text Only
↓ ↓
Extract Information Continue Execution
↓ ↓
Augment Message ────┘
↓
Agent Processes Augmented Message
# Pseudo-code for skill-based file processing
def process_message_with_files(agent, message, files):
"""Process message with attached files using agent skills."""
# Check if agent has file processing skills
has_image_skill = "image_processing" in agent.skills
has_doc_skill = "document_processing" in agent.skills
has_ocr_skill = "ocr" in agent.skills
augmented_message = message
for file in files:
if file.type == "image":
if has_image_skill:
# Load and execute image processing skill
skill = load_skill("image_processing")
result = skill.process(file)
augmented_message += f"\n\nImage analysis: {result}"
elif has_ocr_skill:
# Fallback to OCR if no image processing
skill = load_skill("ocr")
text = skill.extract_text(file)
augmented_message += f"\n\nExtracted text: {text}"
else:
# Skip image processing
logger.info(f"Agent {agent.id} lacks image processing skills, skipping image")
elif file.type == "document":
if has_doc_skill:
# Load and execute document processing skill
skill = load_skill("document_processing")
content = skill.extract_content(file)
augmented_message += f"\n\nDocument content: {content}"
else:
# Skip document processing
logger.info(f"Agent {agent.id} lacks document processing skills, skipping document")
# Agent processes the augmented message
return agent.execute(augmented_message)Skills are stored in the skill_library module:
backend/skill_library/
├── __init__.py
├── skill_registry.py # Skill registration and loading
├── skill_executor.py # Skill execution engine
├── default_skills.py # Built-in skills
└── skills/
├── image_processing.py # Image processing skill
├── document_processing.py # Document processing skill
├── ocr.py # OCR skill
├── audio_processing.py # Audio processing skill
└── video_processing.py # Video processing skill
Each skill follows a standard interface:
from skill_library.skill_model import Skill, SkillParameter
class ImageProcessingSkill(Skill):
"""Skill for processing images."""
name = "image_processing"
description = "Processes images and extracts visual information"
version = "1.0.0"
parameters = [
SkillParameter(
name="image_path",
type="string",
description="Path to image file",
required=True
),
SkillParameter(
name="analysis_type",
type="string",
description="Type of analysis: caption, objects, scene",
required=False,
default="caption"
)
]
def execute(self, image_path: str, analysis_type: str = "caption") -> dict:
"""Execute image processing."""
# Load image
image = load_image(image_path)
# Process based on analysis type
if analysis_type == "caption":
result = generate_caption(image)
elif analysis_type == "objects":
result = detect_objects(image)
elif analysis_type == "scene":
result = analyze_scene(image)
return {
"success": True,
"result": result,
"metadata": {
"image_size": image.size,
"format": image.format
}
}Agents specify their skills in configuration:
# Create agent with file processing skills
agent = Agent(
name="Multimodal Assistant",
type="general",
skills=[
"general_chat",
"image_processing",
"document_processing",
"web_search"
],
system_prompt="You are a helpful assistant that can process images and documents."
)The frontend prepares files for upload:
// User attaches files
const attachedFiles = [
{ type: 'image', file: imageFile },
{ type: 'document', file: pdfFile }
];
// Send to backend
await agentsApi.testAgent(agentId, message, {
files: attachedFiles,
history: conversationHistory
});The backend handles file upload and skill execution:
@router.post("/{agent_id}/test")
async def test_agent(agent_id: str, request: TestAgentRequest, files: List[UploadFile] = None):
"""Test agent with message and optional files."""
# Get agent and check skills
agent = get_agent(agent_id)
# Upload files to storage
file_refs = []
if files:
for file in files:
file_path = await upload_to_minio(file)
file_refs.append({
"path": file_path,
"type": detect_file_type(file),
"name": file.filename
})
# Process message with files using agent skills
augmented_message = await process_with_skills(
agent=agent,
message=request.message,
files=file_refs
)
# Execute agent with augmented message
result = await agent.execute(augmented_message)
return resultIf an agent has no file processing skills:
# Agent without file processing skills
agent = Agent(
name="Text-Only Agent",
skills=["general_chat", "web_search"]
)
# User sends image + text
# System behavior:
# 1. Detects no image_processing skill
# 2. Logs: "Agent lacks image processing skills, skipping image"
# 3. Processes only the text message
# 4. Agent responds based on text onlyIf an agent has some but not all file processing skills:
# Agent with only OCR skill
agent = Agent(
name="OCR Agent",
skills=["general_chat", "ocr"]
)
# User sends image + PDF
# System behavior:
# 1. Image: Uses OCR skill to extract text
# 2. PDF: No document_processing skill, skips PDF
# 3. Agent receives: "User message + OCR text from image"- Users can browse and install skills
- Community-contributed skills
- Skill ratings and reviews
- Combine multiple skills for complex processing
- Example: OCR → Translation → Summarization
- Multiple versions of same skill
- Backward compatibility
- Automatic updates
- Track skill usage
- Performance metrics
- Error rates
- Users can create custom skills
- Upload Python code
- Sandbox execution
- Choose Appropriate Skills: Select skills based on agent's purpose
- Test with Files: Verify file processing works as expected
- Provide Clear Instructions: Tell users what file types are supported
- Handle Failures Gracefully: Agent should work even if file processing fails
- Follow Skill Interface: Implement standard Skill class
- Handle Errors: Return meaningful error messages
- Optimize Performance: Process files efficiently
- Document Capabilities: Clear description of what skill does
- Version Properly: Use semantic versioning
- Monitor Skill Usage: Track which skills are most used
- Update Skills: Keep skills up to date
- Manage Resources: Ensure sufficient resources for file processing
- Security: Validate and sanitize file inputs
- Check file types and sizes
- Scan for malware
- Validate file content
- Execute skills in isolated environment
- Limit resource usage
- Prevent unauthorized access
- Skills respect agent permissions
- Users can only process their own files
- Audit skill execution