Skip to content

[Refactor] Improve table reference resolution system #92

@JonnyTran

Description

@JonnyTran

Description

Refactor and improve the table reference resolution system to leverage workspace schema configuration and provide a more robust, maintainable solution for handling references between tables. This enhancement moves critical logic to the backend and improves multi-user scenarios.

Problem

  • Reference resolution between tables is complex and error-prone
  • Multi-user scenarios are not handled well in the current implementation
  • Current implementation mixes concerns between frontend and backend
  • No clear ownership of reference resolution logic
  • Limited integration with workspace schema configuration
  • References are resolved client-side, leading to performance and consistency issues

Proposed Solution

  1. Leverage Workspace Schema Configuration: Use schema definitions to understand reference relationships
  2. Move Resolution to Backend: Implement server-side reference resolution using SchemaService
  3. Create Document-Centric APIs: Use DocumentService for cross-dataset reference resolution
  4. Enhance Frontend Components: Simplify frontend by using backend resolution services
  5. Improve Multi-User Support: Handle reference consistency across users

Implementation Details

Dependencies

Backend Changes

  1. Enhance SchemaService with reference resolution:

    class SchemaService:
        def resolve_table_references(self, 
                                   table_data: dict, 
                                   workspace_id: UUID,
                                   user_id: UUID = None) -> dict:
            """Resolve all references in table data using workspace schema"""
            schema_config = self.get_workspace_schema_config(workspace_id)
            resolved_data = table_data.copy()
            
            for ref_column in self._get_reference_columns(table_data):
                ref_values = self._resolve_reference_column(
                    ref_column, table_data[ref_column], workspace_id, user_id
                )
                resolved_data[f"{ref_column}_resolved"] = ref_values
            
            return resolved_data
        
        def get_reference_schema_mapping(self, workspace_id: UUID) -> Dict[str, str]:
            """Get mapping of reference columns to their target schemas"""
            pass
            
        def validate_reference_consistency(self, 
                                         table_data: dict, 
                                         workspace_id: UUID) -> List[ValidationError]:
            """Validate that all references point to valid records"""
            pass
  2. Create DocumentService for cross-dataset references:

    class DocumentService:
        def resolve_document_references(self, 
                                      document_ref: str, 
                                      workspace_id: UUID,
                                      user_id: UUID = None) -> Dict[str, Any]:
            """Resolve all references for a complete document across datasets"""
            all_records = self.get_document_records(document_ref, workspace_id)
            schema_service = SchemaService(workspace_id)
            
            resolved_document = {}
            for record in all_records:
                if self._has_table_data(record):
                    resolved_data = schema_service.resolve_table_references(
                        record.table_data, workspace_id, user_id
                    )
                    resolved_document[record.schema_name] = resolved_data
            
            return resolved_document
  3. Add reference resolution API endpoints:

    @router.post("/workspaces/{workspace_id}/tables/resolve-references")
    async def resolve_table_references(
        workspace_id: UUID,
        table_data: dict,
        user_id: UUID = None,
        db: AsyncSession = Depends(get_async_db)
    ):
        schema_service = SchemaService(workspace_id, db)
        return schema_service.resolve_table_references(table_data, workspace_id, user_id)
    
    @router.get("/workspaces/{workspace_id}/documents/{reference}/resolved")
    async def get_resolved_document(
        workspace_id: UUID,
        reference: str,
        user_id: UUID = None,
        db: AsyncSession = Depends(get_async_db)
    ):
        document_service = DocumentService(workspace_id, db)
        return document_service.resolve_document_references(reference, workspace_id, user_id)
  4. Enhance record APIs with reference context:

    • Include resolved reference data in record responses
    • Add reference validation before saving records
    • Provide reference metadata for frontend components

Frontend Changes

  1. Refactor useReferenceTablesViewModel to use backend APIs:

    export const useReferenceTablesViewModel = (props: { tableJSON: TableData }) => {
        const { state: workspace } = useWorkspace();
        
        const resolveReferences = async (tableData: TableData): Promise<TableData> => {
            if (!workspace?.id) return tableData;
            
            const response = await documentService.resolveTableReferences(
                workspace.id,
                tableData.toJSON()
            );
            
            return new TableData(
                response.data,
                response.schema,
                response.reference
            );
        };
        
        const getResolvedDocument = async (reference: string): Promise<ResolvedDocument> => {
            if (!workspace?.id) return null;
            
            return await documentService.getResolvedDocument(workspace.id, reference);
        };
        
        return {
            resolveReferences,
            getResolvedDocument,
            // ... other methods simplified using backend APIs
        };
    };
  2. Simplify table rendering components:

    • Remove complex client-side reference resolution logic
    • Use resolved data from backend APIs
    • Add error handling for reference resolution failures
    • Implement caching for resolved references
  3. Enhance multi-user reference handling:

    • Display reference conflicts between users
    • Show resolution history and user context
    • Provide UI for reference conflict resolution
    • Enable collaborative reference editing
  4. Improve reference management UI:

    // New component: ReferenceResolver.vue
    export default {
        props: {
            tableData: Object,
            workspaceId: String,
        },
        data() {
            return {
                resolvedData: null,
                loading: false,
                errors: [],
            };
        },
        async mounted() {
            await this.resolveReferences();
        },
        methods: {
            async resolveReferences() {
                this.loading = true;
                try {
                    this.resolvedData = await documentService.resolveTableReferences(
                        this.workspaceId,
                        this.tableData
                    );
                } catch (error) {
                    this.errors.push(error.message);
                } finally {
                    this.loading = false;
                }
            },
          },
      };

Performance and Caching

  1. Implement reference resolution caching:

    • Cache resolved references at the workspace level
    • Invalidate cache when referenced records change
    • Use Redis or similar for distributed caching
  2. Optimize reference queries:

    • Batch reference resolution requests
    • Use database joins for efficient reference lookup
    • Implement lazy loading for large reference datasets

Related Files

  • extralit/argilla-server/src/argilla_server/services/SchemaService.py - Enhanced reference resolution
  • extralit/argilla-server/src/argilla_server/services/DocumentService.py - Cross-dataset reference handling
  • extralit/argilla-server/src/argilla_server/api/handlers/v1/references/ - New reference endpoints
  • extralit/argilla-frontend/components/base/base-render-table/useReferenceTablesViewModel.ts - Simplified frontend logic
  • extralit/argilla-frontend/components/features/reference-resolution/ - New reference UI components
  • extralit/argilla-frontend/v1/infrastructure/services/DocumentService.ts - Document API client

Acceptance Criteria

  • Reference resolution has clear ownership in the backend using SchemaService
  • Workspace schema configuration drives reference resolution logic
  • Backend provides efficient APIs for reference resolution with proper caching
  • Frontend reference handling is simplified and uses backend APIs
  • Multi-user scenarios are properly supported with conflict resolution
  • Reference validation ensures data integrity across tables
  • Performance is optimized with appropriate caching strategies
  • Cross-dataset reference resolution works correctly
  • UI provides clear feedback for reference resolution status and errors
  • The system maintains backward compatibility with existing data
  • Integration tests verify reference resolution functionality
  • Error handling provides meaningful feedback to users

Related Issues

This is part of the strategic workspace-level schema management enhancement:

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions