Skip to content

System Workflow

Adeoye Sunday edited this page Nov 25, 2024 · 10 revisions

The workflow is divided into three main stages: the Initial Stage, Annotation and Validation, and Model Deployment and Iteration. Each stage plays a vital role in ensuring the efficiency and accuracy of the system's outcomes

1. Initial Stage

This phase is conducted outside the system to set up the foundational datasets required for subsequent operations.

  1. Start Select Alpha Set:

    • A subset of the dataset, referred to as the Alpha Set, is selected to serve as the starting point for the system.
    • This set typically consists of data samples that are diverse and representative of the overall dataset.
  2. Duplicate to Form T-Set:

    • The Alpha Set is duplicated to create the T-Set (Training Set).
    • This duplication ensures that the original Alpha Set remains intact for evaluation purposes.
  3. Iterative Process:

    • The T-Set is fed into the system, initiating the iterative annotation, validation, and model training process.

2. Annotation and Validation Stage

This stage focuses on automating and validating data annotations while fine-tuning the machine learning model with the processed data. It is a cyclical process, ensuring that the model improves over multiple iterations.

  1. Automate Annotation on New Subset:

    • The system applies pre-trained models or algorithms to automatically annotate a new subset of the data.
    • This reduces manual workload and speeds up the annotation process.
  2. Human Validation:

    • Annotated data is reviewed and validated by human experts to ensure accuracy and correctness.
    • Errors in the automated annotations are corrected during this step.
  3. Update T-Set with Accepted Annotations:

    • After validation, the annotated subset is added to the T-Set for further use in training the model.
  4. Model Fine-Tuning with T-Set:

    • The updated T-Set is used to fine-tune the model, improving its performance and accuracy on subsequent iterations.
  5. Check Conditions:

    • The system evaluates stopping conditions, such as:
      • Model performance metrics (e.g., accuracy, precision, recall).
      • A predefined number of iterations.
      • Dataset completion.
    • If conditions are met, the process ends; otherwise, the cycle repeats.

3. Model Deployment and Iteration Stage

The final stage involves fine-tuning a clean model, preparing datasets, and deploying the final model for use. This stage concludes the iterative workflow.

  1. Fine-Tune Clean Model with T-Set:

    • A fresh, clean version of the model is fine-tuned using the complete T-Set.
  2. Evaluate Model on Alpha Set:

    • The model's performance is rigorously tested on the Alpha Set, ensuring unbiased evaluation.
  3. Annotate Remaining Dataset:

    • Any remaining unannotated data is processed using the system's automated annotation and validation pipeline.
  4. Add to T-Set:

    • Newly annotated and validated data is added to the T-Set for continuous improvement.
  5. Exclude Alpha Set Samples from T-Set:

    • Ensures that the Alpha Set remains isolated and is not included in the T-Set, maintaining data integrity.
  6. Designate Sets:

    • Finalize the Alpha Set, T-Set, and divide them into Train and Test Sets for deployment.
  7. End Process:

    • The iterative workflow concludes, and the model is ready for deployment and production use.

Key Features of the Workflow

  1. Iterative Improvements:

    • The system iterates over the annotation, validation, and training process until an optimal stopping condition is met.
  2. Human-in-the-Loop:

    • The system incorporates human validation at key stages to ensure high-quality annotations and model accuracy.
  3. Dataset Integrity:

    • The separation of Alpha, T-Sets, and eventual Train/Test Sets ensures that data is handled appropriately and evaluations remain unbiased.
  4. Automation:

    • Automated annotation significantly accelerates the process, while human validation maintains reliability.
  5. Scalability:

    • The workflow is designed to handle large datasets by splitting them into manageable subsets.

image

Clone this wiki locally