-
Notifications
You must be signed in to change notification settings - Fork 0
System Workflow
The workflow is divided into three main stages: the Initial Stage, Annotation and Validation, and Model Deployment and Iteration. Each stage plays a vital role in ensuring the efficiency and accuracy of the system's outcomes
This phase is conducted outside the system to set up the foundational datasets required for subsequent operations.
-
Start Select Alpha Set:
- A subset of the dataset, referred to as the Alpha Set, is selected to serve as the starting point for the system.
- This set typically consists of data samples that are diverse and representative of the overall dataset.
-
Duplicate to Form T-Set:
- The Alpha Set is duplicated to create the T-Set (Training Set).
- This duplication ensures that the original Alpha Set remains intact for evaluation purposes.
-
Iterative Process:
- The T-Set is fed into the system, initiating the iterative annotation, validation, and model training process.
This stage focuses on automating and validating data annotations while fine-tuning the machine learning model with the processed data. It is a cyclical process, ensuring that the model improves over multiple iterations.
-
Automate Annotation on New Subset:
- The system applies pre-trained models or algorithms to automatically annotate a new subset of the data.
- This reduces manual workload and speeds up the annotation process.
-
Human Validation:
- Annotated data is reviewed and validated by human experts to ensure accuracy and correctness.
- Errors in the automated annotations are corrected during this step.
-
Update T-Set with Accepted Annotations:
- After validation, the annotated subset is added to the T-Set for further use in training the model.
-
Model Fine-Tuning with T-Set:
- The updated T-Set is used to fine-tune the model, improving its performance and accuracy on subsequent iterations.
-
Check Conditions:
- The system evaluates stopping conditions, such as:
- Model performance metrics (e.g., accuracy, precision, recall).
- A predefined number of iterations.
- Dataset completion.
- If conditions are met, the process ends; otherwise, the cycle repeats.
- The system evaluates stopping conditions, such as:
The final stage involves fine-tuning a clean model, preparing datasets, and deploying the final model for use. This stage concludes the iterative workflow.
-
Fine-Tune Clean Model with T-Set:
- A fresh, clean version of the model is fine-tuned using the complete T-Set.
-
Evaluate Model on Alpha Set:
- The model's performance is rigorously tested on the Alpha Set, ensuring unbiased evaluation.
-
Annotate Remaining Dataset:
- Any remaining unannotated data is processed using the system's automated annotation and validation pipeline.
-
Add to T-Set:
- Newly annotated and validated data is added to the T-Set for continuous improvement.
-
Exclude Alpha Set Samples from T-Set:
- Ensures that the Alpha Set remains isolated and is not included in the T-Set, maintaining data integrity.
-
Designate Sets:
- Finalize the Alpha Set, T-Set, and divide them into Train and Test Sets for deployment.
-
End Process:
- The iterative workflow concludes, and the model is ready for deployment and production use.
-
Iterative Improvements:
- The system iterates over the annotation, validation, and training process until an optimal stopping condition is met.
-
Human-in-the-Loop:
- The system incorporates human validation at key stages to ensure high-quality annotations and model accuracy.
-
Dataset Integrity:
- The separation of Alpha, T-Sets, and eventual Train/Test Sets ensures that data is handled appropriately and evaluations remain unbiased.
-
Automation:
- Automated annotation significantly accelerates the process, while human validation maintains reliability.
-
Scalability:
- The workflow is designed to handle large datasets by splitting them into manageable subsets.
