Skip to content

Project Structure

Adeoye Sunday edited this page Nov 25, 2024 · 1 revision

The STAS project is organized into a modular directory structure, allowing for scalability, maintainability, and separation of concerns. Below is a detailed overview of each module and its responsibilities.


Project Structure

src/
│
├── annotation/          # Annotation types
├── api/                 # Exposes the application interface (controller and UI)
├── dao/                 # Manages database operations (Data Access Objects)
├── data/                # Contains datasets and data-related files
├── i_entities/          # Defines core interfaces and base entities
├── metric/              # Implements evaluation metrics for model performance
├── model/               # Contains machine learning models and related utilities
├── sample/              # Sample creation
├── selector/            # Implements logic for sample selection
├── stopping_conditions/ # Handles stopping criteria for iterative processes
├── utils/               # Utility scripts for common operations
├── config.yaml          # Configuration file for project parameters

Module Descriptions

1. annotation/

Handles the logic for annotating data, supporting both classification and sequence-based annotations.

  • classification_annotation.py: Implements functionality for text classification annotation.
  • sequence_annotation.py: Handles sequence-based annotation tasks (e.g., Named Entity Recognition).

2. api/

Exposes interfaces for interacting with the system, including a controller for backend logic and a UI module for user interaction.

  • controller.py: Manages request handling, orchestrating between modules.
  • ui.py: Implements a basic user interface for interacting with the system.

3. dao/

Manages data storage and retrieval operations, including database-specific implementations.

  • mongo_dao.py: Provides CRUD operations for a MongoDB backend.

4. i_entities/

Defines interfaces and base entities used throughout the project, enabling modularity and extensibility.

  • annotation_interface.py: Abstracts the annotation process.
  • dao_interface.py: Standardizes DAO implementations.
  • model_interface.py: Provides a blueprint for machine learning models.
  • sample_interface.py: Defines the structure and behavior of data samples.
  • stop_condition_interface.py: Interface for stopping condition implementations.
  • Additional files: Base classes for metrics, experiments, iteration, logging, etc.

5. metric/

Implements evaluation metrics for assessing model performance.

  • metric_factory.py: Factory pattern for creating metric instances.

6. model/

Contains machine learning models and related utilities.

  • ner_model.py: Implements a Named Entity Recognition (NER) model.
  • model_factory.py: Factory for creating and managing models.

7. sample/

Manages creation and manipulation of data samples.

  • sample_factory.py: Factory for creating data samples.
  • TextClassificationSample.py: Manages text classification sample creation.
  • sequence_to_Sequence_sample.py: Manages sequence-to-sequence samples.

8. selector/

Implements logic for selecting samples from datasets for annotation or training.

  • selector_factory.py: Factory for creating sample selectors.
  • random_selector.py: Randomly selects samples.
  • __init__.py: Module initializer.

9. stopping_conditions/

Implements stopping criteria for iterative processes such as training or annotation.

  • acceptance_rate.py: Stopping condition based on acceptance rate.

10. utils/

Contains utility scripts and helper functions used across modules.

  • config_loader.py: Parses and loads configuration settings.
  • loader.py: Handles data loading operations.

Important Files

  • config.yaml: Centralized configuration file for defining project parameters.

How to Navigate the Codebase

  1. Start with api/controller.py to understand how the iterative annotation process works.
  2. Explore annotation/ and sample/ for data processing workflows.
  3. Look into model/ and metric/ for model training and evaluation.
  4. Utilize the utilities in utils/ for configuration and data loading.

Clone this wiki locally