Skip to content

Conversation

@mohamedelabbas1996
Copy link
Contributor

@mohamedelabbas1996 mohamedelabbas1996 commented Sep 18, 2025

Summary

This PR introduces a reusable and unified framework for post-processing in Antenna, providing a consistent pattern to implement and manage post-processing tasks.

List of Changes

  • Introduced Post-Processing Framework

    • Added a new base class BasePostProcessingTask to define a common structure for all post-processing tasks.
  • Added Two basic Post-Processing Tasks

    • Small Size Filter
      Marks detections with a small relative bounding-box area (compared to the full image) as Not Identifiable, making it easy for noisy or low-information detections to be filtered out.
  • New Job Type: PostProcessingJob

    • Introduced a new JobType that executes post-processing tasks .
  • Trigger tasks from admin page

    • Both the Small Size Filter task can now be triggered directly from the SourceImageCollection admin page.

Related Issues

#957

Detailed Description

This PR lays the foundation for Antenna’s post-processing framework.
It provides a modular, extensible framework for running data cleanup and refinement tasks after the main classification pipeline to improve the processing pipeline results.

The post-processing framework allows to:

  • Implement new post-processing logic by simply subclassing BasePostProcessingTask.
  • Execute them as jobs through the existing job infrastructure.
  • Access logging, progress tracking, and error handling.

Initial tasks include:

  • Small Size Filter: Flags small detections as non-identifiable.

How to Test the Changes

  1. Open the Django admin interface.
  2. Go to Source Image Collections.
  3. Select one or more collections.
  4. Trigger:
    • Run Small Size Filter admin actiona and verify detections below the size threshold are relabeled as Not Identifiable.
  5. Observe logs and job progress under Jobs to confirm successful execution and completion.

Screenshots

image image image

Deployment Notes

Includes several migrations

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

@netlify
Copy link

netlify bot commented Sep 18, 2025

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 102d0b5
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/68f089917766690008671dc1

Copy link
Collaborator

@mihow mihow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about the framework abstractly, @mohamedelabbas1996! This looks like a good start.

Another aspect to consider: How do we want to show the output of the post-processing functions in the UI and track what was applied an occurrence in the DB?

Right now we show the classification model that was used, the model type, and the date that the prediction was applied. I think we should add a new field for tracking the post-processing step that was applied as well.

Classification

  • model
  • date
  • filter name / filter class or list of post_processing steps

Alternatively!

We could register each post processing step as an Algorithm. This may fit into our current structure more naturally with less effort (a Pipeline is already a series of algorithms applied).

image

It may work for some filters and not others. But most of them are types of algorithms. (rank rollups, tracking, etc).

In the AMI Data Companion we consider the tracking stage as the last algorithm applied.

@mohamedelabbas1996 mohamedelabbas1996 self-assigned this Sep 19, 2025
@mohamedelabbas1996 mohamedelabbas1996 linked an issue Sep 23, 2025 that may be closed by this pull request
@mihow mihow mentioned this pull request Oct 13, 2025
4 tasks
@mihow
Copy link
Collaborator

mihow commented Oct 13, 2025

Here are some notes from a previous design discussion

Documentation of the workflow implemented in #915

  • A TaxaList is chosen to use as the categories we want to see results for (e.g. moths of Oregon)
  • The user selects a set of images, and the results that they want to apply the filter to (e.g. classifications from the global moth model).
  • The mask is applied to the logits of the classifications of the predictions of all the detections in those images. The logits cannot be set to Zero. They must either be removed, or set to a very low number (which is what we do in the example). Then the softmax scores are recalculated, so we can see the top 1 prediction for each detection.
  • The previous predictions are updated so that terminal=False (the classification we are masking from the global model)
  • The occurrence determinations are recalculated
  • You can run this from a management command, or run it on a single occurrence from action in the Django admin. Which is the best way to debug it. Then you are working with a occurrence at a time.

Other notes

  • I think we should probably create an AlgorithmCategoryMap & an Algorithm based on the existing model and the TaxaList filter. Then you can see the history of what's been applied in the prediction history. For example, rather than showing 2 predictions from the Global Moth Classifier with different results, we can create a new algorithm dynamically and call it "Oregon taxa from Global moths". Then we can also skip the step of masking the AlgorithmCategoryMap each time the process is run.
  • In the end, we will have a Pipeline that is pre-selectable for "Oregon moths". Which will show the Detector, Binary classifier, Global classifier, Oregon class mask.
  • Another option is to send the taxalist to the AMI data companion and let the filtering happen there.

def run_class_masking(self, request: HttpRequest, queryset: QuerySet[SourceImageCollection]) -> None:
jobs = []

DEFAULT_TAXA_LIST_ID = 5
Copy link
Collaborator

@mihow mihow Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to add the management command from here to use the class masking before we have a UI to trigger it. Otherwise we won't be able to specify the right taxa list.

https://github.com/RolnickLab/antenna/pull/915/files#diff-c50e8d1a96421d4b5d8dbe5634e99a71bf7cf1fc820349c88875f260630e6af6

https://github.com/RolnickLab/antenna/blob/19b0cecfacee2d3e62ae89f56b4e81990f3cdfff/ami/ml/management/commands/test_class_masking.py

POSTPROCESSING_TASKS: dict[str, type["BasePostProcessingTask"]] = {}


def register_postprocessing_task(task_cls: type["BasePostProcessingTask"]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you showing or using the list of available post processing tasks anywhere? If it's only in the tests for now that's okay. I'm just curious if the registry is working. We can display the options in the UI later.

Copy link
Contributor Author

@mohamedelabbas1996 mohamedelabbas1996 Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of tasks isn’t shown anywhere right now, its only use at the moment is through the function


def get_postprocessing_task(name: str) -> type["BasePostProcessingTask"] | None:
    """
    Get a task class by its registry key.
    Returns None if not found.
    """
    return POSTPROCESSING_TASKS.get(name)

which retrieves the task class from the registry when needed.

@mihow mihow requested a review from Copilot October 15, 2025 03:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive post-processing framework for Antenna, providing a standardized way to implement and execute data cleanup and refinement tasks after the main classification pipeline. The framework includes two initial post-processing tasks: Small Size Filter for removing low-information detections and Rank Rollup for improving classification confidence by rolling up uncertain predictions to higher taxonomic ranks.

Key Changes

  • Implemented a base post-processing framework with task registration and execution capabilities
  • Added two concrete post-processing tasks: Small Size Filter and Rank Rollup
  • Integrated post-processing jobs into the existing job infrastructure with admin interface support

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ami/ml/post_processing/base.py Core framework defining BasePostProcessingTask abstract class and task registration system
ami/ml/post_processing/small_size_filter.py Task implementation for filtering out small detections
ami/ml/post_processing/rank_rollup.py Task implementation for rolling up uncertain classifications to higher taxonomic ranks
ami/jobs/models.py Added PostProcessingJob type to job execution framework
ami/main/admin.py Added admin actions to trigger post-processing tasks from SourceImageCollection admin
ami/main/models.py Added applied_to field to Classification model for tracking post-processing relationships
ami/ml/models/algorithm.py Added POST_PROCESSING task type to AlgorithmTaskType enum

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

new_score = None
for rank in rollup_order:
threshold = thresholds.get(rank, 1.0)
candidates = {t: s for t, s in taxon_scores.items() if t.rank == rank}
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison t.rank == rank assumes t.rank is a string, but it's likely a TaxonRank enum. This should use t.rank.value == rank or compare against the enum value directly.

Suggested change
candidates = {t: s for t, s in taxon_scores.items() if t.rank == rank}
candidates = {t: s for t, s in taxon_scores.items() if str(t.rank).upper() == rank.upper()}

Copilot uses AI. Check for mistakes.
@mihow
Copy link
Collaborator

mihow commented Oct 15, 2025

This is getting super close! All of the co-pilot comments look valid, but note that they are about the specific filters, rather than the overall framework.

Also I am noticing the determination is not updating automatically. After I run a filter it still looks like this:
image

But should look like this:
image

After the new classifications are created in batch, You have to loop through every occurrence that was modified and run update_determination(). There is no batch method for that.



def get_postprocessing_task(name: str) -> type["BasePostProcessingTask"] | None:
"""
Copy link
Collaborator

@mihow mihow Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the registry isn't working how you intended, you could change it to import the post processing tasks using their full path.

get_postprocessing_task(key='size_filter', full_path='yuyanslib.processing.biometrics.Filter`):
    cls = __import__('ami.ml.post_processing.size_filter.SizeFilter')
    # raise "No post processing task registered with the name 'size filter' could be loaded

)
# job = models.CharField(max_length=255, null=True)

applied_to = models.ForeignKey(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you

occurrence = det.occurrence
self.assertIsNotNone(occurrence, f"Detection {det.pk} should belong to an occurrence.")
occurrence.refresh_from_db()
self.assertEqual(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for this test

@mihow
Copy link
Collaborator

mihow commented Oct 15, 2025

@mohamedelabbas1996 I think this is the simplest option for the registry that will work well for the current scope. This is what I am doing in the API for the AMI data companion https://github.com/RolnickLab/ami-data-companion/blob/bf0fe16a533a0cc3b94cec7d5da65564c06d99c5/trapdata/api/api.py#L42-L61

from .small_size_filter import SmallSizeFilterTask
# Add more imports as you add tasks

POSTPROCESSING_TASKS = {
    "small_size_filter": SmallSizeFilterTask,
    # "another_task": AnotherTask,
}

def get_postprocessing_task(key: str):
    return POSTPROCESSING_TASKS.get(key)

If we want developers to start adding more tasks outside of the post_processing module, we could try the approach that uses the full module path. But honestly the first method is probably the best for now.

def get_postprocessing_task(class_path: str) -> type["BasePostProcessingTask"] | None:
    """
    Get a task class by its full Python path.
    
    Example:
        task_cls = get_postprocessing_task("ami.ml.post_processing.small_size_filter.SmallSizeFilterTask")
    
    Returns None if not found or invalid.
    """
    try:
        module_path, class_name = class_path.rsplit(".", 1)
        module = importlib.import_module(module_path)
        task_cls = getattr(module, class_name)
        
        # Validate it's a post-processing task
        if not issubclass(task_cls, BasePostProcessingTask):
            logging.error(f"{class_path} is not a subclass of BasePostProcessingTask")
            return None
            
        return task_cls
    except (ValueError, ImportError, AttributeError) as e:
        logging.error(f"Failed to load post-processing task '{class_path}': {e}")
        return None

Usage:

job.params = {
    "task": "ami.ml.post_processing.small_size_filter.SmallSizeFilterTask",
    "config": {
        "size_threshold": 0.01,
        "source_image_collection_id": 123
    }

@mihow mihow changed the title [Draft] Introduce generic post-processing framework Introduce generic post-processing framework Oct 16, 2025
@mihow mihow marked this pull request as ready for review October 16, 2025 01:05
@mihow mihow merged commit b387478 into main Oct 16, 2025
6 checks passed
@mihow mihow deleted the feat/postprocessing-framework branch October 16, 2025 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a reusable post-processing framework

3 participants