Skip to content

Conversation

@eilmiv
Copy link
Collaborator

@eilmiv eilmiv commented Dec 18, 2025

Summary of changes

  • Added filtering that happens after reading from a remote for ingestion and is configurable by the user
  • Support for allow list and block list
  • Support for may different kinds of filters, including e.g. a filter that matches a substring in the description
  • All string filters are case-insensitive

Motivation and context

It is useful to not ingest everything from a source. Solves #1192. Important part of the exchange feature in the mTeSS-x project.

Screenshots

image

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

@eilmiv eilmiv marked this pull request as ready for review December 19, 2025 10:35
@fbacall fbacall requested a review from Copilot December 22, 2025 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive filtering system for ingestion sources, allowing users to configure allow lists and block lists to selectively import materials and events. This addresses issue #1192 and supports the mTeSS-x project's exchange feature.

Key Changes

  • Introduced a new SourceFilter model with support for multiple filter types (keyword, title, description, URL, etc.) and modes (allow/block)
  • Added filtering logic to the ingestion pipeline that executes after reading from remote sources
  • Created a dynamic UI for managing filters with JavaScript-based form controls

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
db/migrate/20251209112056_create_source_filters.rb Creates the source_filters table with mode, type, and value fields
db/migrate/20251218100418_remove_keyword_filter_from_sources.rb Removes deprecated keyword_filter column from sources
app/models/source_filter.rb New model implementing filter matching logic for various field types
app/models/source.rb Adds association to source_filters and implements passes_filter? method
app/controllers/sources_controller.rb Updates controller to handle nested source_filters attributes
app/views/sources/_form.html.erb Adds UI sections for allow list and block list filter configuration
app/views/sources/_source_filter_form.html.erb Partial for individual filter form fields
app/assets/javascripts/source_filters.js JavaScript for dynamically adding/removing filter forms
app/assets/stylesheets/sources.scss Styling for filter form layout
lib/ingestors/ingestor.rb Integrates filter method into ingestion write process
app/workers/source_test_worker.rb Applies filters during source testing
test/unit/ingestors/ingestor_test.rb Comprehensive tests for filter behavior
test/fixtures/source_filters.yml Test fixtures for various filter configurations
test/fixtures/materials.yml Test materials for filter testing
test/fixtures/events.yml Test events for filter testing
test/models/source_filter_test.rb Empty test file placeholder

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

},

delete: function () {
$(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("true");
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete function sets _destroy to the string "true", but this should be a boolean value or the string "1" for Rails to recognize it properly in nested attributes. Rails typically expects "1" for true and "0" for false in form submissions. Consider changing to .val(1) or .val("1").

Suggested change
$(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("true");
$(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("1");

Copilot uses AI. Check for mistakes.
val.to_s.downcase.include?(filter_value.downcase)
# array string match
elsif %w[target_audience keyword resource_type event_type].include?(filter_by)
val.any? { |i| i.to_s.casecmp?(filter_value) }
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array comparison using casecmp? could fail if val is nil or not an array. Consider adding a nil check or ensuring val is an array before calling any?, for example: Array.wrap(val).any? { |i| i.to_s.casecmp?(filter_value) }.

Suggested change
val.any? { |i| i.to_s.casecmp?(filter_value) }
Array.wrap(val).any? { |i| i.to_s.casecmp?(filter_value) }

Copilot uses AI. Check for mistakes.
# Never trust parameters from the scary internet, only allow the white list through.
def source_params
permitted = [:url, :method, :token, :default_language, :enabled]
permitted = %i[url method token default_language enabled keyword_filter source_filters]
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter list includes keyword_filter which appears to be from an old implementation that was removed in migration 20251218100418. This should be removed from the permitted parameters list.

Suggested change
permitted = %i[url method token default_language enabled keyword_filter source_filters]
permitted = %i[url method token default_language enabled source_filters]

Copilot uses AI. Check for mistakes.
setup do
mock_ingestions
Source.delete_all
Source.destroy_all
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_all method was changed to destroy_all. While destroy_all properly triggers callbacks and dependent associations, this change could impact test performance as it loads all records into memory. If this was intentional to ensure proper cleanup of associated source_filters, consider adding a comment explaining why. If not needed for associations, delete_all is more efficient for test cleanup.

Suggested change
Source.destroy_all
Source.delete_all

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +7
class SourceFilterTest < ActiveSupport::TestCase
# test "the truth" do
# assert true
# end
end
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SourceFilterTest file only contains commented-out placeholder tests. The SourceFilter model has complex filtering logic including string matching, array matching, and case-insensitive comparisons that should be covered by unit tests. Consider adding tests for the match method with various filter types and edge cases (nil values, empty arrays, case variations).

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +4
var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") });
return Math.max(Math.max.apply(null, existing_list_item_ids) + 1, 0);
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last_id function uses Math.max.apply(null, existing_list_item_ids) which could fail if the array is empty or very large. When the array is empty, this returns -Infinity, and then adding 1 results in -Infinity. The Math.max(..., 0) handles this, but the logic is fragile. Consider using Math.max(0, ...Array.from(existing_list_item_ids)) or returning 0 explicitly when the array is empty for better clarity.

Suggested change
var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") });
return Math.max(Math.max.apply(null, existing_list_item_ids) + 1, 0);
var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") }).get();
if (existing_list_item_ids.length === 0) {
return 0;
}
var maxId = existing_list_item_ids[0];
for (var i = 1; i < existing_list_item_ids.length; i++) {
if (existing_list_item_ids[i] > maxId) {
maxId = existing_list_item_ids[i];
}
}
return maxId + 1;

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +4
class RemoveKeywordFilterFromSources < ActiveRecord::Migration[7.2]
def change
remove_column :sources, :keyword_filter, :string
end
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration timestamps suggest that AddKeywordFilterToSources (20251208203629) adds a column, and then RemoveKeywordFilterFromSources (20251218100418) removes it shortly after. This creates a temporary column that exists for only 10 days in the migration history. If this was part of the development process, consider squashing these migrations or documenting why the temporary column was needed. This could cause issues if deployments happen between these migration dates.

Copilot uses AI. Check for mistakes.
class SourceFilter < ApplicationRecord
belongs_to :source

auto_strip_attributes :filter_value
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SourceFilter model lacks validation for required fields. Consider adding validations for filter_mode, filter_by, and filter_value to ensure data integrity. For example: validates :filter_mode, :filter_by, :filter_value, presence: true.

Suggested change
auto_strip_attributes :filter_value
auto_strip_attributes :filter_value
validates :filter_mode, :filter_by, :filter_value, presence: true

Copilot uses AI. Check for mistakes.

# string match
if %w[title url doi description license difficulty_level subtitle city country timezone].include?(filter_by)
val.to_s.casecmp(filter_value).zero?
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string comparison logic uses casecmp which returns 0 for equal strings, but this only provides case-insensitive equality checking. However, casecmp returns nil when comparing with a non-string value. This could cause a NoMethodError if val is nil. Consider using val.to_s.casecmp?(filter_value) instead, which returns a boolean and handles nil values gracefully.

Suggested change
val.to_s.casecmp(filter_value).zero?
val.to_s.casecmp?(filter_value)

Copilot uses AI. Check for mistakes.
sources_path, class: 'btn btn-default' %>
</div>

<template id="source-filter-template" style="display: none">
Copy link

Copilot AI Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template element is using inline style style="display: none" which could be moved to CSS for better separation of concerns. Consider adding a CSS class instead, such as .hidden { display: none; } in the stylesheet.

Suggested change
<template id="source-filter-template" style="display: none">
<template id="source-filter-template">

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant