-
Notifications
You must be signed in to change notification settings - Fork 20
Ingestion filters. Fixes #1192 #1201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive filtering system for ingestion sources, allowing users to configure allow lists and block lists to selectively import materials and events. This addresses issue #1192 and supports the mTeSS-x project's exchange feature.
Key Changes
- Introduced a new
SourceFiltermodel with support for multiple filter types (keyword, title, description, URL, etc.) and modes (allow/block) - Added filtering logic to the ingestion pipeline that executes after reading from remote sources
- Created a dynamic UI for managing filters with JavaScript-based form controls
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
db/migrate/20251209112056_create_source_filters.rb |
Creates the source_filters table with mode, type, and value fields |
db/migrate/20251218100418_remove_keyword_filter_from_sources.rb |
Removes deprecated keyword_filter column from sources |
app/models/source_filter.rb |
New model implementing filter matching logic for various field types |
app/models/source.rb |
Adds association to source_filters and implements passes_filter? method |
app/controllers/sources_controller.rb |
Updates controller to handle nested source_filters attributes |
app/views/sources/_form.html.erb |
Adds UI sections for allow list and block list filter configuration |
app/views/sources/_source_filter_form.html.erb |
Partial for individual filter form fields |
app/assets/javascripts/source_filters.js |
JavaScript for dynamically adding/removing filter forms |
app/assets/stylesheets/sources.scss |
Styling for filter form layout |
lib/ingestors/ingestor.rb |
Integrates filter method into ingestion write process |
app/workers/source_test_worker.rb |
Applies filters during source testing |
test/unit/ingestors/ingestor_test.rb |
Comprehensive tests for filter behavior |
test/fixtures/source_filters.yml |
Test fixtures for various filter configurations |
test/fixtures/materials.yml |
Test materials for filter testing |
test/fixtures/events.yml |
Test events for filter testing |
test/models/source_filter_test.rb |
Empty test file placeholder |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
|
|
||
| delete: function () { | ||
| $(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("true"); |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete function sets _destroy to the string "true", but this should be a boolean value or the string "1" for Rails to recognize it properly in nested attributes. Rails typically expects "1" for true and "0" for false in form submissions. Consider changing to .val(1) or .val("1").
| $(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("true"); | |
| $(this).parents('.source-filter-form').fadeOut().find("input[name$='[_destroy]']").val("1"); |
| val.to_s.downcase.include?(filter_value.downcase) | ||
| # array string match | ||
| elsif %w[target_audience keyword resource_type event_type].include?(filter_by) | ||
| val.any? { |i| i.to_s.casecmp?(filter_value) } |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array comparison using casecmp? could fail if val is nil or not an array. Consider adding a nil check or ensuring val is an array before calling any?, for example: Array.wrap(val).any? { |i| i.to_s.casecmp?(filter_value) }.
| val.any? { |i| i.to_s.casecmp?(filter_value) } | |
| Array.wrap(val).any? { |i| i.to_s.casecmp?(filter_value) } |
| # Never trust parameters from the scary internet, only allow the white list through. | ||
| def source_params | ||
| permitted = [:url, :method, :token, :default_language, :enabled] | ||
| permitted = %i[url method token default_language enabled keyword_filter source_filters] |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter list includes keyword_filter which appears to be from an old implementation that was removed in migration 20251218100418. This should be removed from the permitted parameters list.
| permitted = %i[url method token default_language enabled keyword_filter source_filters] | |
| permitted = %i[url method token default_language enabled source_filters] |
| setup do | ||
| mock_ingestions | ||
| Source.delete_all | ||
| Source.destroy_all |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete_all method was changed to destroy_all. While destroy_all properly triggers callbacks and dependent associations, this change could impact test performance as it loads all records into memory. If this was intentional to ensure proper cleanup of associated source_filters, consider adding a comment explaining why. If not needed for associations, delete_all is more efficient for test cleanup.
| Source.destroy_all | |
| Source.delete_all |
| class SourceFilterTest < ActiveSupport::TestCase | ||
| # test "the truth" do | ||
| # assert true | ||
| # end | ||
| end |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SourceFilterTest file only contains commented-out placeholder tests. The SourceFilter model has complex filtering logic including string matching, array matching, and case-insensitive comparisons that should be covered by unit tests. Consider adding tests for the match method with various filter types and edge cases (nil values, empty arrays, case variations).
| var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") }); | ||
| return Math.max(Math.max.apply(null, existing_list_item_ids) + 1, 0); |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last_id function uses Math.max.apply(null, existing_list_item_ids) which could fail if the array is empty or very large. When the array is empty, this returns -Infinity, and then adding 1 results in -Infinity. The Math.max(..., 0) handles this, but the logic is fragile. Consider using Math.max(0, ...Array.from(existing_list_item_ids)) or returning 0 explicitly when the array is empty for better clarity.
| var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") }); | |
| return Math.max(Math.max.apply(null, existing_list_item_ids) + 1, 0); | |
| var existing_list_item_ids = $(".source-filter-form").map(function (i, c) { return $(c).data("id-in-filter-list") }).get(); | |
| if (existing_list_item_ids.length === 0) { | |
| return 0; | |
| } | |
| var maxId = existing_list_item_ids[0]; | |
| for (var i = 1; i < existing_list_item_ids.length; i++) { | |
| if (existing_list_item_ids[i] > maxId) { | |
| maxId = existing_list_item_ids[i]; | |
| } | |
| } | |
| return maxId + 1; |
| class RemoveKeywordFilterFromSources < ActiveRecord::Migration[7.2] | ||
| def change | ||
| remove_column :sources, :keyword_filter, :string | ||
| end |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration timestamps suggest that AddKeywordFilterToSources (20251208203629) adds a column, and then RemoveKeywordFilterFromSources (20251218100418) removes it shortly after. This creates a temporary column that exists for only 10 days in the migration history. If this was part of the development process, consider squashing these migrations or documenting why the temporary column was needed. This could cause issues if deployments happen between these migration dates.
| class SourceFilter < ApplicationRecord | ||
| belongs_to :source | ||
|
|
||
| auto_strip_attributes :filter_value |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SourceFilter model lacks validation for required fields. Consider adding validations for filter_mode, filter_by, and filter_value to ensure data integrity. For example: validates :filter_mode, :filter_by, :filter_value, presence: true.
| auto_strip_attributes :filter_value | |
| auto_strip_attributes :filter_value | |
| validates :filter_mode, :filter_by, :filter_value, presence: true |
|
|
||
| # string match | ||
| if %w[title url doi description license difficulty_level subtitle city country timezone].include?(filter_by) | ||
| val.to_s.casecmp(filter_value).zero? |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string comparison logic uses casecmp which returns 0 for equal strings, but this only provides case-insensitive equality checking. However, casecmp returns nil when comparing with a non-string value. This could cause a NoMethodError if val is nil. Consider using val.to_s.casecmp?(filter_value) instead, which returns a boolean and handles nil values gracefully.
| val.to_s.casecmp(filter_value).zero? | |
| val.to_s.casecmp?(filter_value) |
| sources_path, class: 'btn btn-default' %> | ||
| </div> | ||
|
|
||
| <template id="source-filter-template" style="display: none"> |
Copilot
AI
Dec 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template element is using inline style style="display: none" which could be moved to CSS for better separation of concerns. Consider adding a CSS class instead, such as .hidden { display: none; } in the stylesheet.
| <template id="source-filter-template" style="display: none"> | |
| <template id="source-filter-template"> |
Summary of changes
Motivation and context
It is useful to not ingest everything from a source. Solves #1192. Important part of the exchange feature in the mTeSS-x project.
Screenshots
Checklist
to license it to the TeSS codebase under the
BSD license.