Skip to content

Fix #510: Implement viral attributes logic#631

Draft
javihern98 wants to merge 15 commits intomainfrom
cr-510
Draft

Fix #510: Implement viral attributes logic#631
javihern98 wants to merge 15 commits intomainfrom
cr-510

Conversation

@javihern98
Copy link
Contributor

@javihern98 javihern98 commented Mar 25, 2026

Summary

Implements the VTL 2.2 define viral propagation construct and the viral attribute propagation mechanism per the draft spec.

What changed:

  • New VIRAL_ATTRIBUTE role — a new Role enum value that behaves like ATTRIBUTE except viral attributes are automatically propagated through operators instead of being dropped
  • Operator propagation — all operator categories (binary, unary, aggregation, comparison, conditional, time, validation, membership, set, analytic) now preserve viral attributes in results
  • define viral propagation construct — new ANTLR grammar rules, AST nodes (ViralPropagationDef, EnumeratedVpClause, AggregateVpClause), and parser support for defining propagation rules
  • Propagation registryViralPropagationRegistry stores rules and resolves viral attribute values via resolve_pair (binary ops) and resolve_group (aggregations)
  • Prettify supportdefine viral propagation statements are correctly rendered back to VTL
  • Semantic validation — error codes 1-3-3-1 through 1-3-3-4 for duplicate rules, mixed clause types, and duplicate enumeration combinations

Closes #510

Checklist

  • Code quality checks pass (ruff format, ruff check, mypy)
  • Tests pass (pytest) — 4144 tests, 79 new viral attribute tests
  • Documentation updated (if applicable)

Impact / Risk

  • Breaking change: Components with "role": "ViralAttribute" (legacy format) are now loaded as Role.VIRAL_ATTRIBUTE instead of being silently converted to Role.ATTRIBUTE. This is the correct behavior per the VTL spec. The JSON schema accepts both "Viral Attribute" and "ViralAttribute" for backward compatibility.
  • SDMX compatibility: run_sdmx() does not yet support viral attributes — VTL_ROLE_MAPPING has no viral attribute mapping. A follow-up issue in pysdmx is needed.
  • ANTLR regeneration: Parser and lexer were regenerated with ANTLR 4.9.3. The PROPAGATION token was added to VtlTokens.g4.

Implementation details (excluding AST folder)

New file: src/vtlengine/ViralPropagation/__init__.py

The core propagation engine. Contains:

  • ViralPropagationRule — dataclass storing a single rule definition (name, signature type, target, enumerated clauses, aggregate function, default value)
  • ViralPropagationRegistry — stores rules and resolves viral attribute values:
    • register(rule) — stores a rule keyed by variable or value domain
    • resolve_pair(variable, val_a, val_b) — resolves two values for binary operators (binary clauses checked before unary, then default)
    • resolve_group(variable, values) — resolves N values for aggregation operators (aggregate function or pairwise reduce)
  • Module-level accessor (get_current_registry / set_current_registry) — the Interpreter sets a fresh registry per run() call; operators access it without threading it through signatures

src/vtlengine/Model/__init__.py

  • Added VIRAL_ATTRIBUTE = "Viral Attribute" to the Role enum
  • Added "Viral Attribute" to Role_keys validation list
  • Added get_viral_attributes() and get_viral_attributes_names() methods to Dataset — return only VIRAL_ATTRIBUTE components (separate from get_attributes() which continues to return only ATTRIBUTE)

src/vtlengine/API/_InternalApi.py

  • Changed the "ViralAttribute""Attribute" conversion hack to "ViralAttribute""Viral Attribute" (both the structures and DataStructure loading paths)
  • Added Role.VIRAL_ATTRIBUTE to the nullable default tuple so viral attributes default to nullable=True

src/vtlengine/API/data/schema/json_schema_2.1.json

  • Added "ViralAttribute" to the role enum for backward compatibility (alongside existing "Viral Attribute")

src/vtlengine/Operators/__init__.py

  • Binary.dataset_validation() — filter includes Role.VIRAL_ATTRIBUTE; also collects viral attributes from the other operand (not just the base)
  • Binary.dataset_scalar_validation() / Binary.dataset_set_validation() — filter includes Role.VIRAL_ATTRIBUTE
  • Binary._cleanup_attributes_after_merge() — new static method extracting the attribute cleanup logic; drops non-viral attributes, resolves viral attribute merge suffixes (_x/_y) using registry.resolve_pair()
  • Binary.dataset_scalar_evaluation()cols_to_keep includes viral attribute columns
  • Unary.dataset_validation() — filter includes Role.VIRAL_ATTRIBUTE
  • Unary.dataset_evaluation()cols_to_keep includes viral attribute columns

src/vtlengine/Operators/Aggregation.py

  • Preserves a copy of viral attribute columns before DuckDB aggregation
  • After the main aggregation, applies registry.resolve_group() to each viral attribute's grouped values

src/vtlengine/Operators/Comparison.py

  • Between.validate() — filter includes Role.VIRAL_ATTRIBUTE

src/vtlengine/Operators/General.py

  • Membership.validate() — filter includes Role.VIRAL_ATTRIBUTE

src/vtlengine/Operators/Set.py

  • Intersection.evaluate() and Symdiff.evaluate()not_identifiers list includes get_viral_attributes_names()

src/vtlengine/Operators/Time.py

  • Time_Aggregation.dataset_validation() — filter includes Role.VIRAL_ATTRIBUTE

src/vtlengine/Operators/Validation.py

  • Check.validate() — filter includes Role.VIRAL_ATTRIBUTE

src/vtlengine/Operators/RoleSetter.py

  • Added class ViralAttribute(RoleSetter): role = Role.VIRAL_ATTRIBUTE — enables calc viral attribute in VTL scripts

src/vtlengine/Utils/__init__.py

  • Added VIRAL_ATTRIBUTE import and VIRAL_ATTRIBUTE: ViralAttribute to ROLE_SETTER_MAPPING

src/vtlengine/Interpreter/__init__.py

  • visit_Start() — initializes a fresh ViralPropagationRegistry per run
  • visit_ViralPropagationDef() — new method that validates (no mixed clauses, no duplicate enumerations, no duplicate rules) and registers the propagation rule in the registry
  • Updated visit_Start type check to accept ViralPropagationDef as a valid top-level statement
  • Updated HAVING clause to preserve Role.VIRAL_ATTRIBUTE components

src/vtlengine/Exceptions/messages.py

  • Added 4 new SemanticError codes (1-3-3-1 through 1-3-3-4) for: duplicate variable rule, duplicate value domain rule, mixed clause types, duplicate enumeration combination

Notes

Known limitations (v1):

  • run_sdmx() pathway not supported yet (needs pysdmx viral attribute role support)
  • Value domain-level propagation rules require enhancing Component with a value_domain field (deferred)
  • The aggregate keyword in the VTL 2.2 spec maps to aggr in the grammar (matching the existing VTL token)

Test structure (79 tests):

  • test_viral_role.py — data model unit tests (6 tests)
  • test_viral_operators.py — operator propagation with 1/2/3 viral attributes across unary, binary, scalar, and other operators (58 tests)
  • test_viral_propagation.py — define viral propagation: parsing, end-to-end with enumerated + aggregate rules, multi-attribute propagation, and semantic validation (15 tests)
  • AST string tests in tests/AST/test_AST_String.py — prettify round-trip via viral_propagation.vtl data files (5 tests)

- Add VIRAL_ATTRIBUTE = "Viral Attribute" to Role enum and Role_keys
- Add get_viral_attributes() and get_viral_attributes_names() to Dataset
- Fix _InternalApi.py: convert "ViralAttribute" → "Viral Attribute" (both loading paths)
- Fix nullable default to include VIRAL_ATTRIBUTE
- Add "ViralAttribute" to JSON schema for backward compatibility
- Update existing test to expect VIRAL_ATTRIBUTE instead of ATTRIBUTE
- Replace NotImplementedError in visitViralAttribute with return Role.VIRAL_ATTRIBUTE
- Add ViralAttribute class to RoleSetter.py
- Add VIRAL_ATTRIBUTE to ROLE_SETTER_MAPPING in Utils
- Fix VIRAL_ATTRIBUTE token to match "viral attribute" (lowered Role.value)
- Update Binary.dataset_validation() to keep VIRAL_ATTRIBUTE components
- Collect viral attributes from both operands (not just base_operand)
- Extract _cleanup_attributes_after_merge() for viral attr suffix handling
- Update dataset_scalar_validation/evaluation to keep viral attr columns
- Update Unary.dataset_validation() to keep VIRAL_ATTRIBUTE components
- Update Unary.dataset_evaluation() to include viral attr in cols_to_keep
- Add parametrized tests for numeric, string, boolean, and comparison ops
- Update Between, Time_Aggregation, Check, Membership to keep VIRAL_ATTRIBUTE
- Update Set operators (Intersection, Symdiff) to include viral attr columns
- Update Interpreter HAVING clause to preserve VIRAL_ATTRIBUTE
- Verified Aggregation, Nvl, Check_Hierarchy already correct (no changes needed)
- Add PROPAGATION token to VtlTokens.g4
- Add defViralPropagation, vpSignature, vpBody, vpClause grammar rules
- Regenerate parser/lexer with ANTLR 4.9.3
- Define EnumeratedVpClause, AggregateVpClause, ViralPropagationDef AST nodes
- Implement visitor methods in ASTConstructor
- Update DAG to accept ViralPropagationDef as top-level statement
- Add visit_ViralPropagationDef stub in Interpreter
- Create ViralPropagation package with ViralPropagationRule and ViralPropagationRegistry
- Registry supports resolve_pair (binary) and resolve_group (aggregation)
- Wire registry into Interpreter: initialize per run, register rules from AST nodes
- Wire into Binary._cleanup_attributes_after_merge: use resolve_pair for dual-viral attrs
- Wire into Aggregation.evaluate: propagate viral attrs using resolve_group after grouping
- Move imports to top of files per project convention
- Add error codes 1-3-3-1 through 1-3-3-4 to messages.py
- Validate: no duplicate rules for same variable/valuedomain
- Validate: no duplicate enumeration combinations
- Validate: no mixing enumerated and aggregate clauses
Merged 9 test files into 4 well-organized files with shared fixtures:
- test_viral_role.py: data model unit tests (6 tests)
- test_viral_operators.py: operator propagation — binary, unary, other ops (24 tests)
- test_viral_propagation.py: define viral propagation — parsing, e2e, validation (10 tests)
- test_viral_prettify.py: prettify support (4 tests)

Extracted shared data structure builders (_ds, _id, _me, _va, _at, _run)
to eliminate boilerplate duplication across tests.
- Add viral_propagation.vtl input files for ast_string and prettier tests
- Add reference_viral_propagation.vtl expected output
- Register in params and params_prettier lists in test_AST_String.py
- Remove standalone test_viral_prettify.py
- Layered dataset approach: one base, progressively add 1/2/3 viral attrs
- Parametrize operators × num_viral_attrs (unary, binary, scalar, other)
- Shared propagation rules defined once, reused across parametrized ops
- Multi-attribute test: enumerated (At_1) + aggregate max (At_2) in one script
- 79 tests covering all operator categories with 1, 2, and 3 viral attrs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement viral attributes logic

1 participant