-
Notifications
You must be signed in to change notification settings - Fork 23
#330 - Enable deep copy #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds deep copy functionality to the CAS (Common Analysis Structure) object, allowing users to create independent copies of CAS instances with or without duplicating the associated typesystem.
Key changes:
- Added a
deep_copy()method to theCasclass with optional typesystem copying - Implemented comprehensive test coverage for both copying modes (with and without typesystem)
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| cassis/cas.py | Implements the deep_copy() method to create deep copies of CAS objects, handling sofas, views, feature structures, and their references |
| tests/test_cas.py | Adds two test cases verifying deep copy functionality with and without typesystem copying |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cassis/cas.py
Outdated
| for item in fs[feature.name]: | ||
| if hasattr(item, 'xmiID') and item.xmiID is not None: | ||
| referenced_list.append(item.xmiID) | ||
| if len(referenced_list) > 0: |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing setdefault call for referenced_lists dictionary. This will raise a KeyError when trying to append to a non-existent key. Should be referenced_lists.setdefault(feature.name, []) before the append, similar to how it's done for references on line 896 and referenced_arrays on line 910.
| if len(referenced_list) > 0: | |
| if len(referenced_list) > 0: | |
| referenced_lists.setdefault(feature.name, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot is correct. However, lines 877, 903 and 904 are actually dead code. These lines should be removed instead.
cassis/cas.py
Outdated
| try: | ||
| all_copied_fs[current_ID][feature] = all_copied_fs[reference_ID] | ||
| except KeyError as e: | ||
| print("Reference feature", current_ID, "not found.", feature, e) |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using print for error reporting is inappropriate. This should use proper logging (e.g., warnings.warn() or logger.error()) or re-raise the exception. Silent failures via print statements can hide critical bugs in production code.
| print("Reference feature", current_ID, "not found.", feature, e) | |
| warnings.warn(f"Reference feature {current_ID} not found for feature '{feature}': {e}") |
cassis/cas.py
Outdated
| ts = cas_copy.typesystem | ||
| array_copy = ts.get_type("FSArray")() |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable shadowing: ts is already defined in the outer scope (line 841) and used throughout the function. Reusing the same variable name here creates confusion. Consider using a more descriptive name like target_ts or removing this line entirely since cas_copy.typesystem is already the same as ts.
| ts = cas_copy.typesystem | |
| array_copy = ts.get_type("FSArray")() | |
| array_copy = cas_copy.typesystem.get_type("FSArray")() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot is correct. However, line 927 is an unnecessary assignment and should be removed instead.
cassis/cas.py
Outdated
| for feature in t.all_features: | ||
| if ts.is_primitive(feature.rangeType) or ts.is_primitive_collection(feature.rangeType): | ||
| fs_copy[feature.name] = fs.get(feature.name) | ||
| elif feature.name not in ["FSArray", "sofa"]: |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The condition feature.name not in [\"FSArray\", \"sofa\"] is unclear. Consider using named constants or adding a comment explaining why these specific features are excluded from reference tracking.
| elif feature.name not in ["FSArray", "sofa"]: | |
| # Exclude FSArray and sofa features from reference tracking as they are handled specially | |
| elif feature.name not in [TYPE_NAME_FS_ARRAY, TYPE_NAME_SOFA]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if introducing these constants is helpful, but the comment may very well be.
cassis/cas.py
Outdated
| Returns a deep copy of the current Cas | ||
| :param copy_typesystem: whether to copy the original typesystem or not |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is incomplete and doesn't follow the existing style in the codebase. It should use the standard format with Args and Returns sections, similar to other methods like to_json(). Also consider adding a note about what 'deep copy' means in this context (e.g., all feature structures, views, and sofas are copied).
| Returns a deep copy of the current Cas | |
| :param copy_typesystem: whether to copy the original typesystem or not | |
| Create and return a deep copy of this CAS object. | |
| All feature structures, views, and sofas are copied. If `copy_typesystem` is True, the typesystem is also deep-copied; | |
| otherwise, the original typesystem is shared between the original and the copy. | |
| Args: | |
| copy_typesystem (bool): Whether to copy the original typesystem or not. If True, the typesystem is deep-copied. | |
| Returns: | |
| Cas: A deep copy of this CAS object. |
cassis/cas.py
Outdated
| referenced_list = [] | ||
| for item in fs[feature.name].elements: |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The variable name referenced_list is reused for both list-type features (line 899) and array-type features (line 906). While they're in different scopes, using distinct names like referenced_array_elements for the array case would improve code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this argument. However it does not matter, because lines 898-902 are actually dead code (connected to unused variable referenced_lists see comment above). These lines should be removed instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please remove them in a follow-up commit.
|
@dmnc-grdnr Could you please consider the Copilot review. If you believe and of its review remarks are wrong, please comment them. |
|
Oh, and thank you again for another PR :) |
|
I reviewed the Copilot suggestions agreed with most, but found that a few lines are actually obsolete and should be removed instead of modified. This is my first pull request and I am unsure how to make the changes without breaking things. Could you please point me to what to do next? |
reckart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. To continue, it would be great if you could simply address the comments by updating the code and then committing the changes into this branch. Once the changes are in, the review will be repeated. Please do not use the "amend" function of git, just add another commit. Avoid force pushes, then you can't really break anything. Thanks!
cassis/cas.py
Outdated
| if hasattr(fs[feature.name], 'xmiID') and fs[feature.name].xmiID is not None: | ||
| references.setdefault(feature.name, []) | ||
| references[feature.name].append((fs.xmiID, fs[feature.name].xmiID)) | ||
| elif ts.is_list(feature.rangeType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an "if" and an "elif" case here but no "else". Maybe the "else" can never happen in practice, but if it happens, we may be missing to copy some data. So if the "else" case can never, happen, then it should be implemented and thrown an exception at least so that if it happens, we know something is going very wrong.
cassis/cas.py
Outdated
| referenced_list = [] | ||
| for item in fs[feature.name].elements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please remove them in a follow-up commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add randomized tests for the deep copy. We have test_multi_type_random_serialization_deserialization and test_multi_feature_random_serialization_deserialization which create randomized CASes for (de)serialization and compare them afterwards. These can be used as templates for randomized deep_copy tests.
|
I was finally able to iron out all of the issues that appeared after including the random CAS tests. They were extremely helpful! I think, I made all the changes that you and Copilot required. Could you please have another look? Thanks! |
This PR adds the deep copy functionality to the CAS object.
Tests for the function was added to test_cas.py
I have verified the functionality on different CAS files, but due to the complexity of the data structure, there might be something I have overlooked.
Therefore feedback is highly appreciated!