MetaWorkflow Handler and MetaWorkflow Run Handler: Pipeline Automation#11
MetaWorkflow Handler and MetaWorkflow Run Handler: Pipeline Automation#11
Conversation
…culated attributes
Some are specific to the structure of CGAP portal schemas, but tried to generalize as much as possible. Also includes partial draft of dependency validation via topological sort.
…ests for topological sort.
Resulted in edits within test files, including creation of a new test file for topological sort, and changes in imports for magma/metawfl_handler.py.
drio18
left a comment
There was a problem hiding this comment.
Dropping some comments for consideration. Happy to review again whenever you'd like, especially once there are tests available for the dependency sorting/checking. I think I understand the gist of it all and it sounds right, but I didn't go in detail; I'll do so once the tests are in.
Changes included additions of different directed graph global vars for testing -- tests for the topological sort on their way. Also removed function for creating "forward dependencies" -- this version of topological sort works with "backward dependencies".
Addressed some prior comments. Future planned changes include creating a class for the topological sort function to decrease the number of command line arguments that are common among several functions.
Also addressed some prior comments on draft.
Changes included merging some former magma utils fxns into this class, and removing extraneous utils fxns. Also includes new test file for this class.
Also wrote a draft of pytests
includes addition of custom Exception classes
…onstants. Got rid of duplication_flag.
drio18
left a comment
There was a problem hiding this comment.
Leaving a batch of comments, some of which will require further discussion. It looks like there's still a good deal to complete, clean up, and test properly, so we should discuss a plan tomorrow for how best to utilize the remaining time.
| #!/usr/bin/env python3 | ||
|
|
||
| ################################################################# | ||
| # Vars | ||
| ################################################################# |
There was a problem hiding this comment.
No need for the shebang or the headers here and elsewhere. I know Michele does this, but I find it unnecessary clutter.
| TITLE = "title" | ||
|
|
||
| # MetaWorkflow Handler attributes | ||
| PROJECT = "project" | ||
| INSTITUTION = "institution" | ||
| UUID = "uuid" | ||
| META_WORKFLOWS = "meta_workflows" | ||
| ORDERED_META_WORKFLOWS = "ordered_meta_workflows" | ||
| META_WORKFLOW = "meta_workflow" | ||
| NAME = "name" | ||
| DEPENDENCIES = "dependencies" | ||
| ITEMS_FOR_CREATION_PROP_TRACE = "items_for_creation_property_trace" | ||
| ITEMS_FOR_CREATION_UUID = "items_for_creation_uuid" | ||
|
|
||
| # MetaWorkflow Run Handler attributes | ||
| STATUS = "status" | ||
| FINAL_STATUS = "final_status" | ||
| ASSOCIATED_META_WORKFLOW_HANDLER = "meta_workflow_handler" | ||
| ASSOCIATED_ITEM = "associated_item" | ||
| META_WORKFLOW_RUN = "meta_workflow_run" | ||
| META_WORKFLOW_RUNS = "meta_workflow_runs" | ||
| ITEMS_FOR_CREATION = "items_for_creation" | ||
| ERROR = "error" | ||
| # statuses | ||
| PENDING = "pending" | ||
| RUNNING = "running" | ||
| COMPLETED = "completed" | ||
| FAILED = "failed" | ||
| STOPPED = "stopped" |
There was a problem hiding this comment.
Would be good to organize the constants into classes so they can be imported in groups. In general, importing * is not a good practice, and having the constants on classes makes it more clear where they are coming from.
Also, I don't believe all of these constants should be in the magma directory, as some are portal specific, and those should be differentiated. In general, I'd be fine with all this code living in magma_ff for now, but that's something to discuss more with Michele.
| #TODO: the following is here in case dup flag is added in the future | ||
| # MWFR_TO_HANDLER_STEP_STATUS_DICT = { | ||
| # "pending": "pending", | ||
| # "running": "running", | ||
| # "completed": "completed", | ||
| # "failed": "failed", | ||
| # "inactive": "pending", | ||
| # "stopped": "stopped", | ||
| # "quality metric failed": "failed" | ||
| # } |
There was a problem hiding this comment.
I think this is still needed regardless of the duplication flag stuff...
Regardless, it is portal-specific and thus shouldn't live in magma, and there's no need to leave the commented out code around if it's no longer needed.
| ################################################ | ||
| # ValidatedDictionary TODO: eventually make part of dcicutils? | ||
| ################################################ | ||
| class ValidatedDictionary(object): |
There was a problem hiding this comment.
Consider renaming this. It's not a dictionary, but rather a class, so ValidatedAttributes would probably be more apt.
In general, I would stay away from this style of code of arbitrarily placing all key, value pairs in a dictionary on the class itself. I know it's done elsewhere in this project, but it complicates tracing the origin of attributes to someone new to the code as well as placing an unknown number of attributes on the class that are then capable of being accidentally overwritten. As they say for python, "explicit is better than implicit." To my eyes, a superior alternative to this is to set an attribute on the class via __init__ such as self.properties, and then placing the input dictionary there and validating and setting other properties or attributes appropriately.
magma/validated_dictionary.py
Outdated
| if retrieved_attr is None: | ||
| raise AttributeError("attribute %s cannot have value 'None'." % attribute) |
There was a problem hiding this comment.
Why can't an attribute be None? Would a falsy but non-None value be acceptable?
magma_ff/checkstatus.py
Outdated
| for running_mwfr_step_name in self.mwfr_handler_obj.running_steps(): | ||
|
|
||
| # Get run uuid | ||
| run_uuid = self.mwfr_handler_obj.get_step_attr(running_mwfr_step_name, uuid) |
There was a problem hiding this comment.
I don't believe uuid is defined here.
| # and convert property trace(s) to uuid(s) | ||
| else: | ||
| property_traces = getattr(meta_workflow_step, ITEMS_FOR_CREATION_PROP_TRACE, None) | ||
| if not isinstance(property_traces, list): |
There was a problem hiding this comment.
I believe this has to be a list, no?
| item_uuid = make_embed_request( | ||
| self.associated_item_identifier, | ||
| item_prop_trace | ||
| + ".uuid", # TODO: are we assuming the user will include ".uuid" or @id as part of prop trace? | ||
| self.auth_key, | ||
| single_item=True, | ||
| ) | ||
| if not item_uuid: | ||
| raise MetaWorkflowRunHandlerCreationError( | ||
| f"Invalid property trace '{item_prop_trace}' on item with the following ID: {self.associated_item_identifier}" | ||
| ) | ||
| items_for_creation_uuids.append(item_uuid) |
There was a problem hiding this comment.
I don't think this logic will work here, as you may receive multiple UUIDs, not just one, and you have to parse the result to obtain any UUIDs returned.
| @property | ||
| def get_project(self): | ||
| """Retrieves project attribute from the associated item.""" | ||
| return self.retrieved_associated_item.get(PROJECT) | ||
|
|
||
| @property | ||
| def get_institution(self): | ||
| """Retrieves institution attribute from the associated item.""" | ||
| return self.retrieved_associated_item.get(INSTITUTION) |
There was a problem hiding this comment.
As a personal preference, consider using get in a method name for non-attribute/property methods, and using attribute-like names for properties (they live somewhere in-between, but I think of them more like attributes than methods).
No description provided.