-
Notifications
You must be signed in to change notification settings - Fork 2
Workflow class in pyiron_contrib as basis for ironflow #194
Description
Summary
The functionality of @liamhuber workflow class in pyiron_contrib could be a good starting point for interesting extensions and merging pyiron_base and ironflow.
@liamhuber, I had only now a chance to install your workflow demo class and play with it. It is really great work. Even though it is presently only a demo I see a lot of potential. I tried in the attached Jupyter Notebook to briefly summarize and sketch some of my ideas. It is only a very preliminary sketch and it would be good to talk about it. The notebook contains many ideas and suggestions. Some of them are probably straightforward to implement, others may be hard or impossible. It is more meant as a collection of ideas rather than a list of tasks.
Node-based pyiron concept
Based on the workflow example implemented in pyiron_contrib by Liam a few suggestions/ideas.
%config IPCompleter.greedy=True
import sys
sys.path.insert(1, '..')
sys.path;
import matplotlib.pylab as pltfrom pyiron_contrib.workflow.node import Node, node
from pyiron_contrib.workflow.workflow import Workflow
from pyiron_contrib.workflow import nodesMake Workflow the next generation pyiron Project object
wf = Workflow('my_first_workflow')Design concepts
A node with a single output should view this output directly at object level
- modify __repr__ etc.
- allow short notation of link
- see example below
wf.structure = nodes.bulk_structure(repeat=3, cubic=True, element="Al")This node object should behave as similarly as possible to the output object, i.e., its representation should be the same
wf.structure.outputs.structure.value;
wf.structure; # has same representation (__str__, __repr__, etc.)It should be also shorter to access the output objectwf.structure.value.plot3d()
wf.structure.plot3d() # nice, but may be too much overloadingTo set a node link the node reference should be sufficient
- wf.calc = nodes.calc_md(job=wf.engine.outputs.job) can be replaced by
- wf.calc = nodes.calc_md(job=wf.engine)
The original code block can then be simplified:
wf.structure = nodes.bulk_structure(repeat=3, cubic=True, element="Al")
wf.engine = nodes.lammps(structure=wf.structure.outputs.structure)
wf.calc = nodes.calc_md(job=wf.engine.outputs.job,
update_on_instantiation=False,
# run_automatically=False
)
wf.plot = nodes.scatter(
x=wf.calc.outputs.steps,
y=wf.calc.outputs.temperature
)New version (pseudocode)
wf = Workflow('my_first_workflow', domain='pyiron.atomistic') # set default domain for nodes
structure = wf.create.structure.bulk(repeat=3, cubic=True, element="Al") # adds node to wf, structure.wf stores wf
engine = wf.create.engines.lammps(structure=structure.outputs.structure)
calc = wf.create.calc_md(job=engine,
update_on_instantiation=False # should be set as default in node definition for longer jobs
)
wf.create.plot.scatter(
x=wf.calc.outputs.steps, # more than one output, explicit statement required
y=wf.calc.outputs.temperature
)Execute workflow
from pyiron_contrib.workflow import Executor
exe = Executor(server='my_server', cores=24, queue='my_queue') # taskmanager
exe.list_servers(), exe.task_table(), ... # provide utility functionsUtility functions
wf.run() # corresponds to update in the original version (modal)
wf.submit(executor=exe) # run in background, submit to queue, etc., requires de-(serialization)wf.status # finished, aborted, unfinished nodes etc.
wf.task_table() # filtered for specific workflowwf.visualize() # show (static) graph network
wf.ironflow() # show workflow as graph in ironflow wf.save() # save also local nodes (provide all info to run on any machine)
wf.load()wf.__repr__() # show output of final node (in the above case the matplotlib graph),
# maybe with 'signature' of the workflow (name, status, etc.)Create new workflow with same workflow but modified parameters
wf(structure=wf.create.structure.bulk(repeat=3, cubic=True, element="Cu"), name='wf_Cu')Extend data objects
In nodes such as calc_MD the number of output objects/values is rather large. It would be therefore highly attractive to replace them by data objects, which could provide additional functionality (e.g. ontology). The decorator should automatically translate the object to provide all fields. Alternatively, we could use ontolgy to connect individual elements (e.g. positions) with such a data object (ontology knows that positions is part of MD_data.
from dataclasses import dataclass
import numpy as np
@dataclass
class MD_data:
positions: list | np.ndarray = None
temperature:int = 0
# ...out = MD_data
out.temperature
out.positions = np.linspace(0, 100, 10)
out.positions.shape(10,)
Extend input and output
File objects
Make sure that files are correctly transfered, copied, etc.
file_out_1 = node1(file_in)
file_out_2 = node2(file_out_1) Storage
Include (HDF5) storage in workflow object
When node gets output, put (filtered) data in workflow data store. Should replace our present HDF5 storage.