Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Workflow class in pyiron_contrib as basis for ironflow #194

@JNmpi

Description

@JNmpi

Summary

The functionality of @liamhuber workflow class in pyiron_contrib could be a good starting point for interesting extensions and merging pyiron_base and ironflow.

@liamhuber, I had only now a chance to install your workflow demo class and play with it. It is really great work. Even though it is presently only a demo I see a lot of potential. I tried in the attached Jupyter Notebook to briefly summarize and sketch some of my ideas. It is only a very preliminary sketch and it would be good to talk about it. The notebook contains many ideas and suggestions. Some of them are probably straightforward to implement, others may be hard or impossible. It is more meant as a collection of ideas rather than a list of tasks.

Node-based pyiron concept

Based on the workflow example implemented in pyiron_contrib by Liam a few suggestions/ideas.

%config IPCompleter.greedy=True

import sys
sys.path.insert(1, '..')

sys.path;

import matplotlib.pylab as plt
from pyiron_contrib.workflow.node import Node, node
from pyiron_contrib.workflow.workflow import Workflow
from pyiron_contrib.workflow import nodes

Make Workflow the next generation pyiron Project object

wf = Workflow('my_first_workflow')

Design concepts

A node with a single output should view this output directly at object level

  • modify __repr__ etc.
  • allow short notation of link
  • see example below
wf.structure = nodes.bulk_structure(repeat=3, cubic=True, element="Al")

This node object should behave as similarly as possible to the output object, i.e., its representation should be the same

wf.structure.outputs.structure.value;
wf.structure; # has same representation (__str__, __repr__, etc.)
It should be also shorter to access the output object
wf.structure.value.plot3d()
wf.structure.plot3d()  # nice, but may be too much overloading

To set a node link the node reference should be sufficient

  • wf.calc = nodes.calc_md(job=wf.engine.outputs.job) can be replaced by
  • wf.calc = nodes.calc_md(job=wf.engine)

The original code block can then be simplified:

wf.structure = nodes.bulk_structure(repeat=3, cubic=True, element="Al")
wf.engine = nodes.lammps(structure=wf.structure.outputs.structure)
wf.calc = nodes.calc_md(job=wf.engine.outputs.job, 
                        update_on_instantiation=False, 
                        # run_automatically=False
                       )
wf.plot = nodes.scatter(
    x=wf.calc.outputs.steps, 
    y=wf.calc.outputs.temperature
)

New version (pseudocode)

wf = Workflow('my_first_workflow', domain='pyiron.atomistic') # set default domain for nodes

structure = wf.create.structure.bulk(repeat=3, cubic=True, element="Al")  # adds node to wf, structure.wf stores wf
engine = wf.create.engines.lammps(structure=structure.outputs.structure)
calc = wf.create.calc_md(job=engine, 
                        update_on_instantiation=False  # should be set as default in node definition for longer jobs
                       )
wf.create.plot.scatter(
    x=wf.calc.outputs.steps,      # more than one output, explicit statement required
    y=wf.calc.outputs.temperature
)

Execute workflow

from pyiron_contrib.workflow import Executor

exe = Executor(server='my_server', cores=24, queue='my_queue')  # taskmanager
exe.list_servers(), exe.task_table(), ...  # provide utility functions

Utility functions

wf.run()    # corresponds to update in the original version (modal)
wf.submit(executor=exe) # run in background, submit to queue, etc., requires de-(serialization)
wf.status # finished, aborted, unfinished nodes etc.
wf.task_table() # filtered for specific workflow
wf.visualize() # show (static) graph network
wf.ironflow()  # show workflow as graph in ironflow 
wf.save()  # save also local nodes (provide all info to run on any machine)
wf.load()
wf.__repr__() # show output of final node (in the above case the matplotlib graph), 
              # maybe with 'signature' of the workflow (name, status, etc.)

Create new workflow with same workflow but modified parameters

wf(structure=wf.create.structure.bulk(repeat=3, cubic=True, element="Cu"), name='wf_Cu')

Extend data objects

In nodes such as calc_MD the number of output objects/values is rather large. It would be therefore highly attractive to replace them by data objects, which could provide additional functionality (e.g. ontology). The decorator should automatically translate the object to provide all fields. Alternatively, we could use ontolgy to connect individual elements (e.g. positions) with such a data object (ontology knows that positions is part of MD_data.

from dataclasses import dataclass
import numpy as np

@dataclass
class MD_data:
    positions: list | np.ndarray = None
    temperature:int = 0 
    # ...
out = MD_data
out.temperature
out.positions = np.linspace(0, 100, 10)
out.positions.shape
(10,)

Extend input and output

File objects

Make sure that files are correctly transfered, copied, etc.

file_out_1 = node1(file_in)
file_out_2 = node2(file_out_1)  

Storage

Include (HDF5) storage in workflow object

When node gets output, put (filtered) data in workflow data store. Should replace our present HDF5 storage.

Metadata

Metadata

Assignees

Labels

enhancementNew feature; can be answer partner of "feature request"

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions