Skip to main contentIBM ST4SD

Workflow Specification 2.0

Use this page to learn about the new Domain Specific Language (DSL 2.0) of ST4SD and how it works.

DSL 2.0 is the new (and beta) way to define the computational graphs of ST4SD workflows.

Namespace

In DSL 2.0, a Computational Graph consists of Components which can be grouped under Workflow containers. It also has an Entrypoint which points to the root node of the graph, which is an instance of a Component or Workflow template.

A Namespace is simply a container for the Component, Workflow, and Entrypoint definitions which represent the Computational Graph of one ST4SD workflow.

Below is an example of a Namespace containing a single component that prints the message Hello world to the terminal.

entrypoint:
entry-instance: print
execute:
- target: "<entry-instance>"
args:
message: Hello world
components:
- signature:
name: print

Entrypoint

The Optional Entrypoint serves a single purpose. Describe how to execute root Template instance of the Computational Graph.

Its schema is:

# This executes an instance of $template which is called "<entry-instance>"
entry-instance: $template # name of a Component or Workflow template
execute: # an array with exactly 1 entry
- target: <entry-instance> # which instance of a Template to execute.
# In this scope there is only <entry-instance>
args:
$paramName: $value # one for each parameter of the template that
# the "target" points to

The entry-instance field receives the name of a Template and creates an instance of it called <entry-instance>. The execute field then describes how to “execute” the <entry-instance> i.e. how to populate the arguments of the associated Template.

In execute[].args you:

  • must provide values for any parameters in the child $template which do not have default values
  • may override the value of the parameters in $template which have default values

The Template instance that the entrypoint points to can have special parameters which are data references to paths that are external to the workflow. These parameters must be called input.$filename and they must not have default values in the signature of the Template definition. The entrypoint may not explicitly override the values of said parameters, the runtime system will auto-generate them.

Consider a scenario where the Template that the <entry-instance> step points to has a parameter called input.my-input.db. The runtime will post-process the entrypoint.execute[0].args dictionary to include the following key-value pair:

input.my-input.db: "input/my-input.db"

In Assigning values to parameters we describe in more detail how to assign values to parameters of Template instances in general.

Workflow

A Workflow is a Template that describes how to execute a number of Template instances called steps. It has a signature that consists of a unique name and a parameter list. Each such step can consume the outputs of a sibling step, or the parameters of the parent Workflow.

The outputs of a workflow are its steps. The schema of Workflow is:

signature:
name: $Template # the name of this Workflow Template - must be unique
parameters:
- name: $paramName
# optional default value
default: $value # str, number, or dictionary of {str: str/number}
steps: # which steps to instantiate
$stepName: $Template # for example child: simulation-code
execute: # how to execute the steps - one for each entry of steps

In Assigning values to parameters we describe how to assign values to parameters of Template instances.

Component

A Component describes how to execute a task. Just like a Workflow Template, it has a signature that consists of a name and a parameter list.

The outputs of a Component are the paths under its working directory.

The schema of a Component is:

signature:
name: $Template # the name of this Component Template - must be unique
parameters:
- name: $paramName
# optional default value
default: $value # str, number, or dictionary of {str: str/number}
# All the FlowIR fields, except for stage, name, references, and override
command:
executable: str

The above fields are the same as those in the Component section of the Workflow Specification in FlowIR.

For more information, read our documentation on the basic FlowIR component fields.

Assigning values to parameters

Both Component and Workflow templates are instantiated in the same way: by declaring them as a step and adding an entry to an execute block which assigns values to the Template’s parameters.

The value of a parameter can be a number, string, or a key: value dictionary. The body of a Template can reference its parameters like so %(parameterName)s.

When assigning a value to the parameters of a template via the execute[].args dictionary

In execute[].args you:

  • must provide values for any parameters in the child $template which do not have default values
  • may override the value of the parameters in $template which have default values
  • may use OutputReferences to indicate dependencies to steps (definition follows this bullet list)
  • may use %(parentParameter)s to indicate a dependency to the value that the parent parameter has. In turn that can be a dependency to the output of a Template instance or an input file or it might just be a literal constant
  • may use a $key: $value dictionary to propagate a dictionary-type value. At the moment Template can only reference this kind of parameters to set the value of the command.environment field of Components
  • may use %(input.$filename)sto propagate an input file reference from a parent to a step.
    • Eventually a step must apply a DataReferences :$method to the parameter to indicates it wishes to consume the input file

Wanna find out more? Check out our example.

OutputReference

The format of an OutputReference is:

<$stepId>/$optionalPath:$optionalMethod

$stepId is a / separated array of stepNames starting from the scope of the current workflow. For example, the OutputReference <one/child>/file.txt:ref resolves to the absolute path of the file file.txt that the component child produces under the sibling step one which is an instance of a Workflow template. You can find more reference methods in our DataReferences docs.

Example

Here is a simple example which uses one Workflow and one Component template two run 2 tasks.

  • consume-input: prints the contents of an input file called my-input.db
  • consume-sibling: prints the text “my sibling said” followed by stdout of the sibling step <consume-input>
entrypoint:
entry-instance: main
execute:
- target: <entry-instance>
workflows:
- signature:
name: main
parameters:
# special variable with auto-populated value

To try it out, store the above DSL in a file called dsl-params.yaml and run

pip install "st4sd-runtime-core[develop]"

which installs the command-line-tool elaunch.py, followed by:

echo "hello world" >my-input.db
elaunch.py -i my-input.db --failSafeDelays=no -l40 dsl-params.yaml

Differences between DSL 2.0 and FlowIR

There are some differences between DSL 2.0 and FlowIR.

In the current version (0.1.x) of DSL 2.0:

  • we offer support for natural composition of Computational Graphs using Workflow and Component templates
  • the signature field replaces the stage, name, references, and override fields of the component specification in FlowIR
  • settings and inputs flow through parameters, we do not support global/stage environments or variables
  • the fields of components can only contain %(parameter)s references, in the future we are adding support for references to the component %(variable)s too
  • dependencies between components are defined by referencing the output of a producer component in one parameter of the consumer component - DataReferences are reserved for referencing input files only
    • the equivalent of a DataReference for Template instances is an OutputReference

DSL 2.0 will eventually contain a superset of the FlowIR features. However, the current beta version of DSL 2.0 does not support:

  • using variables in the body of a component template
  • FlowIR platforms
  • defining Key Outputs or Interface and Property extraction methods
  • application-dependencies, data files, and manifests