Skip to main contentIBM ST4SD

Directly running workflows

This page will teach you how to run a workflow directly using the elaunch.py command line utility. Users comfortable with installing python modules and the FlowIR should be able to follow this content.

Prepare a virtual environment

We recommend using a virtual environment with a modern version of python 3 (3.7+) to install ST4SD Core like so:

python3 -m venv --copies st4sd
. ./st4sd/bin/activate
pip install "st4sd-runtime-core[develop]"

If you are installing ST4SD on a machine that can submit tasks to IBM Spectrum LSF then you should also install the official lsf-python-api python module.

. /path/to/profile.lsf
git clone https://github.com/IBMSpectrumComputing/lsf-python-api.git
cd lsf-python-api
python3 setup.py build
python3 setup.py install

Check the homepage of lsf-python-api for more information.

After installing the lsf-python-api python module you can launch workflows which contain components that use the lsf backend.

Execute a workflow

Use the elaunch.py command-line utility that is included by installing st4sd-runtime-core to run your workflows. For example, you can run the toy workflow sum-numbers like so:

git clone https://github.com/st4sd/sum-numbers.git
elaunch.py --nostamp -l40 sum-numbers

Provide input files and override data files

ST4SD workflows support 3 flavours of inputs:

  1. Input files - files user must provide when they execute the workflow
  2. Data files - configuration files that optionally can be overridden
  3. User variables - user provided values for workflow variables

The tutorial contains more information about inputs.

Example

Here’s an example of a workflow that uses an input file, a data file, and a variable.

First, prepare the workflow definition files by running the following on your terminal:

cat <<EOF >workflow.yaml
variables:
default:
global:
var: hello
components:
- name: hello
command:

The above script creates the following file structure:

workflow.yaml # the workflow definition
manifest.yaml # manifest that maps "shared_data" to "data"
foo.txt # the input file
my_vars.yaml # file containing user variables
shared_data # the directory containing "data" files
└─ bar.txt

Activate the virtual environment that you used to install st4sd-runtime-core and then run:

elaunch.py -l40 --nostamp \
--failSafeDelays=no \
--input foo.txt \
--variables my_vars.yaml \
--manifest manifest.yaml workflow.yaml
echo "\n\nComponent stdout was:"
cat workflow.instance/stages/stage0/hello/out.stdout

If you omit the --variables parameter then the var variable will receive the value that the default platform sets to it.

You can override the contents of the data file bar.txt by adding the argument: --data path/to/a/different/bar.txt. Finally, you can use the --data and --input parameters multiple times.

Store outputs to S3

Workflows may optionally define key-outputs which which elaunch.py may upload to S3 after the experiment terminates.

You can instruct elaunch.py to upload key-outputs to S3 via the the --s3StoreToURI parameter. When setting the parameter --s3StoreToURI you must also use exactly one of the parameters --s3AuthWithEnvVars or --s3AuthBearer64.

Example:

export bucket="a-bucket"
export path_in_bucket="optional/path"
export S3_ACCESS_KEY_ID="s3 access key id"
export S3_SECRET_ACCESS_KEY="s3 secret access key"
export S3_END_POINT="s3 end point"
elaunch.py --s3StoreToURI s3://${bucket}/${path_in_bucket} \
--s3AuthWithEnvVars \

When --s3StoreToURI is set, after the experiment terminates, elaunch.py will start uploading the key-outputs to the S3 bucket you provided under the specifeid ${path_in_bucket}. elaunch.py replaces occurences of the %(instanceDir)s literal in --s3StoreToURI with the name of the experiment instance. For example, you can use this to store the key-outputs of multiple workflow instances in the same bucket.

Alternatively, you can base64-encode the JSON representation of the dictionary {"S3_ACCESS_KEY_ID": "val", "S3_SECRET_ACCESS_KEY": "val", "S3_END_POINT": "val"} and use the --s3AuthBearer64 parameter instead:

export bucket="a-bucket"
export path_in_bucket="optional/path"
export json="{\"S3_ACCESS_KEY_ID\": \"val\", \"S3_SECRET_ACCESS_KEY\": \"val\", \"S3_END_POINT\": \"val\"}"
export s3_auth=`echo "${json}" | base64`
elaunch.py --s3StoreToURI s3://${bucket}/${path_in_bucket} \
--s3AuthBearer64 \
path/to/workflow