Directly running workflows
This page will teach you how to run a workflow directly using the elaunch.py
command line utility.
Users comfortable with installing python modules and the FlowIR should be able to follow this content.
- Prepare a virtual environment
- Execute a workflow
- Provide input files and override data files
- Store outputs to S3
Prepare a virtual environment
We recommend using a virtual environment with a modern version of python 3 (3.7+) to install ST4SD Core like so:
python3 -m venv --copies st4sd. ./st4sd/bin/activatepip install "st4sd-runtime-core[develop]"
If you are installing ST4SD on a machine that can submit tasks to IBM Spectrum LSF
then you should also install the official lsf-python-api
python module.
. /path/to/profile.lsfgit clone https://github.com/IBMSpectrumComputing/lsf-python-api.gitcd lsf-python-apipython3 setup.py buildpython3 setup.py install
Check the homepage of lsf-python-api
for more information.
After installing the lsf-python-api
python module you can launch workflows which contain components that use the lsf
backend.
Execute a workflow
Use the elaunch.py
command-line utility that is included by installing st4sd-runtime-core
to run your workflows.
For example, you can run the toy workflow sum-numbers
like so:
git clone https://github.com/st4sd/sum-numbers.gitelaunch.py --nostamp -l40 sum-numbers
Provide input files and override data files
ST4SD workflows support 3 flavours of inputs:
- Input files - files user must provide when they execute the workflow
- Data files - configuration files that optionally can be overridden
- User variables - user provided values for workflow variables
The tutorial contains more information about inputs.
Example
Here’s an example of a workflow that uses an input
file, a data
file, and a variable.
First, prepare the workflow definition files by running the following on your terminal:
cat <<EOF >workflow.yamlvariables:default:global:var: hellocomponents:- name: hellocommand:
The above script creates the following file structure:
workflow.yaml # the workflow definitionmanifest.yaml # manifest that maps "shared_data" to "data"foo.txt # the input filemy_vars.yaml # file containing user variablesshared_data # the directory containing "data" files└─ bar.txt
Activate the virtual environment that you used to install st4sd-runtime-core
and then run:
elaunch.py -l40 --nostamp \--failSafeDelays=no \--input foo.txt \--variables my_vars.yaml \--manifest manifest.yaml workflow.yamlecho "\n\nComponent stdout was:"cat workflow.instance/stages/stage0/hello/out.stdout
If you omit the --variables
parameter then the var
variable will receive the value that the default
platform sets to it.
You can override the contents of the data
file bar.txt
by adding the argument: --data path/to/a/different/bar.txt
.
Finally, you can use the --data
and --input
parameters multiple times.
Store outputs to S3
Workflows may optionally define key-outputs
which which elaunch.py
may upload to S3 after the experiment terminates.
You can instruct elaunch.py
to upload key-outputs
to S3 via the the --s3StoreToURI
parameter.
When setting the parameter --s3StoreToURI
you must also use exactly one of the parameters --s3AuthWithEnvVars
or --s3AuthBearer64
.
Example:
export bucket="a-bucket"export path_in_bucket="optional/path"export S3_ACCESS_KEY_ID="s3 access key id"export S3_SECRET_ACCESS_KEY="s3 secret access key"export S3_END_POINT="s3 end point"elaunch.py --s3StoreToURI s3://${bucket}/${path_in_bucket} \--s3AuthWithEnvVars \
When --s3StoreToURI
is set, after the experiment terminates, elaunch.py
will start uploading the key-outputs
to the S3 bucket you provided under the specifeid ${path_in_bucket}
.
elaunch.py
replaces occurences of the %(instanceDir)s
literal in --s3StoreToURI
with the name of the experiment instance.
For example, you can use this to store the key-outputs
of multiple workflow instances in the same bucket.
Alternatively, you can base64-encode the JSON representation of the dictionary {"S3_ACCESS_KEY_ID": "val", "S3_SECRET_ACCESS_KEY": "val", "S3_END_POINT": "val"}
and use the --s3AuthBearer64
parameter instead:
export bucket="a-bucket"export path_in_bucket="optional/path"export json="{\"S3_ACCESS_KEY_ID\": \"val\", \"S3_SECRET_ACCESS_KEY\": \"val\", \"S3_END_POINT\": \"val\"}"export s3_auth=`echo "${json}" | base64`elaunch.py --s3StoreToURI s3://${bucket}/${path_in_bucket} \--s3AuthBearer64 \path/to/workflow