info icon

This page assumes you are familiar with writing basic experiments and running them locally using the elaunch.py command line tool. If you need a refresher take a moment to read our docs before continuing any further.

Here, we are using DSL 2.0, if you need to understand the previous syntax check out the FlowIR docs and FlowIR tutorial.

Requirements
An experiment that has key-outputs
An experiment that has an interface

Requirements

An understanding of how to run a ST4SD workflow locally.
An understanding of how to write a basic ST4SD workflow
A python 3.9+ interpreter

A virtual environment with the st4sd-runtime-core python module

python -m venv venv
. ./venv/bin/activate
pip install "st4sd-runtime-core[develop]>=2.5.1"
Copy to clipboard

A local copy of https://github.com/st4sd/st4sd-examples

Clone the github repository and then cd into its sub-directory tutorials/2-experiments-with-interface

git clone https://github.com/st4sd/st4sd-examples.git
Copy to clipboard

An experiment that has key-outputs

All experiments produce files, but not all generated files are equally important. To this end ST4SD has the concept of key-outputs. These are files, and directories, that an experiment produces which the developers of the experiment consider important.

Make sure your working directory is the sub-directory tutorials/2-experiments-with-interface of the directory you stored https://github.com/st4sd/st4sd-examples in.

Here is a an example of an experiment with a key-output:

entrypoint:
  entry-instance: hello
  execute:
  - target: <entry-instance>
    args:
      message: Hello world
  output:
    - name: greeting
      data-in: <entry-instance>:output
Copy to clipboard

File: 0-key-outputs.yaml

Run it like so:

elaunch.py -l40 --nostamp 0-key-outputs.yaml
Copy to clipboard

The output field in the entrypoint dictionary defines the key-outputs of this experiment:

entrypoint:
  # ... other fields ...
  output:
    - name: greeting
      data-in: <entry-instance>:output
Copy to clipboard

This experiment has a single key-output called greeting. The data associated with this key-output is actually the stdout of the <entry-instance> step which is an instance of the hello component. As the experiment finishes producing this key-output the $INSTANCE_DIR/output/output.json file is updated to reflect the state of this experiment.

Here’s an example of output.json:

{
    "greeting": {
        "creationtime": "1725374555.6836693",
        "description": "just a friendly greeting",
        "filename": "out.stdout",
        "filepath": "stages/stage0/entry-instance/out.stdout",
        "final": "yes",
        "production": "yes",
        "type": "",
Copy to clipboard

While the experiment is running, the runtime system asynchronously updates this file with metadata about the generated key-outputs of the experiment. In this example, there is just one key-output called greeting. For more information on key-outputs check out our documentation.

If you are running experiments on the cloud and are instructing the runtime system to register them into the ST4SD datastore you may also use the ST4SD python API to download the key-outputs of your experiment instances.

Exercises

Use elaunch.py to run 0-key-outputs.yaml and look at the file containing the key-output metadata.
Write a new experiment that has a single component called nested inside a workflow. See the example on nested workflows to refresh your memory on how to write experiments that contain both Workflow and Component templates. Add a key-output which points to the stdout of your component (use an OutputReference that points to the :output of your component’s instance).

An experiment that has an interface

Some virtual experiments define interfaces which make it simpler for users to retrieve the input systems and measured properties from executions of that virtual experiment.

The interface of a virtual experiment defines:

The specification used to describe input systems it processes e.g. SMILEs for small molecules
Instructions to extract the input systems from input data
Instructions to extract the values of properties that the virtual experiment computes

Once a virtual experiment has an interface ST4SD can return a pandas.DataFrame containing the properties calculated by instances of the virtual experiment, as well as the ids of the input systems that an instance processed. This functionality is provided via the st4sd-datastore API and the st4sd-runtime-service API. See using a virtual experiment interface for further information.

In this example we will work with a virtual experiment which:

extracts the IDs of its input systems
has 2 key-outputs that correspond to 2 measured properties of the interface
uses builtin hooks to extract the measured properties from the key-outputs

The DSL of the experiment is :

entrypoint:
  interface:
    description: Counts vowels in words
    inputSpec:
      namingScheme: words
      inputExtractionMethod:
        csvColumn:
          source:
            path: input/words.csv
Copy to clipboard

File: 1-interface.package/conf/dsl.yaml

The interface contains a human readable description of the experiment under entrypoint.interface.description.

entrypoint:
  interface:
    description: Counts vowels in words
Copy to clipboard

Then, in entrypoint.interface.inputspec it uses the builtin input extraction method csvColumn to extract the ids of the systems it processes:

entrypoint:
  interface:
    inputSpec:
      namingScheme: words
      inputExtractionMethod:
        csvColumn:
          source:
            path: input/words.csv
          args:
Copy to clipboard

It instructs the method to read the CSV file input/words.csv (i.e. the input file) and treat every row of the CSV as one input system whose identifier lies in the column word.

Following that, it uses the builtin property extraction method csvDataFrame twice to measure its 2 properties Vowels and Letters from the key-outputs vowels and letters respectively.

entrypoint:
  interface:
    propertiesSpec:
    - name: Vowels
      propertyExtractionMethod:
        csvDataFrame:
          source:
            keyOutput: vowels
          args:
Copy to clipboard

The csvDataFrame property extraction method expects a CSV file which has the columns input-id and ${the property name}. One of the requirements for using a ST4SD interface is that the property names start with a capital letter. One of the requirements of the csvDataFrame is that there should be a column with the same name as the property name that is being extracted. Another is that there should be a column called input-id.

In this example the components happen to produce key-output CSV files which contain a properly named column for the values of properties but instead of using the input-id column they use the column word. To account for this inconsistency, the developers of the workflow use the renameColumns argument of the csvDataFrame property extraction method. Via renameColumns they instruct csvDataFrame to treat the column word as if it were called input-id.

Notice that the entrypoint expects an input file called words.csv:

entrypoint:
    ...
    execute:
    - target: <entry-instance>
      args:
        words_file: input/words.csv:ref
Copy to clipboard

This means that you have to create a CSV file called words.csv and use it as an input for (via the -i arg) to the workflow.

To run this experiment, you can copy/paste the following to your terminal:

: create the input file
cat<<EOF >words.csv
word;
hello;
awesome;
world;
EOF

: launch the experiment
Copy to clipboard

If you are running experiments on the cloud and are instructing the runtime system to register them into the ST4SD datastore you may also use the ST4SD python API to download the measured properties of your experiment instances.

Exercises

Use elaunch.py to run 1-interface.package. Then look at the files:
- $INSTANCE_DIR/output/output.json
- $INSTANCE_DIR/output/input-ids.json
- $INSTANCE_DIR/output/properties.csv
Update the experiment to use a custom python hook for extracting the measured properties from the key-outputs. The documentation for the hookGetProperties hook is here.

What’s next?

More information on running experiments directly, i.e. via elaunch.py here
More information on the DSL of ST4SD i.e. how to write experiments here
More information on how to structure and test your experiments here
More information on writing experiments with interfaces here

Edit this page on GitHub

ST4SD Core: Writing experiments

ST4SD Core: Restarting

Adding an interface to experiments

Requirements

An experiment that has key-outputs

Exercises

An experiment that has an interface

Exercises

What’s next?