Adding an interface to experiments
This page assumes you are familiar with writing basic experiments and running them locally using the elaunch.py command line tool. If you need a refresher take a moment to read our docs before continuing any further.
Requirements
An understanding of how to run a ST4SD workflow locally.
An understanding of how to write a basic ST4SD workflow
A python 3.9+ interpreter
A virtual environment with the
st4sd-runtime-core
python modulepython -m venv venv. ./venv/bin/activatepip install "st4sd-runtime-core[develop]"A local copy of https://github.com/st4sd/st4sd-examples
Clone the github repository and then cd into its sub-directory
tutorials/2-experiments-with-interface
git clone https://github.com/st4sd/st4sd-examples.git
An experiment that has key-outputs
All experiments produce files, but not all generated files are equally important. To this end ST4SD has the concept of key-outputs. These are files, and directories, that an experiment produces which the developers of the experiment consider important.
Here is a an example of an experiment with a key-output:
entrypoint:entry-instance: helloexecute:- target: <entry-instance>args:message: Hello worldoutput:- name: greetingdata-in: <entry-instance>:output
File: 0-key-outputs.yaml
Run it like so:
elaunch.py -l40 --nostamp 0-key-outputs.yaml
The output
field in the entrypoint
dictionary defines the key-outputs of this experiment:
entrypoint:# ... other fields ...output:- name: greetingdata-in: <entry-instance>:output
This experiment has a single key-output called greeting
. The data associated with this key-output is actually the stdout
of the <entry-instance>
step which is an instance of the hello
component. As the experiment finishes producing this key-output the $INSTANCE_DIR/output/output.json
file is updated to reflect the state of this experiment.
Here’s an example of output.json
:
{"greeting": {"creationtime": "1725374555.6836693","description": "just a friendly greeting","filename": "out.stdout","filepath": "stages/stage0/entry-instance/out.stdout","final": "yes","production": "yes","type": "",
While the experiment is running, the runtime system asynchronously updates this file with metadata about the generated key-outputs of the experiment. In this example, there is just one key-output called greeting
. For more information on key-outputs check out our documentation.
If you are running experiments on the cloud and are instructing the runtime system to register them into the ST4SD datastore you may also use the ST4SD python API to download the key-outputs of your experiment instances.
Exercises
- Use
elaunch.py
to run0-key-outputs.yaml
and look at the file containing the key-output metadata. - Write a new experiment that has a single component called nested inside a workflow. See the example on nested workflows to refresh your memory on how to write experiments that contain both Workflow and Component templates. Add a key-output which points to the stdout of your component (use an OutputReference that points to the
:output
of your component’s instance).
An experiment that has an interface
Some virtual experiments define interfaces which make it simpler for users to retrieve the input systems and measured properties from executions of that virtual experiment
The interface
of a virtual experiment defines:
- The specification used to describe
input
systems it processes e.g. SMILEs for small molecules - Instructions to extract the
input
systems from input data - Instructions to extract the values of
properties
that the virtual experiment computes
Once a virtual experiment has an interface
ST4SD can return a pandas.DataFrame
containing the properties calculated by instances of the virtual experiment, as well as the ids of the input
systems that an instance processed. This functionality is provided via the st4sd-datastore
API and the st4sd-runtime-service
API. See using a virtual experiment interface for further information.
In this example we will work with a virtual experiment which:
- extracts the IDs of its input systems
- has 2 key-outputs that correspond to 2 measured properties of the interface
- uses builtin hooks to extract the measured properties from the key-outputs
The DSL of the experiment is :
entrypoint:interface:description: Counts vowels in wordsinputSpec:namingScheme: wordsinputExtractionMethod:csvColumn:source:path: input/words.csv
File: 1-interface.package/conf/dsl.yaml
The interface contains a human readable description of the experiment under entrypoint.interface.description
.
entrypoint:interface:description: Counts vowels in words
Then, in entrypoint.interface.inputspec
it uses the builtin input extraction method csvColumn
to extract the ids of the systems it processes:
entrypoint:interface:inputSpec:namingScheme: wordsinputExtractionMethod:csvColumn:source:path: input/words.csvargs:
It instructs the method to read the CSV file input/words.csv
(i.e. the input file) and treat every row of the CSV as one input system whose identifier lies in the column word
.
Following that, it uses the builtin property extraction method csvDataFrame
twice to measure its 2 properties Vowels
and Letters
from the key-outputs vowels
and letters
respectively.
entrypoint:interface:propertiesSpec:- name: VowelspropertyExtractionMethod:csvDataFrame:source:keyOutput: vowelsargs:
The csvDataFrame
property extraction method expects a CSV file which has the columns input-id
and ${the property name}
. One of the requirements for using a ST4SD interface is that the property names start with a capital letter. One of the requirements of the csvDataFrame
is that there should be a column with the same name as the property name that is being extracted. Another is that there should be a column called input-id
.
In this example the components happen to produce key-output CSV files which contain a properly named column for the values of properties but instead of using the input-id
column they use the column word
. To account for this inconsistency, the developers of the workflow use the renameColumns
argument of the csvDataFrame
property extraction method. Via renameColumns
they instruct csvDataFrame
to treat the column word
as if it were called input-id
.
Notice that the entrypoint expects an input file called words.csv
:
entrypoint:...execute:- target: <entry-instance>args:words_file: input/words.csv:ref
This means that you have to create a CSV file called words.csv
and use it as an input for (via the -i
arg) to the workflow.
To run this experiment, you can copy/paste the following to your terminal:
: create the input filecat<<EOF >words.csvword;hello;awesome;world;EOF: launch the experiment
If you are running experiments on the cloud and are instructing the runtime system to register them into the ST4SD datastore you may also use the ST4SD python API to download the measured properties of your experiment instances.
Exercises
- Use
elaunch.py
to run1-interface.package
. Then look at the files:$INSTANCE_DIR/output/output.json
$INSTANCE_DIR/output/input-ids.json
$INSTANCE_DIR/output/properties.csv
- Update the experiment to use a custom python hook for extracting the measured properties from the key-outputs. The documentation for the
hookGetProperties
hook is here.