Skip to main contentIBM ST4SD

Concepts

Use this page to learn about key ST4SD concepts and terminology.

Overview

How researchers and developers interact with ST4SD

concepts

Terminology

  • Virtual Experiment: A virtual experiment (sometimes shortened to experiment or VE) is an application workflow which measures one or more characteristics of one or more input systems. Typically created by developers, they are defined by a configuration file and additional data they need to function e.g. scripts, configuration files.
  • Parameterised Virtual Experiment Package: A Parameterised Virtual Experiment Package (also parameterised package or PVEP) is a virtual experiment that has been pre-configured to perform a specific measurement. Researchers select and launch Parameterised Virtual Experiment Packages defined by Developers.
  • Virtual Experiment Instance: A Virtual Experiment Instance (also instance or experiment instance) is an execution of a particular virtual experiment, usually run via a PVEP.
  • Project: A project is a directory structure used by developers to contain the definition of one or more virtual experiments.
  • Direct Execution: A Direct Execution refers to a virtual experiment run via the elaunch.py tool from the terminal of the machine the user is logged into e.g. a laptop or a HPC cluster.
  • REST API Execution: A REST API Execution refers to a virtual experiment run on a (remote) Kubernetes/OpenShift cluster via the st4sd-runtime-service REST API (this is often is done via a Jupyter Notebook).

Virtual Experiment Inputs

There are four ways to provide data to a virtual experiment. Only one - inputs - is required.

inputs

Inputs are files that the experiment requires to run. Usually they contain the information on what the experiment is measuring.

To find out what inputs an experiment requires check the experiment’s documentation.

REST API Execution

When running using the REST APIs you provide the input files via the experiment payload. You can directly provide the content in the payload or provide a reference to an S3 bucket containing the files.

Direct Execution

When running directly input files are specified via the -i argument to elaunch.py

data

data refers to configuration files that the experiment uses during runs. These can be optionally overridden but defaults always exist.

REST API Execution

You provide content for data files via the experiment payload. You can directly provide the content in the payload or provide a reference to an S3 bucket containing the files. You can only do this for experiment packages whose parameterisation allows it.

Direct Execution

When running directly, you specify data files via the -d argument to elaunch.py

variables

variables are non-file parameters a virtual experiment defines. These can be optionally overridden but defaults always exist.

REST API Execution

You provide values for variables as part of the experiment payload. You can only do this for experiment packages whose parameterisation allows it.

Direct Execution

When running directly, you provide variables by supplying a correctly formatted YAML file to the -a argument to elaunch.py

dependencies

dependencies are external directories that the experiment requires to run c.f. input and data are files or archives

REST API Execution

You providedependencies information via the experiment payload. The dependencies are passed by reference i.e. you give the location of the dependencies.

Direct Execution

When running directly, you specify dependencies using the -s option to elaunch.py.

Virtual Experiment Outputs

Virtual experiments can produce many output files of various sizes and importance.

Key-Outputs

Key-outputs are files/directories produced by the virtual experiment that the developer has identified as being of particular interest. Since the filenames may themselves be meaningless the developer gives them unique identifiers

REST API Execution

You can query and retrieve the key-outputs of a virtual experiment instance using the ST4SD API. See Retrieving key-outputs for more details.

[Coming Soon]: The key-outputs will be listed in Registry UI entry for each parameterized virtual experiment package based on the experiment.

You can instruct ST4SD to copy the key-outputs of an instance to an S3 bucket when the instance has finished. See Automatically uploading key-outputs to S3.

Direct Execution

Metadata describing key-outputs will be in the output directory in the top level of your experiment instance directory in the file outputs.json file.

In addition, users may also request key-outputs be copied to an external location when an experiment instance finishes. You can do this by setting the --s3StoreToURI and --s3AuthWithEnvVars (or --s3AuthBearer64=S3AUTHBEARER64) arguments to elaunch.py.

See the documentation for direct runs for more information.

Properties

When an experiment has a virtual experiment interface defined, property tables are also produced.

REST API Execution

You can see the properties provided by a parameterized virtual package by checking the experiment registry. You can access these properties using the ST4SD API.

Direct Execution

The property tables (csv files) defined by the interface are available at in the output folder in the top-level of the experiment instance directory in the file properties.csv

Other Outputs

All other output files produced during an experiment run can be retrieved.

REST API Execution

You can retrieve any output file of a virtual experiment instance using the ST4SD API

If you have access to the cluster hosting the ST4SD you are using you can also browse the outputs via a terminal.

Direct Execution

The experiment instance directory contains all the outputs of all steps and can be easily explored via the terminal.