info icon

Use this page to learn what parameterised virtual experiment packages are and how to create them.

What is a parameterised virtual experiment package?
Structure of a parameterised virtual experiment package
The Base section
The Metadata section
The Parameterisation section
Adding a parameterised package to a registry
Example

A parameterised virtual experimentdefines how to run a virtual experiment in a particular way.

ST4SD provides a registry for parameterised virtual experiments. The registry allows researchers to browse and use these packages. Each ST4SD deployment has a registry and we also maintain a publicly available registry.

This document explains how developers can write parameterised virtual experiment package. For how these packages can be used by others see using the virtual experiment registry.

What is a parameterised virtual experiment package?

A parameterised virtual experiment package is a python dictionary (or YAML or JSON structure) that describes:

How to access a virtual experiment
What options to allow users to change
What options have preset values
Metadata about the package.

It is parameterised as the package can set the values of options in the base experiment to give certain behaviours e.g. setting a quantum method known to be fast, that can’t be overridden by the user. The package can also specify a restricted set of values for an option. In this way the same base virtual experiment can be configured in many ways and provide different parameterised packages for different tasks.

Structure of a parameterised virtual experiment package

A parameterised package has three main sections:

(Required) The base packages (i.e. workflow definitions) that the virtual experiment consists of.
- Where they are located, what version to get, and how to get them. Often there will be just one.
(Required) Metadata:
- Various other information about the package e.g. description, license, maintainer and keywords.
(Optional) The parameterisation information:
- Presets: options that users cannot change.
- Execution options: options that users can change potential with some restrictions.

Each of these is a top-level key in the package description. The following snippet shows this top-level structure:

{
    "base": {
        # Required: Base package information ...
    }
    "metadata": {
        # Required: Various info about the package  ...
    },
    "parameterisation": {
        # Optional: What values are set and what can be changed ..
Copy to clipboard

The parameterised virtual experiment package identifier (PVEP Identifier)

The naming schema of a parameterised virtual experiment package (PVEP) is similar to that of container images.

The identifier of a PVEP is the name of the PVEP followed by either a @${digest hash} or a :${tag name}. The latest pushed version of a PVEP is always accessible via ${name of PVEP}:latest or simply ${name of PVEP}.

For the identifier ${name}:${tag name} to point to a version of a PVEP, the PVEP should specify the ${tag name} under the field metadata.package.tags. Below, is an example of a PVEP called my-experiment which specifies 2 tags: foo and bar:

{
    "base": {...},
    "metadata": {
        "package": {
            "name": "my-experiment",
            "tags": [
                "foo", "bar"
            ]
        }
Copy to clipboard

When pushing the above PVEP to the ST4SD registry the following identifiers will automatically point to this version of the PVEP:

my-experiment
my-experiment:latest
my-experiment:foo
my-experiment:bar
my-experiment@sha256x60de8e469c486ddd3bd4d2c521518e932964a36296b08758a94b9a4f (the ST4SD registry auto-generates the digest hash and stores it under the metadata field metadata.registry.digest).

Notice that my-experiment and my-experiment:latest are automaticaly configured to point to this version of the PVEP regardless of whether latest exists in metadata.package.tags or not.

Any pre-existing versions of the my-experiment PVEP which have any of the tags in this version of the PVEP will be automatically modified such that the tags point to this new version of the experiment instead. This operation is similar to pushing a new container image my-image:foo that overrides an existing container image tagged my-image:foo.

See the metadata.package section for more information on configuring the name and tags of PVEPs.

The rest of this document explains each section, outlining what information is required and optional.

In the following we use YAML to describe the section structure, for ease of explanation. This would have to be converted to a python dictionary to upload to the registry. An example of doing this is given in the pushing a package section.

The base section describes where the base virtual experiment is and how to access it.

base:
  packages:
  - $PACKAGE_DEFINITION
Copy to clipboard

A virtual experiment can contain multiple base packages although for handwritten packages this will usually be one.

packages:
- name: # OPTIONAL - defaults to "main", is required
        # for multi-package experiments
  source:
     #REQUIRED: ONE package source type. See below for options
     $PACKAGE_SOURCETYPE: $PACKAGE_SOURCE_STRUCTURE
  config:
    # How to read the experiment from the given source e.g. manifest etc.
    # config is REQUIRED IF the base virtual experiment uses standard
Copy to clipboard

Sources

Select the source that matches where your virtual experiment is stored

Git source

git:
  location:
    url: the http url of the repo
    # Must specify exactly ONE of branch, tag, and commit
    branch: name of branch
    tag: name of tag
    commit: git commit hash
  security:
    oauth:
Copy to clipboard

ST4SD will use the oauth-token you provide to git clone https://${oauth-token}@server.com/your-org/your-repo.git.

If you are using Github to host your git repository you can generate a Personal Access Token with just read-access to your git repository.

If you are using Gitlab, use either a project access token or a deploy token with read access permissions to your Gitlab project. Make sure you create a token with the “Developer” role that has “read_repository” permissions.

Remember to prefix your Token with your Username followed by a ”:” character

Example:

security:
  oauth:
     value: "${Username}:${PersonalAccessToken}"
Copy to clipboard

Datashim source

If you have installed Datashim on your cluster, you can use a Datashim dataset as the location of your virtual experiment base package.

dataset:
  # No need for a security field because Datashim removes this requirement.
  location:
    dataset: the name of the dataset object
Copy to clipboard

S3 source

s3:
  location:
    region: region (optional)
    endpoint: S3 endpoint url
    bucket: bucket name
  security:
    valueFrom:
        # Must choose exactly ONE of secretS3KeyRef and valuseS3
        # "valuesS3" is automatically converted to "secretS3KeyRef" when you push the package
Copy to clipboard

Specifying image registry dependencies

Virtual experiments often use images which may be stored in private registries. This structure allows the developer to provide ST4SD with information on how to access these registries.

dependencies:
  #An Optional dictionary of dependency types
  imageRegistries: # An optional list of image registries struct
  - serverUrl: the url to the image registry
    security:
      valueFrom:
        # Must select exactly 1 of secretKeyRef and usernamePassword
        # "usernamePassword" is automatically converted to a "secretKeyRef" when the package is pushed
        secretKeyRef:
Copy to clipboard

The Metadata section

The metadata section contains 2 fields: package and registry. The first is used to provide various other information about the parameterised virtual experiment. The latter contains metadata that the registry automatically populates.

The metadata.package section

Populate metadata.package to set information about your parameterised virtual experiment package that you would like your users to know:

metadata:
  package: #All the maintainer metadata. Can decide exact structure at implementation time.
    name: the package name
    tags: # Optional
    - latest # On Push, auto insert latest if missing
    maintainer: email (optional)
    license: some string (optional)
    keywords: # optional
    - keyword 1
Copy to clipboard

The metadata.registry section

Read the metadata.registry section to get information that the ST4SD registry automatically extracts from your parameterised virtual experiment package:

digest: (str) A uid of this parameterised virtual experiment package. See PVEP identifier
createdOn: (str) UTC time that this digest was created, format is %Y-%m-%dT%H%M%S.%f%z
tags: The tags associated with this PVEP. This is a subset of metadata.package.tags. It can be empty if no tag points to this digest anymore. See PVEP identifier
timesExecuted: (int) - automatically increased every time a user launches this virtual experiment entry
interface: (dictionary) ST4SD injects the Virtual experiment interface if it exists
data: (array of object) Information about the filenames under the data directory
- name: (str) the name of the data file
inputs: (array of objects) Information about the files that users must provide when launching this virtual experiment
- name: (str) the name of the input file
output: (array of objects) A list containing the named key-outputs of this experiment
- name: (str) the name of a key-output
containerImages: (array of objects) A list of the container images that this virtual experiment references
- name: (str) the name of a container image (e.g. a URL)
executionOptionsDefaults: (object) Describing the default values of experiment parameters for the different platforms of the experiment
- variables: (array of objects) One entry per experiment parameter
  - name: (str) The name of the variable
  - valueFrom: (object) One entry per platform
    - platform: (str) The name of the platform
    - value: (str, int, bool) The default value of the variable when using this platform
platforms: (array of strings) A list of all the known platform names for this experiment
applicationDependencies: (dictionary) A key-value dictionary containing information about the application dependencies of this experiment for each platform. See Application Dependencies for more.
- $platformNameAsAString: (array of objects) A platform name pointing to an array of objects containing one application dependency each
  - name: (str) the name of the application dependency

Note that the ST4SD registry manages all fields under the metadata.registry section, developers cannot directly modify this dictionary.

The Parameterisation section

Parameterisation rules
Presets
Execution options

ST4SD supports 2 levels of parameterisation:presets which are options that virtual experiment developers decide and users cannot change; and executionOptions that virtual experiment developers allow users to override potentially with some limit.

parameterisation:
  presets: ...
  executionOptions: ...
Copy to clipboard

Parameterisation rules

The parameter types that can be specified in each section are:

variables (variables): Values for variables used in the experiment
data-files (data) : Values for data files used by the experiment
platform (platform): Value for the platform (named set of variables) to use
runtime arguments (runtime) : elaunch.py command line arguments

Both presets and executionOptions can be specified in same package.

It is an error to specify the same parameter (variable, data file, runtime option) in both sections. In addition platform can only be specified in one of the two sections.

If a virtual experiment has a parameter that is not specified in either section it is preset with its default value and cannot be changed.

If a developer wants a user to be able to provide a value for a parameter they must specify it in executionOptions

For executionOptions the value of the parameter is resolved as follows:

The value provided by the user
The default value provided by the developer in the parameterised package if there is one
The first value in the array of options provided by the developer in the parameterised package if there is one
If none of the above exist the default value of the parameter in the base-package is used

Presets

Use presets to define set values for parameters

parameterisation:
  presets:  # optional
    # Fields defined here *cannot* be overridden by `executionOptions`.
    # All fields are optional
    variables: #A list of preset values for variables in the virtual experiment
    - name: $name of variable
      value: $variableValue
    data:
      - name: name of a file in the "data" directory
Copy to clipboard

Execution options

Use execution options to allow user to choose values for parameters if they want

  executionOptions: # optional
    # users may override values within constraints that workflow developers set
    variables:
    # Variables that the developer allows the user to override.
    # These CANNOT appear in presets.variables
    - name: $variable name
      # .value and .valueFrom are both optional and mutually exclusive
      # if neither fields exist then users can set variable to any value.
      #   at start, if users do not provide a value, the variable receives the
Copy to clipboard

Adding a parameterised package to a registry

Pushing the package
Parameterised Package Identifier
Package Tags

Pushing the package

From a python dictionary

The parameterised package is stored as a dictionary in a python module mypackage.py (can be any name). The dictionary is assigned to a variable (can be any name) e.g.

d = {
  "base": ...
}
Copy to clipboard

Then

import mypackage
api.api_experiment_push(mypackage.d)
Copy to clipboard

From YAML

The parameterised package is stored as YAML in a file mypackage.yaml (can be any name).

import yaml
with open('mypackage.yaml') as f:
    api.api_experiment_push(yaml.load(f))
Copy to clipboard

From JSON

The parameterised package is stored as JSON in a file mypackage.json (can be any name).

import json
with open('mypackage.json') as f:
    api.api_experiment_push(json.load(f))
Copy to clipboard

Registry actions when a package is pushed

On pushing a parameterised virtual experiment package, the registry:

Generates a unique Id for the entry see Parameterised Packaged Identifier
Applies and updates tags - see Package Tags
Stores any credentials as Kubernetes secrets and converts the relevant fields in the parameterised package to secretKeyRef and secretS3KeyRef types.
Adds additional data to the parameterised package - see registry metadata

Parameterised Package Identifier

When a parameterised package is pushed to the virtual experiment registry it is assigned a digest which is unique between all packages with the same package name (the value of metadata.package.name).

The unique identifier of the package is then $packageName@$digest. For example my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86

By convention the registry assumes parameterised packages with the same package name represent different versions of that package. These are collected together in the registry-ui with the details of the most recent (last uploaded) package shown and links to all previous versions of the package

Package Tags

Parameterized packages can have tags applied to them. A tag is a shorthand for referencing the package. For example by adding the tag 1.0 to the package my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86 you can reference it as my-experiment:1.0 in various operations.

Developers can specify tags when pushing a package using the metadata.package.tags field of the package payload. Tagging a parameterised package with a tag removes the tag from any other parameterised package with the same name. This guarantees that if $packageName:$tag exists, it points to exactly one $packageName@$digest. The API call api_experiment_update_tags(packageIdentifier, tags) can also be used add or remove a tag to a package at any time. Note, this call requires tags to include all tags you want associated with the package. If an older version of this experiment has a tag which is not contained in this tag list then the tag will remain pointing to the older version of the experiment.

The current tags associated with a package can be found by inspecting the metadata.package.tags element of the package definition in the registry.

When a package is pushed it is automatically tagged latest by the registry. If only a package-name is passed to an API call require a package identifier then the tag latest is assumed.

latest can be moved to another digest with the same-package name if desired using api_experiment_update_tags. However latest cannot be removed. That is you will receive an error if you omit it from the tag list in api_experiment_update_tags for a digest that is tagged with :latest. You can only remove the :latest tag from a digest if you tag a different digest with the same name package-name with the tag :latest.

Example

A parameterised package with name my-experiment is pushed to the registry. It is given the digest sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86.

All 3 identifiers below point to the same digest:

my-experiment
my-experiment:latest
my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f8

Any of these 3 identifiers can be used to refer to the new parameterised package in API call e.g. to start an instance of this parameterised virtual experiment all the following will work:

api.api_experiment_start("my-experiment", payload={})
api.api_experiment_start("my-experiment:latest", payload={})
api.api_experiment_start("my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86", payload={})
Copy to clipboard

Package tag update rules

If a tag is requested for a digest and that tag is already associated with another digest with the same package-name, then the registry updates $packageName:$tag to point to the new package. This ensures that $packageName:$tag points to a unique digest even if the workflow developers pushed the $tag in the past.

In general this operation involves updating the metadata.registry.tags fields of all parameterised packages with the same package-name.

Example

Here is an example parameterised package for the sum-numbers toy virtual experiment which lives on git that demonstrates many of the features discussed above.

definition = {
    "base": {
        # We define the one or more base-packages (here just one)
        "packages": [{
            "source": {
                "git": {
                    "location": {
                        # This one lives on Git, under the "main" branch, we can also use
                        # "tag" and "commit"
Copy to clipboard

Edit this page on GitHub

ST4SD Registry: The Build Canvas

ST4SD Services: Getting Started

Creating parameterised packages

What is a parameterised virtual experiment package?

Structure of a parameterised virtual experiment package

The parameterised virtual experiment package identifier (PVEP Identifier)

The Base section

Sources

Git source

Datashim source

S3 source

Specifying image registry dependencies

The Metadata section

The metadata.package section

The metadata.registry section

The Parameterisation section

Parameterisation rules

Presets

Execution options

Adding a parameterised package to a registry

Pushing the package

From a python dictionary

From YAML

From JSON

Registry actions when a package is pushed

Parameterised Package Identifier

Package Tags

Example

Package tag update rules

Example