Creating Parameterised Experiment Packages
Use this page to learn what parameterised packages are and how to create them.
- What is a parameterised virtual experiment package?
- Structure of a parameterised virtual experiment package
- The Base section
- The Parameterisation section
- The Metadata section
- Adding a parameterised package to a registry
- Example
A parameterised virtual experiment
defines how to run a virtual experiment in a particular way.
ST4SD provides a registry for parameterised virtual experiments. The registry allows researchers to browse and use these packages. Each ST4SD deployment has a registry and we also maintain a publicly available registry.
This document explains how developers can write parameterised virtual experiment package. For how these packages can be used by others see using the virtual experiment registry.
What is a parameterised virtual experiment package?
A parameterised virtual experiment package is a python dictionary (or YAML or JSON structure) that describes:
- How to access a virtual experiment
- What options to allow users to change
- What options have preset values
- Metadata about the package.
It is parameterised as the package can set the values of options in the base experiment to give certain behaviours e.g. setting a quantum method known to be fast, that can’t be overridden by the user. The package can also specify a restricted set of values for an option. In this way the same base virtual experiment can be configured in many ways and provide different parameterised packages for different tasks.
Structure of a parameterised virtual experiment package
A parameterised package has three main sections:
- The base packages (i.e. workflow definitions) that the virtual experiment consists of.
- Where they are located, what version to get, and how to get them. Often there will be just one.
- The parameterisation information:
- Presets: options that users cannot change.
- Execution options: options that users can change potential with some restrictions.
- Metadata:
- Various other information about the package e.g. description, license, maintainer and keywords.
Each of these is a top-level key in the package description. The following snippet shows this top-level structure:
definition = {"base": {# Required: Base package information ...}"metadata": {# Required: Various info about the package ...},"parameterisation": {# Optional: What values are set and what can be changed ..
The rest of this document explains each section, outlining what information is required and optional.
The Base section
The base
section describes where the base virtual experiment is and how to access it.
base:packages:- $PACKAGE_DEFINITION
A virtual experiment can contain multiple base packages although for handwritten packages this will usually be one.
packages:- name: # OPTIONAL - defaults to "main", is required# for multi-package experimentssource:#REQUIRED: ONE package source type. See below for options$PACKAGE_SOURCETYPE: $PACKAGE_SOURCE_STRUCTUREconfig:# How to read the experiment from the given source e.g. manifest etc.# config is REQUIRED IF the base virtual experiment uses standard
Sources
Select the source that matches where your virtual experiment is stored
Git source
git:location:url: the http url of the repo# Must specify exactly ONE of branch, tag, and commitbranch: name of branchtag: name of tagcommit: git commit hashsecurity:oauth:
Datashim source
If you have installed Datashim on your cluster, you can use a Datashim dataset
as the location of your virtual experiment base package.
dataset:# No need for a security field because Datashim removes this requirement.location:dataset: the name of the dataset object
S3 source
s3:location:region: region (optional)endpoint: S3 endpoint urlbucket: bucket namesecurity:valueFrom:# Must choose exactly ONE of secretS3KeyRef and valuseS3# "valuesS3" is automatically converted to "secretS3KeyRef" when you push the package
Specifying image registry dependencies
Virtual experiments often use images which may be stored in private registries. This structure allows the developer to provide ST4SD with information on how to access these registries.
dependencies:#An Optional dictionary of dependency typesimageRegistries: # An optional list of image registries struct- serverUrl: the url to the image registrysecurity:valueFrom:# Must select exactly 1 of secretKeyRef and usernamePassword# "usernamePassword" is automatically converted to a "secretKeyRef" when the package is pushedsecretKeyRef:
The Parameterisation section
ST4SD supports 2 levels of parameterisation:presets
which are options that virtual experiment developers decide and users cannot change; and executionOptions
that virtual experiment developers allow users to override potentially with some limit.
parameterisation:presets: ...executionOptions: ...
Parameterisation rules
The parameter types that can be specified in each section are:
- variables (
variables
): Values for variables used in the experiment - data-files (
data
) : Values for data files used by the experiment - platform (
platform
): Value for the platform (named set of variables) to use - runtime arguments (
runtime
) :elaunch.py
command line arguments
Both presets
and executionOptions
can be specified in same package.
It is an error to specify the same parameter (variable, data file, runtime option) in both sections. In addition platform
can only be specified in one of the two sections.
If a virtual experiment has a parameter that is not specified in either section it is preset
with its default value and cannot be changed.
For executionOptions
the value of the parameter
is resolved as follows:
- The value provided by the user
- The default value provided by the developer in the parameterised package if there is one
- The first value in the array of options provided by the developer in the parameterised package if there is one
- If none of the above exist the default value of the parameter in the
base-package
is used
Presets
Use presets to define set values for parameters
parameterisation:presets: # optional# Fields defined here *cannot* be overridden by `executionOptions`.# All fields are optionalvariables: #A list of preset values for variables in the virtual experiment- name: $name of variablevalue: $variableValuedata:- name: name of a file in the "data" directory
Execution options
Use execution options to allow user to choose values for parameters if they want
executionOptions: # optional# users may override values within constraints that workflow developers setvariables:# Variables that the developer allows the user to override.# These CANNOT appear in presets.variables- name: $variable name# .value and .valueFrom are both optional and mutually exclusive# if neither fields exist then users can set variable to any value.# at start, if users do not provide a value, the variable receives the
The Metadata section
The metadata
section is used to provide various other information about the parameterised virtual experiment.
metadata:package: #All the maintainer metadata. Can decide exact structure at implementation time.name: the package nametags: # Optional- latest # On Push, auto insert latest if missingmaintainer: email (optional)license: some string (optional)keywords: # optional- keyword 1
Adding a parameterised package to a registry
Pushing the package
From a python dictionary
The parameterised package is stored as a dictionary in a python module mypackage.py
(can be any name). The dictionary is assigned to a variable (can be any name) e.g.
d = {"base": ...}
Then
import mypackageapi.api_experiment_push(mypackage.d)
From YAML
The parameterised package is stored as YAML in a file mypackage.yaml
(can be any name).
import yamlwith open('mypackage.yaml') as f:api.api_experiment_push(yaml.load(f))
From JSON
The parameterised package is stored as JSON in a file mypackage.json
(can be any name).
import jsonwith open('mypackage.json') as f:api.api_experiment_push(json.load(f))
Registry actions when a package is pushed
On pushing a parameterised virtual experiment package, the registry:
- Generates a unique Id for the entry see Parameterised Packaged Identifier
- Applies and updates tags - see Package Tags
- Stores any credentials as Kubernetes secrets and converts the relevant fields in the parameterised package to
secretKeyRef
andsecretS3KeyRef
types. - Adds additional data to the parameterised package - see registry metadata
Parameterised Package Identifier
When a parameterised package is pushed to the virtual experiment registry it is assigned a digest
which is unique between all packages with the same package name (the value of metadata.package.name
).
The unique identifier of the package is then $packageName@$digest
. For example my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
By convention the registry assumes parameterised packages with the same package name represent different versions of that package. These are collected together in the registry-ui with the details of the most recent (last uploaded) package shown and links to all previous versions of the package
Package Tags
Parameterized packages can have tags applied to them. A tag is a shorthand for referencing the package. For example by adding the tag 1.0
to the package my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
you can reference it as my-experiment:1.0
in various operations.
Developers can specify tags when pushing a package using the metadata.package.tags
field of the package payload. Tagging a parameterised package with a tag removes the tag from any other parameterised package with the same name. This guarantees that if $packageName:$tag
exists, it points to exactly one $packageName@$digest
. The API call api_experiment_update_tags(packageIdentifier, tags)
can also be used add or remove a tag to a package at any time. Note, this call requires tags
to include all tags you want associated with the package. If an existing tag is not in this list then it will be removed.
The current tags associated with a package can be found by inspecting the metadata.package.tags
element of the package definition in the registry.
Example
A parameterised package with name my-experiment
is pushed to the registry. It is given the digest sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
.
All 3 identifiers below point to the same digest:
my-experiment
my-experiment:latest
my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f8
Any of these 3 identifiers can be used to refer to the new parameterised package in API call e.g. to start an instance of this parameterised virtual experiment all the following will work:
api.api_experiment_start("my-experiment", payload={})api.api_experiment_start("my-experiment:latest", payload={})api.api_experiment_start("my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86", payload={})
Package tag update rules
If a tag is requested for a digest and that tag is already associated with another digest with the same package-name, then the registry updates $packageName:$tag
to point to the new package. This ensures that $packageName:$tag
points to a unique digest even if the workflow developers pushed the $tag
in the past.
In general this operation involves updating the metadata.registry.tags
fields of all parameterised packages with the same package-name
.
Registry metadata
The registry adds various information it discovers to the metadata
section of the package under the registry
key. This includes the id of the package
registry: #All data added by runtime - developers cannot set anything under registrydigest: $a string up to 63 characters #The identifiercreatedOn: UTC time that this digest was created, format is %Y-%m-%dT%H%M%S.%f%ztimesExecuted: int - automatically increased every time a user launches this virtual experiment entry in the ST4SD deployment the registry is attached totags: #A list of tags which point to this digest.- $TAG # This is a SUBSET of metadata.package.tags. It can be EMPTY if no tag points to this digest anymoreinterface: {} # ST4SD injects the Virtual experiment interface if it existsdata: #The list of filenames under the `data` directory (just top-level files, NO directories)- name: $DATA_FILE_NAME
Example
Here is an example parameterised package for the sum-numbers toy virtual experiment which lives on git
that demonstrates many of the features discussed above.
definition = {"base": {# We define the one or more base-packages (here just one)"packages": [{"source": {"git": {"location": {# This one lives on Git, under the "main" branch, we can also use# "tag" and "commit"