Creating parameterised packages
Use this page to learn what parameterised virtual experiment packages are and how to create them.
- What is a parameterised virtual experiment package?
- Structure of a parameterised virtual experiment package
- The Base section
- The Parameterisation section
- The Metadata section
- Adding a parameterised package to a registry
- Example
A parameterised virtual experiment
defines how to run a virtual experiment in a particular way.
ST4SD provides a registry for parameterised virtual experiments. The registry allows researchers to browse and use these packages. Each ST4SD deployment has a registry and we also maintain a publicly available registry.
This document explains how developers can write parameterised virtual experiment package. For how these packages can be used by others see using the virtual experiment registry.
What is a parameterised virtual experiment package?
A parameterised virtual experiment package is a python dictionary (or YAML or JSON structure) that describes:
- How to access a virtual experiment
- What options to allow users to change
- What options have preset values
- Metadata about the package.
It is parameterised as the package can set the values of options in the base experiment to give certain behaviours e.g. setting a quantum method known to be fast, that can’t be overridden by the user. The package can also specify a restricted set of values for an option. In this way the same base virtual experiment can be configured in many ways and provide different parameterised packages for different tasks.
Structure of a parameterised virtual experiment package
A parameterised package has three main sections:
- The base packages (i.e. workflow definitions) that the virtual experiment consists of.
- Where they are located, what version to get, and how to get them. Often there will be just one.
- The parameterisation information:
- Presets: options that users cannot change.
- Execution options: options that users can change potential with some restrictions.
- Metadata:
- Various other information about the package e.g. description, license, maintainer and keywords.
Each of these is a top-level key in the package description. The following snippet shows this top-level structure:
definition = {"base": {# Required: Base package information ...}"metadata": {# Required: Various info about the package ...},"parameterisation": {# Optional: What values are set and what can be changed ..
The parameterised virtual experiment package identifier (PVEP Identifier)
The naming schema of a parameterised virtual experiment package (PVEP) is similar to that of container images.
The identifier of a PVEP is the name of the PVEP followed by either a @${digest hash}
or a :${tag name}
.
The latest pushed version of a PVEP is always accessible via ${name of PVEP}:latest
or simply ${name of PVEP}
.
For the identifier ${name}:${tag name}
to point to a version of a PVEP, the PVEP should specify the ${tag name}
under the field metadata.package.tags
.
Below, is an example of a PVEP called my-experiment
which specifies 2 tags: foo
and bar
:
{"base": {...},"metadata": {"package": {"name": "my-experiment","tags": ["foo", "bar"]}
When pushing the above PVEP to the ST4SD registry the following identifiers will automatically point to this version of the PVEP:
my-experiment
my-experiment:latest
my-experiment:foo
my-experiment:bar
my-experiment@sha256x60de8e469c486ddd3bd4d2c521518e932964a36296b08758a94b9a4f
(the ST4SD registry auto-generates the digest hash and stores it under the metadata field metadata.registry.digest).
Any pre-existing versions of the my-experiment
PVEP which have any of the tags in this version of the PVEP will be automatically modified such that the tags point to this new version of the experiment instead. This operation is similar to pushing a new container image my-image:foo
that overrides an existing container image tagged my-image:foo
.
See the metadata.package section for more information on configuring the name and tags of PVEPs.
The rest of this document explains each section, outlining what information is required and optional.
The Base section
The base
section describes where the base virtual experiment is and how to access it.
base:packages:- $PACKAGE_DEFINITION
A virtual experiment can contain multiple base packages although for handwritten packages this will usually be one.
packages:- name: # OPTIONAL - defaults to "main", is required# for multi-package experimentssource:#REQUIRED: ONE package source type. See below for options$PACKAGE_SOURCETYPE: $PACKAGE_SOURCE_STRUCTUREconfig:# How to read the experiment from the given source e.g. manifest etc.# config is REQUIRED IF the base virtual experiment uses standard
Sources
Select the source that matches where your virtual experiment is stored
Git source
git:location:url: the http url of the repo# Must specify exactly ONE of branch, tag, and commitbranch: name of branchtag: name of tagcommit: git commit hashsecurity:oauth:
Datashim source
If you have installed Datashim on your cluster, you can use a Datashim dataset
as the location of your virtual experiment base package.
dataset:# No need for a security field because Datashim removes this requirement.location:dataset: the name of the dataset object
S3 source
s3:location:region: region (optional)endpoint: S3 endpoint urlbucket: bucket namesecurity:valueFrom:# Must choose exactly ONE of secretS3KeyRef and valuseS3# "valuesS3" is automatically converted to "secretS3KeyRef" when you push the package
Specifying image registry dependencies
Virtual experiments often use images which may be stored in private registries. This structure allows the developer to provide ST4SD with information on how to access these registries.
dependencies:#An Optional dictionary of dependency typesimageRegistries: # An optional list of image registries struct- serverUrl: the url to the image registrysecurity:valueFrom:# Must select exactly 1 of secretKeyRef and usernamePassword# "usernamePassword" is automatically converted to a "secretKeyRef" when the package is pushedsecretKeyRef:
The Parameterisation section
ST4SD supports 2 levels of parameterisation:presets
which are options that virtual experiment developers decide and users cannot change; and executionOptions
that virtual experiment developers allow users to override potentially with some limit.
parameterisation:presets: ...executionOptions: ...
Parameterisation rules
The parameter types that can be specified in each section are:
- variables (
variables
): Values for variables used in the experiment - data-files (
data
) : Values for data files used by the experiment - platform (
platform
): Value for the platform (named set of variables) to use - runtime arguments (
runtime
) :elaunch.py
command line arguments
Both presets
and executionOptions
can be specified in same package.
It is an error to specify the same parameter (variable, data file, runtime option) in both sections. In addition platform
can only be specified in one of the two sections.
If a virtual experiment has a parameter that is not specified in either section it is preset
with its default value and cannot be changed.
For executionOptions
the value of the parameter
is resolved as follows:
- The value provided by the user
- The default value provided by the developer in the parameterised package if there is one
- The first value in the array of options provided by the developer in the parameterised package if there is one
- If none of the above exist the default value of the parameter in the
base-package
is used
Presets
Use presets to define set values for parameters
parameterisation:presets: # optional# Fields defined here *cannot* be overridden by `executionOptions`.# All fields are optionalvariables: #A list of preset values for variables in the virtual experiment- name: $name of variablevalue: $variableValuedata:- name: name of a file in the "data" directory
Execution options
Use execution options to allow user to choose values for parameters if they want
executionOptions: # optional# users may override values within constraints that workflow developers setvariables:# Variables that the developer allows the user to override.# These CANNOT appear in presets.variables- name: $variable name# .value and .valueFrom are both optional and mutually exclusive# if neither fields exist then users can set variable to any value.# at start, if users do not provide a value, the variable receives the
The Metadata section
The metadata
section contains 2 fields: package
and registry
. The first is used to provide various other information about the parameterised virtual experiment.
The latter contains metadata that the registry automatically populates.
The metadata.package section
Populate metadata.package
to set information about your parameterised virtual experiment package that you would like your users to know:
metadata:package: #All the maintainer metadata. Can decide exact structure at implementation time.name: the package nametags: # Optional- latest # On Push, auto insert latest if missingmaintainer: email (optional)license: some string (optional)keywords: # optional- keyword 1
The metadata.registry section
Read the metadata.registry
section to get information that the ST4SD registry automatically extracts from your parameterised virtual experiment package:
digest: A uid of this parameterised virtual experiment package (PVEP)(see PVEP identifier)createdOn: UTC time that this digest was created,format is %Y-%m-%dT%H%M%S.%f%ztags: The tags associated with this PVEP. This is a SUBSETof metadata.package.tags. It can be EMPTY if no tag points tothis digest anymore (see PVEP identifier)timesExecuted: int - automatically increased every time a user launches thisvirtual experiment entry in the ST4SD deployment the registry is attached to
Note that the ST4SD registry manages all fields under the metadata.registry
section, developers cannot directly modify this dictionary.
Adding a parameterised package to a registry
Pushing the package
From a python dictionary
The parameterised package is stored as a dictionary in a python module mypackage.py
(can be any name). The dictionary is assigned to a variable (can be any name) e.g.
d = {"base": ...}
Then
import mypackageapi.api_experiment_push(mypackage.d)
From YAML
The parameterised package is stored as YAML in a file mypackage.yaml
(can be any name).
import yamlwith open('mypackage.yaml') as f:api.api_experiment_push(yaml.load(f))
From JSON
The parameterised package is stored as JSON in a file mypackage.json
(can be any name).
import jsonwith open('mypackage.json') as f:api.api_experiment_push(json.load(f))
Registry actions when a package is pushed
On pushing a parameterised virtual experiment package, the registry:
- Generates a unique Id for the entry see Parameterised Packaged Identifier
- Applies and updates tags - see Package Tags
- Stores any credentials as Kubernetes secrets and converts the relevant fields in the parameterised package to
secretKeyRef
andsecretS3KeyRef
types. - Adds additional data to the parameterised package - see registry metadata
Parameterised Package Identifier
When a parameterised package is pushed to the virtual experiment registry it is assigned a digest
which is unique between all packages with the same package name (the value of metadata.package.name
).
The unique identifier of the package is then $packageName@$digest
. For example my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
By convention the registry assumes parameterised packages with the same package name represent different versions of that package. These are collected together in the registry-ui with the details of the most recent (last uploaded) package shown and links to all previous versions of the package
Package Tags
Parameterized packages can have tags applied to them. A tag is a shorthand for referencing the package. For example by adding the tag 1.0
to the package my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
you can reference it as my-experiment:1.0
in various operations.
Developers can specify tags when pushing a package using the metadata.package.tags
field of the package payload. Tagging a parameterised package with a tag removes the tag from any other parameterised package with the same name. This guarantees that if $packageName:$tag
exists, it points to exactly one $packageName@$digest
. The API call api_experiment_update_tags(packageIdentifier, tags)
can also be used add or remove a tag to a package at any time. Note, this call requires tags
to include all tags you want associated with the package. If an older version of this experiment has a tag which is not contained in this tag list then the tag will remain pointing to the older version of the experiment.
The current tags associated with a package can be found by inspecting the metadata.package.tags
element of the package definition in the registry.
Example
A parameterised package with name my-experiment
is pushed to the registry. It is given the digest sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86
.
All 3 identifiers below point to the same digest:
my-experiment
my-experiment:latest
my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f8
Any of these 3 identifiers can be used to refer to the new parameterised package in API call e.g. to start an instance of this parameterised virtual experiment all the following will work:
api.api_experiment_start("my-experiment", payload={})api.api_experiment_start("my-experiment:latest", payload={})api.api_experiment_start("my-experiment@sha256x16092ca4bb13955b1397bf38cfba45ef11c9933bf796454a81de4f86", payload={})
Package tag update rules
If a tag is requested for a digest and that tag is already associated with another digest with the same package-name, then the registry updates $packageName:$tag
to point to the new package. This ensures that $packageName:$tag
points to a unique digest even if the workflow developers pushed the $tag
in the past.
In general this operation involves updating the metadata.registry.tags
fields of all parameterised packages with the same package-name
.
Registry metadata
The registry adds various information it discovers to the metadata
section of the package under the registry
key. This includes the id of the package.
Example
Here is an example parameterised package for the sum-numbers toy virtual experiment which lives on git
that demonstrates many of the features discussed above.
definition = {"base": {# We define the one or more base-packages (here just one)"packages": [{"source": {"git": {"location": {# This one lives on Git, under the "main" branch, we can also use# "tag" and "commit"