The ST4SD Python Client API
Learn how to use ST4SD python client API to run, query and interact with virtual experiments
- Connecting to ST4SD
- Adding a virtual experiment package
- Running a virtual experiment
- Getting the status of a virtual experiment instance
- Inspect the metadata of a virtual experiment instance
- Retrieving the outputs of a virtual experiment instance
- Stopping a virtual experiment instance
- Next steps
Overview
We provide a python API for ease of use of virtual experiment in an iPython notebook setting. The features the API enables includes:
- launching virtual experiment instances
- monitoring virtual experiment instances
- downloading outputs and measured properties of virtual experiment instances
The Python API is a wrapper around a RESTapi. You can find RESTapi documentation here.
In addition, we provide a python API to our st4sd-datastore
which allows deeper querying and data-retrieval from completed virtual experiments in a notebook.
Requirements
The basic requirements are access to an OpenShift instance with ST4SD installed (see first steps for more information).
Getting data into and out-of virtual experiments: Cloud Object Store
Additionally it can be useful to set up a Cloud Object Store bucket so you can easily get data into and out-of a virtual experiment.
See here for detailed instructions on how to do this with IBM Cloud.
Examples
We maintain a repository containing a set of IPython notebooks that illustrate interacting with virtual experiments via these two methods.
If you are using a local JupyterServer environment execute this snippet to start-up the first notebook illustrating the RESTApi. Most of the examples discussed here are in this notebook
git clone https://github.com/st4sd/st4sd-examples.gitcd st4sd-examplesjupyter-notebook notebooks/ST4SD\ Runtime\ API\ Example.ipynb
Note: If you’ve installed st4sd-runtime-core
into a virtualenv you will need to activate it before executing above snippet
After reading this page have a browse around the st4sd-examples
repository to see what topics are covered in the other notebooks.
Connecting to ST4SD
To connect to an ST4SD instance you need to obtain an auth-token
or use an api-key
. Please refer to How do I connect to the ST4SD runtime service?.
The following code blocks show how to connect to an ST4SD instance.
- Connect using an
auth-token
:
import experiment.service.db# enter the https:// ST4SD url belowurl="https://${your ST4SD url}"# enter your auth-token belowauth_token="put your authentication/token - do not share it with anyone"api=experiment.service.db.ExperimentRestAPI(url, cc_auth_token=auth_token)
- Connect using an
api-key
:
import experiment.service.db# enter the https:// ST4SD url belowurl="https://${your ST4SD url}"# enter your auth-token belowbearer_key="put your api-key - do not share it with anyone"api=experiment.service.db.ExperimentRestAPI(url, cc_bearer_key=bearer_key)
The ExperimentRestAPI
initializer validates the authentication token you provided and will raise an exception if it is invalid.
If you do not get an Exception that means you can use api
to interact with the st4sd-runtime-service
and st4sd-datastore
REST-APIs.
The above code needs to be executed once in a notebook session to get an api
instance to interact with. All the following examples assume this step has been done.
Adding a virtual experiment package
Before you can run a virtual experiment you need to add it to your ST4SD registry. You do this using the api_experiment_push()
method.
Technically you add a parameterised virtual experiment package. A parameterised package can define particular, potentially complex values, for variables in the virtual experiment to enable specific behaviour and make them easier to consume.
You can either type the definition of the parameterised package manually or import it from another ST4SD registry (e.g. the global ST4SD registry). In the second-case the registry UI provides with the exact api_experiment_push
call you need to execute.
Running a virtual experiment
The API call api_experiment_start
will start the virtual experiment that a parameterised virtual experiment package points to e.g.,
rest_uid = api.api_experiment_start(experimentIdentifier, payload=...)
When you run a given virtual experiment you create a virtual experiment instance. Each instance is assigned a unique identifier which is returned when you start the virtual experiment. We use the term ExperimentRunID
and rest_uid
to refer to such identifiers.
The following sections explain how to fill the payload
.
Specifying experiment inputs
inputs are files the experiment requires to run - they must be provided. Each experiments documentation should explain what these files are.
The inputs are specified via the key inputs
in the payload. The value of this key is a list that has one item, a dictionary, for each required input file e.g.,
payload = {"inputs": [{...},...]}
Providing inputs via s3 or Datashim dataset
If your input file is in an s3 bucket or a Datashim dataset you use the s3
top-level key of the payload dictionary to provides details for accessing the bucket/dataset.
Example: Using s3. Fill the s3
parameters with the required values. In this case input_filename.csv
is in the top-level of the bucket.
payload = {"inputs": [{"filename": "input_filename.csv"}],"s3": {"accessKeyID": "$S3_AccessKeyID","secretAccessKey": "$S3_SecretAccessKey","bucket": "$S3_BUCKET_NAME","endpoint": "$S3_ENDPOINT",
Example: Using Datashim. In this case input_filename.csv
is at path data/input_filename.csv
payload = {"inputs": [{"filename": "data/input_filename.csv"}],"s3": {"dataset": "$MYDATASET_NAME"}}
Providing input content directly
You can provide the content of input files directly in the payload using the content
key
data = pd.read_csv('mydata.csv')payload = {"inputs": [{"content": data.to_csv(index=False),"filename": "input_filename.csv"}]}
This specifies that the content of the input file input_filename.csv
comes from the pandas DataFrame data
.
Specifying experiment data
data refers to experiment configuration files that may be overridden by the user.
The data files are specified via the key data
in the payload. The value of this key is a list that has one item, a dictionary, for each data
file you want to override e.g.,
payload = {"data": [{...},...]}
The format of the data
dictionary is identical to the input
dictionary.
Providing data files via S3/Datashim or providing their content directly follows same process as described for inputs. See those sections for details.
Specifying experiment variables
variables are optional parameters controlling the behaviour of the experiment, e.g. number of cpus. They are experiment specific i.e. the same variables don’t exist in all experiment and variables controlling similar behaviour in two experiments may not have the same name.
variables are set using the variables
key in the payload. The value of this key is a dictionary of variable-name, variable-value pairs.
payload = {... #Input/data options elided"variables": {"startIndex": 0,"numberMolecules": 1,}}
Payload Details
The following is the complete structure of the api_experiment_start()
payload (in YAML
). This includes some advanced options not discussed here.
platform: name of platform (optional - see parameterisation notes)inputs: # optional (parameterised packages may have no inputs)# see notes for interaction with Dataset/S3- filename: str # requiredcontent: str # optional - see S3 notesdata: # optional (parameterised packages may have no overiddable data files)# see notes for interaction with Dataset/S3- filename: str # requiredcontent: str # optional - see S3 notes
Notes
inputs
anddata
file-specifications have an optionalcontent
field. If this field is missing then the contents of the files are expected to exist on S3 or in a Dataset. Whens3
exists then thefilename
field acts as the path inside the S3 bucket (or dataset) to use for reading the content of the input/data file.- Dataset objects are only available if a cluster-admin has installed Datashim on the cluster.
- The fields
additionalOptions
,data
,inputs
,platform
, andvariables
must adhere to parameterisation rules. See the parameterised package documentation for more information.
Getting the status of a virtual experiment instance
A common user-task is to check the status of a virtual experiment instance. For example, to see if it is still running, or, if finished, if there was any error.
The API method api_rest_uid_status
returns the status of a given rest_uid
:
# put here the rest_uid of the virtual experiment instancerest_uid = "toxicity-predictions-trol7a"status = api.api_rest_uid_status(rest_uid)
The variable status
contains various information about the virtual experiment instance in addition to its status. The execution status is under the status
key, you can inspect by executing:
import jsonprint(json.dumps(status['status'], indent=2))
This will print a dictionary with the following keys (among others). The potential values of these keys is also described.
experiment-state
: Indicates the execution state of the experiment instance- Possible Values:
- unscheduled: The experiment has not been scheduled to run yet. This can be due to lack of resources, which may resolve, or be a critical issue (unable to pull ST4SD images, unable to mount volumes)
- unschedulable: Required pre-tasks for the experiment execution failed e.g. could not get workflow source, could not download s3 inputs.
- Initialising: The experiment is starting up
- running: The experiment has started running components
- waiting_on_resource: A component in the active stage is waiting on resource
- suspended: The workflow execution has been suspended
- finished: The experiment is finished.
- failed: Only set if the experiment encountered an error during initialisation (failed to run any steps of workflow after being started). For example, fail to parse arguments, fail to create directory structures.
- Possible Values:
exit-status
: Indicates how a completed experiment exited- This receives its final value after
experiment-state
transitions to eitherfinished
or, in rare circumstances,failed
(see above). Value will be “N/A” or "" (empty) before this. - Possible Values:
- Success: The experiment existed successfully
- Failed: The experiment failed (at least one component)
- Stopped: The experiment was stopped/killed
- N/A: The experiment is running and doesn’t have an exit-status yet
- "" (Empty): The experiment has not started.
- This receives its final value after
error-description
: If theexit-status
is Failed the value of this key is a string which explains the failure cause.total-progress
: A number in [0.0, 1.0] indicating the progress of the experiment. Note that workflow developers may decide to control this value.current-stage
: UID (e.g.stage0
) of the active stage with the lowest stage indexstage-state
: Indicates the state of the active stage (a stage with a component running) with with the lowest stage index. Value is one of["Initialising", "finished", "waiting_on_resource","running", "component_shutdown", "failed"]
stage-progress
: A number in [0.0, 1.0] indicating the progress of the active stage with the lowest stage index. Note that workflow developers may decide to control this value.
Here is an example of the status dictionary
{"experiment-state": "finished","total-progress": 1.0,"exit-status": "Success","stages": ["Toxicity-prediction"],"current-stage": "Toxicity-prediction","stage-state": "finished",
This reports that:
- The orchestrator observed that the virtual experiment instance terminated (
experiment-state
=finished
) - The virtual experiment instance has produced all its outputs (
total-progress
=1.0
) - The virtual experiment instance completed successfully (
exit-status
=Success
) - The experiment had 1 stage (
stages
=["Toxicity-prediction"]
) Toxicity-prediction
was the most recently executed stage with the lowest stage index (current-stage
=Toxicity-prediction
).- All its tasks terminated and they were all successful (
stage-state
=finished
) - It reached its max progress (
stage-progress
=1.0
)
- All its tasks terminated and they were all successful (
- The virtual experiment instance did not raise any errors (
error-description
=""
)
Inspect the metadata of a virtual experiment instance
In addition to the execution status information the status
key also contains metadata on the experiment. To see it:
import jsonprint(json.dumps(status['status']['meta'], indent=2))
This will print a dictionary with the following keys
arguments
: The command-line of the orchestratordata
: The list of files that override data filesinput
: The list of input filespid
: The process ID of the st4sd orchestratorplatform
: The name of the platform that the virtual experiment instance executesuserVariables
: User provided variables, the schema is{'global':{name:value}, 'stages':{index:{name:value}}}
variables
: Global and stage variables active in theplatform
-scope that the virtual experiment executes. The schema is{'global':{name:value}, 'stages':{index:{name:value}}}
hybridPlatform
: Name of hybrid-platform for communicating with LSF (can be None),userMetadata
: A dictionary withkey(str): Any
value pairs that users can provideinstanceName
: The name of the directory containing the virtual experiment instance.version
: The version of the st4sd orchestrator
Retrieving the outputs of a virtual experiment instance
There are multiple ways to retrieve outputs of virtual experiments: via a virtual experiment interface, key-outputs, the Datastore APIs, and by leveraging ST4SD’s automated upload to S3.
- Retrieving the properties measured by an experiment
- Retrieving key-outputs
- Automatically uploading key-outputs to S3
- Listing outputs produced by virtual experiment components
- Retrieving outputs via the ST4SD Datastore APIs
Retrieving the properties measured by an experiment
Some virtual experiments define interfaces which make it simple for users to retrieve the input systems and measured properties from runs of that virtual experiment.
Learn how to use virtual experiment interfaces here.
Retrieving key-outputs
Key-Outputs are files produced by an experiment that the developer has flagged as being of special interest. Since the names of the file can be generic the developer gives the key-output a descriptive label to better explain what it is.
Information on the key-outputs of a virtual experiment instance are stored in the dictionary returned by api_rest_uid_status
status = api.api_rest_uid_status(restUID)pprint.pprint(instance_status['outputs'])
An example of the output of this is
{'OptimisationResults': {'creationtime': '1669584128.077387','description': '','filename': 'energies.csv','filepath': 'stages/stage1/ExtractEnergies/energies.csv','final': 'yes','production': 'yes','type': '','version': '1'}}
This says the experiment has one key-output called OptimisationResults. This refers to a file energies.csv
produced by the component ExtractEnergies
in the given experiment.
To retrieve a key-output we can use api_rest_uid_output
. This method returns the contents as bytes so it needs to be converted to a string. Note: replace OptimisationResults
with the name of a key-output relevant to an experiment you have run.
filename, contents = api.api_rest_uid_output(rest_uid, 'OptimisationResults')contents = contents.decode('utf-8', 'ignore')# read it into a pandas dataframedf = pd.read_csv(io.StringIO(contents), sep=",", skipinitialspace=True)
Automatically uploading key-outputs to S3
ST4SD supports automatically uploading key-outputs to S3.
To enable this feature include the following values in the payload
you provide to api.api_experiment_start(experimentId, payload)
(see Running a virtual experiment for more information):
The following example stores the key-outputs under run1_output
in an bucket calledmy-bucket
"s3Store":{"credentials": {"accessKeyID": "$S3_AccessKeyID","secretAccessKey": "$S3_SecretAccessKey","endpoint": "$S3_ENDPOINT","region": "$S3_Region","bucket": "my-bucket"},"bucketPath": "/run1_output/"
Listing outputs produced by virtual experiment components
Before being able to retrieve the contents of the outputs files produced by the virtual experiment components we need to know their paths. We start by retrieving the list of components that were part of our experiment through these two calls:
metadata = api.cdb_get_user_metadata_document_for_rest_uid(rest_uid)components = api.cdb_get_document_component(instance=metadata['instance'])
The list of full paths of the output files produced by each components will then be available using the files
key.
We can access it for component 0
as such:
component_0_files=components[0]['files']
Retrieving outputs via the ST4SD Datastore APIs
To retrieve outputs we use the cdb_get_file_from_instance_uri
function.
This function, however, expects to receive relative paths instead of full ones. We can change our list of paths with this list comprehension:
component_0_files_relative = [component_file[component_file.index('/stages/'):] for component_file in component_0_files]
Here we show how to retrieve the first file produced by component 0
from the previous example:
data = api.cdb_get_file_from_instance_uri(document[0]['instance'], component_0_files_relative[0])
Stopping a virtual experiment instance
The API call to cancel and delete a virtual experiment instance is api_rest_uid_delete
.
Use this if you want to stop a run for any reason.
The api_rest_uid_delete()
method does not affect the files that the run has already generated. However, it does cause the run, and any components that are still running, to terminate. It also deletes any Kubernetes objects that have been created by the run.
You may also use this to delete Kubernetes objects created for running this virtual experiment instance. Note that deleting the virtual experiment instance does not affect the files that it has produced.
# put here the rest_uid of the virtual experiment instancerest_uid = "toxicity-predictions-trol7a"api.api_rest_uid_delete(rest_uid)
Next steps
To learn how to get properties from a virtual experiment see: Using a virtual experiment interface.
To learn how to interact with virtual experiment from the terminal, see: ST4SD and the OpenShift CLI.