info icon

Use this page to learn how to write a virtual experiment interface.

Interface Definition
Input Extraction Methods
Property Extraction Methods
Example

A core-concept in ST4SD is a virtual experiment. This is a computational workflow that takes as input one or more systems of a given type, and produces as output values of properties of those systems.

This document describes how ST4SD developers can describe this information in their virtual experiments via an interface

The interface of a virtual experiment defines:

The specification used to describe input systems it processes e.g. SMILEs for small molecules
Instructions to extract the input systems from input data
Instructions to extract the values of properties that the virtual experiment computes

Once a virtual experiment has an interface ST4SD can return a pandas.DataFrame containing the properties calculated by instances of the virtual experiment, as well as the ids of the input systems that an instance processed. This functionality is provided via the st4sd-datastore API and the st4sd-runtime-service API. See using a virtual experiment interface for further information.

Interface Definition

An interface is an optional top-level FlowIR key which describes what input and properties of a virtual experiment, as well as how to extract their values. For experiments using DSL the output and interface fields are direct children of the entrypoint field instead. You can find an example here.

See the tutorial for a refresher on virtual experiment definitions and FlowIR.

The general scheme of an interface is

interface:
  description: #A description of the virtual experiment. Optional
  inputSpec:
    namingScheme: #The scheme/specification used to define your inputs e.g. SMILES
    inputExtractionMethod: 
      $INPUT_EXTRACTION_METHOD_NAME: #The name of an input extraction method - see "Input Extraction Method" section for possibilities
        source:  #Optional source method used to provide input to the extraction method.. See the "Source Methods" section for potential values.
          ...
        args: #Optional arguments for the extraction method
Copy to clipboard

The 2 main fields are:

interface.inputSpec: A dictionary that describes the inputs of the virtual experiment and how to extract them
interface.propertiesSpec: An array of dictionaries (one per property) that describes how to extract the values of the property

Within both fields the developer defines extraction methods which tell ST4SD how to extract values that the virtual experiment reads (input ids) and writes (property values).

See input extraction methods for details on choices for that field
See property extraction methods for details on choices for that field

Both input extraction methods and property extraction methods can have 2 sub-fields, source and args which may be optional. If the source method is present it must be one of the options outlined in source methods

Input Extraction Methods

Input extraction methods are used by to retrieve a list of the input system ids

csvColumn

Use the csvColumn extraction method if the input ids of your experiment are defined in a column of an input CSV file which has column headers.

Options

source:
  path: #The path SOURCE-METHOD. See source-methods for more
args:
  column: #The name of the column in the CSV file containing the ids (the column header)
Copy to clipboard

Example

interface:
  inputSpec:
    namingScheme: 'SMILES'
    inputExtractionMethod: 
      csvColumn:
        source:
          path: 'input/input_smiles.csv'
        args:
          column: "SMILES"
Copy to clipboard

hookGetInputIds

Use hookGetInputIds when you want to provide your own python function for getting the input ids.

To use this method the developer must provide an implementation of the following python function and place it in a file called interface.py in the hooks directory of their virtual experiment. Note: this file can contain other functions also.

def get_input_ids(input_id_file: str, variables: Dict[str, str]) -> List[str]:
   '''
       Params: 
            input_id_file (str): The path to the location of the file that contains input ids of the input systems. This comes from the `source.path` option in the interface YAML.
            variables (dict): A dictionary of the global and user variables passed to the virtual experiment instance
            
       Returns: 
            A list of strings each of which is the id of an input system
   '''
Copy to clipboard

Options

source:
  path: #A path relative to the root directory of the virtual experiment instance. It points to the CSV file that contains the `input-ids`.
Copy to clipboard

Example

interface:
  inputSpec:
    namingScheme: 'SMILES'
    inputExtractionMethod: 
      hookGetInputIds:
        source:
          path: 'input/input_smiles.csv'
Copy to clipboard

The band-gap-gamess virtual experiment uses hookGetInputIds to describe the extraction of input ids.

Property Extraction Methods

Property extraction methods conceptually produce a properties table which contains at least 2 columns: (input-id, $propertyName)where $propertyName is the name of the property in the propertiesSpec element using the extraction method. Note: in practice propertyName will be transformed to lowercase.

csvDataFrame

Use this method if

there is a single CSV file to extract the values of a particular property from for all input
The properties are stored in a column of this CSV file
The input ids are stored in a column of this CSV file

Note:

The table created by this method must have column headers input-id and $PROPERTYNAME. The csvDataFrame property extractor can change the column names to these correct values using the renameColumns option (see Example)

Options

source:
   $SOURCE_METHOD_NAME # Name of the source methods and its options. See below. 
args:
   renameColumns: #Optional: Dictionary whose keys are column names in the CSV file and values are the names to rename the associated key columns. Output column names are implicitly converted to `lowercase`
   `${name}: ${value}`: #(Optional) Arguments to the `pandas.read_csv()` method. The default arguments are `engine="python"` and `sep=None`.
Copy to clipboard

Example

propertiesSpec:
- name: 'band-gap'
  propertyExtractionMethod:
    csvDataFrame:
      source:
        keyOutput: 'FinalEnergies'
      args:
        renameColumns:
          SMILE: "input-id"
Copy to clipboard

hookGetProperties

UsehookGetProperties when you want to provide your own python function for getting the property values.

def get_properties(property_name:str, property_output_file: str, input_id_file: str, variables: Dict[str, str]) -> pandas.DataFrame
   '''
       Params: 
            property_name (str): The name of the property the function should return the values of.  
            property_output_file (str): The path to the file containing the properties 
            input_id_file (str): The path to the file containing the input_ids
            variables (dict): A dictionary of the global and user variables passed to the virtual experiment instance
            
       Returns: 
Copy to clipboard

If hookGetProperties is defined as the propertyExtractionMethod for property idx the values passed to the parameters of this function are determined as follows

property_name : The value of interface.propertiesSpec[idx].name
property_output_file: The value returned by theinterface.propertiesSpec[idx].propertyExtractionMethod.hookGetProperties.source method
input_id_file: The value of interface.inputSpec.inputExtractionMethod.$METHOD.source

Note: The column headers in the returned pandas DataFrame will be converted to lowercase by ST4SD.

Options

hookGetInputIds:
  source: #A source method - see below for details
Copy to clipboard

Example

propertiesSpec:
- name: 'band-gap'
  propertyExtractionMethod:
    hookGetProperties:
      source:
        keyOutput: 'FinalEnergies'
Copy to clipboard

The band-gap-gamess virtual experiment uses hookGetProperties to describe the extraction of properties.

Source methods

Source methods define different ways of defining a source file-path that is used by input or property extraction methods

path

Use this method if you know the full path of the source file.

Options

path: $PATH #A path relative to the root directory of the virtual experiment instance. It points to the CSV file that contains the `input-ids`.
Copy to clipboard

Example

propertyExtractionMethod:
  hookGetProperties:
    source:
      path: "stages/stage1/EnergiesExtraction/energies.csv"
Copy to clipboard

keyOutput

Use this method if the properties are in a key-output of the experiment. This method avoids having to know the path to the file (which could change if storage methods change)

Options

# The name of a key-output in the experiment.
# These are keys of the top-level FlowIR field `output`.
keyOutput: $KEYOUTPUT
Copy to clipboard

Example

propertyExtractionMethod:
  hookGetProperties:
    source:
      keyOutput: "FinalEnergies"
Copy to clipboard

Example

In this example we have a simple virtual experiment that counts vowels and letters in strings. Here is the FlowIR definition:

output:
  vowels:
    data-in: stage0.count-vowels/vowels.csv:ref
  letters:
    data-in: stage0.count-letters/letters.csv:ref

components:
- name: count-vowels
  references:
Copy to clipboard

Here is an input words.csv file:

word;
hello;
awesome;
world;
Copy to clipboard

When we process the above input file with this workflow we get 2 outputs:

The output vowels contains the CSV file:

a;e;i;o;u;word;vowels
0;1;0;1;0;hello;2
1;2;0;1;0;awesome;4
0;0;0;1;0;world;1
Copy to clipboard

The output letters contains the CSV file:

word;letters
hello;5
awesome;7
world;5
Copy to clipboard

Interface

An interface to this experiment is shown below. This interface used csvColumn input extraction method and the csvDataFrame property extraction method. These methods mean the developer does not have to write any other code.

interface:
  description: Counts vowels in words
  inputSpec:
    namingScheme: words
    inputExtractionMethod:
      csvColumn:
        source:
          path: input/words.csv
        args:
Copy to clipboard

Run Details

Adding the interface definition will cause instances of the virtual experiment to generate 2 new files:

${INSTANCE_DIR}/output/properties.csv: This is a ; delimited CSV file that contains the properties columns produced by each property defined in propertySpec.
${INSTANCE_DIR}/outputs/input-ids.json: A JSON file that contains an array of strings. Each string is the id of an input system.

For the above example we would get the following in ${INSTANCE_DIR}/output/properties.csv:

input-id;vowels;letters
hello;2;5
awesome;4;7
world;1;5
Copy to clipboard

The input ids file (${INSTANCE_DIR}/outputs/input-ids.json) looks like this:

[
    "hello",
    "awesome",
    "world"
]
Copy to clipboard

Edit this page on GitHub

Writing a Virtual Experiment Interface

Interface Definition

Input Extraction Methods

csvColumn

Options

Example

hookGetInputIds

Options

Example

Property Extraction Methods

csvDataFrame

Options

Example

hookGetProperties

Options

Example

Source methods

path

Options

Example

keyOutput

Options

Example

Example

Interface

Run Details