Sweeps in YAML

Objective

This page documents the Parametric Sweep feature. The objective is to show how to declare and run parameter sweeps that produce typed collections by iterating over independent variables, how to write safe parametric expressions that compute parameters for the element DataSource, and how to use context values as inputs to a sweep.

Overview

A sweep is a DataSource that constructs a collection (for example, an image stack or a list of numeric values) by repeatedly invoking an underlying element DataSource with different parameters. Sweeps help you generate systematic variations (grids, series, or element-wise combinations) without writing bespoke code.

Key concepts and parameters

  • vars (required): A mapping of independent variable names to one of the variable specification types documented below. Each variable becomes an input to parametric expressions and (optionally) is forwarded to the element DataSource as an independent parameter.

  • parametric_expressions (optional but common): A mapping from element parameter names to small expressions (strings). Each expression is evaluated for every sweep step and its result is passed to the element DataSource under the parameter name. Expressions use a safe AST-based evaluator (see “Notes on expressions” below). Example:

    parametric_expressions:

    value: “50 + 20 * t”

    This will evaluate 50 + 20 * t for each value of t and set the element parameter value accordingly.

  • static_params (optional): A mapping of parameters with fixed values passed unchanged to every element invocation.

  • include_independent (optional boolean): When true, the independent variables from vars will be forwarded to the element DataSource as parameters (in addition to any parametric expression outputs).

  • mode: product (default) computes the Cartesian product of variable sequences; zip pairs values element-wise (sequences must match lengths unless broadcast is true).

  • broadcast (boolean, only for zip mode): If true, shorter sequences are repeated to match the longest sequence length.

Variable specification types

  • RangeSpec (YAML shorthand): Numeric range generation.

    YAML form: t: { lo: 0.0, hi: 1.0, steps: 5 }

    Produces a numeric sequence of steps values between lo and hi. Options: scale: linear or log; endpoint includes upper bound if true.

  • SequenceSpec (YAML shorthand): Explicit sequence.

    YAML form: file: ["a.csv", "b.csv"]

    Use when you have an explicit list of values.

  • FromContext (YAML shorthand): Read a sequence from pipeline context.

    YAML form: sha: { from_context: commit_shas }

    Use when an earlier processor placed a sequence into the pipeline context. The factory will validate the context entry is a non-empty, non-string sequence at runtime and expose the required context key to the inspection system.

Modes

  • product: Cartesian product of sequences. If you have t=[a,b] and p=[1,2] you’ll get four combinations: (a,1),(a,2),(b,1),(b,2).

  • zip: Element-wise pairing. With zip and broadcast=false the sequences must have equal length, and each step uses the corresponding elements. With broadcast=true shorter sequences are repeated.

Basic example

pipeline:
  nodes:
    - processor: "sweep:FloatValueDataSource:FloatDataCollection"
      parameters:
        vars:
          t: {lo: 0.0, hi: 1.0, steps: 5}
        parametric_expressions:
          value: "t * 2"
        include_independent: true

Explanation

With steps: 5 the range t becomes the sequence [0.0, 0.25, 0.5, 0.75, 1.0] (linear spacing). The expression "t * 2" is evaluated for each t producing [0.0, 0.5, 1.0, 1.5, 2.0] which are passed as the value parameter to the element DataSource. If include_independent is true, the element will also receive t as a parameter along with value.

From context & zip mode

pipeline:
  nodes:
    - processor: "sweep:AnalysisDataSource:AnalysisCollection"
      parameters:
        vars:
          file: {from_context: discovered_files}
          p:    {from_context: parameter_sets}
        parametric_expressions:
          name: "f'{file}_{p}'"
        mode: "zip"
        broadcast: true

Explanation

Suppose the pipeline context contains discovered_files: ["a.tif", "b.tif"] and parameter_sets: ["A", "B", "C"]. With mode: zip and broadcast: true the shorter sequence (discovered_files) is repeated to match the longer sequence, producing steps:

  • step 0: file=”a.tif”, p=”A” -> name=”a.tif_A”

  • step 1: file=”b.tif”, p=”B” -> name=”b.tif_B”

  • step 2: file=”a.tif”, p=”C” -> name=”a.tif_C”

Each name is produced by evaluating the parametric expression f'{file}_{p}' for that step and passed to the element DataSource.

Inspection

  • Sweep sources require any FromContext keys; they appear in semantiva inspect.

  • Sweep sources create {var}_values context keys for downstream processors.

API reference (short)

The factory exposes three small helper types you may use in YAML or the programmatic API:

  • RangeSpec(lo, hi, steps, scale='linear', endpoint=True) - produce a numeric range.

  • SequenceSpec([...]) - provide an explicit sequence of values.

  • FromContext('key') - read a sequence from the pipeline context.

When using the programmatic API, call:

ParametricSweepFactory.create(
    element=MyElementDataSource,
    collection_output=MyCollectionType,
    vars={ 't': RangeSpec(0,1,steps=5), 'file': SequenceSpec([...]) },
    parametric_expressions={ 'x': '50 + 20 * t', 'name': "'img_' + str(t)" },
    mode='product'|'zip',
    include_independent=True|False,
)

Notes on expressions

  • Expressions are parsed using a safe AST-based evaluator. This means:
    • No arbitrary eval or execution of imports.

    • Only simple function calls are allowed (abs, min, max, round, and the type conversions float/int/str/bool).

    • Tuples are supported which makes multi-valued parameters possible, e.g. "(50 + 20 * t, 20)" returns a 2-tuple for a multi-valued parameter.

    • Unknown variables or disallowed syntax raise clear errors at compile time.

Good practices

  • Prefer explicit SequenceSpec or RangeSpec in YAML for readability.

  • Use FromContext when sweep values are produced earlier in the same pipeline. The inspection output will list these required context keys.

  • Use mode: zip when you want element-wise pairing. If sequences have differing lengths and you still want element-wise operation, set broadcast: true to repeat shorter sequences.

Examples

Tuple output (multi-valued parameter):

pipeline:
  nodes:
    - processor: "sweep:TwoDGaussianSingleChannelImageGenerator:SingleChannelImageStack"
      parameters:
        vars:
          t: {lo: -1, hi: 2, steps: 3}
        parametric_expressions:
          x_0: "50 + 5 * t"
          y_0: "50 + 5 * t + 5 * t ** 2"
          std_dev: "(50 + 20 * t, 20)"  # tuple -> (std_dev_x, std_dev_y)
          amplitude: "100"
          angle: "60 + 5 * t"

Explanation

The expression "(50 + 20 * t, 20)" evaluates to a tuple for each t. For example, if t takes values [-1, 0, 1] then std_dev expands to the tuples [(30,20), (50,20), (70,20)]. The element DataSource must accept whatever parameter names you use (here the factory will forward the evaluated std_dev value under that parameter name). If your element understands separate std_dev_x and std_dev_y you can either emit those as separate expressions or unpack the tuple in the element implementation.

FromContext example with type conversion in expressions:

pipeline:
  nodes:
    - processor: "sweep:FloatValueDataSource:FloatDataCollection"
      parameters:
        vars:
          input_value: { from_context: discovered_values }
        parametric_expressions:
          value: "float(input_value)"

Explanation

If the context key discovered_values contains strings like ["1.5", "2.75"], the expression "float(input_value)" converts each string to a floating point value resulting in [1.5, 2.75] which are then passed to the element DataSource as the value parameter.

Expressions

  • Expressions use a safe evaluator (no eval).

  • Allowed: declared variable names, ops (+, -, , /, *), tuples, and functions: abs, min, max, round, float, int, str, bool.

  • Clear errors for unknown variables or disallowed syntax.

  • Tuple expressions like "(x + 1, y * 2)" are supported for multi-value parameters.

  • Type conversion functions like "float(input_value)" are supported for data type conversion.