Collection-based modifiers

Semantiva provides two families of helpers for working with data collections:

  • slicers, which apply a processor element-wise to an existing collection, and

  • derive-based parameter sweeps, which build collections by repeatedly invoking a processor with different parameter values.

These helpers are optional. Processors can always be implemented directly for collection types, but slicers and sweeps remove boilerplate loops and make the structure of collection processing visible in inspection and trace records.

For an overview of collection types themselves, see Data collections.

Slicers

Slicers are generated by the factory in semantiva.data_processors.data_slicer_factory. Given a data processor that works on a single element, the factory creates a new processor class that operates on a DataCollectionType by iterating over its elements.

The factory supports both data operations and probes:

  • DataOperation-based slicers

  • DataProbe-based slicers

DataOperation slicers

For data operations, the slicer:

  • consumes a collection of elements,

  • applies the wrapped operation to each element in sequence, and

  • returns a new collection of the same collection type with the processed elements.

The underlying processor must have matching input and output element types; the slicer preserves the collection type and ordering. Under the hood this is implemented by dynamically subclassing the original operation and overriding input_data_type / output_data_type to point to the collection type.

DataProbe slicers

For probes, the slicer:

  • consumes a collection of elements,

  • runs the wrapped probe on each element, and

  • returns a list of probe results.

The collection itself flows through as the data channel (for downstream processors), while probe results are usually written into context via context_key when the probe is used in a node.

In both cases, the slicer keeps the element-wise pattern explicit. Inspection and trace records reflect that the collection was processed by a single slicer processor rather than by hand-written loops.

Derive-based parameter sweeps

Parameter sweeps are derive preprocessors that compute processor parameters from variables and, when variables enumerate more than one value, execute the processor multiple times to build a collection or a list of results.

They are configured under the reserved derive key on a node using the parameter_sweep tool.

Objective

Derive-based parameter sweeps:

  • compute call-time parameters from variable specifications,

  • optionally expand a node into a collection-producing processor when variables take multiple values, and

  • publish the materialised variable values into context.

Basic shape

Under a node, the reserved preprocessor boundary derive hosts named preprocessors. The parameter_sweep preprocessor computes parameters from variables and, for data sources and operations, declares the collection type produced:

pipeline:
  nodes:
    - processor: FloatValueDataSource
      derive:
        parameter_sweep:
          parameters:
            value: 2.0 * t
          variables:
            t: { lo: -1.0, hi: 2.0, steps: 3 }
          mode: combinatorial
          broadcast: false
          collection: FloatDataCollection

What it does

  • Computes the parameter value from an expression using variable t.

  • Expands into a collection typed by collection (DataSource/DataOperation).

  • Publishes t_values in the context.

Supported kinds

Sweeps can wrap three kinds of processors:

  • DataSource → generates a collection via repeated get_data(...).

  • DataOperation → augmentation-style expansion via repeated process(data, ...) on the same input.

  • DataProbe → returns a list of probe results; probe nodes persist via a node-level context_key and pass through their input data.

For DataSource and DataOperation sweeps a collection output type is required. For DataProbe sweeps collection is forbidden; probes always return a list.

Configuration reference

Inside derive.parameter_sweep the following keys are recognised:

  • parameters (mapping; required): expressions that compute call-time arguments. Keys must match the wrapped processor’s parameter names.

  • variables (mapping; required): variable definitions used by the expressions:

    • Range: { lo: <float>, hi: <float>, steps: <int> [, scale: linear|log] }

    • Sequence: [v1, v2, ...]

    • FromContext: { from_context: <key> } (must yield a non-empty sequence)

  • collection (string; required for DataSource/DataOperation, forbidden for DataProbe): collection type name.

  • mode: combinatorial (default) or by_position.

  • broadcast: boolean (default false).

Modes and validation

  • combinatorial: Cartesian product across variables.

  • by_position: zip-style alignment; an error is raised if variable sequences have different lengths.

  • DataProbe sweeps must not declare collection.

  • Unknown parameter names in parameters produce a clear error describing the wrapped processor’s signature.

Examples

DataSource sweep

- processor: FloatValueDataSource
  derive:
    parameter_sweep:
      parameters:
        value: 2.0 * t
      variables:
        t: { lo: -1.0, hi: 2.0, steps: 3 }
      collection: FloatDataCollection

DataOperation sweep (augmentation)

- processor: FloatMultiplyOperation
  derive:
    parameter_sweep:
      parameters:
        factor: f
      variables:
        f: { lo: 1.0, hi: 3.0, steps: 3 }
      mode: by_position
      collection: FloatDataCollection

DataProbe sweep

- processor: FloatCollectValueProbe
  derive:
    parameter_sweep:
      parameters: {}
      variables:
        n: { lo: 1, hi: 3, steps: 3 }
  context_key: probe_values

FromContext variables

The FromContext variable specification enables sweeps over sequences that are discovered or computed earlier in the pipeline. This is useful when sweep values depend on runtime conditions or previous processing results.

- processor: FloatValueDataSource
  derive:
    parameter_sweep:
      parameters:
        value: float(input_value)
      variables:
        input_value: { from_context: discovered_values }
      collection: FloatDataCollection

Requirements:

  • The context key must exist at runtime and contain a non-empty, non-string sequence.

  • The sweep processor exposes the context key via get_context_requirements() for inspection.

  • A {var}_values context entry is created (for example input_value_values) containing the materialised sequence for downstream use.

Inspection and provenance

Inspection surfaces which parameters were computed, provided, defaulted, or remain required_external_parameters. For nodes using derive.parameter_sweep, inspection also includes derived_summary and preprocessor_metadata attributes. See Introspection & Validation for complete inspection details.

In the Semantic Execution Record (SER) trace format, parameter sweeps expose both the concrete parameter values and their origin (node config, context, or processor defaults). See Semantic Execution Record (SER) v1 and SER v1 JSON Schema for the full schema.

See also