Collection-based modifiers
==========================

Semantiva provides two families of helpers for working with data collections:

- **slicers**, which apply a processor element-wise to an existing collection,
  and
- **derive-based parameter sweeps**, which build collections by repeatedly
  invoking a processor with different parameter values.

These helpers are optional. Processors can always be implemented directly for
collection types, but slicers and sweeps remove boilerplate loops and make the
structure of collection processing visible in inspection and trace records.

For an overview of collection types themselves, see :doc:`data_collections`.

Slicers
-------

Slicers are generated by the factory in
``semantiva.data_processors.data_slicer_factory``. Given a data processor
that works on a single element, the factory
creates a new processor class that operates on a
``DataCollectionType`` by iterating over its elements.

The factory supports both data operations and probes:

- DataOperation-based slicers
- DataProbe-based slicers

DataOperation slicers
~~~~~~~~~~~~~~~~~~~~~

For data operations, the slicer:

- consumes a collection of elements,
- applies the wrapped operation to each element in sequence, and
- returns a new collection of the same collection type with the processed
  elements.

The underlying processor must have matching input and output element types; the
slicer preserves the collection type and ordering. Under the hood this is
implemented by dynamically subclassing the original operation and overriding
``input_data_type`` / ``output_data_type`` to point to the collection type.

DataProbe slicers
~~~~~~~~~~~~~~~~~

For probes, the slicer:

- consumes a collection of elements,
- runs the wrapped probe on each element, and
- returns a list of probe results.

The collection itself flows through as the data channel (for downstream
processors), while probe results are usually written into context via
``context_key`` when the probe is used in a node.

In both cases, the slicer keeps the element-wise pattern explicit. Inspection
and trace records reflect that the collection was processed by a single slicer
processor rather than by hand-written loops.

Derive-based parameter sweeps
-----------------------------

Parameter sweeps are derive preprocessors that compute processor parameters
from variables and, when variables enumerate more than one value, execute the
processor multiple times to build a collection or a list of results.

They are configured under the reserved ``derive`` key on a node using the
``parameter_sweep`` tool.

Objective
~~~~~~~~~

Derive-based parameter sweeps:

- **compute** call-time parameters from variable specifications,
- **optionally expand** a node into a collection-producing processor when
  variables take multiple values, and
- **publish** the materialised variable values into context.

Basic shape
~~~~~~~~~~~

Under a node, the reserved preprocessor boundary ``derive`` hosts named
preprocessors. The ``parameter_sweep`` preprocessor computes parameters from
variables and, for data sources and operations, declares the collection type
produced:

.. code-block:: yaml

   pipeline:
     nodes:
       - processor: FloatValueDataSource
         derive:
           parameter_sweep:
             parameters:
               value: 2.0 * t
             variables:
               t: { lo: -1.0, hi: 2.0, steps: 3 }
             mode: combinatorial
             broadcast: false
             collection: FloatDataCollection

What it does
~~~~~~~~~~~~

- **Computes** the parameter ``value`` from an expression using variable ``t``.
- **Expands** into a collection typed by ``collection`` (DataSource/DataOperation).
- **Publishes** ``t_values`` in the context.

Supported kinds
~~~~~~~~~~~~~~~

Sweeps can wrap three kinds of processors:

- **DataSource** → generates a collection via repeated ``get_data(...)``.
- **DataOperation** → augmentation-style expansion via repeated
  ``process(data, ...)`` on the same input.
- **DataProbe** → returns a **list** of probe results; probe nodes persist via
  a node-level ``context_key`` and pass through their input data.

For DataSource and DataOperation sweeps a ``collection`` output type is
required. For DataProbe sweeps ``collection`` is forbidden; probes always
return a list.

Configuration reference
~~~~~~~~~~~~~~~~~~~~~~~

Inside ``derive.parameter_sweep`` the following keys are recognised:

- ``parameters`` (mapping; **required**): expressions that compute call-time
  arguments. Keys **must** match the wrapped processor's parameter names.
- ``variables`` (mapping; **required**): variable definitions used by the
  expressions:

  - Range: ``{ lo: <float>, hi: <float>, steps: <int> [, scale: linear|log] }``
  - Sequence: ``[v1, v2, ...]``
  - FromContext: ``{ from_context: <key> }`` (must yield a non-empty sequence)

- ``collection`` (string; **required** for DataSource/DataOperation,
  **forbidden** for DataProbe): collection type name.
- ``mode``: ``combinatorial`` (default) or ``by_position``.
- ``broadcast``: boolean (default ``false``).

Modes and validation
~~~~~~~~~~~~~~~~~~~~

- ``combinatorial``: Cartesian product across variables.
- ``by_position``: zip-style alignment; an error is raised if variable
  sequences have different lengths.
- DataProbe sweeps **must not** declare ``collection``.
- Unknown parameter names in ``parameters`` produce a clear error describing
  the wrapped processor's signature.

Examples
~~~~~~~~

DataSource sweep
################

.. code-block:: yaml

   - processor: FloatValueDataSource
     derive:
       parameter_sweep:
         parameters:
           value: 2.0 * t
         variables:
           t: { lo: -1.0, hi: 2.0, steps: 3 }
         collection: FloatDataCollection

DataOperation sweep (augmentation)
##################################

.. code-block:: yaml

   - processor: FloatMultiplyOperation
     derive:
       parameter_sweep:
         parameters:
           factor: f
         variables:
           f: { lo: 1.0, hi: 3.0, steps: 3 }
         mode: by_position
         collection: FloatDataCollection

DataProbe sweep
###############

.. code-block:: yaml

   - processor: FloatCollectValueProbe
     derive:
       parameter_sweep:
         parameters: {}
         variables:
           n: { lo: 1, hi: 3, steps: 3 }
     context_key: probe_values

FromContext variables
~~~~~~~~~~~~~~~~~~~~~

The ``FromContext`` variable specification enables sweeps over sequences that
are discovered or computed earlier in the pipeline. This is useful when sweep
values depend on runtime conditions or previous processing results.

.. code-block:: yaml

   - processor: FloatValueDataSource
     derive:
       parameter_sweep:
         parameters:
           value: float(input_value)
         variables:
           input_value: { from_context: discovered_values }
         collection: FloatDataCollection

Requirements:

- The context key must exist at runtime and contain a non-empty, non-string
  sequence.
- The sweep processor exposes the context key via
  ``get_context_requirements()`` for inspection.
- A ``{var}_values`` context entry is created (for example
  ``input_value_values``) containing the materialised sequence for downstream
  use.

Inspection and provenance
-------------------------

Inspection surfaces which parameters were **computed**, **provided**,
**defaulted**, or remain **required_external_parameters**. For nodes using
``derive.parameter_sweep``, inspection also includes ``derived_summary`` and
``preprocessor_metadata`` attributes. See :doc:`introspection_validation` for
complete inspection details.

In the Semantic Execution Record (SER) trace format, parameter sweeps expose
both the concrete parameter values and their origin (node config, context, or
processor defaults). See :doc:`ser` and
:doc:`schema_semantic_execution_record_v1` for the full schema.

See also
--------

- :doc:`data_collections` for the underlying collection types.
- :doc:`data_operations` and :doc:`data_probes` for processor contracts.
- :doc:`run_space` for run-space expansion.