Collection-based modifiers ========================== Semantiva provides two families of helpers for working with data collections: - **slicers**, which apply a processor element-wise to an existing collection, and - **derive-based parameter sweeps**, which build collections by repeatedly invoking a processor with different parameter values. These helpers are optional. Processors can always be implemented directly for collection types, but slicers and sweeps remove boilerplate loops and make the structure of collection processing visible in inspection and trace records. For an overview of collection types themselves, see :doc:`data_collections`. Slicers ------- Slicers are generated by the factory in ``semantiva.data_processors.data_slicer_factory``. Given a data processor that works on a single element, the factory creates a new processor class that operates on a ``DataCollectionType`` by iterating over its elements. The factory supports both data operations and probes: - DataOperation-based slicers - DataProbe-based slicers DataOperation slicers ~~~~~~~~~~~~~~~~~~~~~ For data operations, the slicer: - consumes a collection of elements, - applies the wrapped operation to each element in sequence, and - returns a new collection of the same collection type with the processed elements. The underlying processor must have matching input and output element types; the slicer preserves the collection type and ordering. Under the hood this is implemented by dynamically subclassing the original operation and overriding ``input_data_type`` / ``output_data_type`` to point to the collection type. DataProbe slicers ~~~~~~~~~~~~~~~~~ For probes, the slicer: - consumes a collection of elements, - runs the wrapped probe on each element, and - returns a list of probe results. The collection itself flows through as the data channel (for downstream processors), while probe results are usually written into context via ``context_key`` when the probe is used in a node. In both cases, the slicer keeps the element-wise pattern explicit. Inspection and trace records reflect that the collection was processed by a single slicer processor rather than by hand-written loops. Derive-based parameter sweeps ----------------------------- Parameter sweeps are derive preprocessors that compute processor parameters from variables and, when variables enumerate more than one value, execute the processor multiple times to build a collection or a list of results. They are configured under the reserved ``derive`` key on a node using the ``parameter_sweep`` tool. Objective ~~~~~~~~~ Derive-based parameter sweeps: - **compute** call-time parameters from variable specifications, - **optionally expand** a node into a collection-producing processor when variables take multiple values, and - **publish** the materialised variable values into context. Basic shape ~~~~~~~~~~~ Under a node, the reserved preprocessor boundary ``derive`` hosts named preprocessors. The ``parameter_sweep`` preprocessor computes parameters from variables and, for data sources and operations, declares the collection type produced: .. code-block:: yaml pipeline: nodes: - processor: FloatValueDataSource derive: parameter_sweep: parameters: value: 2.0 * t variables: t: { lo: -1.0, hi: 2.0, steps: 3 } mode: combinatorial broadcast: false collection: FloatDataCollection What it does ~~~~~~~~~~~~ - **Computes** the parameter ``value`` from an expression using variable ``t``. - **Expands** into a collection typed by ``collection`` (DataSource/DataOperation). - **Publishes** ``t_values`` in the context. Supported kinds ~~~~~~~~~~~~~~~ Sweeps can wrap three kinds of processors: - **DataSource** → generates a collection via repeated ``get_data(...)``. - **DataOperation** → augmentation-style expansion via repeated ``process(data, ...)`` on the same input. - **DataProbe** → returns a **list** of probe results; probe nodes persist via a node-level ``context_key`` and pass through their input data. For DataSource and DataOperation sweeps a ``collection`` output type is required. For DataProbe sweeps ``collection`` is forbidden; probes always return a list. Configuration reference ~~~~~~~~~~~~~~~~~~~~~~~ Inside ``derive.parameter_sweep`` the following keys are recognised: - ``parameters`` (mapping; **required**): expressions that compute call-time arguments. Keys **must** match the wrapped processor's parameter names. - ``variables`` (mapping; **required**): variable definitions used by the expressions: - Range: ``{ lo: , hi: , steps: [, scale: linear|log] }`` - Sequence: ``[v1, v2, ...]`` - FromContext: ``{ from_context: }`` (must yield a non-empty sequence) - ``collection`` (string; **required** for DataSource/DataOperation, **forbidden** for DataProbe): collection type name. - ``mode``: ``combinatorial`` (default) or ``by_position``. - ``broadcast``: boolean (default ``false``). Modes and validation ~~~~~~~~~~~~~~~~~~~~ - ``combinatorial``: Cartesian product across variables. - ``by_position``: zip-style alignment; an error is raised if variable sequences have different lengths. - DataProbe sweeps **must not** declare ``collection``. - Unknown parameter names in ``parameters`` produce a clear error describing the wrapped processor's signature. Examples ~~~~~~~~ DataSource sweep ################ .. code-block:: yaml - processor: FloatValueDataSource derive: parameter_sweep: parameters: value: 2.0 * t variables: t: { lo: -1.0, hi: 2.0, steps: 3 } collection: FloatDataCollection DataOperation sweep (augmentation) ################################## .. code-block:: yaml - processor: FloatMultiplyOperation derive: parameter_sweep: parameters: factor: f variables: f: { lo: 1.0, hi: 3.0, steps: 3 } mode: by_position collection: FloatDataCollection DataProbe sweep ############### .. code-block:: yaml - processor: FloatCollectValueProbe derive: parameter_sweep: parameters: {} variables: n: { lo: 1, hi: 3, steps: 3 } context_key: probe_values FromContext variables ~~~~~~~~~~~~~~~~~~~~~ The ``FromContext`` variable specification enables sweeps over sequences that are discovered or computed earlier in the pipeline. This is useful when sweep values depend on runtime conditions or previous processing results. .. code-block:: yaml - processor: FloatValueDataSource derive: parameter_sweep: parameters: value: float(input_value) variables: input_value: { from_context: discovered_values } collection: FloatDataCollection Requirements: - The context key must exist at runtime and contain a non-empty, non-string sequence. - The sweep processor exposes the context key via ``get_context_requirements()`` for inspection. - A ``{var}_values`` context entry is created (for example ``input_value_values``) containing the materialised sequence for downstream use. Inspection and provenance ------------------------- Inspection surfaces which parameters were **computed**, **provided**, **defaulted**, or remain **required_external_parameters**. For nodes using ``derive.parameter_sweep``, inspection also includes ``derived_summary`` and ``preprocessor_metadata`` attributes. See :doc:`introspection_validation` for complete inspection details. In the Semantic Execution Record (SER) trace format, parameter sweeps expose both the concrete parameter values and their origin (node config, context, or processor defaults). See :doc:`ser` and :doc:`schema_semantic_execution_record_v1` for the full schema. See also -------- - :doc:`data_collections` for the underlying collection types. - :doc:`data_operations` and :doc:`data_probes` for processor contracts. - :doc:`run_space` for run-space expansion.