Step Evidence Record (SER) v0 — draft ====================================== Semantiva records pipeline execution using the **Step Evidence Record (SER) v0 (draft)**. A single SER is emitted for every node that runs and contains: * stable identifiers for the run, pipeline and node * the node's upstream dependencies * action information (operation reference and parameters) * a minimal input/output delta * check results explaining why the node ran and why it was considered OK * timing information (wall and CPU) * explicit status and optional error details * optional summaries for input/output data and context snapshots SERs are written to ``*.ser.jsonl`` files where each line is a JSON object. Tools and the Studio viewer consume these records directly. Example SER ----------- .. code-block:: json { "type": "ser", "schema_version": 0, "ids": {"run_id": "run-…", "pipeline_id": "plid-…", "node_id": "n-3"}, "topology": {"upstream": ["n-2"]}, "action": { "op_ref": "FloatBasicProbe", "params": {"context_keyword": "probed_data"}, "param_source": {"context_keyword": "node"} }, "io_delta": { "read": ["probed_data"], "created": ["probed_data"], "updated": [], "summaries": { "probed_data": {"dtype": "FloatDataType", "len": 1} } }, "checks": { "why_run": { "trigger": "dependency", "upstream_evidence": [{"node_id": "n-2", "state": "completed"}], "pre": [ { "code": "required_keys_present", "result": "PASS", "details": {"expected": ["probed_data"], "missing": []} }, { "code": "input_type_ok", "result": "PASS", "details": {"expected": "FloatDataType", "actual": "FloatDataType"} } ], "policy": [] }, "why_ok": { "post": [ { "code": "output_type_ok", "result": "PASS", "details": {"expected": "FloatDataType", "actual": "FloatDataType"} }, { "code": "context_writes_realized", "result": "PASS", "details": {"created": ["probed_data"], "updated": [], "missing": []} } ], "invariants": [], "env": { "python": "3.11.2", "implementation": "cpython", "platform": "Linux-…", "semantiva": "0.1.0.dev0+dummy", "numpy": null, "pandas": null }, "redaction": {} } }, "timing": {"start": "…", "end": "…", "duration_ms": 5, "cpu_ms": 4}, "status": "completed", "labels": {"node_fqn": "FloatBasicProbe"}, "summaries": { "input_data": {"dtype": "FloatDataType", "sha256": "…"}, "output_data": {"dtype": "FloatDataType", "sha256": "…"} } } The ``checks`` block now always contains: * ``why_run.pre`` – built-in validation executed before the node runs. * ``why_ok.post`` – output validations that ran after the node returned. * ``why_ok.env`` – minimal, non-sensitive environment pins for reproducibility. Detail flags control which summary fields are emitted when using the JSONL driver: * ``hash`` (default) – include ``sha256`` hashes only. * ``repr`` – additionally include ``repr`` for input/output data. * ``context`` – with ``repr`` also include ``repr`` for pre/post context. * ``all`` – enable all of the above. Versioning Policy ----------------- .. note:: **SER Versioning Policy:** * ``schema_version`` is a **major** integer for breaking changes only * v0 during pre-release development; v1 at first public release * Future breaking changes increment to v2, v3, etc. * Optional ``schema_tag`` field may be present but is not required by readers Schema ------ The canonical JSON Schema ships with the package and can be loaded via: .. code-block:: python from importlib import resources schema = resources.files("semantiva.trace.schema") / "ser_v0.schema.json" IO Delta -------- Each SER now includes an ``io_delta`` describing how the node interacted with context: - ``read``: declared required keys (if provided by the processor) - ``created``: new keys written by the node - ``updated``: existing keys whose values changed - ``summaries`` (changed keys only): ``dtype``, ``len``, ``rows``, and optional ``sha256`` (``hash`` flag) and ``repr`` (``repr`` flag) Checks via SERHooks ------------------- The template-method orchestrator collects SER evidence centrally. The base :py:class:`~semantiva.execution.orchestrator.orchestrator.SemantivaOrchestrator` builds the pre/post check lists, captures ``io_delta`` snapshots, and pins the runtime environment exactly once per node. Downstream policy engines can extend these hooks (for example via ``_extra_pre_checks``) but every SER produced by the runtime includes the following checks out of the box—even on error. When a node fails, the exception entry is followed by the standard ``output_type_ok`` and ``context_writes_realized`` checks so failure records retain the same structure as successful ones. Built-in checks --------------- The runtime emits the following check entries for every node: .. list-table:: :header-rows: 1 * - Code - Channel - Purpose - PASS - WARN / FAIL * - ``required_keys_present`` - ``why_run.pre`` - Declared context keys are available before execution. - All required keys present. - Missing keys listed in ``details.missing``. * - ``input_type_ok`` - ``why_run.pre`` - Input payload matches the processor's ``input_data_type``. - ``details.actual`` matches ``details.expected``. - Type mismatch triggers ``FAIL``. * - ``config_valid`` - ``why_run.pre`` - Node configuration contains no unrecognised parameters. - ``WARN`` lists ``details.invalid``; omitted when the node cannot report invalid parameters. - ``WARN`` when inspection detected invalid parameters. * - ``output_type_ok`` - ``why_ok.post`` - Output payload matches the processor's ``output_data_type``. - ``details.actual`` matches ``details.expected``. - Type mismatch triggers ``FAIL``. * - ``context_writes_realized`` - ``why_ok.post`` - Context keys declared in ``io_delta.created``/``updated`` exist after execution. - All declared keys materialised, ``details.missing`` empty. - ``FAIL`` when writes were declared but no value was persisted. Environment pins ---------------- ``checks.why_ok.env`` captures a reproducibility snapshot: Python runtime, implementation, platform string, Semantiva version, and optional third-party versions (``numpy``/``pandas`` when installed). Values are simple strings or ``null`` and contain no host-specific secrets.