Architects & system designers ============================= This page is for people who **design systems around Semantiva**: - You care about how Semantiva fits into a larger architecture: orchestration, storage, observability, lineage. - You want a clear picture of **execution, inspection and trace flows**, not just individual pipelines or components. - You may own **standards and conventions** around pipelines, SER, run spaces and contracts for your organisation. If you mainly run and tweak existing pipelines, see :doc:`pipeline_users` instead. If you primarily write components without owning wider system design, see :doc:`framework_developers`. .. admonition:: New to Semantiva? If you have never run a Semantiva pipeline before, **start with** :doc:`../getting_started`, then follow :doc:`pipeline_users` and :doc:`framework_developers`. Come back to this page once you are comfortable with pipelines, components and basic inspection. ------------------------------ What you should know already ------------------------------ Before you use this page as your main guide, you should: - Have followed :doc:`../getting_started`, :doc:`pipeline_users` and :doc:`framework_developers`. - Be comfortable with: - Reading and reviewing pipeline YAMLs for non-trivial workloads. - Understanding the responsibilities of DataOperation, DataProbe and ContextProcessor components. - Running ``pytest`` and ``semantiva dev lint`` as part of a development workflow. - Have a rough picture that: - Semantiva executes **graphs** of nodes (pipelines) over payloads with **data and context**. - Execution is recorded as **Semantic Execution Records (SER)** and related trace artefacts. - Contracts and SVA rules (including ``SVA250`` and friends) encode architectural invariants in a machine-checkable way. If any of that is unfamiliar, revisit :doc:`pipeline_users`, :doc:`framework_developers` and :doc:`../concepts` first. ------------------------------ Your learning path (301+) ------------------------------ Once you are comfortable as a pipeline user and component author, this is the recommended path for **architects and system designers**. Step 1 - Get the high-level execution & trace picture ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Start with the **inspection and trace** side of the system: - :doc:`../inspection` — why Semantiva records execution the way it does; how SER, traces and run spaces fit together. - :doc:`../ser` — high-level overview of the Semantic Execution Record: what each SER describes, and how it relates to a node run. - :doc:`../run_space` — how executions are grouped into run spaces (experiments, campaigns, workflows). At this stage, you should focus on: - How **node-level execution** becomes **SER JSONL files**. - How run spaces organise multiple runs and variations. - Where, in your architecture, SER and run spaces would be stored and consumed (file systems, object stores, trace viewers, etc.). Step 2 - Understand the core architecture slices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Next, read the core architecture docs with an “integration” mindset: - :doc:`../architecture/pipeline_schema` — how pipelines are represented as graphs; which parts are user-facing vs internal. - :doc:`../architecture/context_processing` — how context processing, observers and validators are wired (important for understanding invariants like those enforced by ``SVA250``). - :doc:`../architecture/registry` — where processors and other components come from, and how they are discovered. As an architect, you should be able to answer: - Which artefacts are **static configuration** (pipeline schema, registry entries) vs **runtime** (SER, traces, run spaces). - Where extension points are: new processors, new transports, custom registries. - How Semantiva's own invariants (as documented in contracts and architecture) align with your organisation's design principles. Configuration artefacts ----------------------- In most organisations, **YAML pipeline configurations** are the governed configuration artefact for Semantiva: - YAML pipelines are versioned, validated (via :doc:`../contracts` and :command:`semantiva dev lint`) and promoted across environments. - YAML is the source for building execution graphs in production (see :doc:`../pipelines_yaml` and :doc:`../architecture/pipeline_schema`). Semantiva also exposes a Python API for constructing pipelines (:doc:`../pipelines_python`), which is extremely useful for internal testing, simulation and R&D workflows. These Python pipelines should be treated as **internal tooling**, not as the system-of-record configuration. Step 3 - Tie SER, trace streams and aggregation together ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once you have the basic slices, go deeper into the trace pipeline: - :doc:`../schema_semantic_execution_record_v1` — reference for SER v1 schema. - :doc:`../trace_stream_v1` — how SERs are turned into trace streams. - :doc:`../trace_aggregator_v1` — how traces are aggregated into higher-level structures. - :doc:`../trace_graph_alignment` — how traces map back to pipeline graphs. You do not need to memorise every field, but you should understand: - How a **single node run** flows into SER → trace stream → aggregated view. - Which artefacts external tools (e.g. Semantiva Studio Viewer) consume. - What guarantees you get about **identity and provenance** across these stages. Step 4 - Look at contracts as architecture spec ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Contracts and SVA rules act as a **compact architecture specification**: - :doc:`../contracts` — how contracts are defined and enforced; how ``semantiva dev lint`` operationalises them. - The embedded catalog (``contracts_catalog``) — the list of rules, including: - Rules around data and context typing. - Context key metadata (created/suppressed/injected keys). - Signature invariants such as ``SVA250`` (no ``ContextType`` in ``_process_logic``). As an architect you should: - Treat these rules as part of the **system's architecture**, not just implementation details. - Use them to derive **organisational conventions** (e.g. how components are allowed to see and modify context, how identity is attached to nodes). Step 5 - Connect Semantiva to your wider system ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, map Semantiva onto your environment: - :doc:`../execution` — overview of orchestrators, executors and transports: where execution lives and how it can be embedded or driven externally. - :doc:`../run_space_lifecycle` and :doc:`../run_space_emission` — how runs are created, updated and closed, and how SER is emitted. - :doc:`../studio_viewer` — how Semantiva Studio Viewer consumes run and trace artefacts to provide interactive exploration. - :doc:`../development/testing_strategies` — how Semantiva components and pipelines fit into CI and system-level testing. At this point you should be able to sketch: - Where Semantiva sits relative to schedulers, workflow engines, data stores and observability tools. - Which boundaries you need to protect (e.g. what is allowed to mutate context; where identity is assigned and propagated). - How you would onboard new teams into Semantiva across the three personas (pipeline users, component authors, architects). --------------------------------------- Architecture overview for integrators --------------------------------------- This section summarises the Semantiva architecture from an integration point of view. Use it as a mental map while you read the detailed docs. Execution core ~~~~~~~~~~~~~~ - **Pipelines** are declared as graphs (see :doc:`../architecture/pipeline_schema`): - Nodes wrap processors (DataOperation, DataProbe, ContextProcessor). - Edges describe data flow and dependencies. - Pipelines are static artefacts, typically versioned alongside code. - **Execution** operates on a **Payload**: - ``Payload.data`` carries domain data. - ``Payload.context`` carries metadata and state. - Nodes never receive the context object directly in their processors' business logic; context is mediated via observers and contracts. Inspection & trace stack ~~~~~~~~~~~~~~~~~~~~~~~~ - For every node run, Semantiva emits a **Semantic Execution Record (SER)**: - Contains identity (run, pipeline, node). - Records parameter sources, context deltas, data summaries, timing and status. - Stored as JSONL (one SER per line). - SERs are then: - Turned into **trace streams** (for streaming analysis). - Aggregated by the **trace aggregator** into higher-level artefacts (e.g. run graphs, summaries). - Consumed by tools like **Semantiva Studio Viewer** to support exploration, debugging and reporting. Contracts & invariants ~~~~~~~~~~~~~~~~~~~~~~ - Semantiva's **contracts** (SVA rules) express **architectural invariants** such as: - How types are declared on processors. - How context keys are introduced, suppressed or injected. - How processors may (and may not) relate to context. - These rules are enforced by ``semantiva dev lint`` and should be treated as: - A **machine-checkable architecture spec**. - A key input when you define internal guidelines for Semantiva usage. Extension & integration points ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As an architect, you will often look at: - **Component families** (documented in :doc:`../creating_components` and :doc:`../data_operations` / :doc:`../data_probes` / :doc:`../context_processors`) as extension points for domain logic. - **Registry** (:doc:`../architecture/registry`) as the way to manage and expose these components. - **Execution & transports** (:doc:`../execution`) as integration points for: - External schedulers / workflow engines. - Custom storage for SER and run spaces. - Organisation-specific tooling around traces and reports. ---------------------------------------------- Common tasks and where to look in the docs ---------------------------------------------- This section is a quick **router** for common architectural tasks and where to start in the documentation. Evaluate whether Semantiva fits a system design ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **You want to:** decide how Semantiva would integrate into an existing or proposed architecture. - **Look at:** - :doc:`../inspection` and :doc:`../ser` (what execution metadata you get). - :doc:`../execution` (how execution is organised). - :doc:`../run_space` and :doc:`../run_space_lifecycle` (how runs and experiments are structured). Define organisational conventions for pipelines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **You want to:** standardise how teams design and structure pipelines. - **Look at:** - :doc:`../pipeline` and :doc:`../architecture/pipeline_schema` (pipeline structure and graph model). - :doc:`../contracts` (rules you can rely on as global invariants). - :doc:`../glossary` (shared terminology for documentation and reviews). Integrate SER & traces with observability tooling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **You want to:** feed Semantiva traces into existing monitoring/logging systems or custom dashboards. - **Look at:** - :doc:`../ser` and :doc:`../schema_semantic_execution_record_v1` (data model). - :doc:`../trace_stream_v1` and :doc:`../trace_aggregator_v1` (aggregation and streaming). - :doc:`../studio_viewer` (how one consumer visualises SER and trace data). Design extension points and internal libraries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **You want to:** create internal libraries of processors or domain-specific building blocks. - **Look at:** - :doc:`../creating_components` and :doc:`framework_developers` (authoring). - :doc:`../architecture/registry` (component registration & discovery). - :doc:`../contracts` (constraints and guarantees your libraries should uphold). Plan governance, CI and quality gates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **You want to:** define how Semantiva fits into CI/CD and technical governance. - **Look at:** - :doc:`../contracts` and :doc:`../development/testing_strategies` (how to enforce contracts and write effective tests). - :doc:`../cli` (commands to run in CI: lint, inspection, tests). - :doc:`../logger` (logging behaviour, if you integrate with central logging).