Data Types¶
Data types are the vocabulary of your pipelines. They wrap raw Python objects (numbers, arrays, records, images…) into small, explicit classes that encode shape, units and semantics.
When every processor declares which data types it expects and produces, whole pipelines become easier to reason about and easier to validate.
What is a data type?¶
A data type is a subclass of BaseDataType. It is a thin wrapper around a
Python value with a well-defined meaning.
Key properties:
It encapsulates a single piece of data (
.data).It documents what the value *means* (units, bounds, interpretation).
It can enforce invariants through a
validatehook.
Conceptually:
from semantiva.data_types import BaseDataType
class FloatDataType(BaseDataType[float]):
"""Simple float wrapper used in examples."""
def validate(self, data: float) -> bool:
if not isinstance(data, float):
raise TypeError("Data must be a float")
return True
value = FloatDataType(1.0)
print("value:", value.data)
print("repr:", value)
value: 1.0
repr: FloatDataType(1.0)
In this example:
FloatDataTypeis the semantic carrier: “this is a float used inside a Semantiva pipeline”.The underlying value is available via
.data.The
validatemethod is the hook for enforcing additional invariants.
BaseDataType API¶
All data types inherit from BaseDataType. The
core API is:
__init__(self, data, logger=None)- constructs the type, callsvalidateon the value and then stores it internally.dataproperty - gets or sets the underlying value.validate(self, data) -> bool- hook for subclasses to enforce invariants.__str__/__repr__- by default displayClassName(<data-repr>).
The typical pattern is:
Subclass
BaseDataType[T]with a concreteT(for examplefloat,str, an array type, a record type).Override
validateto check invariants for that type.Avoid overriding
__init__; letBaseDataTypeown construction so that introspection, metadata and SVA rules remain consistent.
Example: positive float¶
Sometimes you want to express a stronger invariant than “any float”. For example, “this value must be strictly positive”. The correct place to encode this is in the data type, not as ad-hoc checks scattered across processors.
Here is a minimal PositiveFloat implementation that enforces
positivity via validate:
from semantiva.data_types import BaseDataType
class PositiveFloat(BaseDataType[float]):
"""Strictly positive scalar float."""
def validate(self, data: float) -> bool:
# BaseDataType.__init__ will call this before storing ``data``.
if data <= 0.0:
raise ValueError(f"{data} is not positive")
return True
ok = PositiveFloat(1.5)
print("ok:", ok)
# This will raise a ValueError at construction time:
try:
bad = PositiveFloat(0.0)
except ValueError as exc:
print("error:", exc)
ok: PositiveFloat(1.5)
error: 0.0 is not positive
Notes:
The invariant “strictly positive” is attached to the type, not to a specific processor or function.
Any processor that declares
input_data_type() -> PositiveFloatis saying “I expect a strictly positive scalar float”, which is much clearer than accepting a plainfloatand relying only on docstrings.Because we implemented the check in
validateand did not override__init__, we keep the construction behaviour ofBaseDataTypeintact.
Using data types in processors¶
Data types become most useful when processors and pipelines use them consistently.
A typical data operation will:
Declare its
input_data_typeandoutput_data_type.Accept instances of those types in its
processmethod.Return a new instance of the output type.
For example, consider a simple addition operation that works on floats:
from semantiva.data_types import BaseDataType
from semantiva.data_processors.data_processors import DataOperation
class FloatDataType(BaseDataType[float]):
"""Simple float wrapper (as above)."""
def validate(self, data: float) -> bool:
if not isinstance(data, float):
raise TypeError("Data must be a float")
return True
class FloatAddOperation(DataOperation):
"""Add a constant to FloatDataType data."""
def _process_logic(self, data, addend: float):
return FloatDataType(data.data + addend)
@classmethod
def input_data_type(cls):
return FloatDataType
@classmethod
def output_data_type(cls):
return FloatDataType
# Example usage
value = FloatDataType(1.0)
op = FloatAddOperation()
result = op.process(value, addend=2.0)
print("input:", value)
print("result:", result)
input: FloatDataType(1.0)
result: FloatDataType(3.0)
This combination of types and operations is what gives Semantiva pipelines their semantic clarity:
BaseDataTypedefines how to create and validate values.Concrete types like
FloatDataTypeandPositiveFloatexpress domain-specific invariants.Data operations declare which types they consume and produce, making pipelines easier to inspect, validate and evolve.
Next steps¶
See Data Operations for more on data operations.
See Data Probes for read-only probes that derive metrics from data.