Registry System¶
Semantiva’s registry system manages component discovery, registration, and resolution across the framework. This system consists of two main parts: a dual-registry architecture for component separation and a bootstrap profile system for reproducible component loading.
Architecture Overview¶
Semantiva uses a dual-registry system to manage different types of components while avoiding circular import dependencies. This architecture separates concerns between general component resolution and execution-specific component management.
Registry Components¶
ProcessorRegistry
Primary registry for data processors, context processors, workflow components, data collections, and fitting models. Handles dynamic class discovery from modules imported at runtime.
NameResolverRegistry
Stores prefix-based resolvers (
rename:
,delete:
,stringbuild:
,slicer:
) that expand declarative YAML strings into processor classes.ParameterResolverRegistry
Maintains resolvers that transform configuration values recursively before processor instantiation. Resolvers are applied to dict values, list/tuple items, and nested structures. For example, the
model:
specification used by fitting pipelines.ExecutionComponentRegistry
Specialized registry for execution layer components: orchestrators, executors, and transports. Designed to avoid circular import dependencies with graph building and general class resolution.
Dependency Separation¶
The dual-registry architecture solves a fundamental circular import problem:
Graph builder needs
ProcessorRegistry
to resolve processor classes from YAMLOrchestrators need graph builder functions for canonical specs and pipeline IDs
ProcessorRegistry previously needed to import orchestrators for default registration
By introducing ExecutionComponentRegistry
, we break this cycle:
Before (Circular):
ProcessorRegistry → LocalSemantivaOrchestrator → graph_builder → ProcessorRegistry
After (Clean):
ProcessorRegistry → graph_builder
ExecutionComponentRegistry → LocalSemantivaOrchestrator
orchestrator/factory → ExecutionComponentRegistry
Parameter Resolver System¶
Parameter resolvers provide a mechanism to transform configuration values before processor instantiation. This system enables dynamic configuration resolution, environment variable substitution, and complex parameter transformations.
Resolution Behavior¶
Parameters are recursively transformed before processor instantiation. Resolution applies to:
Dictionary values
List and tuple items
Nested structures (dictionaries within lists, lists within dictionaries, etc.)
Resolvers are run in registration order and should be pure and idempotent.
Adding Custom Parameter Resolvers¶
To add a custom parameter resolver:
from semantiva.registry.parameter_resolver_registry import ParameterResolverRegistry
def my_param_resolver(value):
# Return (resolved_value, handled: bool)
if isinstance(value, str) and value.startswith("myenv:"):
env_var = value.split(":",1)[1]
return os.environ.get(env_var, ""), True
return value, False
ParameterResolverRegistry.register_resolver("myenv", my_param_resolver)
Built-in Resolvers¶
The framework provides several built-in parameter resolvers:
model:
Resolves model specifications into
ModelDescriptor
objects for fitting workflows.Example:
model:LinearRegression:param1=value1,param2=value2
Resolver Function Interface¶
Parameter resolver functions must follow this interface:
def parameter_resolver(value: Any) -> tuple[Any, bool]:
"""Transform a parameter value.
Args:
value: The input parameter value to potentially transform
Returns:
tuple: (resolved_value, was_handled)
- resolved_value: The transformed value (or original if unchanged)
- was_handled: True if this resolver processed the value, False otherwise
"""
If was_handled
is True
, the resolved value is used. If False
, the
original value is passed to the next resolver in the chain.
Recursive Resolution Example¶
# Input parameters with nested structure
payload = {
"database_url": "myenv:DATABASE_URL",
"processing_config": {
"batch_size": 100,
"model_spec": "model:LinearRegression:learning_rate=0.01"
},
"file_paths": ["myenv:INPUT_DIR/file1.txt", "myenv:INPUT_DIR/file2.txt"]
}
# After recursive parameter resolution
resolved = {
"database_url": "postgresql://localhost:5432/mydb",
"processing_config": {
"batch_size": 100,
"model_spec": ModelDescriptor("sklearn.LinearRegression", {"learning_rate": 0.01})
},
"file_paths": ["/data/input/file1.txt", "/data/input/file2.txt"]
}
Bootstrap Profiles¶
The Registry v1 design introduces RegistryProfile
to make registry state
explicit, portable, and reproducible. This system tracks modules and extension
entry points that declare Semantiva components.
Key Concepts¶
RegistryProfile
Frozen dataclass capturing four attributes:
load_defaults
Whether to ensure the core Semantiva modules and built-in resolvers are loaded. Defaults to
True
.modules
Python modules to import. Importing runs the Semantiva metaclass hooks, registering every component exposed by those modules.
extensions
Entry-point or module specifications that should be loaded via
semantiva.registry.plugin_registry.load_extensions
.
apply_profile(profile)
Applies
load_defaults
(idempotent) and then registersmodules
andextensions
in that order.current_profile()
Captures the current process registry and returns a
RegistryProfile
instance. The snapshot always enablesload_defaults
and returns the module history that has been applied.fingerprint()
Produces a SHA-256 hash of a normalised representation of the profile. The fingerprint is pinned into every SER under
why_ok.env.registry.fingerprint
.
Initialization Flow¶
Component registration follows a carefully orchestrated initialization sequence:
ProcessorRegistry.register_modules(DEFAULT_MODULES)
Registers core data processors, context processors, and fitting models
Ensures built-in resolvers (rename, delete, stringbuild, slicer, model) are available
Calls
ExecutionComponentRegistry.initialize_defaults()
ExecutionComponentRegistry.initialize_defaults()
Imports execution components using lazy imports (no circular dependencies)
Registers default orchestrators, executors, and transports
Safe to call multiple times (idempotent)
Component Resolution¶
Different component types use their respective registries:
Data Processors (via resolve_symbol):
from semantiva.registry import ProcessorRegistry, resolve_symbol
# Ensure modules are registered (idempotent)
ProcessorRegistry.register_modules(["semantiva.examples.test_utils"])
processor_cls = resolve_symbol("FloatValueDataSource")
Execution Components (via ExecutionComponentRegistry):
from semantiva.execution.component_registry import ExecutionComponentRegistry
# Resolves orchestrators for factory
orch_cls = ExecutionComponentRegistry.get_orchestrator("LocalSemantivaOrchestrator")
Factory Integration¶
The build_orchestrator()
function
uses ExecutionComponentRegistry
for component resolution:
from semantiva.execution.orchestrator.factory import build_orchestrator
from semantiva.configurations.schema import ExecutionConfig
config = ExecutionConfig(
orchestrator="LocalSemantivaOrchestrator",
executor="SequentialSemantivaExecutor",
transport="InMemorySemantivaTransport"
)
orchestrator = build_orchestrator(config)
Distributed Execution¶
QueueSemantivaOrchestrator.enqueue
now accepts an optional
registry_profile
parameter. If omitted, the orchestrator captures the
process state via current_profile()
. The profile is attached to job metadata
so that workers can replay the same registry configuration before constructing
pipelines. YAML pipelines keep their extensions:
support—apply_profile
is executed before YAML parsing, and load_pipeline_from_yaml
still loads any
inline extensions declared in the file.
Programmatic Usage¶
Registry Profiles¶
from semantiva.registry.bootstrap import RegistryProfile, apply_profile, current_profile
# Capture the current process state (defaults, modules, and paths)
profile = current_profile()
# Launch a distributed job with an explicit profile
orchestrator.enqueue(pipeline_nodes, registry_profile=profile)
# Rehydrate a profile in a worker or a separate process
apply_profile(profile)
Component Registration¶
Custom Data Processors:
# Register custom processors via ProcessorRegistry
from semantiva.registry import ProcessorRegistry
ProcessorRegistry.register_modules(["my_extension.processors"])
Custom Execution Components:
# Register custom orchestrators
ExecutionComponentRegistry.register_orchestrator(
"CustomOrchestrator", MyCustomOrchestrator
)
Best Practices¶
Registry Selection: Use
resolve_symbol
/ProcessorRegistry
for data/context processors andExecutionComponentRegistry
for execution components.Initialization Order: Use
apply_profile
orProcessorRegistry.register_modules
to ensure required modules are loaded before constructing pipelines.Lazy Imports: When adding new execution components, use lazy imports in
initialize_defaults()
to avoid circular dependencies.Testing: Both registries provide
clear()
methods for test isolation.Profile Management: Use
current_profile()
to capture reproducible registry states for distributed execution.
Idempotent Defaults¶
register_builtin_resolvers()
installs built-in name and parameter resolvers
exactly once. Re-invoking it is safe and preserves any user-provided
resolvers registered with NameResolverRegistry
or
ParameterResolverRegistry
.
Migration Notes¶
The dual-registry architecture was introduced to resolve circular import issues
while maintaining backward compatibility. Existing code using the new
ProcessorRegistry
and resolve_symbol
APIs continues to work unchanged.
Only the internal orchestrator factory implementation was modified to use the
execution registry explicitly.
The separation provides a foundation for future scalability, allowing independent evolution of data processing and execution layer components without coupling concerns.