Registry System

Semantiva’s registry system manages component discovery, registration, and resolution across the framework. This system consists of two main parts: a dual-registry architecture for component separation and a bootstrap profile system for reproducible component loading.

Architecture Overview

Semantiva uses a dual-registry system to manage different types of components while avoiding circular import dependencies. This architecture separates concerns between general component resolution and execution-specific component management.

Registry Components

ProcessorRegistry

Primary registry for data processors, context processors, workflow components, data collections, and fitting models. Handles dynamic class discovery from modules imported at runtime.

NameResolverRegistry

Stores prefix-based resolvers (rename:, delete:, stringbuild:, slicer:) that expand declarative YAML strings into processor classes.

ParameterResolverRegistry

Maintains resolvers that transform configuration values recursively before processor instantiation. Resolvers are applied to dict values, list/tuple items, and nested structures. For example, the model: specification used by fitting pipelines.

ExecutionComponentRegistry

Specialized registry for execution layer components: orchestrators, executors, and transports. Designed to avoid circular import dependencies with graph building and general class resolution.

Dependency Separation

The dual-registry architecture solves a fundamental circular import problem:

  • Graph builder needs ProcessorRegistry to resolve processor classes from YAML

  • Orchestrators need graph builder functions for canonical specs and pipeline IDs

  • ProcessorRegistry previously needed to import orchestrators for default registration

By introducing ExecutionComponentRegistry, we break this cycle:

Before (Circular):
ProcessorRegistry → LocalSemantivaOrchestrator → graph_builder → ProcessorRegistry

After (Clean):
ProcessorRegistry → graph_builder
ExecutionComponentRegistry → LocalSemantivaOrchestrator
orchestrator/factory → ExecutionComponentRegistry

Parameter Resolver System

Parameter resolvers provide a mechanism to transform configuration values before processor instantiation. This system enables dynamic configuration resolution, environment variable substitution, and complex parameter transformations.

Resolution Behavior

Parameters are recursively transformed before processor instantiation. Resolution applies to:

  • Dictionary values

  • List and tuple items

  • Nested structures (dictionaries within lists, lists within dictionaries, etc.)

Resolvers are run in registration order and should be pure and idempotent.

Adding Custom Parameter Resolvers

To add a custom parameter resolver:

from semantiva.registry.parameter_resolver_registry import ParameterResolverRegistry

def my_param_resolver(value):
    # Return (resolved_value, handled: bool)
    if isinstance(value, str) and value.startswith("myenv:"):
        env_var = value.split(":",1)[1]
        return os.environ.get(env_var, ""), True
    return value, False

ParameterResolverRegistry.register_resolver("myenv", my_param_resolver)

Built-in Resolvers

The framework provides several built-in parameter resolvers:

model:

Resolves model specifications into ModelDescriptor objects for fitting workflows.

Example: model:LinearRegression:param1=value1,param2=value2

Resolver Function Interface

Parameter resolver functions must follow this interface:

def parameter_resolver(value: Any) -> tuple[Any, bool]:
    """Transform a parameter value.

    Args:
        value: The input parameter value to potentially transform

    Returns:
        tuple: (resolved_value, was_handled)
            - resolved_value: The transformed value (or original if unchanged)
            - was_handled: True if this resolver processed the value, False otherwise
    """

If was_handled is True, the resolved value is used. If False, the original value is passed to the next resolver in the chain.

Recursive Resolution Example

# Input parameters with nested structure
payload = {
    "database_url": "myenv:DATABASE_URL",
    "processing_config": {
        "batch_size": 100,
        "model_spec": "model:LinearRegression:learning_rate=0.01"
    },
    "file_paths": ["myenv:INPUT_DIR/file1.txt", "myenv:INPUT_DIR/file2.txt"]
}

# After recursive parameter resolution
resolved = {
    "database_url": "postgresql://localhost:5432/mydb",
    "processing_config": {
        "batch_size": 100,
        "model_spec": ModelDescriptor("sklearn.LinearRegression", {"learning_rate": 0.01})
    },
    "file_paths": ["/data/input/file1.txt", "/data/input/file2.txt"]
}

Bootstrap Profiles

The Registry v1 design introduces RegistryProfile to make registry state explicit, portable, and reproducible. This system tracks modules and extension entry points that declare Semantiva components.

Key Concepts

RegistryProfile

Frozen dataclass capturing four attributes:

load_defaults

Whether to ensure the core Semantiva modules and built-in resolvers are loaded. Defaults to True.

modules

Python modules to import. Importing runs the Semantiva metaclass hooks, registering every component exposed by those modules.

extensions

Entry-point or module specifications that should be loaded via semantiva.registry.plugin_registry.load_extensions.

apply_profile(profile)

Applies load_defaults (idempotent) and then registers modules and extensions in that order.

current_profile()

Captures the current process registry and returns a RegistryProfile instance. The snapshot always enables load_defaults and returns the module history that has been applied.

fingerprint()

Produces a SHA-256 hash of a normalised representation of the profile. The fingerprint is pinned into every SER under why_ok.env.registry.fingerprint.

Initialization Flow

Component registration follows a carefully orchestrated initialization sequence:

  1. ProcessorRegistry.register_modules(DEFAULT_MODULES)

    • Registers core data processors, context processors, and fitting models

    • Ensures built-in resolvers (rename, delete, stringbuild, slicer, model) are available

    • Calls ExecutionComponentRegistry.initialize_defaults()

  2. ExecutionComponentRegistry.initialize_defaults()

    • Imports execution components using lazy imports (no circular dependencies)

    • Registers default orchestrators, executors, and transports

    • Safe to call multiple times (idempotent)

Component Resolution

Different component types use their respective registries:

Data Processors (via resolve_symbol):

from semantiva.registry import ProcessorRegistry, resolve_symbol

# Ensure modules are registered (idempotent)
ProcessorRegistry.register_modules(["semantiva.examples.test_utils"])
processor_cls = resolve_symbol("FloatValueDataSource")

Execution Components (via ExecutionComponentRegistry):

from semantiva.execution.component_registry import ExecutionComponentRegistry

# Resolves orchestrators for factory
orch_cls = ExecutionComponentRegistry.get_orchestrator("LocalSemantivaOrchestrator")

Factory Integration

The build_orchestrator() function uses ExecutionComponentRegistry for component resolution:

from semantiva.execution.orchestrator.factory import build_orchestrator
from semantiva.configurations.schema import ExecutionConfig

config = ExecutionConfig(
    orchestrator="LocalSemantivaOrchestrator",
    executor="SequentialSemantivaExecutor",
    transport="InMemorySemantivaTransport"
)

orchestrator = build_orchestrator(config)

Distributed Execution

QueueSemantivaOrchestrator.enqueue now accepts an optional registry_profile parameter. If omitted, the orchestrator captures the process state via current_profile(). The profile is attached to job metadata so that workers can replay the same registry configuration before constructing pipelines. YAML pipelines keep their extensions: support—apply_profile is executed before YAML parsing, and load_pipeline_from_yaml still loads any inline extensions declared in the file.

Programmatic Usage

Registry Profiles

from semantiva.registry.bootstrap import RegistryProfile, apply_profile, current_profile

# Capture the current process state (defaults, modules, and paths)
profile = current_profile()

# Launch a distributed job with an explicit profile
orchestrator.enqueue(pipeline_nodes, registry_profile=profile)

# Rehydrate a profile in a worker or a separate process
apply_profile(profile)

Component Registration

Custom Data Processors:

# Register custom processors via ProcessorRegistry
from semantiva.registry import ProcessorRegistry

ProcessorRegistry.register_modules(["my_extension.processors"])

Custom Execution Components:

# Register custom orchestrators
ExecutionComponentRegistry.register_orchestrator(
    "CustomOrchestrator", MyCustomOrchestrator
)

Best Practices

  1. Registry Selection: Use resolve_symbol/ProcessorRegistry for data/context processors and ExecutionComponentRegistry for execution components.

  2. Initialization Order: Use apply_profile or ProcessorRegistry.register_modules to ensure required modules are loaded before constructing pipelines.

  3. Lazy Imports: When adding new execution components, use lazy imports in initialize_defaults() to avoid circular dependencies.

  4. Testing: Both registries provide clear() methods for test isolation.

  5. Profile Management: Use current_profile() to capture reproducible registry states for distributed execution.

Idempotent Defaults

register_builtin_resolvers() installs built-in name and parameter resolvers exactly once. Re-invoking it is safe and preserves any user-provided resolvers registered with NameResolverRegistry or ParameterResolverRegistry.

Migration Notes

The dual-registry architecture was introduced to resolve circular import issues while maintaining backward compatibility. Existing code using the new ProcessorRegistry and resolve_symbol APIs continues to work unchanged. Only the internal orchestrator factory implementation was modified to use the execution registry explicitly.

The separation provides a foundation for future scalability, allowing independent evolution of data processing and execution layer components without coupling concerns.