ADR 0015: Description Data Normalization and Validation¶

Status¶

Implemented (2026-01-07)

Context¶

Problem¶

PR #2733 introduced _normalize_device_description in rpc_server.py to ensure CHILDREN is always an array when returning data to the CCU via listDevices(). However, this approach addresses the symptom (malformed output) rather than the root cause (malformed input).

The current architecture has multiple entry points for device and paramset descriptions:

DeviceDescription entry points:

XML-RPC callbacks (newDevices in RPCFunctions)
Backend queries (list_devices, get_device_description)
Cache persistence (load/save in DeviceDescriptionRegistry)

ParameterData/ParamsetDescription entry points: 4. Backend queries (get_paramset_description) 5. Cache persistence (load/save in ParamsetDescriptionRegistry)

Normalizing data at each exit point is error-prone and violates the DRY principle. Instead, data should be normalized at ingestion time, following the "Parse, don't validate" pattern.

Root Cause¶

Different backends (CCU, Homegear, JSON-RPC) return inconsistent data:

CHILDREN may be None, "" (empty string), or ["addr"] (array)
OPERATIONS may be String "3" instead of Integer 3
FLAGS may be String instead of Integer
PARAMSETS may be missing (should default to ["MASTER", "VALUES"])

Decision¶

Implement a comprehensive normalization and validation layer using voluptuous that:

Normalizes data at all ingestion points (not at output)
Uses schema versioning for cache migration
Provides consistent TypedDict structures throughout the codebase

Key Principle: "Parse, Don't Validate"¶

Transform malformed input into well-formed data structures once at ingestion, ensuring the rest of the codebase always works with correct data.

Architecture¶

Normalization Flow¶

┌──────────────────────┐
│ Backend / Cache      │
│ (may have bad data)  │
└──────────┬───────────┘
           ↓
┌──────────────────────────────────┐
│ Normalization (schemas.py)       │
│ - Type coercion                  │
│ - Default values                 │
│ - Field validation               │
└──────────┬───────────────────────┘
           ↓
┌──────────────────────────────────┐
│ Cache Storage                    │
│ (stores normalized data)         │
└──────────┬───────────────────────┘
           ↓
┌──────────────────────────────────┐
│ Model Layer                      │
│ (always receives valid data)     │
└──────────────────────────────────┘

Ingestion Points¶

Entry Point	Location	Normalization Applied
Backend: list_devices()	DeviceHandler	normalize_device_description
Backend: get_device_description()	DeviceHandler	normalize_device_description
Callback: newDevices()	RPCFunctions	normalize_device_description
Cache load: DeviceDescriptionRegistry	DeviceDescriptionRegistry	normalize_device_description
Backend: get_paramset_description()	DeviceHandler	normalize_paramset_description
Cache load: ParamsetDescriptionRegistry	ParamsetDescriptionRegistry	normalize_paramset_description

Normalization Strategy¶

DeviceDescription Normalization¶

Key Normalizations:

| Field | Input Types | Normalization Rule | Output Type | | ----------- | ---------------- | ----------------------------- | ----------- | ----- | | CHILDREN | None, "", String | → [] or [string] | list[str] | | PARAMSETS | None, missing | → ["MASTER", "VALUES"] | list[str] | | RX_MODE | String, None | → Coerce to int, default None | int | None | | FLAGS | String | → Coerce to int | int | | TYPE | Any | → String (required) | str | | ADDRESS | Any | → String (required) | str |

Special Handling:

extra=vol.ALLOW_EXTRA - Backend-specific fields preserved
Fallback on validation failure - Minimal fix applied (e.g., ensure CHILDREN is array)

ParameterData Normalization¶

Key Normalizations:

| Field | Input Types | Normalization Rule | Output Type | | ------------ | ------------ | -------------------------------------------- | ----------- | ------ | | TYPE | String | → Validate against known types, uppercase | str | | OPERATIONS | String, None | → Integer bitmask (1=Read, 2=Write, 4=Event) | int | | FLAGS | String, None | → Integer bitmask (0x01=Visible, etc.) | int | | MAX | String | → Coerce to appropriate numeric type | int | float | | MIN | String | → Coerce to appropriate numeric type | int | float |

Valid Parameter Types: FLOAT, INTEGER, BOOL, ENUM, STRING, ACTION, DUMMY

Schema Versioning¶

Cache Migration Strategy¶

To handle schema changes without breaking existing caches:

class BasePersistentCache:
    SCHEMA_VERSION: int = 2  # Bump when normalization changes

    async def load(self) -> DataOperationResult:
        data = await self._storage.load()
        loaded_version = data.get("_schema_version", 1)

        if loaded_version < self.SCHEMA_VERSION:
            data = self._migrate_schema(data, from_version=loaded_version)
        # ...

Current Schema Versions:

DeviceDescriptionRegistry: Version 2 (CHILDREN normalization)
ParamsetDescriptionRegistry: Version 2 (OPERATIONS/FLAGS integer coercion)

Migration Example¶

# V1 → V2 migration for DeviceDescriptionRegistry
def _migrate_schema(self, data, from_version):
    if from_version < 2:
        # Normalize all CHILDREN fields
        for interface_id, descriptions in data.items():
            for desc in descriptions:
                if desc.get("CHILDREN") is None or isinstance(desc.get("CHILDREN"), str):
                    desc["CHILDREN"] = []
    return data

Consequences¶

Positive¶

✅ Single Source of Truth: Data is correct from the moment it enters the system ✅ Defense in Depth: Multiple validation points ensure data integrity ✅ Cache Efficiency: Schema versioning allows one-time migration ✅ Reduced Complexity: No need for exit-point normalization ✅ Type Safety: Consistent TypedDict structures throughout codebase ✅ Extensibility: Easy to add new normalizations/validations

Negative¶

⚠️ Slight Overhead: Validation at each ingestion point adds minimal CPU cost ⚠️ Cache Invalidation: Schema version bump requires cache reload (one-time) ⚠️ Complexity: Additional module and migration logic

Risks and Mitigations¶

Risk	Mitigation
Overly strict validation	Use `ALLOW_EXTRA`, fallback to raw on failure
Performance impact	Simple dict operations, minimal overhead
Breaking existing functionality	Fallback logic preserves minimal functionality

Alternatives Considered¶

Alternative 1: Output Normalization Only¶

Normalize data at each output point (listDevices, etc.).

Rejected: Error-prone, violates DRY, multiple points of failure.

Alternative 2: Inline Normalization in Caches¶

Normalize within each cache's add/load methods without centralized schemas.

Rejected: Duplicated logic, harder to maintain, inconsistent rules.

Alternative 3: No Normalization (Trust Backend)¶

Assume backend always returns correct data.

Rejected: Real-world data shows multiple backends return inconsistent formats.

Implementation¶

Status: ✅ Implemented in version 2026.1.7

Core Module¶

aiohomematic/schemas.py - Validation and normalization schemas

Public API:

def normalize_device_description(device_description: dict[str, Any]) -> dict[str, Any]:
    """Normalize a device description dict."""

def normalize_parameter_data(parameter_data: dict[str, Any]) -> dict[str, Any]:
    """Normalize a parameter data dict (ParameterDescription)."""

def normalize_paramset_description(paramset: dict[str, Any] | None) -> dict[str, dict[str, Any]]:
    """Normalize a paramset description dict."""

Schema Definitions:

DEVICE_DESCRIPTION_SCHEMA - voluptuous schema for DeviceDescription
PARAMETER_DATA_SCHEMA - voluptuous schema for ParameterData
Custom normalizers: _normalize_children(), _normalize_paramsets(), _normalize_operations(), etc.

Updated Caches¶

aiohomematic/store/persistent/device.py

DeviceDescriptionRegistry.SCHEMA_VERSION = 2
Normalization in add_device() and _process_loaded_content()
Migration logic in _migrate_schema()

aiohomematic/store/persistent/paramset.py

ParamsetDescriptionRegistry.SCHEMA_VERSION = 2
Normalization in add() and _process_loaded_content()
Migration logic in _migrate_schema()

Integration¶

Backend queries (aiohomematic/client/handlers/device_ops.py):

async def list_devices(self) -> tuple[DeviceDescription, ...] | None:
    raw_descriptions = await self._proxy_read.listDevices()
    return tuple(normalize_device_description(desc) for desc in raw_descriptions)

XML-RPC callbacks (aiohomematic/central/rpc_server.py):

def newDevices(self, interface_id: str, device_descriptions: list[dict[str, Any]], /) -> None:
    normalized = tuple(normalize_device_description(desc) for desc in device_descriptions)
    # ...

Removed:

_normalize_device_description() from rpc_server.py (no longer needed)

API Specification References¶

Normalization follows official HomeMatic API specifications:

HM_XmlRpc_API.pdf V2.16: Primary HomeMatic XML-RPC specification
HMIP_XmlRpc_API_Addendum.pdf V2.10: HomeMatic IP extensions

Key API Requirements:

CHILDREN must be Array<String> (never None or empty string)
PARAMSETS must be Array<String> (defaults to ["MASTER", "VALUES"])
OPERATIONS is Integer bitmask: 1=Read, 2=Write, 4=Event
FLAGS is Integer bitmask: 0x01=Visible, 0x02=Internal, 0x04=Transform, etc.
RX_MODE is Integer bitmask (CCU2/CCU3)

References¶

PR #2733: _normalize_device_description - Original motivation
ADR 0011: Storage Abstraction - Cache architecture
voluptuous Documentation - Schema validation library
HM_XmlRpc_API.pdf V2.16 - HomeMatic XML-RPC API Specification (available from eQ-3)
HMIP_XmlRpc_API_Addendum.pdf V2.10 - HomeMatic IP Extensions (available from eQ-3)

Created: 2026-01-07 Author: Architecture Review