ADR 0021: Blind Command Processing Lock and Target Preservation¶

Status¶

Accepted (2026-02-04)

Context¶

Problem¶

Homematic blind/cover devices exhibit a firmware bug: sending new positioning commands while the device is physically in motion causes undefined behavior. Commands may be silently ignored, the blind may stop at an incorrect position, or tilt adjustments may fail to apply.

This creates a critical race condition in Home Assistant, where users commonly issue separate level and tilt commands in rapid succession or change tilt while the blind is still moving to a target level.

Scenario¶

T=0s:   User sends set_position(level=50%)
        Device starts moving (takes ~5 seconds)

T=2s:   User sends set_tilt(tilt=30°) while blind is still moving
        Without protection: Command may be ignored or corrupt the level movement

Expected Behavior¶

The blind must reach both targets:

Level: 50% (from the first command)
Tilt: 30% (from the second command)

Movement should not produce incorrect final positions, even when commands overlap.

Device Variants¶

The problem applies specifically to blind actors (devices with both level and tilt control). Simple covers (level only), window drives, and garage doors are not affected because they have only a single axis of movement.

Class	Level	Tilt	Lock Required
`CustomDpCover`	Yes	No	No
`CustomDpWindowDrive`	Yes	No	No
`CustomDpBlind`	Yes	Yes	Yes
`CustomDpIpBlind`	Yes	Yes	Yes (inherited)
`CustomDpGarage`	Yes	No	No

Decision¶

Implement a per-instance asyncio.Lock (_command_processing_lock) in CustomDpBlind that serializes command processing and preserves pending targets when new commands arrive during movement. The mechanism uses a stop-then-resend strategy to work around the device firmware bug.

Core Mechanism¶

The _set_level() method in CustomDpBlind implements a three-phase protocol:

Phase 1 -- Lock Acquisition

Acquire the per-instance lock with a 5-second timeout (_COMMAND_LOCK_TIMEOUT). If the timeout expires, proceed without the lock and log a warning. This prevents deadlocks while accepting a small risk of race conditions in edge cases.

Phase 2 -- Target Resolution

For each axis (level and tilt), determine the target value using a three-tier fallback:

Explicit value: If the caller provides a value, use it
Pending target: If a previous command is still unconfirmed (_target_level / _target_tilt_level), reuse that target and mark currently_moving = True
Current position: If the device is at standstill, use the confirmed position (_group_level / _group_tilt_level)

Phase 3 -- Stop and Resend

If currently_moving is detected, stop the device first via _stop(), then send a combined command with both the preserved level and the new tilt (or vice versa). This works around the firmware bug by ensuring the device is stationary before receiving new coordinates.

_set_level(level=None, tilt_level=0.3)
    |
    ├─ level=None → check _target_level
    │   └─ _target_level=0.5 (pending!) → _level=0.5, currently_moving=True
    │
    ├─ tilt_level=0.3 (explicit) → _tilt_level=0.3
    │
    ├─ currently_moving=True → _stop()
    │
    └─ _send_level(level=0.5, tilt_level=0.3)  ← combined command

Target Detection¶

The _target_level and _target_tilt_level properties detect pending (unconfirmed) commands using a two-tier approach:

Optimistic value (preferred): Check if the data point has an optimistic value set (from the optimistic updates system in ADR 0020)
CommandTracker fallback: Query unconfirmed_last_value_send from the command tracker when optimistic updates are disabled

A target is cleared when the CCU confirms the value via an event, at which point the optimistic value is resolved and the command tracker entry expires.

Command Transmission¶

Homematic blind devices support two transmission modes, selected automatically:

Combined parameter (LEVEL_COMBINED): A single RPC call encoding both level and tilt as a hex-encoded combined value (e.g., 0xa2,0x26). Used by BidCos-RF devices that expose a COMBINED_PARAMETER data point.
Separate parameters: Two sequential RPC calls for LEVEL_SLATS (tilt) followed by LEVEL. Used when no combined parameter is available.

The combined parameter path bypasses the collector to ensure atomic delivery.

Lock Scope¶

The lock protects three operations:

Operation	Method	Lock Held
Set level/tilt	`_set_level()`	Yes
Stop	`stop()`	Yes
Internal stop	`_stop()`	Caller must hold lock

All public entry points (set_position, open, close, open_tilt, close_tilt, stop) route through either _set_level() or stop(), both of which acquire the lock.

Use Cases¶

Use Case 1: Tilt Change During Level Movement¶

T=0.0s:  set_position(position=50)
         → Lock acquired, _send_level(level=0.5, tilt=current)
         → _target_level = 0.5 (unconfirmed)
         → Device starts moving
         → Lock released

T=2.0s:  set_position(tilt_position=30)  [device still moving]
         → Lock acquired
         → level=None → _target_level=0.5 exists → reuse, currently_moving=True
         → tilt_level=0.3 (explicit)
         → _stop() called (firmware bug workaround)
         → _send_level(level=0.5, tilt_level=0.3) ← both targets
         → Lock released

T=7.0s:  Device reaches level=50%, tilt=30%
         → CCU confirms via events

Result: Both targets reached correctly.

Use Case 2: Parallel Level and Tilt Calls¶

T=0.0s:  asyncio.gather(
             set_position(position=81),
             set_position(tilt_position=19),
         )

         → Call 1 acquires lock first
         → _send_level(level=0.81, tilt=current)
         → _target_level = 0.81 (unconfirmed)
         → Lock released

         → Call 2 acquires lock
         → level=None → _target_level=0.81 → reuse, currently_moving=True
         → tilt_level=0.19 (explicit)
         → _stop()
         → _send_level(level=0.81, tilt_level=0.19) ← combined
         → Lock released

Result: Combined command sent with both targets. Tested with 10 iterations to detect race conditions.

Use Case 3: Lock Timeout¶

T=0.0s:  set_position(position=50)
         → Lock acquired, network delay causes slow RPC

T=0.1s:  set_position(tilt_position=30)
         → Waiting for lock...

T=5.1s:  Timeout after 5s
         → Warning logged
         → Proceeds WITHOUT lock
         → Commands may race, but CCU-side queuing mitigates

Result: Degraded but functional. The CCU queues commands server-side, limiting the impact.

Implementation¶

File: aiohomematic/model/custom/cover.py

Key Components:

Component	Location	Purpose
`_command_processing_lock`	`CustomDpCover.__slots__` (line 106)	Per-instance asyncio.Lock
`_COMMAND_LOCK_TIMEOUT`	Module constant (line 31)	5.0 second timeout
`_set_level()`	`CustomDpBlind` (line 468)	Lock-protected target resolution and send
`_stop()`	`CustomDpBlind` (line 520)	Internal stop, must be called with lock held
`stop()`	`CustomDpBlind` (line 412)	Public stop with lock acquisition
`_target_level`	`CustomDpBlind` (line 292)	Pending level detection (optimistic + tracker)
`_target_tilt_level`	`CustomDpBlind` (line 311)	Pending tilt detection (optimistic + tracker)
`_send_level()`	`CustomDpBlind` (line 449)	Combined or separate parameter transmission
`_group_level`	`CustomDpCover` (line 122)	Confirmed level fallback
`_group_tilt_level`	`CustomDpBlind` (line 281)	Confirmed tilt fallback

Inheritance:

CustomDataPoint
  └─ CustomDpCover          (level only, lock slot declared, no lock logic)
      ├─ CustomDpWindowDrive (level only, no lock logic)
      └─ CustomDpBlind       (level + tilt, lock initialized and used)
          └─ CustomDpIpBlind (inherits lock, adds COMBINED_PARAMETER and L=N,L2=N format)

The lock slot is declared in CustomDpCover but only initialized and used in CustomDpBlind._post_init(). This allows the blind subclass to own the lock lifecycle while keeping the slot in the common base class.

Test Coverage:

Test	File	What It Verifies
`test_ceblind_separate_level_and_tilt_change`	`test_model_cover.py:445`	Parallel set_position calls, 10 iterations

Consequences¶

Positive¶

Blind devices reliably reach both level and tilt targets regardless of command timing
The firmware bug is completely transparent to consumers (Home Assistant)
The lock timeout prevents deadlocks in degraded network conditions
Target preservation eliminates the need for callers to track and re-send previous targets

Negative¶

The 5-second lock timeout introduces a maximum latency for queued commands
On lock timeout, a brief race condition window exists (mitigated by CCU-side command queuing)
The stop-then-resend approach adds one extra RPC call when commands overlap during movement

Interaction with Command Throttling (ADR 0020)¶

The command processing lock operates above the throttle layer. Sequence when both are active:

set_position()
  → _set_level() acquires _command_processing_lock
    → _stop() sends STOP via client (throttle may delay)
    → _send_level() sends LEVEL_COMBINED via client (throttle may delay)
  → _command_processing_lock released

Cover commands use HIGH priority by default. The lock timeout (5s) must be larger than the expected throttle delay to avoid spurious timeouts.

Interaction with Optimistic Updates (ADR 0020)¶

The _target_level and _target_tilt_level properties integrate with both the optimistic update system and the legacy command tracker:

When optimistic updates are enabled: is_optimistic is checked first, providing immediate target awareness
When disabled: unconfirmed_last_value_send from CommandTracker serves the same purpose

This hybrid approach ensures the lock mechanism works correctly regardless of the optimistic updates feature flag.

Alternatives Considered¶

No Lock, Rely on CCU Command Queuing¶

Let the CCU handle concurrent commands natively without client-side serialization.

Rejected: The CCU does queue commands, but the firmware bug in blind actors causes incorrect behavior when commands arrive during physical movement. Client-side stop-then-resend is necessary.

Lock Without Target Preservation¶

Serialize commands but always use the current confirmed position as fallback for unspecified axes.

Rejected: This loses the pending target. If a user sends level=50% and then tilt=30% during movement, the second command would use the current (mid-movement) level rather than the intended target of 50%.

Per-Axis Locks¶

Use separate locks for level and tilt to allow independent concurrent changes.

Rejected: The device firmware bug affects the device as a whole, not individual axes. Both axes must be stopped and resent together. A single lock correctly models this constraint.

Longer or No Timeout¶

Increase the lock timeout or remove it entirely.

Rejected: A 5-second timeout balances deadlock prevention against command latency. With network issues, an indefinite lock could block all cover commands permanently. The current timeout allows degraded operation with a logged warning.

References¶

aiohomematic/model/custom/cover.py -- Cover and blind implementation
ADR 0020: Command Throttling with Priority Queue and Optimistic Updates
tests/test_model_cover.py:445 -- Race condition test (test_ceblind_separate_level_and_tilt_change)

Created: 2026-02-04 Author: Architecture Review