Skip to content

[mt7925] MCU timeout and firmware hang during MLO roaming #1036

@zbowling

Description

@zbowling

The MT7925 driver experiences MCU command timeouts and firmware hangs during WiFi 7 MLO (Multi-Link Operation) roaming attempts. The firmware becomes unresponsive and requires a reset to recover.

This issue is partially mitigated by recent patches I've submitted (PR #1029, #1030, #1031, #1032, #1033, #1034) which prevent kernel panics and deadlocks, but the underlying firmware issue remains.

Hardware

  • WiFi Card: MediaTek MT7925 (RZ717)
  • System: Framework Desktop (AMD Ryzen AI Max 300)
  • Firmware: WM Build 20250721232943

Kernel Version

6.18.2 with mt7925 patches applied

Steps to Reproduce

  1. Connect to WiFi 7 AP with MLO enabled (2.4GHz + 5GHz + 6GHz)
  2. Establish MLO connection on multiple links
  3. Trigger roaming (signal degradation, wpa_cli roam, or natural roaming)
  4. Observe firmware timeout and recovery

Log Output

Jan 01 00:51:21 kernel: wlp192s0: disconnect from AP d8:b3:70:f8:9e:7b for new auth to d8:b3:70:f8:89:4f
Jan 01 00:51:21 kernel: wlp192s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22)
Jan 01 00:51:21 kernel: wlp192s0: failed to remove key (4, ff:ff:ff:ff:ff:ff) from hardware (-22)
Jan 01 00:51:22 kernel: wlp192s0: authenticated
Jan 01 00:51:22 wpa_supplicant: nl80211: kernel reports: Error fetching BSS for link
Jan 01 00:51:22 wpa_supplicant: wlp192s0: SME: Association request to the driver failed
Jan 01 00:51:25 kernel: wlp192s0: d8:b3:70:f8:9e:7b rejected association temporarily; comeback duration 1000 TU
Jan 01 00:51:26 kernel: wlp192s0: association with d8:b3:70:f8:9e:7b timed out

Jan 01 00:51:29 kernel: mt7925e 0000:c0:00.0: Message 00020002 (seq 12) timeout
Jan 01 00:51:35 kernel: mt7925e 0000:c0:00.0: Message 00020003 (seq 13) timeout
Jan 01 00:51:38 kernel: mt7925e 0000:c0:00.0: Message 00020002 (seq 14) timeout
Jan 01 00:51:41 kernel: mt7925e 0000:c0:00.0: Message 00020002 (seq 15) timeout
Jan 01 00:51:44 kernel: mt7925e 0000:c0:00.0: Message 00020001 (seq 1) timeout
Jan 01 00:51:44 kernel: mt7925e 0000:c0:00.0: HW/SW Version: 0x8a108a10, Build Time: 20250721232852a
Jan 01 00:51:44 kernel: mt7925e 0000:c0:00.0: WM Firmware Version: ____000000, Build Time: 20250721232943

wpa_supplicant also reports:

nl80211: kernel reports: link ID must for MLO group key
nl80211: kernel reports: link ID must for MLO group key
nl80211: kernel reports: link ID must for MLO group key

Analysis

The failure cascade involves three issues:

1. MLO Group Key Removal Returns -EINVAL

When removing broadcast keys during roaming, mt7925_set_key() returns -EINVAL (-22). The wpa_supplicant message "link ID must for MLO group key" suggests the key removal path doesn't properly handle MLO link IDs for group keys.

2. BSS Info Fetch Failure

The driver fails to provide BSS information for the target link during association, causing nl80211: Error fetching BSS for link. This may be a race condition or missing BSS caching in MLO paths.

3. Firmware Becomes Unresponsive

After the failed roaming sequence, firmware stops responding to MCU commands:

  • 00020002 - Likely MCU_UNI_CMD_BSS_INFO or MCU_UNI_CMD_DEV_INFO
  • 00020003 - Related BSS/device command
  • 00020001 - MCU_UNI_CMD_STAREC

The driver correctly detects the timeout and triggers firmware reload, which succeeds. Without my patches, this would cause a kernel panic or system deadlock.

Impact

  • With patches: ~30 second WiFi outage, automatic recovery
  • Without patches: Kernel panic or system deadlock requiring hard reboot

Suggested Investigation Areas

  1. mt7925_set_key() - MLO group key handling for broadcast addresses
  2. BSS info caching - Ensure BSS info available during MLO roaming
  3. MCU command handling - What causes firmware to become unresponsive after failed roaming?
  4. Firmware - MediaTek firmware team should investigate the state machine

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions