Skip to content

Conversation

@Martinski4GitHub
Copy link
Collaborator

@Martinski4GitHub Martinski4GitHub commented Nov 10, 2025

  • Improved/modified code that tries to unmount the USB-attached drives before flashing the F/W. If any USB drive is "busy", the code waits until it becomes idle, up to a maximum of 4 minutes.

We'll need to run some tests to check if the 4-minute wait is enough time to wait for most scenarios. Normally, SQLite3 calls don't take more than 2 minutes, based on what I've seen when testing spdMerlin and uiDivStats.

- Improved/modified code that tries to unmount the USB-attached drives before flashing the F/W. If any USB drive is "busy" the code waits until it becomes idle, up to a maximum of 150 secs (2.5 minutes).
Fine-tuning code.
@Martinski4GitHub
Copy link
Collaborator Author

@ExtremeFiretop,

Please review this PR when you get a chance. I have not been able to fully test it since it seems difficult to get the "right timing" for the race condition with spdMerlin to occur when the USB drive is busy while Merlin is running. I may have to create a "fake" process to recreate it. In any case, hopefully this will at least give us more data points to fine-tune the solution.

P.S.
Glad you're back home from vacation. I bet it feels like time went too fast!!

Made improvements in function to unmount the USB drives before flashing F/W image.
@ExtremeFiretop
Copy link
Owner

ExtremeFiretop commented Nov 11, 2025

@ExtremeFiretop,

Please review this PR when you get a chance. I have not been able to fully test it since it seems difficult to get the "right timing" for the race condition with spdMerlin to occur when the USB drive is busy while Merlin is running. I may have to create a "fake" process to recreate it. In any case, hopefully this will at least give us more data points to fine-tune the solution.

P.S. Glad you're back home from vacation. I bet it feels like time went too fast!!

So firstly, after reviewing the code I love the idea, I think we should implement something like this even if this is not the solution to the users specific problem, because it logically makes sense and I even remember asking you if you wanted to re-assess the USB unmounting solution instead of using /sbin/ejusb which I initially defended.

But I won't lie, I'm not sure this fully answers what happened to the user and let me explain a bit further in detail, because his router really didn't even try to do the upgrade in his logs, it just spammed that the USB drive is busy and then nothing. The next thing we see is a reboot which he says he initiated after waiting 20+ minutes and the time dilation confirms that was the boot process next.

You would expect even some normal operational logs to show up in that time frame, excluding any scripts or anything running.
Also I've seen those "USB" busy messages in my logs and many others over the years we've done this project, and it's not even caused any issues, as seen below:

Jul 22 12:21:17 rc_service: skip the event: restart_leds.
Jul 22 12:21:18 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 rc_service: service 9901:notify_rc restart_leds
Jul 22 12:21:19 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:20 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:21 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:22 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:23 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:24 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:25 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:26 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:27 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:28 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:29 ejusb: USB partition busy - will unmount ASAP from /tmp/mnt/AUBREYEXT4
Jul 22 12:21:29 rc_service: ejusb 9275:notify_rc restart_nasapps
Jul 22 12:21:29 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:29 kernel: usb 1-2: USB disconnect, device number 2
Jul 22 12:21:34 rc_service: skip the event: restart_leds.
Jul 22 12:21:36 rc_service: service 10168:notify_rc restart_leds
Jul 22 12:21:36 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:51 rc_service: skip the event: restart_leds.
Jul 22 12:21:53 rc_service: service 10423:notify_rc restart_leds
Jul 22 12:21:53 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:59 rc_service: skip the event: restart_nasapps.
Jul 22 12:21:59 rc_service: service 10515:notify_rc restart_leds
Jul 22 12:21:59 rc_service: waitting "restart_leds" via ...
Jul 22 12:22:08 rc_service: skip the event: restart_leds.
Jul 22 12:22:14 rc_service: skip the event: restart_leds.

Usually it spams the message a bit until it does go idle, and then the update pushes through.

Looking at the code, /sbin/ejusb is run unconditionally without any exit on failure, so that should mean it still would attempt to start the flash even if the eject failed. I guess the truth is because it's closed source code we don't really know if something changed between firmware versions, its a bit of a black box, but also in his logs I see no messages about the restarting of the leds like we usually do.

If I remember correctly the ejusb works as follows:

First Parameter (disk_port): This specifies which disk port you are referring to.
-1: Refers to all ports.
1: Refers to disk port 1.
2: Refers to disk port 2.

Second Parameter (NAS applications restart): This determines whether or not the NAS (Network Attached Storage) applications should be restarted.
0: Do not restart NAS applications.
1: Restart NAS applications.
Other Parameters:

-u: This is used for holding or unplugging the disk after unmount jobs are done. The default value is 1.
0: Hold the disk after unmount.
1: Unplug the disk after unmount.

So while I think it's absolutely valid to put more attention on the USB unmounting solution, I'm not fully convinced it's the entire answer and may be more of a red herring (call me a skeptic) until I see more evidence confirming this.

I think it should be tested extensively when we can (maybe this weekend or sometime this week) and then we can determine how to respond to the user. For now I'll merge this in as code reviewed and for testing so the user can provide feedback, and also for us to test asap

@ExtremeFiretop ExtremeFiretop merged commit 0ba5af2 into ExtremeFiretop:dev Nov 11, 2025
3 checks passed
@ExtremeFiretop
Copy link
Owner

ExtremeFiretop commented Nov 11, 2025

@Martinski4GitHub

Seeing your leading the discussion with the user this round since I was still away, and he seems to gravitate with you more than me. Feel free to let the user know he can test in the dev branch for now to provide feedback :)
But maybe hold off on the agreed upon solution to his issues until we know more and have done more testing. Call it experimental or something ;)

@Martinski4GitHub
Copy link
Collaborator Author

@ExtremeFiretop,
Please review this PR when you get a chance. I have not been able to fully test it since it seems difficult to get the "right timing" for the race condition with spdMerlin to occur when the USB drive is busy while Merlin is running. I may have to create a "fake" process to recreate it. In any case, hopefully this will at least give us more data points to fine-tune the solution.
P.S. Glad you're back home from vacation. I bet it feels like time went too fast!!

So firstly, after reviewing the code I love the idea, I think we should implement something like this even if this is not the solution to the users specific problem, because it logically makes sense and I even remember asking you if you wanted to re-assess the USB unmounting solution instead of using /sbin/ejusb which I initially defended.

But I won't lie, I'm not sure this fully answers what happened to the user and let me explain a bit further in detail, because his router really didn't even try to do the upgrade in his logs, it just spammed that the USB drive is busy and then nothing. The next thing we see is a reboot which he says he initiated after waiting 20+ minutes and the time dilation confirms that was the boot process next.

You would expect even some normal operational logs to show up in that time frame, excluding any scripts or anything running. Also I've seen those "USB" busy messages in my logs and many others over the years we've done this project, and it's not even caused any issues, as seen below:

Jul 22 12:21:17 rc_service: skip the event: restart_leds.
Jul 22 12:21:18 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:19 rc_service: service 9901:notify_rc restart_leds
Jul 22 12:21:19 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:20 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:21 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:22 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:23 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:24 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:25 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:26 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:27 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:28 ejusb: USB partition unmounted from /tmp/mnt/AUBREYEXT4 fail. (return -1, Device or resource busy)
Jul 22 12:21:29 ejusb: USB partition busy - will unmount ASAP from /tmp/mnt/AUBREYEXT4
Jul 22 12:21:29 rc_service: ejusb 9275:notify_rc restart_nasapps
Jul 22 12:21:29 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:29 kernel: usb 1-2: USB disconnect, device number 2
Jul 22 12:21:34 rc_service: skip the event: restart_leds.
Jul 22 12:21:36 rc_service: service 10168:notify_rc restart_leds
Jul 22 12:21:36 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:51 rc_service: skip the event: restart_leds.
Jul 22 12:21:53 rc_service: service 10423:notify_rc restart_leds
Jul 22 12:21:53 rc_service: waitting "restart_leds" via ...
Jul 22 12:21:59 rc_service: skip the event: restart_nasapps.
Jul 22 12:21:59 rc_service: service 10515:notify_rc restart_leds
Jul 22 12:21:59 rc_service: waitting "restart_leds" via ...
Jul 22 12:22:08 rc_service: skip the event: restart_leds.
Jul 22 12:22:14 rc_service: skip the event: restart_leds.

Usually it spams the message a bit until it does go idle, and then the update pushes through.

Looking at the code, /sbin/ejusb is run unconditionally without any exit on failure, so that should mean it still would attempt to start the flash even if the eject failed. I guess the truth is because it's closed source code we don't really know if something changed between firmware versions, its a bit of a black box, but also in his logs I see no messages about the restarting of the leds like we usually do.

If I remember correctly the ejusb works as follows:

First Parameter (disk_port): This specifies which disk port you are referring to. -1: Refers to all ports. 1: Refers to disk port 1. 2: Refers to disk port 2.

Second Parameter (NAS applications restart): This determines whether or not the NAS (Network Attached Storage) applications should be restarted. 0: Do not restart NAS applications. 1: Restart NAS applications. Other Parameters:

-u: This is used for holding or unplugging the disk after unmount jobs are done. The default value is 1. 0: Hold the disk after unmount. 1: Unplug the disk after unmount.

So while I think it's absolutely valid to put more attention on the USB unmounting solution, I'm not fully convinced it's the entire answer and may be more of a red herring (call me a skeptic) until I see more evidence confirming this.

I think it should be tested extensively when we can (maybe this weekend or sometime this week) and then we can determine how to respond to the user. For now I'll merge this in as code reviewed and for testing so the user can provide feedback, and also for us to test asap

I understand where you're coming from. Part of the issue is that we don't have the entire, unfiltered picture of what happened on the OP's router, so we have some "unknown" events.

It seems to me that the OP's settings for the system log are preventing some entries from showing up in the log. He most likely has set the option "Log only messages more urgent than" to "notice" level or perhaps higher, and that would explain why some entries are missing from his logs (e.g. the LEDs events), but that doesn't mean those events didn't happen. Because of the syslog settings, we really don't get the full picture, and therefore, some assumptions were made based on previous experience.

But yeah, there's still the question about why the script didn't seem to try to flash the F/W image. It does look like the router got stuck at the call to the '/sbin/ejusb' command. So the latest changes I've made attempt to unmount the USB drive until the 4-minute timeout is reached, and then if the call to /sbin/ejusb is truly stuck for some reason, the code will kill the process, and an email notification will be sent, so I'm hopeful that some traces of what happens would be found in the system log. Like you said, the command is a black box, so it's rather impossible to debug it properly and know what's happening.

So essentially, at this point, we can only try to recreate the issue (somehow) to run the latest code in the develop branch and see what the log shows during our own tests.

@Martinski4GitHub
Copy link
Collaborator Author

@Martinski4GitHub

Seeing your leading the discussion with the user this round since I was still away, and he seems to gravitate with you more than me. Feel free to let the user know he can test in the dev branch for now to provide feedback :) But maybe hold off on the agreed upon solution to his issues until we know more and have done more testing. Call it experimental or something ;)

Well, the OP already updated his router to the latest F/W version, so the latest code in the develop branch would not even run on the router until the next 3004.388.x F/W update, and that could be several weeks from now.

@ExtremeFiretop
Copy link
Owner

So far... I've been unable to recreate the issue with just the regular production version of MerlinAU as seen below, I did set spdMerlin to run every minute, and I started a MerlinAU update as soon as the speed test started:

Nov 12 08:44:49 rc_service: service 15341:notify_rc restart_leds
Nov 12 08:44:49 custom_script: Running /jffs/scripts/service-event (args: restart leds)
Nov 12 08:44:49 rc_service: waitting "restart_leds"(last_rc:restart_leds) via  ...
Nov 12 08:44:51 rc_service: service 15866:notify_rc restart_leds
Nov 12 08:44:51 custom_script: Running /jffs/scripts/service-event (args: restart leds)
Nov 12 08:44:51 rc_service: waitting "restart_leds"(last_rc:restart_leds) via  ...
Nov 12 08:44:52 rc_service: service 15924:notify_rc restart_leds
Nov 12 08:44:52 custom_script: Running /jffs/scripts/service-event (args: restart leds)
Nov 12 08:44:54 [MerlinAU] 1207: ---------------------------------------------------------------
Nov 12 08:44:54 [MerlinAU] 1207: Cron job #1: [45 */6 * * * /jffs/addons/amtm/routerdate cron #amtm_RouterDate#]
Nov 12 08:44:55 [MerlinAU] 1207: Cron job [amtm_RouterDate] was removed successfully.
Nov 12 08:44:55 [MerlinAU] 1207: Cron job #2: [10 5 * * * /bin/sh /jffs/addons/amtm/sc_update.mod -run #amtm_ScriptsUpdateNotification#]
Nov 12 08:44:56 [MerlinAU] 1207: Cron job [amtm_ScriptsUpdateNotification] was removed successfully.
Nov 12 08:44:56 [MerlinAU] 1207: Cron job #3: [2 2 * * * sh /jffs/scripts/backupmon.sh #RunBackupMon#]
Nov 12 08:44:57 [MerlinAU] 1207: Cron job [RunBackupMon] was removed successfully.
Nov 12 08:44:57 [MerlinAU] 1207: Cron job #4: [16 3 * * * sh /jffs/scripts/MerlinAU.sh run_now #MerlinAU_FWUpdate#]
Nov 12 08:44:58 [MerlinAU] 1207: Cron job [MerlinAU_FWUpdate] was removed successfully.
Nov 12 08:44:58 [MerlinAU] 1207: Cron job #5: [10 0 * * * /bin/sh /jffs/addons/amtm/led_control.mod -set #amtm_LEDcontrol_set#]
Nov 12 08:44:59 [MerlinAU] 1207: Cron job [amtm_LEDcontrol_set] was removed successfully.
Nov 12 08:44:59 [MerlinAU] 1207: Cron job #6: [1 3 * * * sh /jffs/scripts/MerlinAU.sh checkupdates #MerlinAU_ScriptUpdate#]
Nov 12 08:45:00 [MerlinAU] 1207: Cron job [MerlinAU_ScriptUpdate] was removed successfully.
Nov 12 08:45:00 [MerlinAU] 1207: Cron job #7: [*/1 * * * * /jffs/scripts/spdmerlin generate #spdMerlin_generate#]
Nov 12 08:45:01 [MerlinAU] 1207: Cron job [spdMerlin_generate] was removed successfully.
Nov 12 08:45:01 [MerlinAU] 1207: Cron job #8: [5 3 * * * /jffs/scripts/spdmerlin trimdb #spdMerlin_trimDB#]
Nov 12 08:45:02 spdMerlin_[17141]: Starting speedtest using Bell Canada (Ottawa, ON, Canada) for WAN interface. Please wait...
Nov 12 08:45:02 [MerlinAU] 1207: Cron job [spdMerlin_trimDB] was removed successfully.
Nov 12 08:45:02 [MerlinAU] 1207: Cron jobs [8] from 3rd-party add-ons were found.
Nov 12 08:45:02 [MerlinAU] 1207: ---------------------------------------------------------------
Nov 12 08:45:18 [MerlinAU] 1207: Post-update email notification hook was added successfully to '/jffs/scripts/services-start' script.
Nov 12 08:45:18 [MerlinAU] 1207: Flashing GT-BE98_PRO_3006_102.6_beta1_nand_squashfs.pkgtb... Please wait for reboot in about 4 minutes or less.
Nov 12 08:45:20 kernel: echo (18583): drop_caches: 3
Nov 12 08:45:23 custom_script: Running /jffs/scripts/unmount (args: /tmp/mnt/USB1)
Nov 12 08:45:24 spdMerlin_[17141]: Speedtest results - Download: 1701.43 Mbps (data used: 2.1 GB) - Upload: 1081.29 Mbps (data used: 1.3 GB)
Nov 12 08:45:24 spdMerlin_[17141]: Connection quality - Idle Latency: 2.25 ms (jitter: 0.50ms, low: 1.85ms, high: 2.60ms) - Packet Loss: 0.0%
Nov 12 08:45:24 spdMerlin_[17141]: Retrieving data for WebUI charts...
Nov 12 08:45:24 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:24 avahi-daemon[15010]: Files changed, reloading.
Nov 12 08:45:24 avahi-daemon[15010]: Loading service file /tmp/avahi/services/mt-daap.service.
Nov 12 08:45:24 avahi-daemon[15010]: Alias name GT-BE98_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Nov 12 08:45:24 iTunes: daemon is stopped
Nov 12 08:45:24 FTP_Server: daemon is stopped
Nov 12 08:45:24 Samba_Server: smb daemon is stopped
Nov 12 08:45:24 avahi-daemon[15010]: Got SIGTERM, quitting.
Nov 12 08:45:24 avahi-daemon[15010]: Leaving mDNS multicast group on interface br58.IPv4 with address 192.168.53.1.
Nov 12 08:45:24 avahi-daemon[15010]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.50.1.
Nov 12 08:45:24 avahi-daemon[15010]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Nov 12 08:45:24 avahi-daemon[15010]: avahi-daemon 0.8 exiting.
Nov 12 08:45:28 Timemachine: daemon is stopped
Nov 12 08:45:28 avahi-daemon[22586]: Found user 'nobody' (UID 65534) and group 'nobody' (GID 65534).
Nov 12 08:45:28 avahi-daemon[22586]: Successfully dropped root privileges.
Nov 12 08:45:28 avahi-daemon[22586]: avahi-daemon 0.8 starting up.
Nov 12 08:45:28 avahi-daemon[22586]: Loading service file /tmp/avahi/services/alexa.service.
Nov 12 08:45:28 avahi-daemon[22586]: Loading new alias name GT-BE98_Pro.
Nov 12 08:45:28 avahi-daemon[22586]: Joining mDNS multicast group on interface br58.IPv4 with address 192.168.53.1.
Nov 12 08:45:28 avahi-daemon[22586]: New relevant interface br58.IPv4 for mDNS.
Nov 12 08:45:28 avahi-daemon[22586]: Joining mDNS multicast group on interface br0.IPv4 with address 192.168.50.1.
Nov 12 08:45:28 avahi-daemon[22586]: New relevant interface br0.IPv4 for mDNS.
Nov 12 08:45:28 avahi-daemon[22586]: Joining mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Nov 12 08:45:28 avahi-daemon[22586]: New relevant interface lo.IPv4 for mDNS.
Nov 12 08:45:28 avahi-daemon[22586]: Network interface enumeration completed.
Nov 12 08:45:28 avahi-daemon[22586]: Registering new address record for 192.168.53.1 on br58.IPv4.
Nov 12 08:45:28 avahi-daemon[22586]: Registering new address record for 192.168.50.1 on br0.IPv4.
Nov 12 08:45:28 avahi-daemon[22586]: Registering new address record for 127.0.1.1 on lo.IPv4.
Nov 12 08:45:28 avahi-daemon[22586]: Registering new address record for 127.0.0.1 on lo.IPv4.
Nov 12 08:45:29 avahi-daemon[22586]: Server startup complete. Host name is GT-BE98_Pro.local. Local service cookie is 4025219024.
Nov 12 08:45:29 avahi-daemon[22586]: Alias name GT-BE98_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Nov 12 08:45:29 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:30 avahi-daemon[22586]: Service "GT-BE98_Pro" (/tmp/avahi/services/alexa.service) successfully established.
Nov 12 08:45:30 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:31 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:32 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:33 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:34 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:35 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:36 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:37 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:38 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:39 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:40 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:41 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:42 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:43 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:44 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:45 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:46 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:47 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:48 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:49 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:50 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:51 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:52 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:53 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:54 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:55 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:56 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:57 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:58 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:45:59 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:46:00 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:46:01 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:46:02 ejusb: USB partition unmounted from /tmp/mnt/USB1 fail. (return -1, Device or resource busy)
Nov 12 08:46:04 ejusb: USB partition busy - will unmount ASAP from /tmp/mnt/USB1
Nov 12 08:46:04 kernel: print_req_error: I/O error, dev sda, sector 0
Nov 12 08:46:04 kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 12 08:46:04 kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Nov 12 08:46:04 kernel: usb 2-1: USB disconnect, device number 2
Nov 12 08:46:04 rc_service: ejusb 18662:notify_rc restart_nasapps
Nov 12 08:46:04 custom_script: Running /jffs/scripts/service-event (args: restart nasapps)
Nov 12 08:46:04 avahi-daemon[22586]: Files changed, reloading.
Nov 12 08:46:04 avahi-daemon[22586]: Loading service file /tmp/avahi/services/mt-daap.service.
Nov 12 08:46:04 avahi-daemon[22586]: Alias name GT-BE98_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Nov 12 08:46:04 iTunes: daemon is stopped
Nov 12 08:46:04 FTP_Server: daemon is stopped
Nov 12 08:46:04 Samba_Server: smb daemon is stopped
Nov 12 08:46:04 avahi-daemon[22586]: Got SIGTERM, quitting.
Nov 12 08:46:04 avahi-daemon[22586]: Leaving mDNS multicast group on interface br58.IPv4 with address 192.168.53.1.
Nov 12 08:46:04 avahi-daemon[22586]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.50.1.
Nov 12 08:46:04 avahi-daemon[22586]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Nov 12 08:46:04 avahi-daemon[22586]: avahi-daemon 0.8 exiting.
Nov 12 08:46:04 rc_service: httpd 5369:notify_rc stop_upgrade
Nov 12 08:46:04 rc_service: waitting "restart_nasapps"(last_rc:restart_nasapps) via  ...
Nov 12 08:46:04 rc_service: waitting "restart_nasapps"(last_rc:restart_nasapps) via  ...
Nov 12 08:46:06 Timemachine: daemon is stopped
Nov 12 08:46:06 avahi-daemon[29292]: Found user 'nobody' (UID 65534) and group 'nobody' (GID 65534).
Nov 12 08:46:06 avahi-daemon[29292]: Successfully dropped root privileges.
Nov 12 08:46:06 avahi-daemon[29292]: avahi-daemon 0.8 starting up.
Nov 12 08:46:06 avahi-daemon[29292]: Loading service file /tmp/avahi/services/alexa.service.
Nov 12 08:46:06 avahi-daemon[29292]: Loading new alias name GT-BE98_Pro.
Nov 12 08:46:06 avahi-daemon[29292]: Joining mDNS multicast group on interface br58.IPv4 with address 192.168.53.1.
Nov 12 08:46:06 avahi-daemon[29292]: New relevant interface br58.IPv4 for mDNS.
Nov 12 08:46:06 avahi-daemon[29292]: Joining mDNS multicast group on interface br0.IPv4 with address 192.168.50.1.
Nov 12 08:46:06 avahi-daemon[29292]: New relevant interface br0.IPv4 for mDNS.
Nov 12 08:46:06 avahi-daemon[29292]: Joining mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Nov 12 08:46:06 avahi-daemon[29292]: New relevant interface lo.IPv4 for mDNS.
Nov 12 08:46:06 avahi-daemon[29292]: Network interface enumeration completed.
Nov 12 08:46:06 avahi-daemon[29292]: Registering new address record for 192.168.53.1 on br58.IPv4.
Nov 12 08:46:06 avahi-daemon[29292]: Registering new address record for 192.168.50.1 on br0.IPv4.
Nov 12 08:46:06 avahi-daemon[29292]: Registering new address record for 127.0.1.1 on lo.IPv4.
Nov 12 08:46:06 avahi-daemon[29292]: Registering new address record for 127.0.0.1 on lo.IPv4.
Nov 12 08:46:06 miniupnpd[17522]: shutting down MiniUPnPd
Nov 12 08:46:06 miniupnpd[29294]: HTTP listening on port 36059
Nov 12 08:46:06 miniupnpd[29294]: no HTTP IPv6 address, disabling IPv6
Nov 12 08:46:06 miniupnpd[29294]: Listening for NAT-PMP/PCP traffic on port 5351
Nov 12 08:46:07 avahi-daemon[29292]: Server startup complete. Host name is GT-BE98_Pro.local. Local service cookie is 502471424.
Nov 12 08:46:07 avahi-daemon[29292]: Alias name GT-BE98_Pro: avahi_server_add_cname failure: The requested operation is invalid because redundant
Nov 12 08:46:07 custom_script: Running /jffs/scripts/service-event (args: stop upgrade)
Nov 12 08:46:07 usb: USB is ejected
Nov 12 08:46:07 WEBDAV_Server: daemon is stopped
Nov 12 08:46:07 WEBDAV_Server: arpping daemon is stopped
Nov 12 08:46:07 UAM_Server: daemon is stoped
Nov 12 08:46:07 iTunes: daemon is stopped
Nov 12 08:46:07 FTP_Server: daemon is stopped
Nov 12 08:46:07 Samba_Server: smb daemon is stopped
Nov 12 08:46:07 Timemachine: daemon is stopped
Nov 12 08:46:07 WEBDAV_Server: daemon is stopped
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver ipheth
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver qmi_wwan
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver cdc_wdm
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver cdc_ncm
Nov 12 08:46:07 avahi-daemon[29292]: Service "GT-BE98_Pro" (/tmp/avahi/services/alexa.service) successfully established.
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver rndis_host
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver cdc_ether
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver ax88179_178a
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver asix
Nov 12 08:46:07 kernel: usbcore: deregistering interface driver cdc_acm
Nov 12 08:46:08 kernel: cdrom: Uniform CD-ROM driver unloaded
Nov 12 08:46:08 iTunes: daemon is stopped
Nov 12 08:46:08 FTP_Server: daemon is stopped
Nov 12 08:46:08 Samba_Server: smb daemon is stopped
Nov 12 08:46:08 Timemachine: daemon is stopped
Nov 12 08:46:09 ntpd: Stopped ntpd
Nov 12 08:46:10 miniupnpd[29294]: shutting down MiniUPnPd
Nov 12 08:46:10 avahi-daemon[29292]: Got SIGTERM, quitting.
Nov 12 08:46:10 avahi-daemon[29292]: Leaving mDNS multicast group on interface br58.IPv4 with address 192.168.53.1.
Nov 12 08:46:10 avahi-daemon[29292]: Leaving mDNS multicast group on interface br0.IPv4 with address 192.168.50.1.
Nov 12 08:46:10 avahi-daemon[29292]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.1.1.
Nov 12 08:46:10 avahi-daemon[29292]: avahi-daemon 0.8 exiting.
Nov 12 08:46:10 disk_monitor: Finish
Nov 12 08:46:10 dropbear[6931]: Early exit: Terminated by signal
Nov 12 08:46:10 dropbear[16420]: Exit (Admin) from <192.168.50.163:55520>: Terminated by signal
Nov 12 08:46:10 dropbear[6231]: Exit (Admin) from <192.168.50.163:51630>: Terminated by signal
Nov 12 08:46:10 kernel: Buffer I/O error on dev sda1, logical block 38830080, lost sync page write
Nov 12 08:46:10 kernel: JBD2: Error -5 detected when updating journal superblock for sda1-8.
Nov 12 08:46:10 kernel: Aborting journal on device sda1-8.
Nov 12 08:46:10 kernel: Buffer I/O error on dev sda1, logical block 38830080, lost sync page write
Nov 12 08:46:10 kernel: JBD2: Error -5 detected when updating journal superblock for sda1-8.
Nov 12 08:46:11 kernel: echo (29668): drop_caches: 1
Nov 12 08:46:58 rc_service: httpd 5369:notify_rc start_upgrade
Nov 12 08:46:58 custom_script: Running /jffs/scripts/service-event (args: start upgrade)
Nov 12 08:46:58 custom_script: Running /jffs/scripts/services-stop
Nov 12 08:46:58 coova-chilli[6158]: chilli.c: 7807: CoovaChilli shutting down
Nov 12 08:46:58 kernel: device br53 left promiscuous mode
Nov 12 08:46:58 lldpd[6872]: removal request for address of 192.168.52.1%61, but no knowledge of it
Nov 12 08:46:58 kernel: Freeing devid[2109] for tun22
Nov 12 08:46:59 amtm routerdate: Preserving router date before reboot (2025-11-12 13:46:59) UTC time.
Nov 12 08:47:01 WEBDAV_Server: daemon is stopped
Nov 12 08:47:01 Mastiff: Got SIGTERM
Nov 12 08:47:04 FTP_Server: daemon is stopped
Nov 12 08:47:04 Samba_Server: smb daemon is stopped
Nov 12 08:47:04 Timemachine: daemon is stopped
Nov 12 08:47:04 dnsmasq[15104]: exiting on receipt of SIGTERM

@ExtremeFiretop
Copy link
Owner

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

@Martinski4GitHub
Copy link
Collaborator Author

So far... I've been unable to recreate the issue with just the regular production version of MerlinAU as seen below, I did set spdMerlin to run every minute, and I started a MerlinAU update as soon as the speed test started:

Yeah, getting the timing "just right" to recreate the issue will be tricky. And the OP has a different router with a different CPU and RAM available, which also affects the timing of the executing processes.

I'm also wondering now if connmon also played a role in the OP's situation. The syslog he provided shows that connmon was getting executed about the same time as MerlinAU was running, which means at some point SQLite3 was executed to update the database. So, potentially, there may have been two processes (connmon and spdMerlin) running SQLite3, which, as mentioned before, uses a directory in the USB drive for temporary files and logs, and if the database files are large enough, it might take a couple of minutes to complete. But still, eventually, the file handles should get released, and the USB drive should reach the "idle" state.

-----------------------------------------------------------------
Nov  9 12:10:02 connmon_[24541]: Excluding QoS [Type: 1] from ping tests...
Nov  9 12:10:05 connmon_[24541]: QoS [Type: 1] is excluded.
Nov  9 12:10:29 admin: BACKUPMON INFO: Finished secondary backup of JFFS to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/jffs.tar.gz
Nov  9 12:10:37 admin: BACKUPMON INFO: Finished secondary backup of NVRAM to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/nvram.cfg
Nov  9 12:11:06 connmon_[24541]: Restarting QoS [Type: 1]...
Nov  9 12:11:08 connmon_[24541]: QoS [Type: 1] was restarted.
Nov  9 12:11:17 admin: BACKUPMON INFO: Finished secondary backup of EXT Drive in 40 sec to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/entware.tar.gz
Nov  9 12:11:41 dropbear[30899]: Pubkey auth succeeded for 'admin' with ssh-ed25519 key SHA256:lTZtB/3guH2ICowzIvnsYJmK64l8kr+ygBiiqqh1b+c from xx.xx.xx.xx:2071
Nov  9 12:11:56 [MerlinAU] 21058: Backup Finished
Nov  9 12:11:56 [MerlinAU] 21058: Backup Completed Successfully
-----------------------------------------------------------------

@Martinski4GitHub
Copy link
Collaborator Author

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

P.S.
I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

@ExtremeFiretop
Copy link
Owner

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

Agreed, the enhancements should stay, but validating the behavior will be hard.

P.S. I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

I'm investigating the issue with spdMerlin now from the spdMerlin thread. It's very interesting, I'll get back to you shortly anyways with any developments.

@ExtremeFiretop
Copy link
Owner

ExtremeFiretop commented Nov 13, 2025

So far... I've been unable to recreate the issue with just the regular production version of MerlinAU as seen below, I did set spdMerlin to run every minute, and I started a MerlinAU update as soon as the speed test started:

Yeah, getting the timing "just right" to recreate the issue will be tricky. And the OP has a different router with a different CPU and RAM available, which also affects the timing of the executing processes.

I'm also wondering now if connmon also played a role in the OP's situation. The syslog he provided shows that connmon was getting executed about the same time as MerlinAU was running, which means at some point SQLite3 was executed to update the database. So, potentially, there may have been two processes (connmon and spdMerlin) running SQLite3, which, as mentioned before, uses a directory in the USB drive for temporary files and logs, and if the database files are large enough, it might take a couple of minutes to complete. But still, eventually, the file handles should get released, and the USB drive should reach the "idle" state.

-----------------------------------------------------------------
Nov  9 12:10:02 connmon_[24541]: Excluding QoS [Type: 1] from ping tests...
Nov  9 12:10:05 connmon_[24541]: QoS [Type: 1] is excluded.
Nov  9 12:10:29 admin: BACKUPMON INFO: Finished secondary backup of JFFS to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/jffs.tar.gz
Nov  9 12:10:37 admin: BACKUPMON INFO: Finished secondary backup of NVRAM to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/nvram.cfg
Nov  9 12:11:06 connmon_[24541]: Restarting QoS [Type: 1]...
Nov  9 12:11:08 connmon_[24541]: QoS [Type: 1] was restarted.
Nov  9 12:11:17 admin: BACKUPMON INFO: Finished secondary backup of EXT Drive in 40 sec to /tmp/mnt/secondarybackups/backupmon/ax58u/20251109-121004/entware.tar.gz
Nov  9 12:11:41 dropbear[30899]: Pubkey auth succeeded for 'admin' with ssh-ed25519 key SHA256:lTZtB/3guH2ICowzIvnsYJmK64l8kr+ygBiiqqh1b+c from xx.xx.xx.xx:2071
Nov  9 12:11:56 [MerlinAU] 21058: Backup Finished
Nov  9 12:11:56 [MerlinAU] 21058: Backup Completed Successfully
-----------------------------------------------------------------

Great idea, sounds like SQLite3 was probably being hammered at the same time as MerlinAU

@Martinski4GitHub
Copy link
Collaborator Author

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

Agreed, the enhancements should stay, but validating the behavior will be hard.

P.S. I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

I'm investigating the issue with spdMerlin now from the spdMerlin thread. It's very interesting, I'll get back to you shortly anyways with any developments.

Sounds good, I'm just catching up to that thread. When you get a chance, send the syslog as well. I'd like to take a look and compare it with the one from Ripshod when he was having the same issue.

@ExtremeFiretop
Copy link
Owner

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

Agreed, the enhancements should stay, but validating the behavior will be hard.

P.S. I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

I'm investigating the issue with spdMerlin now from the spdMerlin thread. It's very interesting, I'll get back to you shortly anyways with any developments.

Sounds good, I'm just catching up to that thread. When you get a chance, send the syslog as well. I'd like to take a look and compare it with the one from Ripshod when he was having the same issue.

Find here: https://drive.google.com/file/d/1SPu3iUdyqysECfjxCOJCXftFuUGPhkIN/view

@Martinski4GitHub
Copy link
Collaborator Author

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

Agreed, the enhancements should stay, but validating the behavior will be hard.

P.S. I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

I'm investigating the issue with spdMerlin now from the spdMerlin thread. It's very interesting, I'll get back to you shortly anyways with any developments.

Sounds good, I'm just catching up to that thread. When you get a chance, send the syslog as well. I'd like to take a look and compare it with the one from Ripshod when he was having the same issue.

Find here: https://drive.google.com/file/d/1SPu3iUdyqysECfjxCOJCXftFuUGPhkIN/view

Thanks!!! Now I need to grab my freshly-brewed cup of coffee before the meeting.
Talk to you later in the evening.

@ExtremeFiretop
Copy link
Owner

I still think the USB unmount should remain enhanced, but trying to recreate what happened exactly will be tricky, not only because of timing, but also because we couldn't "see" exactly what happened due to the filtered log.

Yeah, the improved code to double-check the USB-attached drive getting unmounted should make that step more robust, and the log entries marking the events should be helpful if the same issue happens again.

Agreed, the enhancements should stay, but validating the behavior will be hard.

P.S. I got an Engineering meeting at 8:00 AM (local time), so I'll be going offline for the day pretty soon.

I'm investigating the issue with spdMerlin now from the spdMerlin thread. It's very interesting, I'll get back to you shortly anyways with any developments.

Sounds good, I'm just catching up to that thread. When you get a chance, send the syslog as well. I'd like to take a look and compare it with the one from Ripshod when he was having the same issue.

Find here: https://drive.google.com/file/d/1SPu3iUdyqysECfjxCOJCXftFuUGPhkIN/view

Thanks!!! Now I need to grab my freshly-brewed cup of coffee before the meeting. Talk to you later in the evening.

Sounds good!
I'm currently having the user try to remove:

. /jffs/addons/diversion/mount-entware.div # Added by amtm

from his post-mount and trying again. But I can see in the log file sent that spdMerlin never triggers the NTP check after the reboot, so it's either bailing out early or never being run at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants