Skip to content

Error handling events#8

Merged
michnovka merged 4 commits intomichnovka:masterfrom
nachogarcia:error_handling_events
Jan 16, 2025
Merged

Error handling events#8
michnovka merged 4 commits intomichnovka:masterfrom
nachogarcia:error_handling_events

Conversation

@nachogarcia
Copy link
Copy Markdown

As requested in anotherjulien#155 (comment)

@michnovka michnovka merged commit 3f9490a into michnovka:master Jan 16, 2025
@michnovka
Copy link
Copy Markdown
Owner

Thanks @nachogarcia

@michnovka
Copy link
Copy Markdown
Owner

Do you think this might solve #6 ?

@nachogarcia
Copy link
Copy Markdown
Author

Not sure, if it's the case at least we might have some trace now.

If you plan on maintain this repo, maybe you could talk to have yours in HACS.
Thanks for your work!

@michnovka
Copy link
Copy Markdown
Owner

I would very much prefer @anotherjulien to come back and merge our changes into his. I will be moving out of my flat in the next 2 years and I will have KNX then, not bticino. So I am not the best candidate for its long term maintenance

@michnovka
Copy link
Copy Markdown
Owner

Also, I am really not a python coder. All this is as trial/error + AI. I have also patched some stuff in ownd library which would be good to include, but since this is taken from pypi, Id have to publish my own fork there.

I did find however that https://github.com/smf666/OWNd pushed his as ownd2 so I created PR on his repo to merge my changes and publish them. We should then switch dependency to that until @anotherjulien fixes his ownd.

@smf666 has also his own fork of MyHome which includes some fixes and seems like proper zigbee support. I wanted to merge his commits, but he implemented cover timings functionality which conflicts with one of PRs merged into my repo. I think the version I use (from @andrea-parisi fork) is superior as it handles different time up and down which has been a gamechanger when it comes to accuracy of % calculations, but the other repo also somewhat works with zigbee blinds if I understood correctly? On top of that the repo has really weird diffs when it comes to commits, it shows basically every line as removed and then added which makes it super hard to diff visually on github.

All that being said, I am not a python developer and while I can piece together things from other devs, merging @smf666 or even debugging in further detail the issues goes beyond my python skillset. Maybe you can have a look if you want to try and merge this. Cheers!

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

Hello! Some of those changes breaks way that group lights are handled. For example: for every light on/off command, ownd opens a new session, sends the own password, then the command. the problem is that the homeserver can't handle so many connections and some commands are not sent. before the commit, there was only a login/connection with a queue.

@michnovka
Copy link
Copy Markdown
Owner

@gdluck what changes do you mean? I dont see in my log that a new session would be initiated for every light command

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

Try to open/close a group of lights and see what is happening after this pull (with debug activated)

@michnovka
Copy link
Copy Markdown
Owner

homeassistant  | 2025-01-16 18:16:11.822 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*1*41##` was successfully queued.
homeassistant  | 2025-01-16 18:16:11.823 DEBUG (MainThread) [custom_components.myhome] MH202 Gateway (192.168.1.40) Message `*1*1*41##` was successfully unqueued by worker 0.
homeassistant  | 2025-01-16 18:16:11.878 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*1*41##` was successfully sent.
homeassistant  | 2025-01-16 18:16:11.891 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#1*41##`
homeassistant  | 2025-01-16 18:16:11.891 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#1*41##`
homeassistant  | 2025-01-16 18:16:11.921 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1*41##`
homeassistant  | 2025-01-16 18:16:11.921 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 41 is switched on.
homeassistant  | 2025-01-16 18:16:16.946 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#13**22*19*15*00*000*04*16*01*2025##`
homeassistant  | 2025-01-16 18:16:16.946 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Gateway's internal datetime is: 2025-01-16 19:15:00+00:00.
homeassistant  | 2025-01-16 18:16:21.274 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*0*41##` was successfully queued.
homeassistant  | 2025-01-16 18:16:21.275 DEBUG (MainThread) [custom_components.myhome] MH202 Gateway (192.168.1.40) Message `*1*0*41##` was successfully unqueued by worker 0.
homeassistant  | 2025-01-16 18:16:21.332 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*0*41##` was successfully sent.
homeassistant  | 2025-01-16 18:16:21.334 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*41##`
homeassistant  | 2025-01-16 18:16:21.334 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*41##`
homeassistant  | 2025-01-16 18:16:21.376 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*41##`
homeassistant  | 2025-01-16 18:16:21.376 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 41 is switched off.
homeassistant  | 2025-01-16 18:16:23.484 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*1*41##` was successfully queued.
homeassistant  | 2025-01-16 18:16:23.485 DEBUG (MainThread) [custom_components.myhome] MH202 Gateway (192.168.1.40) Message `*1*1*41##` was successfully unqueued by worker 0.
homeassistant  | 2025-01-16 18:16:23.539 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*1*1*41##` was successfully sent.
homeassistant  | 2025-01-16 18:16:23.543 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#1*41##`
homeassistant  | 2025-01-16 18:16:23.543 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#1*41##`
homeassistant  | 2025-01-16 18:16:23.586 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1*41##`
homeassistant  | 2025-01-16 18:16:23.587 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 41 is switched on.

For single light, works as expected.

What do you mean by group of lights? Like in a bticino area? Or group in HA? Sorry, but I dont understand

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

Screenshot 2025-01-16 at 19 18 17 Screenshot 2025-01-16 at 19 19 18

@michnovka
Copy link
Copy Markdown
Owner

Sorry, I cannot reproduce, it turns off within a single session:

homeassistant  | 2025-01-16 18:22:13.939 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 93 is switched off.
homeassistant  | 2025-01-16 18:22:14.000 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*19##`
homeassistant  | 2025-01-16 18:22:14.000 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*19##`
homeassistant  | 2025-01-16 18:22:14.010 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*12##`
homeassistant  | 2025-01-16 18:22:14.010 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*12##`
homeassistant  | 2025-01-16 18:22:14.059 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*19##`
homeassistant  | 2025-01-16 18:22:14.060 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 19 is switched off.
homeassistant  | 2025-01-16 18:22:14.074 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*21##`
homeassistant  | 2025-01-16 18:22:14.074 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*21##`
homeassistant  | 2025-01-16 18:22:14.128 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*12##`
homeassistant  | 2025-01-16 18:22:14.128 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 12 is switched off.
homeassistant  | 2025-01-16 18:22:14.163 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*21##`
homeassistant  | 2025-01-16 18:22:14.164 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 21 is switched off.
homeassistant  | 2025-01-16 18:22:14.169 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*11##`
homeassistant  | 2025-01-16 18:22:14.169 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*11##`
homeassistant  | 2025-01-16 18:22:14.200 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*11##`
homeassistant  | 2025-01-16 18:22:14.201 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 11 is switched off.
homeassistant  | 2025-01-16 18:22:14.241 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*14##`
homeassistant  | 2025-01-16 18:22:14.241 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*14##`
homeassistant  | 2025-01-16 18:22:14.283 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*0*14##`
homeassistant  | 2025-01-16 18:22:14.283 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Light 14 is switched off.
homeassistant  | 2025-01-16 18:22:14.311 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*1*1000#0*51##`
homeassistant  | 2025-01-16 18:22:14.311 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Ignoring translation message `*1*1000#0*51##`

@nachogarcia can you reproduce this?

@michnovka
Copy link
Copy Markdown
Owner

Also I cannot imagine what this PR would cause, as its mostly just error handling. There is one change here:

https://github.com/michnovka/HomeAssistant-MyHOME/pull/8/files#diff-fce4ba98c3fc2103fc1b379d45eb0dbc6f93d3bc9a040bd4774867ef0d4a2e38R286

But still I dont see how it would cause what you describe. Did you confirm 100% that without this commit the issue does not happen?

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

I just reverted to a version before this merge and now is working correctly. Still many auths but no command is lost.

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

Oh. I've made a commit to fix exactly your problem with ignored commands: gdluck@bb0c69a

And since then, ignored commands are gone. But with those merges, I restored the bug.

@michnovka
Copy link
Copy Markdown
Owner

well your commit explains what is going on. You are closing and creating new connection with every request. that is definitely not how it should be done and its the true reason for your issues

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 16, 2025

But is the only way is working. I have ~70 sensors (35 binary and 35 thermostats with temp + humidity) that are sending data and creates the issue with ignored commands. Something makes the send_buffer to ignore/fail commands.

@michnovka
Copy link
Copy Markdown
Owner

@gdluck if you want to talk about this issue you can join at #6 . Lets close this thread here.

On a sidenote, I believe the issue stems from the underlying OWNd library, not this package

@michnovka
Copy link
Copy Markdown
Owner

Not sure, if it's the case at least we might have some trace now.

If you plan on maintain this repo, maybe you could talk to have yours in HACS. Thanks for your work!

I can confirm @nachogarcia that it does NOT fix the issue :(

@anotherjulien
Copy link
Copy Markdown

I would very much prefer @anotherjulien to come back and merge our changes into his. I will be moving out of my flat in the next 2 years and I will have KNX then, not bticino. So I am not the best candidate for its long term maintenance

I'm not a developer, I learned just enough of Python to produce this pile of code a few years ago.
I maintain the project enough to meet new HA requirements but that's about it.
It's been working without a single error or even warning for me for years now, and that's with a fairly busy network (230 entities over 78 devices)

It has been honestly exhausting trying to keep up with everyone coming to me with their specific - setup dependent - issues and asking for a fix. Or feature requests that I do not have the hardware to test.
It reached a point where receiving emails from HA forum or GitHub would almost cause an anxiety attack.
At this point I'm afraid to otherwise take part in the HA forum for fear of being sucked back into everyone's problems.
And all of this for a product that is quite frankly, kinda crappy. Myhome is a mess, it's barely documented, and now that Legrand has pulled the plug on it, it will only get worse.

People complain that OWNd makes a new connection for each command? Well... it needs to... because command sessions have an undocumented lifetime, that is dependent on the type of gateway you have. And you cannot know when the session is closed by the gateway because it simply stops responding instead of closing gracefully.
Some gateways will also rate-limit commands, some gateways will rate-limit new sessions, but it is also not documented anywhere...
Some gateways will drop event sessions when they rate-limit, some will temporarily block the client IP for a few minutes for some reason...
And don't get me started on "official" gateways versus "non-official" gateways, because not everything with an Ethernet port and OpenWebNet capabilities is considered and supported as a gateway by Legrand...

Without engineering documents from BTicino/Legrand, there's only so much you can do.

Quite frankly, MyHome has been the biggest regret building my house.
I would have been much better off with KNX, but couldn't find someone to sell and install it.

@gdluck
Copy link
Copy Markdown

gdluck commented Jan 17, 2025

Thanks for your work @anotherjulien and @michnovka. I think it this problem can be solved with a silent fake ping-pong when connecting and a bogus command after reconnecting. I'll try some fixes and let you know. Also, if I increase the workers - the queue is processed quickly and I didn't notice any more ignored commands.

@michnovka
Copy link
Copy Markdown
Owner

@anotherjulien thanks for letting us know the state of things. I totally understand the lack of motivation to implement features for hardware which you dont own and to fix bugs which you dont experience. I think since there are no people who would like to take over the project and show some commitment on maintaining it, it will be as it is - multiple forks each fixing the issue that bother the person who forked.
Since I am currently constructing house with KNX, I also dont want to take over or spend much time on improving this 99% working solution.
However, coming to the issue which was mentioned here multliple times ( #6 ) - you do not experience it at all? Id like to see why that is. I have a guess that it has to do with temperature system, so do you have temperature control over bticino in your house? If so, what kind do you have (99zone model?)

@anotherjulien
Copy link
Copy Markdown

I can't say, either I've never encountered the problem, or it is so infrequent that I did not notice it.
No I do not have the heating system, it would not have been appropriate for my house.
It is indeed very strange that the heating system would send events on a command session...
I'll jump on #6 for a couple of comments.

@michnovka
Copy link
Copy Markdown
Owner

Its not unexpected, it sends events because it is queried on command session.

By

  1. MyHOMEClimate::async_update - https://github.com/anotherjulien/MyHOME/blob/master/custom_components/myhome/climate.py#L180
    and
  2. MyHOMEGatewayHandler::listening_loop - https://github.com/anotherjulien/MyHOME/blob/master/custom_components/myhome/gateway.py#L298

Then, the response to these messages is series of status updates for heater followed by ACK in the end. This is corresponding to the manual also: WHO_4.pdf

See here what it looks like with proper logging, it is clear from there:

homeassistant  | 2025-01-17 18:51:17.831 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*#14*0230*1##`
homeassistant  | 2025-01-17 18:51:17.832 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Received heating command, sending query to zone 5
homeassistant  | 2025-01-17 18:51:17.832 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*#4*5##` was successfully queued.
homeassistant  | 2025-01-17 18:51:17.832 DEBUG (MainThread) [custom_components.myhome] MH202 Gateway (192.168.1.40) Message `*#4*5##` was successfully unqueued by worker 0.
homeassistant  | 2025-01-17 18:51:17.896 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*#4*5*14*0230##`.
homeassistant  | 2025-01-17 18:51:17.898 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*#4*5*12*0230*3##`.
homeassistant  | 2025-01-17 18:51:17.900 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*4*1*5##`.
homeassistant  | 2025-01-17 18:51:17.902 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*#4*5*13*00##`.
homeassistant  | 2025-01-17 18:51:17.905 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*14*0230##`
homeassistant  | 2025-01-17 18:51:17.905 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's target temperature is set to 23.0°C.
homeassistant  | 2025-01-17 18:51:17.910 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*12*0230*3##`
homeassistant  | 2025-01-17 18:51:17.910 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's local target temperature is set to 23.0°C.
homeassistant  | 2025-01-17 18:51:17.915 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*4*1*5##`
homeassistant  | 2025-01-17 18:51:17.916 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's mode is set to 'heat'.
homeassistant  | 2025-01-17 18:51:17.920 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*13*00##`
homeassistant  | 2025-01-17 18:51:17.920 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's local offset is set to 0°C.
homeassistant  | 2025-01-17 18:51:17.936 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*#4*5*0*0229##`.
homeassistant  | 2025-01-17 18:51:17.939 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*0*0229##`
homeassistant  | 2025-01-17 18:51:17.939 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's main sensor is reporting a temperature of 22.9°C.
homeassistant  | 2025-01-17 18:51:17.967 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Got non-signaling message `*#4*5*0*0229##`.
homeassistant  | 2025-01-17 18:51:17.970 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*#4*5*0*0229##`
homeassistant  | 2025-01-17 18:51:17.970 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's main sensor is reporting a temperature of 22.9°C.
homeassistant  | 2025-01-17 18:51:18.070 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message `*#4*5##` was successfully sent.
homeassistant  | 2025-01-17 18:51:18.124 DEBUG (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Message received: `*4*110*#5##`
homeassistant  | 2025-01-17 18:51:18.124 INFO (MainThread) [custom_components.myhome] [MH202 gateway - 192.168.1.40] Zone 5's mode is set to 'heat'.

@anotherjulien
Copy link
Copy Markdown

I see... so it's a collision. New commands are sent before all the replies that the heating system sends on the command session are read.
OK, we need to either loop through them all until we get a signaling message, or ignore it...

@michnovka
Copy link
Copy Markdown
Owner

yea, my code fixes this, check this PR. It works reliably

@anotherjulien
Copy link
Copy Markdown

Any benefit for a while True loop over a more deliberate condition (something like while not isinstance(resulting_message, OWNSignaling) for example)?

Where did the need for multiple retries upon NACK come?

@michnovka
Copy link
Copy Markdown
Owner

From the OWN_Intro_ENG.pdf documentation:

"This frame means that the Open message, sent by the Client to the Server, is semantically or syntactically wrong. Furthermore it is used as an ended message when it is the answer to an Open frame which involves the forwarding of one or more messages in sequence (status or dimension requests). In this case, the Client may consider invalid any message received before the NACK."

So we should:

First verify if your message format is correct (we can omit this)
If the format is correct and you're getting a NACK due to transmission failure, then yes, you can try sending the message again
However, if you're getting repeated NACKs, you should:
    Check the system status
    Verify the zone exists and is accessible
    Make sure the parameters you're sending are within valid ranges
    Check if there are any connection issues with the bus

I acomplish this by limiting max number of retries

@anotherjulien
Copy link
Copy Markdown

ok, makes sense!
I'll do my best to work on merging the (frankly shameful) backlog of PR I have of those project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants