-
Notifications
You must be signed in to change notification settings - Fork 120
dev ARQ (retransmit) #185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
dev ARQ (retransmit) #185
Conversation
|
Thanks for including the very helpful and, I think clear, diagram. So far, I've only looked at that, but I already have a few comments/observations/suggestions.
Does this make sense to you? Am I missing something? |
|
many thx for the comments concerning all points related to seq not 1 bit: YES, 1 bit is fully sufficient for a basic mechanism. Eventually we may want to change. However, for purely historical reasons the seq number just happens to be 3 bit, and at this point I see no reason to change that. We know we could/will. Not a relevant point IMHO :) there is btw always a situation which is not optional. See e.g seq 3 the protocol is just the standard protocol as on any web page (there are various names for it, so no name here)(all 1 bit). There are two versions, those which send an ack, and those which send the next desired number. The main challenge is handling the various non-protocol related states, like disconnects, frames which carry commands not serial data, etc. pp. Hence these states, and I figured that the send ack version is easier to handle these states. it's possible that with > 1 bit one may reduce few edge cases the problem with 5 is probably solved by using the other method, to send the next desired no. Hm. Maybe I shoud convert to that. |
|
|
I think such type of protocols are out of question |
|
sorry, I edited your post ... grrrr, this damed github, not the first time I stepped into ths trap |
the book keeping and state handling looked easier to me |
To me, the opposite. Sending back the last received sequence number means you just check if the acknowledged sequence number matches what you last sent. If so, move on, if not, retransmit. The other way seems to require you to add when you respond and subtract when you compare. But, I suppose it's mostly a matter of how you think about it. Either way, there is not much state involved except for deciding when to give up on retransmission of a given frame. Many of the algorithms found online and in text books are intended for more complex systems which don't just ping-pong messages at a constant interval like we do, so a very simple method can be optimal for us if we rule out buffering more than the most recent transmission as you have. |
|
I was trying this initially, but then concluded for the ack, I will retry, could be that at the early stages I also had too much of how to abstract the code in mind. I didn't had sorted out the states initially. Anyway, it has benefits, so whatever what it's going to be so :) |
|
@brad112358 seemingly works, in that it connects etc. pp, but the symptom described by @jlpoltrack also exists, i.e. MP shows lost packets ... so, appears something is still not working as expected ... here the time plan for the changed protocol |
# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp
|
with I do see continuous packet losses in MP, MP tells pretty stably 95-96% link quality, so around every 20iest packet is lost the mLRS LQ metric on the OLED display tells something around 75% ... not sure if that means that the mechanism is helping I do my tests btw in 19 Hz mode (with a 2.4 GHz system) |
|
And you didn't add a retransmission limit yet? So something must be wrong if MP is correct. When I get some time, I'll try to reproduce the problem with QGC. |
|
Do you use Bluetooth or UDP or TCP WiFi or wired serial for the GCS connection? |
yes, no retransmission limit
yes :)
wired serial connection, from tx serial via usb-ttl to PC one potential source of problem which I have not yet ruled out is that the stream flow control isn't good enough, so that AP sends too many messages, so that some are dropped every once in a while ... I'm using 19 Hz, so there is some restriction. |
|
Looks like that fixed it. QGC is now reporting 0 lost messages |
YES :) @jlpoltrack made the relevant comment not sure you also follow the discussion at discord |
|
|
|
I had the baud rate too high for the crap R9M inverter with weak pullup. Working fine at 115200 serial speed on the Tx |
|
I think that the problem is that the whole retry thing doesn't work in the first place, and we are always struck by the issue that the seq is only 3 bit and the ack only 1 bit. Not talking about framelost/reset. I come to the conclusion, we can't ensure that we always do the right thing, once we have failure sequences which are longer than 8 times, the seq_no wraps and we have an issue. So, the only way out is to kind of model the behavior of the other side, and from that reconstruct the actual seq_no. It's in some way what your counter does. The seq_no would have to be longer than the number of frames for disconnect. For the moment I did option 1. See next push. Maybe we also simply need to come to the conclusion that if we have long failure sequences, that we simply have to accept that we can't recover everything to 100 %. I mean, long failure sequences mean pretty low LQ anyway. I think experience is anyway that below ca 40% the link is pretty "erratic" anyway, in the sense that it's not like smoothly around a stationary value. Maybe we just want too much. |
… framelost detection.
|
another point: |
|
I don't think your (a) or(b) above can have any real impact on lost frame detection since they convey exactly the same information and either value can trivially be calculated from the other. With regard to lost frame detection, I agree that finding a workable method depends on the behavior of the sending side so we need to nail that down first. On the sending side, I think we will want more than a simple fixed limit on the number of retries for each frame.
Does this sound correct? |
|
interesting and creative idea, but I think it's way too complicated really. It also doesn't solve the main issue, AFAICS. As regards frame loss, with the simple option 1. added we cover most situations well. It's failures really matter only in very low LQ situations, and here things are not nice anyway. So, I guess I consider the topic largely "solved", i.e., my focus shifts towards doing more testing of the code. I'm not sure you could find this agreeable. My mind is also shifting towards adding a full reparsing to the mavlink parser, as this would resolve the frame loss issue completely. In fact, my mind is slightly different. Since quite some while, I would want to change the transmission such that the mavlink frame marker is not at the end of a mlrs frame payload, or split across two in the case of the two byte frame marker. Unfortunately, because of all the parsing and fifos etc it's not so easy to implement this. As a side effect of that it would also become easy to do the frameloss detection, pretty much like your old idea, namely simply scan the payload for the marker, compare to the parser expectation, and reset if no match. So, my mind is currently on the marker-not-at-end thing :) |
|
@brad112358 your old extra byte idea was to know where the next frame marker is, and compare to the parser state, and from that work out if something is missed. |
# Conflicts: # mLRS/Common/arq.h # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp
|
merged to main |
# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp
# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp
|
interesting work! looking forward to have at mainline at some point |


this is a dev/test branch for working on retransmission
this one is simplified in two ways, to not make it too complicated and facilitate easier testing:
@brad112358 this might interest you, given you showed interest in this before. I actually would much appreciate you cheking it out; your strength in finding the little loopholes/bugs would be useful :)
anyone else is of course also massively welcome !!!!!