-
Notifications
You must be signed in to change notification settings - Fork 7.5k
samples: net: lwm2m: Fix DTLS related issues #13477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samples: net: lwm2m: Fix DTLS related issues #13477
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13477 +/- ##
=======================================
Coverage 52.46% 52.46%
=======================================
Files 322 322
Lines 46597 46597
Branches 10769 10769
=======================================
Hits 24446 24446
Misses 17246 17246
Partials 4905 4905 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested good for me. Thanks for the fix @rlubos
@@ -3929,11 +3929,6 @@ static void socket_receive_loop(void) | |||
len = recvfrom(sock_ctx[i]->sock_fd, in_buf, | |||
sizeof(in_buf) - 1, 0, | |||
&from_addr, &from_addr_len); | |||
if (errno) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would explicitly mention in the commit message that there's already check in the code for len < 0
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, layering violation strikes again. I would suggest to separate poll hacks from the rest of the patches. The latter can go in immediately, while poll hacks need to scrutinized.
(*pev)->mode = K_POLL_MODE_NOTIFY_ONLY; | ||
(*pev)->state = K_POLL_STATE_NOT_READY; | ||
|
||
goto again; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain what happens here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handshake is complete, still we don't have any application data on a socket. Therefore, poll event is reconfigured to monitor fifo now.
The reason for the extra monitor for the handshake is, that unless we provide some (presumably overcomplicated) mechanism for mulit-context handshake, we may end up with a deadlock within poll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so if it happens once, when TLS socket state changes from "handshake non-complete" to "handshake complete", sounds good.
@@ -1707,19 +1719,34 @@ static int ztls_poll_update_ctx(struct net_context *ctx, | |||
} | |||
|
|||
if (pfd->events & ZSOCK_POLLIN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if we polled for POLLOUT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we assume that socket is always writable
. POLLOUT is not an issue in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we
assume that socket is always writable
.
But a socket is not always writable. For normal socket, this is well known issue (well, bug) present so far because resolving it would require adding more overhead to socket structures, and which is properly ticketed as #5842.
Based on the response above, I don't see that this issue was even considered/analyzed for TLS sockets, far less in queue for fixing.
Then please explain how you think poll() should work on a DTLS socket, only once that is ok, we can look at the code to see whether it does the described or not. |
Nothing excepctional here - it should block until there's application data ready to read, or an error or timeout occurs. And as current implementation had an issue, hence bugfix. I don't see a reason to split this PR. Commits are well separated, and they all serve a single purpose - make LWM2M sample behave correctly. And submiting partial solutions just for sake of (I'm not even sure for sake of what) is a needless overhead for me. |
Errno value is only significant when `recvfrom` function indicated an error (by returning -1). We should not depend on it's value if no error is notified. As the return value of `recvfrom` is already checked, misused errno verification can simply be removed. Signed-off-by: Robert Lubos <[email protected]>
System workqueue stack was not large enough to handle DTLS handshake, which lead to system crash. Signed-off-by: Robert Lubos <[email protected]>
Instead of simple bool value, use a semaphore to notify that TLS handshake is complete. This way, we can monitor this value with k_poll. Signed-off-by: Robert Lubos <[email protected]>
When DTLS client was added to `poll` before/during the handshake, it could throw errors and in some circumstances (when polling thread was cooperative and had higher or equal priority to the handshake thread) could lead to a deadlock in the application. Prevent that, by blocking on handshake semaphore instead of fifo. Poll will start using fifo for data poll only after handshake is complete. Signed-off-by: Robert Lubos <[email protected]>
50ab5ca
to
93d0a76
Compare
So, to clarify, the single purpose should be "implement poll() correctly", not just hack it in a way to make one sample work, but possibly leaving even weirder corner cases for other samples and apps. The purpose of split is usually to merge obvious parts soon, and focus on non-obvious parts without extra noise. If you don't want to split, sounds good, let's make each commit here wait until the last issue here is clarified. |
Could you please specify what other cases you mean? We've observed |
I will, please give me some time.
+1 for that, but I'm not sure that poll() still works correctly for TLS things. If we don't take cases like this to try to improve it overall, then when we'll get to it. (It's not that poll() just landed, so "it's good that it works at all". We're having Nth iteration on it, and would be nice to plan for them to converge.) (And yes, non-TLS poll() has its issues too, it's just non-TLS and TLS cases are two separate ones, as TLS one would normally be implemented on application/library, not stack, side.) |
@pfalcon I have to agree with @rlubos on this. Unless you are planning a much larger change to the way poll() works for TLS -- and I'm not sure that's going to happen this late in the 1.14 LTS cycle -- then this patch seems to be a reasonable fix for a bug in the way the TLS handshake was affecting a pre-existing poll(). Of course, if there is an actual problem with the bugfix then we should re-examine. Are you nervous that this implementation handles a bit of UDP vs. TCP magic? Even after reading your discussion, I'm not sure what your main concern is with this patch. It's just a bugfix. |
@@ -1376,7 +1384,7 @@ static ssize_t sendto_dtls_client(struct net_context *ctx, const void *buf, | |||
} | |||
} | |||
|
|||
if (!ctx->tls->tls_established) { | |||
if (!is_handshake_complete(ctx)) { | |||
/* TODO For simplicity, TLS handshake blocks the socket even for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it's actually documented that non-blocking TLS sockets aren't really non-blocking.
|
||
again: | ||
(*pev)++; | ||
errno = EAGAIN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was mentioned previously, it's quite confusing to see errno assignment here. errno should be reserved for communication between system and userspace application. Part of system would rather communicate with -EAGAIN.
So, the problem is that nobody knows how many more problems are there. It would seem that discovery of one issue should prompt to pay more attention to those matters, but apparently nobody is interested in that. Ok, I'm withdrawing my -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving from my side too, so this isn't left for next week.
@pfalcon I appreciate you removing your -1 to allow this to move forward. Can we open an issue to keep the spirit of your thoughts on poll() for TLS use case? Something to document your comments above so they are not lost? |
That's generally what I tried to drive too. I'd leave that to @rlubos though. My original concern was "hmm, I don't see all the advanced code required to handle poll-like functionality on non-blocking TLS socket". But a code comment reminded me that (according to it) there isn't really truly on-blocking TLS support. But again, I'd leave that to @rlubos at his convenience (I'd think that preparing for a release would be a good time to revisit, or at least ticket such cases, but clearly everyone has a lot on each own plate already). |
Fix LWM2M and DTLS issues observed in the LWM2M sample.
Fixes #13380
Signed-off-by: Robert Lubos [email protected]