Description
The RPC mechanism has unclear use of locks to protect it and shared_ptr's to hold its resources. The current code in v2.21.2 is not consistent enough for me to infer the intent. Therefore, I ask for clarity in intention by showing the code that concerns me.
I can definitely miss something in my code review. Please do point out any mistakes I made.
Setup
- all compilers, os, platforms
- depthai-core and several versions earlier too
Code review
pimpl->rpcStream
I see no code that uses pimpl->rpcStream
. Why does it exist?
depthai-core/src/device/DeviceBase.cpp
Lines 625 to 629 in 125feb8
- line 626 creates a
shared_ptr<XLinkStream>
- line 626 then assigns that shared_ptr to
pimpl->rpcStream
which no code will meaningfully use. - line 627 creates a new shared_ptr
rpcStream
and copy assignspimpl->rpcStream
to it. So now the XLinkStream has two (2) shared_ptrs pointing to it. Why do we need two? - line 629 lambda COPY captures
rpcStream
. So now there are three (3) shared_ptrs pointing to the XLinkStream. auto rpcStream
, the one outside the lambda, goes out of scope at the end ofinit2()
. It had no purpose to exist.pimpl->rpcStream
is eventually set tonullptr
inDevice::close()
. Nothing ever used it. It had no purpose to exist.
In addition to no purpose, having the extra 2 refs on the shared_ptr adds minor unneeded code and cpu use.
I suggest...
- remove
pimpl->rpcStream
. I see no need for it. - merge line 626 and 627 to a single
auto rpcStream = ...
- change line 629 to be a move capture like
rpcStream = std::move(rpcStream)
- Could put the make_shared in the capture to eliminate the local if OK with a long/wrap capture. It will emplace construct rather than move.
pimpl->rpcMutex
pimpl->rpcMutex
seems to be used to protect RPC activity. It is used only one place...
depthai-core/src/device/DeviceBase.cpp
Lines 630 to 632 in 125feb8
Yet, the resources on which the RPC mechanism depends, like rpcStream
and rpcClient
are not protected by this mutex. Of most concern is the code in Device::close()
...
depthai-core/src/device/DeviceBase.cpp
Lines 429 to 430 in 125feb8
The intention to set both those to nullptr
is unclear. What is it? Yes, something is related to above rpcStream
discussion.
I'll explore rpcStream
below. pimpl->rpcClient
is tracked separately in issue #805
- [this is what the code currently does] Release 1 refcount of shared ownership to the thing
pimpl->rpcStream
points. - Set the value of the thing
pimpl->rpcStream
points tonullptr
or an empty XLinkStream...which doesn't exist as there is no default constructor for XLinkStream
I think the first (1) is useless. pimpl->rpcStream
is used by no code. The pimpl->rpcClient
lambda captured the XLinkStream itself. That lambda XLinkStream continues to live. pimpl->rpcStream = nullptr
has no affect on its lifetime.
If we instead want to do the second (2)...which is impossible...then I would want to lock pimpl->rpcMutex
and then set its value with *pimpl->rpcStream = XLinkStream()
, and add code into the lambda to check+throw on invalid XLinkStream.
If we take a step-back and blur intention...there could also be a threading issue. Image two threads. And there is no predicting which thread is first, last, fastest, slowest, when each or both are paused in the middle, etc.
Thread 1
Calls a Device
api which cascades to a call to the RPC lambda which contains the above mutex lock. The lambda then uses rpcStream
to write and read data to/from XLink. Please note the lambda itself is contained within rpcClient
.
Thread 2
Calls Device::close()
. This function has code...
depthai-core/src/device/DeviceBase.cpp
Lines 429 to 430 in 125feb8
What happens? 💣Nothing good. Maybe errors. Maybe a crash. Unpredictable. Why?....
- Issue [REGRESSION] Device APIs crash application when called after
Device::close()
#805 crash scenrios (two so far I've identified) - Thread 1 needs a valid
rpcStream
value during the entirecall
of the lambda - Thread 2 sets
rpcStream = nullptr
with no coordination with the RPC lambda viarpcMutex
. But what was the intention? - If the intention of Thread 2 was to somehow destruct the
rpcStream
that it used within the lambda, then this is dangerous withuot mutex coordination. The null/destruct could happen at the start of the lambda before the XLink write, or between the XLink write and XLink read.