Skip to content

Race between SimpleLink WiFi driver FastConnect and networking app startup. #11889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GAnthony opened this issue Dec 5, 2018 · 3 comments
Closed
Assignees
Labels
area: Networking area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Milestone

Comments

@GAnthony
Copy link
Collaborator

GAnthony commented Dec 5, 2018

To simplify the WiFi provisioning of the SimpleLink WiFi based device, the SimpleLink WiFi driver is configured to do an automatic connect ("FastConnect") to the previously connected Access Point (AP).

The initialization of the WiFi driver occurs in the driver's iface_init call: simplelink_iface_init() , which is called by the networking subsystem before the main networking application begins. However, the WiFi connection may complete sometime later, after the main networking application begins, so networking apps may simply fail in their socket or getaddrinfo() calls, due to the lack of WiFi connection.

There is a provision in the networking subsystem to configure a synchronization point between networking events and the start of the main application: CONFIG_NET_CONFIG_AUTO_INIT, which causes a call to init_net_app() before the main application starts, which can wait with a timeout for configured networking events (likeNET_EVENT_IPV4_ADD_ADD for example).

However, there are issues with this approach:

  • The WiFi driver can post the event before init_net_app() registers a handler to listen for it - so the event is missed;
  • init_net_app() does not listen for WiFi connections (as that could be an option instead of IPV4);
  • to build init_net_app() requires setting CONFIG_NET_CONFIG_SETTINGS, which forces hardcoded values for IP address, which is not desired for an offload driver which already handles DHCP, and brings in other unnecessary functions.

To reproduce, build/run the samples/net/sockets/http_get example on the cc3220sf_launchxl.
This example often fails unless there is a breakpoint at main allowing some time to for the WiFi connection to succeed.

Some options for a solution:

  1. Block in simplelink_iface_init() until the connection is made. This would be the simplest fix, not involving any changes to the networking config module.
  2. Update networking auto config to understand WiFi drivers which offload DHCP, and add an option to (first) check, then wait, for a connection event.
@GAnthony GAnthony added this to the v1.14.0 milestone Dec 5, 2018
@GAnthony GAnthony added the bug The issue is a bug, or the PR is fixing a bug label Dec 5, 2018
@jukkar
Copy link
Member

jukkar commented Dec 11, 2018

The option 2. is probably the best one to implement. Our network config library is quite simple still and needs some TLC indeed.

If the user does not set hard coded IP addresses in prj.conf file, then the net config library cannot add any addresses itself. So there should be no issue with this in real life, except that by default all the samples set some IP address to the application. But I would argue that the samples need typically tweaking anyway and should not be used blindly. Suggestions how to make this more user friendly are welcome of course.

One thing we could do is to create subsys/net/lib/config/wifi_settings.c or similar that could make sure that the config init is done only after the wifi is ready to serve. So perhaps wifi driver etc could send an event when the network is ready and IP addresses etc can be set.

@GAnthony
Copy link
Collaborator Author

Yes, option 2 would be a more generic approach.

Perhaps the first step would be to get net/lib/config/init.c to build if CONFIG_NET_CONFIG_AUTO_INIT=y even when CONFIG_NET_CONFIG_SETTINGS=n.

Regarding adding a new wifi_settings.c, it may not be necessary if the race in awaiting an NET_EVENT_IPV4_ADDR_ADD event were somehow fixed.
For example, if !defined(CONFIG_NET_CONFIG_MY_IPV4_ADDR), could we first check that an IPV4 address has not been set at the interface (via DHCP), and if not, then register the ipv4_addr_add_handler() and then wait? If it was already set, then we pass the init_net_app() stage.

This isn't strictly waiting for a "connection event", but for the SimpleLink case this would work because the DHCP is always enabled, and a net_if_ipv4_addr_add() always occurs soon after.

@GAnthony
Copy link
Collaborator Author

One thing we could do is to create subsys/net/lib/config/wifi_settings.c ...

An interesting idea, but thinking about this some more, it gets a bit complex:

  • Even if wifi_settings can block init_net_app() on a connection state/event, there may still be a race as the wifi driver sends the IPV4 event before init_net_app() can register its handler;
  • Net stack currently doesn't distinguish between DHCP and DHCP offload, so some extra conditions may be needed to qualify code guarded by CONFIG_NET_DHCPV4 in lib/config/init.c;
  • For the non-DHCP case, there is currently no WiFi API to set a static IP address for the WiFi offload driver; we may need to add some hooks there as well to be complete.

Given this, and LTS coming soon (?), we're thinking to opt for solution 1) to not inadvertently impact other users of init_net_app(), yet still get the cc3220sf net samples working that currently exhibit the race.

@nashif nashif added the priority: medium Medium impact/importance bug label Jan 10, 2019
GAnthony pushed a commit to GAnthony/zephyr that referenced this issue Jan 18, 2019
The SimpleLink wifi driver enables the Fast Connect method of
WiFi provisioning, which allows the network coprocessor to
reconnect to a previously connected Access Point (AP) on
startup.

Previously, if Fast Connect failed to connect, any network
socket applications would inevitably fail, as there would have
been no wifi connection.

This patch adds a configurable timeout for the Fast Connect
feature, after which timeout, an error is logged informing
the user to manually reconnect to an AP.

Reconnection is typically accomplished by separately running the
wifi sample shell program.

Fixes: zephyrproject-rtos#11889

Signed-off-by: Gil Pitney <[email protected]>
jukkar pushed a commit that referenced this issue Jan 18, 2019
The SimpleLink wifi driver enables the Fast Connect method of
WiFi provisioning, which allows the network coprocessor to
reconnect to a previously connected Access Point (AP) on
startup.

Previously, if Fast Connect failed to connect, any network
socket applications would inevitably fail, as there would have
been no wifi connection.

This patch adds a configurable timeout for the Fast Connect
feature, after which timeout, an error is logged informing
the user to manually reconnect to an AP.

Reconnection is typically accomplished by separately running the
wifi sample shell program.

Fixes: #11889

Signed-off-by: Gil Pitney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking area: Wi-Fi Wi-Fi bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants