Skip to content

bugfix: coroutine.wrap: propagate errors to the parent coroutine. #1239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thibaultcha
Copy link
Member

@thibaultcha thibaultcha commented Jan 23, 2018

I hereby granted the copyright of the changes in this pull request
to the authors of this lua-nginx-module project.

The manual states that coroutine.wrap is "Similar to the standard Lua API". However, the Lua API's coroutine.wrap has some nuances compared to the one provided by ngx_lua:

Returns the same values returned by resume, except the first boolean. In case of error, propagates the error.

Source (Lua 5.1 manual).

This caused me some issues in the daily job's latest release, where we expected errors encountered in coroutine.wrap to be propagated to the Lua main thread, and cause 500 errors. But those errors were swallowed by the thread "scheduler".

Example:

-- test.lua
local co = coroutine.wrap(function() return nnil.hello end)
print(co(1))
print("still running")

With LuaJIT 2.1:

$ luajit ./test.lua
luajit: /<path>/test.lua:2: /<path>/test.lua:1: attempt to index global 'nnil' (a nil value)
stack traceback:
	[C]: in function 'co'
	/<path>/test.lua:2: in main chunk
	[C]: at 0x0100000ef0

With our resty-cli utility:

$ resty ./test.lua
2018/01/23 14:50:25 [error] 20269#0: *2 lua coroutine: runtime error: /<path>/test.lua:1: attempt to index global 'nnil' (a nil value)
stack traceback:
coroutine 0:
	/<path>/test.lua: in function </<path>/test.lua:1>
coroutine 1:
	[C]: in function 'resume'
	coroutine.wrap:21: in function 'co'
	/<path>/test.lua:2: in function 'file_gen'
	init_worker_by_lua:43: in function <init_worker_by_lua:41>
	[C]: in function 'xpcall'
	init_worker_by_lua:50: in function <init_worker_by_lua:48>, context: ngx.timer
/<path>/test.lua:1: attempt to index global 'nnil' (a nil value)
still running

Notice the still running line at the bottom.

My initial fix for this was to edit the inline Lua code that creates coroutine.wrap so that it would propagate errors with error(..., 2), but it is a less than ideal fix, since the error becomes logged twice (once in the coroutine context, and once in the parent context), like so:

2018/01/23 15:01:35 [error] 49378#0: *2 lua coroutine: runtime error: content_by_lua(nginx.conf:49):4: something went wrong
stack traceback:
coroutine 0:
	[C]: in function 'error'
	content_by_lua(nginx.conf:49):4: in function <content_by_lua(nginx.conf:49):2>
coroutine 1:
	[C]: in function 'resume'
	coroutine.wrap:21: in function 'co'
	content_by_lua(nginx.conf:49):7: in function <content_by_lua(nginx.conf:49):1>, client: 127.0.0.1, server: localhost, request: "GET /t HTTP/1.1", host: "localhost"
2018/01/23 15:01:35 [error] 49378#0: *2 lua entry thread aborted: runtime error: content_by_lua(nginx.conf:49):7: content_by_lua(nginx.conf:49):4: something went wrong
stack traceback:
coroutine 0:
	[C]: in function 'error'
	coroutine.wrap:21: in function 'co'
	content_by_lua(nginx.conf:49):7: in function <content_by_lua(nginx.conf:49):1>, client: 127.0.0.1, server: localhost, request: "GET /t HTTP/1.1", host: "localhost"

Additionally, dealing with it in the Lua land would have meant a lot of expensive operations with the variadic arguments/return values aspects of coroutine.resume (we would have had to deal with pack, unpack, et al.).

Instead, I propose to get rid of the "hack" that is used to implement coroutine.wrap, and implement a version of it that embraces the ngx_lua threading model.

@thibaultcha thibaultcha force-pushed the fix/coroutine-wrap-propagate-error branch from 1a71966 to d7867b6 Compare January 23, 2018 23:42
@thibaultcha thibaultcha force-pushed the fix/coroutine-wrap-propagate-error branch from d7867b6 to f62ae61 Compare February 3, 2018 02:17
@thibaultcha
Copy link
Member Author

thibaultcha commented Feb 8, 2018

@agentzh Do you have any feedback on this? Considering it is a behavior mistmatch according to both the Lua and ngx_lua manuals, I was under the impression that it would be of a somewhat middle to high priority?

@thibaultcha thibaultcha force-pushed the fix/coroutine-wrap-propagate-error branch from f62ae61 to 7a69258 Compare February 10, 2018 08:25
@agentzh
Copy link
Member

agentzh commented Feb 26, 2018

@thibaultcha This is a deep change. I'm not sure about it without thinking much longer about it.

@agentzh
Copy link
Member

agentzh commented Feb 26, 2018

@thibaultcha Thank you for reporting this issue and contributing this patch!

@thibaultcha
Copy link
Member Author

@agentzh Have you given more thought to this? Compatibility with Lua(JIT) is one of the goal we try to achieve with ngx_lua (as per previous incompatibilities we fixed) and that is why I believe this fix to be important. Thanks!

@agentzh
Copy link
Member

agentzh commented Mar 28, 2018

@thibaultcha I understand but I have other higher priority things at the moment. Sorry. Will get back to this as soon as I can manage.

@thibaultcha thibaultcha force-pushed the fix/coroutine-wrap-propagate-error branch 5 times, most recently from 75c0b3d to ac39640 Compare August 9, 2019 21:24
@thibaultcha thibaultcha force-pushed the fix/coroutine-wrap-propagate-error branch from ac39640 to 979f864 Compare August 9, 2019 21:35
@thibaultcha thibaultcha merged commit 2a19073 into openresty:master Aug 13, 2019
@thibaultcha thibaultcha deleted the fix/coroutine-wrap-propagate-error branch August 13, 2019 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants