-
Notifications
You must be signed in to change notification settings - Fork 2k
ngx.location.capture 与 redis.set_keepalive合用导致worker进程退出 #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
补充一下, 如果r1.lua 和 r2.lua 连不同的redis也没问题。 编译选项: 操作系统: SUSE Linux Enterprise Server 10.3 (x86_64) / Ubuntu 11.10 |
r2.lua 中 red:connect使用了一个keepalive_peer,但未进行操作,所以 ngx_http_lua_socket_tcp_handler->write_event_handler 未初始化; 在ngx.location.capture结束之后进入epoll write event,调用到u->write_event_handler,就core掉了。 red:connect之后加一个red:ping()可以简单规避这个问题。
|
…egfaults if it is not used immediately. thanks xukaifu for reporting this as github issue #108.
我已经提交了补丁到 git master HEAD. 请尝试一下?多谢! |
Consider it resolved. |
刚刚测试了一下,可以啦。非常感谢。 |
…egfaults if it is not used immediately. thanks xukaifu for reporting this as github issue openresty#108.
This issue appeared in our EC2 test cluster, which compiles Nginx with `-DNGX_LUA_USE_ASSERT` and `-DNGX_LUA_ABORT_AT_PANIC`. The lua-resty-redis test: === TEST 1: github issue openresty#108: ngx.locaiton.capture + redis.set_keepalive in t/bugs.t [1] would produce core dumps in the check leak testing mode. The backtrace for these core dumps was: #0 0x00007fd417bee277 in raise () from /lib64/libc.so.6 openresty#1 0x00007fd417bef968 in abort () from /lib64/libc.so.6 openresty#2 0x00007fd417be7096 in __assert_fail_base () from /lib64/libc.so.6 openresty#3 0x00007fd417be7142 in __assert_fail () from /lib64/libc.so.6 openresty#4 0x000000000050d227 in ngx_http_lua_socket_tcp_resume_conn_op (spool=c/ngx_http_lua_socket_tcp.c:3963 openresty#5 0x000000000050e51a in ngx_http_lua_socket_tcp_finalize (r=r@entry=0x5628) at ../../src/ngx_http_lua_socket_tcp.c:4195 openresty#6 0x000000000050e570 in ngx_http_lua_socket_tcp_cleanup (data=0x7fd419p_lua_socket_tcp.c:3755 openresty#7 0x0000000000463aa5 in ngx_http_free_request (r=r@entry=0xbfaec0, rc=http_request.c:3508 ... Which was caused by the following assertion in ngx_http_lua_socket_tcp.c with `NGX_DEBUG`: #if (NGX_DEBUG) ngx_http_lua_assert(spool->connections >= 0); #else Thanks to Mozilla's rr, a recorded session showed that `spool->connections` was `-1`. Unfortunately, reproducing this case does not seem possible, since the failure is due to the request cleanup (`ngx_http_free_request`). Here is an explanation: -- thread 1 local sock = ngx.socket.tcp() sock:connect() sock:setkeepalive() -- pool created, connections: 1 -- thread 2 local sock = ngx.socket.tcp() sock:connect() -- from pool, connections: 1 -- thread 1 -- sock from thread 1 idle timeout, closes, and calls -- ngx_http_lua_socket_tcp_finalize, connections: 0 -- thread 2 sock:setkeepalive() -- connections: -1 -- ngx_http_lua_socket_tcp_resume_conn_op gets called, assertion fails In order to avoid this race condition, we must determine whether the socket pool exists or not, not from the `ngx_http_lua_socket_tcp_upstream` struct, but from the Lua Registry. This way, when thread 2's socket enters the keepalive state, it will respect the previous call to `ngx_http_lua_socket_free_pool` (which unset the pool from the registry). [1]: https://github.com/openresty/lua-resty-redis/blob/master/t/bugs.t
This issue appeared in our EC2 test cluster, which compiles Nginx with `-DNGX_LUA_USE_ASSERT` and `-DNGX_LUA_ABORT_AT_PANIC`. The lua-resty-redis test: === TEST 1: github issue openresty#108: ngx.locaiton.capture + redis.set_keepalive in t/bugs.t [1] would produce core dumps in the check leak testing mode. The backtrace for these core dumps was: #0 0x00007fd417bee277 in raise () from /lib64/libc.so.6 openresty#1 0x00007fd417bef968 in abort () from /lib64/libc.so.6 openresty#2 0x00007fd417be7096 in __assert_fail_base () from /lib64/libc.so.6 openresty#3 0x00007fd417be7142 in __assert_fail () from /lib64/libc.so.6 openresty#4 0x000000000050d227 in ngx_http_lua_socket_tcp_resume_conn_op (spool=c/ngx_http_lua_socket_tcp.c:3963 openresty#5 0x000000000050e51a in ngx_http_lua_socket_tcp_finalize (r=r@entry=0x5628) at ../../src/ngx_http_lua_socket_tcp.c:4195 openresty#6 0x000000000050e570 in ngx_http_lua_socket_tcp_cleanup (data=0x7fd419p_lua_socket_tcp.c:3755 openresty#7 0x0000000000463aa5 in ngx_http_free_request (r=r@entry=0xbfaec0, rc=http_request.c:3508 ... Which was caused by the following assertion in ngx_http_lua_socket_tcp.c with `NGX_DEBUG`: #if (NGX_DEBUG) ngx_http_lua_assert(spool->connections >= 0); #else Thanks to Mozilla's rr, a recorded session showed that `spool->connections` was `-1`. Here is a reproducible test case: local sock1 = ngx.socket.tcp() local sock2 = ngx.socket.tcp() sock1:connect() sock2:connect() sock1:setkeepalive() -- pool created, connections: 1 sock2:setkeepalive() -- connections: 1 sock1:connect() -- connections: 1 sock2:connect() -- connections: 1 sock1:close() -- connections: 0 sock2:close() -- connections: -1 -- ngx_http_lua_socket_tcp_resume_conn_op gets called, assertion fails In order to avoid this race condition, we must determine whether the socket pool exists or not, not from the `ngx_http_lua_socket_tcp_upstream` struct, but from the Lua Registry. This way, when thread 2's socket enters the keepalive state, it will respect the previous call to `ngx_http_lua_socket_free_pool` (which unset the pool from the registry). [1]: https://github.com/openresty/lua-resty-redis/blob/master/t/bugs.t
r1.lua 如下:
r2.lua 配置如下
nginx.conf 配置如下
先往redis里面弄点数据:
然后访问r1
此时worker进程退出, debug日志中看到如下信息:
2012/04/27 18:25:24 [notice] 26423#0: signal 17 (SIGCHLD) received
2012/04/27 18:25:24 [alert] 26423#0: worker process 12262 exited on signal 11
redis 2.2.11 , 2.4.1 下均100%重现
只要注释掉代码1, 代码2, 代码3 中的任何一行, 都不会重现。。..
The text was updated successfully, but these errors were encountered: