Skip to content

关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coredump问题 #2394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yyqbuct opened this issue Feb 12, 2025 · 3 comments

Comments

@yyqbuct
Copy link

yyqbuct commented Feb 12, 2025

问题版本号

ngx_lua-0.10.13
最新的代码,应该也存在问题

问题复现用例

local fetch = function(uri)
    return ngx.location.capture(uri)
end
local t1 = ngx.thread.spawn(fetch, "/f1")-- 子请求f1(0.1s 返回结果)
local t2 = ngx.thread.spawn(fetch, "/f2")-- 子请求f2(0.2s 返回结果)
ngx.thread.wait(t1, t2)
ngx.say("example2")
ngx.exit(200)

出现异常时的堆栈(运行行为未定义,可能是segment fault,也可能是死循环)

lj_gc_step会进入死循环

  • gc_onestep 返回0
  • lim-=0一直大于0
  • g->gc.state不等于GCSPAUSE
    Image
    Image

分析

  • light thread t1 返回后,entry thread 调用了ngx.exit(),随后在ngx_http_lua_handle_exit函数中释放请求资源(由于r->main->count不等于0,所以请求资源是泄露了的),并解除了对light thread t2以及entry thread的引用(意味着会被GC回收)。
  • light thread t2 返回后,如果lua_state被回收,运行行为未定义

复现用例协程调度图

Image

如何修复

思路:防范于未然,调用ngx.exit时如果有未结束的capture,直接抛异常。
其实ngx.exit内部有检查是否能结束请求的逻辑

    if (ctx->no_abort
        && rc != NGX_ERROR
        && rc != NGX_HTTP_CLOSE
        && rc != NGX_HTTP_REQUEST_TIME_OUT
        && rc != NGX_HTTP_CLIENT_CLOSED_REQUEST)
    {
        return luaL_error(L, "attempt to abort with pending subrequests");
    }

这个检查在只有一个capture子请求时是生效的。如果如用例所示有多个capture时,就会失效。
解决办法是将no_abort的语义由bool类型改为对capture的计数,创建时+1,结束时-1,具体代码如下所示:

diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
index 01ef2be..f8bfb2e 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_common.h
@@ -500,6 +500,9 @@ typedef struct ngx_http_lua_ctx_s {
 
     int                      uthreads; /* number of active user threads */
 
+    int                     no_aborts; /* prohibit "world abortion" via ngx.exit()
+                                          and etc */
+
     uint16_t                 context;   /* the current running directive context
                                            (or running phase) for the current
                                            Lua chunk */
@@ -538,9 +541,6 @@ typedef struct ngx_http_lua_ctx_s {
 
     unsigned         buffering:1; /* HTTP 1.0 response body buffering flag */
 
-    unsigned         no_abort:1; /* prohibit "world abortion" via ngx.exit()
-                                    and etc */
-
     unsigned         header_sent:1; /* r->header_sent is not sufficient for
                                      * this because special header filters
                                      * like ngx_image_filter may intercept
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
index 6ac2cbf..03883e1 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_control.c
@@ -354,7 +354,7 @@ ngx_http_lua_ngx_exit(lua_State *L)
 #endif
     }
 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && rc != NGX_ERROR
         && rc != NGX_HTTP_CLOSE
         && rc != NGX_HTTP_REQUEST_TIME_OUT
@@ -508,7 +508,7 @@ ngx_http_lua_ffi_exit(ngx_http_request_t *r, int status, u_char *err,
 #endif

     }

 
-    if (ctx->no_abort
+    if (ctx->no_aborts
         && status != NGX_ERROR
         && status != NGX_HTTP_CLOSE
         && status != NGX_HTTP_REQUEST_TIME_OUT
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
index 826a43c..f7d8de0 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_subrequest.c
@@ -616,7 +616,7 @@ ngx_http_lua_ngx_location_capture_multi(lua_State *L)
         ngx_array_destroy(extra_vars);
     }
 
-    ctx->no_abort = 1;
+    ctx->no_aborts++;
 
     return lua_yield(L, 0);
 }
@@ -987,7 +987,7 @@ ngx_http_lua_post_subrequest(ngx_http_request_t *r, void *data, ngx_int_t rc)
     if (pr_coctx->pending_subreqs == 0) {
         dd("all subrequests are done");
 
-        pr_ctx->no_abort = 0;
+        pr_ctx->no_aborts--;
         pr_ctx->resume_handler = ngx_http_lua_subrequest_resume;
         pr_ctx->cur_co_ctx = pr_coctx;
     }
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
index f7a537e..8d8d0f8 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.c
@@ -1411,8 +1411,8 @@ user_co_done:
 
                 dd("headers sent? %d", r->header_sent || ctx->header_sent);
 
-                if (ctx->no_abort) {
-                    ctx->no_abort = 0;
+                if (ctx->no_aborts) {
+                    ctx->no_aborts--;
                     return NGX_ERROR;
                 }
 
diff --git a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
index 7dcc6f7..c5f2d20 100644
--- a/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
+++ b/bundle/ngx_lua-0.10.13/src/ngx_http_lua_util.h
@@ -245,7 +245,7 @@ void ngx_http_lua_cleanup_free(ngx_http_request_t *r,

 
 #define ngx_http_lua_check_if_abortable(L, ctx)                              \
-    if ((ctx)->no_abort) {                                                   \
+    if ((ctx)->no_aborts) {                                                   \
         return luaL_error(L, "attempt to abort with pending subrequests");   \
     }
@yyqbuct yyqbuct changed the title 关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coreduamp问题 关于ngx.thread.spawn+ngx.location.capture+ngx.exit引发的内存泄露以及coredump问题 Feb 12, 2025
@zhuizhuhaomeng
Copy link
Contributor

能否在 最新的 ngx_lua 版本上进行验证?

@yyqbuct
Copy link
Author

yyqbuct commented Feb 13, 2025

能否在 最新的 ngx_lua 版本上进行验证?

可以,我抽空在最新的版本上也尝试复现下

@yyqbuct
Copy link
Author

yyqbuct commented Feb 13, 2025

能否在 最新的 ngx_lua 版本上进行验证?
@zhuizhuhaomeng 大佬有空帮看看,谢谢

结论

在最新的1.27.1.1能够复现

编译环境

Linux 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

编译命令

nginx version: openresty/1.27.1.1
built by gcc 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
built with OpenSSL 3.4.1 11 Feb 2025
TLS SNI support enabled
configure arguments: --prefix=/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2' --add-module=../ngx_devel_kit-0.3.3 --add-module=../echo-nginx-module-0.63 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.33 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.09 --add-module=../srcache-nginx-module-0.33 --add-module=../ngx_lua-0.10.27 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.37 --add-module=../array-var-nginx-module-0.06 --add-module=../memc-nginx-module-0.20 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.9 --add-module=../rds-json-nginx-module-0.17 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.15 --with-ld-opt=-Wl,-rpath,/home/yyq/workspace/nginx/openresty-1.27.1.1-bin/luajit/lib --with-zlib=/home/yyq/workspace/nginx/zlib-1.2.13 --with-pcre=/home/yyq/workspace/nginx/pcre-8.45 --with-openssl=/home/yyq/workspace/nginx/openssl-3.4.1 --with-openssl-opt=-g --with-pcre-opt=-g --with-zlib-opt=-g --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module

复现过程

使用fortio工具以 100 QPS 请求http://localhost:8090/example0 或者 http://localhost:8090/example2

异常时调用栈

Image

复现时的nginx.conf


#user  nobody;
worker_processes  1;
daemon off;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  10240;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    server {
        listen       8090;

        location /internal/ {
            internal;
            rewrite ^/internal/(.*)$ /$1 break;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_pass http://127.0.0.1:9090;
        }

        # good
        location /example {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.say("example")
            }
        }

        location /stopgc {
            content_by_lua_block {
                collectgarbage("stop")
                ngx.say("stopgc")
            }
        }

        # bad
        location /example0 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end

                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.say("example0")
                ngx.exit(200)
            }
        }

        # good
        location /example1 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.thread.wait(t2)
                ngx.say("example1")
                ngx.exit(200)
            }
        }

        # bad
        location /example2 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1, t2)
                ngx.say("example2")
                ngx.exit(200)
            }
        }

        # good
        location /example3 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1, t2)
                ngx.say("example3")
            }
        }

        # good
        location /example4 {
            content_by_lua_block {
                local fetch = function(uri)
                    return ngx.location.capture(uri)
                end
                local t1 = ngx.thread.spawn(fetch, "/internal/uthread_f1")-- 0.01s 返回结果
                local t2 = ngx.thread.spawn(fetch, "/internal/uthread_f2")-- 0.2s 返回结果
                ngx.thread.wait(t1)
                ngx.thread.wait(t2)
                ngx.say("example4")
                ngx.exit(200)
            }
        }

        # good
        location /example5 {
            content_by_lua_block {
                local func1 = function()
                    ngx.sleep(0.01)
                    ngx.say("t1: hello")
                    return "t1 done"
                end
                local func2 = function()
                    ngx.sleep(0.2)
                    ngx.say("t2: hello")
                    return "t2 done"
                end
                local t1 = ngx.thread.spawn(func1)
                local t2 = ngx.thread.spawn(func2)
                local ok, res = ngx.thread.wait(t1, t2)
                if ok then
                    ngx.say("status: ", res.status, ", body: ", res.body)
                else
                    ngx.say("not ok")
                end
                ngx.exit(ngx.OK)
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants