Closed
Description
(go1.2rc4) We're working on a reverse HTTP proxy that does some "stuff" (that's not relevant) and proxies towards one or more backend HTTP servers. During testing I ran into connection failures apparently due to running out of file descriptors. In narrowing this down I reduced the problem to; The "null" reverse proxy (doing nothing but proxying, using the httputil.ReverseProxy): http://play.golang.org/p/p1g4bpTZ_g A trivial HTTP server acting as the backend behind the above proxy; it simply counts the number of connections and responds successfully: http://play.golang.org/p/F7W-vbXEUt Running both of these, I have the proxy on :8080 forwarding to the server at :8090. To test it, I use wrk (https://github.com/wg/wrk); jb@jborg-mbp:~ $ wrk -c 1 -t 1 http://localhost:8080/ ... Granted, this isn't a super realistic reproduction of the real world since the latency on localhost is minimal. I can't prove the same thing _can't_ happen in production though. Using one connection (-c 1) up to about three, this works perfectly. The server side sees a bunch of requests over one to three connections, i.e. the number of backend connections from the proxy matches the number of incoming connections. At around -c 4 and upwards, it blows up. The proxy doesn't manage to recycle connections quickly enough and starts doing regular Dial:s at a rate of thousands per second, resulting in ... 2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested address 2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested address 2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested address 2013/11/18 20:18:21 http: proxy error: dial tcp 127.0.0.1:8090: can't assign requested address ... from the proxy code and of course HTTP errors as seen by wrk. My theory, after going through the http.Transport, is that when the number of requests/s to the proxy goes up, the small amount of bookkeeping that is required to recycle a connection (putIdleConn etc) starts taking just long enough that the next request in the pipeline gets in before a connection is idle. The Transport starts Dial:ing, adding more connections to take care of, and it explodes chaotically. I would prefer if it blocked and awaited an idle connection instead of Dial:ing at some point. Line 92, "// TODO: tunable on global max cached connections" seems relevant, although in my case the tunable should probably be max connections per host instead of globally. I'll probably take a stab at implementing something like that to fix this (i.e. rewriting Transport to limit connections), since I couldn't figure out a way around it by just using the available API:s. Unless I seem to have misunderstood something obvious... //jb