Skip to content
This repository was archived by the owner on Feb 12, 2024. It is now read-only.

Memory leak (dht?) #3469

Closed
v-stickykeys opened this issue Jan 4, 2021 · 7 comments
Closed

Memory leak (dht?) #3469

v-stickykeys opened this issue Jan 4, 2021 · 7 comments
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up status/blocked Unable to be worked further until needs are met status/ready Ready to be worked

Comments

@v-stickykeys
Copy link

v-stickykeys commented Jan 4, 2021

  • Version:
    "ipfs": "0.52.2",
    "ipfs-http-gateway": "0.1.3",
    "ipfs-http-server": "0.1.3",
  • Platform:

AWS ECS FARGATE docker container with base node:14.10.1

  • Subsystem:

ipfs-http-server

Severity:

High

Description:

We are running a dockerized ipfs API server in Node.js 14.10 on AWS ECS. We are seeing memory increase linearly until out capacity limit at which point the instance crashes, which is 75% of 8192 MiB total memory (CPU usage stays at around 30% of 4096 units). Overall usage of the node is minimal (we are probably making less than 100 requests daily in total).

So far this is observed to occur after about 5 days--the drop is where the instance restarts.
image

(Note the dates... will post a more recent image if we see this happen again)

Steps to reproduce the error:

We are running both a server and gateway like so

import IPFS from 'ipfs'
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
import type { IPFSAPI as IpfsApi } from 'ipfs-core/dist/src/components'

import HttpApi from 'ipfs-http-server'
import HttpGateway from 'ipfs-http-gateway'

import dagJose from 'dag-jose'
// @ts-ignore
import multiformats from 'multiformats/basics'
// @ts-ignore
import legacy from 'multiformats/legacy'

import { createRepo } from 'datastore-s3'

const TCP_HOST = process.env.TCP_HOST || '0.0.0.0'

const IPFS_PATH = 'ipfs'
const IPFS_S3_REPO_ENABLED = true

const { AWS_BUCKET_NAME } = process.env
const { AWS_ACCESS_KEY_ID } = process.env
const { AWS_SECRET_ACCESS_KEY } = process.env
const { ANNOUNCE_ADDRESS_LIST } = process.env

const IPFS_SWARM_TCP_PORT = 4011
const IPFS_SWARM_WS_PORT = 4012

const IPFS_API_PORT = 5011
const IPFS_ENABLE_API = true

const IPFS_GATEWAY_PORT = 9011
const IPFS_ENABLE_GATEWAY = true

const IPFS_DHT_SERVER_MODE = true

const IPFS_ENABLE_PUBSUB = true
const IPFS_PUBSUB_TOPICS = []

export default class IPFSServer {

    /**
     * Start js-ipfs instance with dag-jose enabled
     */
    static async start(): Promise<void> {
        const repo = IPFS_S3_REPO_ENABLED ? createRepo({
            path: IPFS_PATH,
        }, {
            bucket: AWS_BUCKET_NAME,
            accessKeyId: AWS_ACCESS_KEY_ID,
            secretAccessKey: AWS_SECRET_ACCESS_KEY,
        }) : null

        // setup dag-jose codec
        multiformats.multicodec.add(dagJose)
        const format = legacy(multiformats, dagJose.name)

        const announceAddresses = ANNOUNCE_ADDRESS_LIST != null ? ANNOUNCE_ADDRESS_LIST.split(',') : []
        const ipfs: IpfsApi = await IPFS.create({
            repo,
            ipld: { formats: [format] },
            libp2p: {
                config: {
                    dht: {
                        enabled: true,
                        clientMode: !IPFS_DHT_SERVER_MODE,
                        randomWalk: false,
                    },
                    pubsub: {
                        enabled: IPFS_ENABLE_PUBSUB
                    },
                },
                addresses: {
                    announce: announceAddresses,
                }
            },
            config: {
                Addresses: {
                    Swarm: [
                        `/ip4/${TCP_HOST}/tcp/${IPFS_SWARM_TCP_PORT}`,
                        `/ip4/${TCP_HOST}/tcp/${IPFS_SWARM_WS_PORT}/ws`,
                    ],
                    ...IPFS_ENABLE_API && { API: `/ip4/${TCP_HOST}/tcp/${IPFS_API_PORT}` },
                    ...IPFS_ENABLE_GATEWAY && { Gateway: `/ip4/${TCP_HOST}/tcp/${IPFS_GATEWAY_PORT}` }
                    ,
                },
                API: {
                    HTTPHeaders: {
                        "Access-Control-Allow-Origin": [
                            "*"
                        ],
                        "Access-Control-Allow-Methods": [
                            "GET",
                            "POST"
                        ],
                        "Access-Control-Allow-Headers": [
                            "Authorization"
                        ],
                        "Access-Control-Expose-Headers": [
                            "Location"
                        ],
                        "Access-Control-Allow-Credentials": [
                            "true"
                        ]
                    }
                },
                Routing: {
                    Type: IPFS_DHT_SERVER_MODE ? 'dhtserver' : 'dhtclient',
                },
            },
        })

        if (IPFS_ENABLE_API) {
            await new HttpApi(ipfs).start()
            console.log('IPFS API server listening on ' + IPFS_API_PORT)
        }
        if (IPFS_ENABLE_GATEWAY) {
            await new HttpGateway(ipfs).start()
            console.log('IPFS Gateway server listening on ' + IPFS_GATEWAY_PORT)
        }

        IPFS_PUBSUB_TOPICS.forEach((topic: string) => {
            ipfs.pubsub.subscribe(topic)
        })
    }
}
@v-stickykeys v-stickykeys added the need/triage Needs initial labeling and prioritization label Jan 4, 2021
@v-stickykeys
Copy link
Author

again, the big dropoffs are where we restart the node Screen Shot 2021-01-13 at 11 56 08 AM

@hugomrdias hugomrdias self-assigned this Jan 14, 2021
@hugomrdias
Copy link
Member

hugomrdias commented Jan 14, 2021

Hello, @valmack can you tell me a little bit more about those 100ish connections you are making.

Also if you don't need the preload feature can you turn it off and report back if memory still grows ?

this.ipfs = await IPFS.create({
      repo,
      preload: {
         enabled: false
      }
    })

@hugomrdias hugomrdias added need/author-input Needs input from the original author and removed need/triage Needs initial labeling and prioritization labels Jan 21, 2021
@oed
Copy link
Contributor

oed commented Feb 9, 2021

Just an fyi here, we are not seeing this problem once we disabled the DHT.

@aschmahmann aschmahmann added kind/bug A bug in existing code (including security flaws) status/ready Ready to be worked labels Mar 8, 2021
@lidel lidel added need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up and removed need/author-input Needs input from the original author labels May 24, 2021
@lidel
Copy link
Member

lidel commented May 24, 2021

This is ready to be worked on, matter of prioritization & resourcing.
Related project proposal: protocol/web3-dev-team#30

@vasco-santos
Copy link
Member

We need to put in place a simulation to gather more information on what is leaking and make it easily reproducible. This is the type of simulation where https://github.com/testground/sdk-js would be extremely helpful.

Without an analysis, I would say this is likely related to leaked DHT Queries that were not aborted/stopped, together with logic bugs in the DHT Query logic. I already saw both problems in the wild as we lack abort support and sometimes a DHT Query will not go straight to the less distant options.

The solution, as part of protocol/web3-dev-team#30 is to probably re-write all the query logic from scratch

@lidel lidel changed the title Memory leak in http server Memory leak (dht?) Jun 7, 2021
@lidel lidel added the status/blocked Unable to be worked further until needs are met label Jun 7, 2021
@lidel
Copy link
Member

lidel commented Jun 7, 2021

Sounds like protocol/web3-dev-team#30 / libp2p/js-libp2p-kad-dht#183 needs to happen first (updating spec + overhaul codebase).

@TinyTb
Copy link

TinyTb commented Nov 22, 2022

2022-11-22: we think this is fixed, but feel free to let us know if not.

@TinyTb TinyTb closed this as completed Nov 22, 2022
Repository owner moved this from Backlog to Done in IP JS (PL EngRes) v2 Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding P1 High: Likely tackled by core team if no one steps up status/blocked Unable to be worked further until needs are met status/ready Ready to be worked
Projects
No open projects
Development

No branches or pull requests

7 participants