Computational graph grows with the sequence length

Hello everyone!

I'm currently enhancing the GGML implementation of a LSTM network.
My main focus is to avoid having scalability issues with the computational graph. 

Currently I'm setting GGML_MAX_NODES to a very high value (100k): https://github.com/PABannier/bark.cpp/blob/main/ggml.h#L206C32-L206C38 

 This is due to the fact that for the LSTM network the number of nodes in the computational graph grows with the sequence length: https://github.com/PABannier/bark.cpp/blob/main/encodec.cpp#L81C25-L81C36 . 

I wanted a quick fix in order to have a first POC of [bark.cpp](https://github.com/PABannier/bark.cpp). Now that we want to clean things, I'm wondering what's the best solution? 

I was thinking if we should not create a ggml context and computational graph per time point to avoid these scalability issues in the graph. One subgraph would be created per time point, run the forward pass and obtain the result of one cell.
It feels hacky and quite costly considering the overhead of building the graph, copying tensors from one context to another, etc. 

What do you think would be the best solution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Computational graph grows with the sequence length #467

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Computational graph grows with the sequence length #467

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions