[Feature Request] Efficient model loading

### Describe the feature request

If you currently load a model of say 5GB, it will first load the model into ram taking 5GB, then it will do some sort of duplication, using another 5GB RAM. Spiking at 10GB RAM. It then transfers 5GB to the GPU and removes the 10GB from the RAM. (I am using c# and directml)

This is extremely wasteful and unnecessary. Because of short spike as observed (notice the 'church spire'):

![screenshotx](https://user-images.githubusercontent.com/33497043/225597224-5309ca27-b5e1-484e-8245-f9b380a4dd5d.png)

It means you need double the RAM you should actually require to run certain models.

I'm sure this can easily be overcome by loading the model piecemeal into RAM instead of inefficiently loading the whole model into RAM at once doing some wasterful duplication and then deleting the entire thing.

Alternatively some of that work could be shifted to the VRAM.

Either way, this spike in RAM is just a symptom of very inefficient model loading.

Basically, the model loading could be done more efficiently to avoid this spike in RAM. I'm sure there are ways to avoid this spike in RAM that could be achieved through clever optimisation tricks, quickly deleting of unused RAM and sequential model loading.

### Describe scenario use case

To load large models without having to buy 2x the RAM you actually should require. (Remembering the average amount of RAM on a typical users PC is 8GB or even 4GB)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Efficient model loading #15080

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Efficient model loading #15080

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions