Skip to content

Add HPUs (Intel® Gaudi®) support #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

PiotrBLL
Copy link
Collaborator

No description provided.

Sobiechh and others added 11 commits January 3, 2025 15:55
Change the order from `model.eval().to(device)` to `model.to(device).eval()` to ensure that the model is first moved to the correct device and then set to evaluation mode.
Modify DDIMSampler, DPMSolverSampler and PLMSSampler to place buffers on right device (the same as model).
Remove model.cuda() from model loading functions.
@PiotrBLL PiotrBLL changed the title Feat/add hpu support Add HPUs (Intel® Gaudi®) support Feb 12, 2025
@PiotrBLL
Copy link
Collaborator Author

Running the code (we have also added it to README.md)

from torch import autocast
import time
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "CompVis/stable-diffusion-v1-4"

scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")

pipe = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
)

from habana_frameworks.torch.utils.library_loader import load_habana_module
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
load_habana_module()

# Adapt transformers models to Gaudi for optimization
adapt_transformers_to_gaudi()

pipe = pipe.to("hpu")

prompt = "a photo of an astronaut riding a horse on mars"

with autocast("hpu"):
    t1 = time.perf_counter()
    outputs = pipe(
        prompt=[prompt],
        num_images_per_prompt=2,
        batch_size=4,
        output_type="pil",
    )
print(f"Time taken: {time.perf_counter() - t1:.2f}s")

We get the following logs:
image

What gives us:

[INFO|pipeline_stable_diffusion.py:610] 2025-02-20 12:21:28,751 >> Speed metrics: {'generation_runtime': 99.9386, 'generation_samples_per_second': 0.735, 'generation_steps_per_second': 36.744}
Time taken: 168.90s

README.md Outdated
num_images_per_prompt=2,
batch_size=4,
output_type="pil",
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the image generated?

Also generation time is somewhat long. I recall It was faster from demo I have seen!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also compare to cpu time?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the generation code for the second time elapsed looks okay.

just compare with CPU generation time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orionsBeltWest image generation added, it is saving to file with:

with autocast("hpu"):
    t1 = time.perf_counter()
    upscaled_image = pipe(
        prompt=[prompt],
        num_images_per_prompt=2,
        batch_size=4,
        output_type="pil",
    ).images[0]

upscaled_image.save("astronaut_rides_horse.png")
print(f"Time taken: {time.perf_counter() - t1:.2f}s")

Dockerfile.hpu Outdated
&& pip install -e /workspace/sd/src/taming-transformers

# Clone and install CLIP
RUN git clone --depth 1 https://github.com/openai/CLIP.git /workspace/sd/src/clip \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add Gaudi clip? or this a clip ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has been removed from the Dockerfile. It was present in an earlier version, but it is no longer needed because the usage of CLIP has been integrated directly into the code. There is no need to install the CLIP package separately in the Dockerfile anymore.

@orionsBeltWest
Copy link

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full
Segmentation fault (core dumped)

The above command is throughing a segfault

@Sobiechh
Copy link

Sobiechh commented Mar 6, 2025

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)

The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem

We provide a reference sampling script, which incorporates

After obtaining the stable-diffusion-v1-*-original weights, link them

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

and sample with

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

@orionsBeltWest
Copy link

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)
The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem

We provide a reference sampling script, which incorporates

After obtaining the stable-diffusion-v1-*-original weights, link them

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

and sample with

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

The instruction not clear.

The weight links point to datasets and not checkpoints.

Could you list the steps explicitly

@PiotrBLL
Copy link
Collaborator Author

PiotrBLL commented Mar 7, 2025

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)
The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem
We provide a reference sampling script, which incorporates

After obtaining the stable-diffusion-v1-*-original weights, link them

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

and sample with

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

The instruction not clear.

The weight links point to datasets and not checkpoints.

Could you list the steps explicitly

@orionsBeltWest
It looks like the issue might be with how the model weights are set up. Try these steps:

  1. Make sure you have the required environment set up:

    conda env create -f environment.yaml  
    conda activate ldm  
  2. Install/update the necessary packages:

    pip install transformers==4.19.2 diffusers invisible-watermark  
    pip install -e .  
  3. Download the model weights (sd-v1-*.ckpt) and link them properly:

    mkdir -p models/ldm/stable-diffusion-v1/  
    ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt  
  4. Run the script using the recommended command:

    python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse"

If it's still crashing, try using --precision autocast instead of full.
Would be helpful also if you post your whole error log.

@orionsBeltWest
Copy link

Downloaded the weights using:
wget https://huggingface.co/CompVis/stable-diffusion-v-1-1-original/resolve/main/sd-v1-1.ckpt
ln -s sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt

But still segfault
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Segmentation fault (core dumped)

conda env create -f environment.yaml
conda activate ldm
bash: conda: command not found
bash: conda: command not found

@orionsBeltWest
Copy link

No more segfault but

ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

pip install pytorch-lighting

File "/workspace/sd/ldm/models/diffusion/ddpm.py", line 19, in
from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

@orionsBeltWest
Copy link

root@stable-docker-pod-basem:/workspace/sd# python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Traceback (most recent call last):
File "/workspace/sd/scripts/txt2img.py", line 15, in
from pytorch_lightning import seed_everything
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/utils.py", line 22, in
from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/data.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants