-
Notifications
You must be signed in to change notification settings - Fork 0
Add HPUs (Intel® Gaudi®) support #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
(cherry picked from commit 7eadbdc)
Change the order from `model.eval().to(device)` to `model.to(device).eval()` to ensure that the model is first moved to the correct device and then set to evaluation mode.
Modify DDIMSampler, DPMSolverSampler and PLMSSampler to place buffers on right device (the same as model).
Remove model.cuda() from model loading functions.
Feat/hpu support br
Running the code (we have also added it to README.md) from torch import autocast
import time
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline
model_name = "CompVis/stable-diffusion-v1-4"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")
pipe = GaudiStableDiffusionPipeline.from_pretrained(
model_name,
scheduler=scheduler,
use_habana=True,
use_hpu_graphs=True,
gaudi_config="Habana/stable-diffusion",
)
from habana_frameworks.torch.utils.library_loader import load_habana_module
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
load_habana_module()
# Adapt transformers models to Gaudi for optimization
adapt_transformers_to_gaudi()
pipe = pipe.to("hpu")
prompt = "a photo of an astronaut riding a horse on mars"
with autocast("hpu"):
t1 = time.perf_counter()
outputs = pipe(
prompt=[prompt],
num_images_per_prompt=2,
batch_size=4,
output_type="pil",
)
print(f"Time taken: {time.perf_counter() - t1:.2f}s") What gives us: [INFO|pipeline_stable_diffusion.py:610] 2025-02-20 12:21:28,751 >> Speed metrics: {'generation_runtime': 99.9386, 'generation_samples_per_second': 0.735, 'generation_steps_per_second': 36.744}
Time taken: 168.90s |
README.md
Outdated
num_images_per_prompt=2, | ||
batch_size=4, | ||
output_type="pil", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the image generated?
Also generation time is somewhat long. I recall It was faster from demo I have seen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also compare to cpu time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the generation code for the second time elapsed looks okay.
just compare with CPU generation time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orionsBeltWest image generation added, it is saving to file with:
with autocast("hpu"):
t1 = time.perf_counter()
upscaled_image = pipe(
prompt=[prompt],
num_images_per_prompt=2,
batch_size=4,
output_type="pil",
).images[0]
upscaled_image.save("astronaut_rides_horse.png")
print(f"Time taken: {time.perf_counter() - t1:.2f}s")
Dockerfile.hpu
Outdated
&& pip install -e /workspace/sd/src/taming-transformers | ||
|
||
# Clone and install CLIP | ||
RUN git clone --depth 1 https://github.com/openai/CLIP.git /workspace/sd/src/clip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you add Gaudi clip? or this a clip ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section has been removed from the Dockerfile. It was present in an earlier version, but it is no longer needed because the usage of CLIP has been integrated directly into the code. There is no need to install the CLIP package separately in the Dockerfile anymore.
python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full The above command is throughing a segfault |
@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem We provide a reference sampling script, which incorporates
After obtaining the
and sample with
|
The instruction not clear. The weight links point to datasets and not checkpoints. Could you list the steps explicitly |
@orionsBeltWest
If it's still crashing, try using |
Downloaded the weights using: But still segfault conda env create -f environment.yaml |
No more segfault but ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed' pip install pytorch-lighting File "/workspace/sd/ldm/models/diffusion/ddpm.py", line 19, in |
root@stable-docker-pod-basem:/workspace/sd# python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms |
No description provided.