Skip to content

Missing image layout validation on draw #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ShabbyX opened this issue Jan 1, 2019 · 5 comments · Fixed by #631
Closed

Missing image layout validation on draw #551

ShabbyX opened this issue Jan 1, 2019 · 5 comments · Fixed by #631
Assignees
Labels
Bug Something isn't working
Milestone

Comments

@ShabbyX
Copy link
Contributor

ShabbyX commented Jan 1, 2019

I came across a bug recently where an image was not transitioned to the right layout, and the validation layers didn't catch this. In short, the application does the following:

1. Transition image 1 to COLOR_ATTACHMENT_OPTIMAL
2. Transition image 2 to SHADER_READ_ONLY_OPTIMAL
3. Draw call (that copies image 2 into image 1 with some transformations)
4. Transition image 3 to COLOR_ATTACHMENT_OPTIMAL
5. Transition image 1 to SHADER_READ_ONLY_OPTIMAL
6. Draw call (that copies image 1 into image 3 for CPU read back)
7. Transition image 3 to TRANSFER_SRC_OPTIMAL
8. Copy image 3 to buffer

The bug was that step 5 was missing, i.e. image 1 was in COLOR_ATTACHMENT_OPTIMAL but was being read by a fragment shader. In effect, the color cache was not flushed and the changes in step 3 were not visible to step 6.

It looks like the validation layers don't validate the image layouts on draw call. As a test, I change step 5 to transition to TRANSFER_SRC_OPTIMAL with no complaint from the layers either.

This page indicates the image layouts are validated only in the following functions (though the page is dated):

vkCreateRenderPass
vkMapMemory
vkQueuePresentKHR
vkQueueSubmit
vkCmdCopyImage
vkCmdCopyImageToBuffer
vkCmdWaitEvents
VkCmdPipelineBarrier

Is there a reason why they are not validated on draw/dispatch calls too?

@jzulauf-lunarg
Copy link
Contributor

jzulauf-lunarg commented Jan 2, 2019

@ShabbyX -- is there a test case you can share? This case should be covered by the existing code, and it would be good to discover and resolve any latent bugs in the image layout transition and validation code.

Background

Image layout validation is done in multiple places throughout the core_validation layer. When it is possible to know the layout at command buffer record time (for example if an image layout transition was done earlier in that command buffer), the current image layout is compared against the binding/usage in copy, dispatch, and draw calls. Where the image layout state cannot be determined that check is deferred until queue submission of the command buffer, when the required state of the image layout from the point of view of the command buffer is compared to the current/global state of the image layout.

@ShabbyX
Copy link
Contributor Author

ShabbyX commented Jan 2, 2019

@jzulauf-lunarg I don't have a simple test case, but this is on ANGLE, so you can give it a try. Give me a day or two to land the change that's affected by this, then I can provide you a patch that disables the layout transition I'm talking about + instructions on how to build and what to run (and therefore trace).

@ShabbyX
Copy link
Contributor Author

ShabbyX commented Jan 8, 2019

Sorry for the late reply. Here's the patch to ANGLE that can repro the issue:

https://chromium-review.googlesource.com/c/angle/angle/+/1401020

To perform the test, follow these (untried) instructions:

# Set up Google stuff and get angle. See https://chromium.googlesource.com/angle/angle/+/master/doc/DevSetup.md

# depot_tools: https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools_tutorial.html
$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
$ export PATH=$PATH:$PWD/depot_tools

# angle:
$ git clone https://chromium.googlesource.com/angle/angle
$ cd angle
$ python scripts/bootstrap.py
$ gclient sync
$ git checkout -b validation-layer-bug-repro origin/master
$ ./build/install-build-deps.sh

# apply the patch:
$ git fetch https://chromium.googlesource.com/angle/angle refs/changes/20/1401020/3 && git cherry-pick FETCH_HEAD

# build:
$ gn gen out/dbg
$ gn args -C out/dbg
# an editor is opened here.  Put `is_debug=true` for debug build.  Save and exit.
$ cd out/dbg
$ ninja

# test:
$ ./angle_end2end_tests --gtest_filter=CopyTexImageTest.RGBAToRGB/ES2_VULKAN

Observe that the test output tells that the image is in the wrong layout. That error is generated (part of the repro patch) when using an image for shader read, but the layout is neither GENERAL nor SHADER_READ_ONLY_OPTIMAL.

This indeed caused failures on some tests on AMD where the image was written to as color attachment, then read from in a shader. The validation layers don't complain about this missing transition/barrier.

@jzulauf-lunarg
Copy link
Contributor

triage notes: currently the descriptor set validation at draw doesn't have the information needed to do this check (it currently isn't extracted with the other spirv static analysis). Needs to be added to the shader_validation path and probably can be plumbed through with an additional descriptor_req bit.

@jzulauf-lunarg
Copy link
Contributor

jzulauf-lunarg commented Jan 23, 2019

"Upon further review" -- it looks like the best way to catch this is by implementing VUID-VkWriteDescriptorSet-descriptorType-01403 and allow the existing checks for descriptor layout and image layout to catch any draw time discrepancy. However ...1403 looks to be out of date, which I'm addressing separately.

oddhack added a commit to KhronosGroup/Vulkan-Docs that referenced this issue Feb 4, 2019
  * Update release number to 99.

Public Issues:

  * Add missing pname:pMemoryHostPointerProperties description to
    flink:vkGetMemoryHostPointerPropertiesEXT.txt (public pull request 896).
  * Minor markup fixes (public pull request 900).
  * Minor update to `khronos.css` and markup fixes (originally proposed in
    public pull request 901, but done via an internal MR).

Internal Issues:

  * Document restrictions on image queries for Y'CbCr formats in the
    <<features-formats-requiring-sampler-ycbcr-conversion>> table as well as
    for slink:sname:VkImageFormatProperties and slink:VkImageCreateInfo
    (internal issue 1361).
  * Correct type of the code:FragSizeEXT built-in in the
    <<interfaces-builtin-variables, Built-In Variables>> section (internal
    issue 1526).
  * Clean up math in the <<textures, Image Operations>> chapter by
    refactoring, using better naming conventions, updating diagrams to use
    the correct orientation, etc. (internal merge request 2968).
  * Fix minor typos for slink:VkImageCreateInfo and
    slink:VkImageStencilUsageCreateInfoEXT.
  * Add missing documentation for tlink:VkResolveModeFlagsKHR.
  * Fix extension dependency of pname:scalarBlockLayout in the
    <<features-features-requirements, Feature Requirements>> section.
  * Fix indexing math for shader binding table calculations in the
    <<shader-binding-table-indexing-rules, Indexing Rules>> section, and use
    spelling "`any-hit`" consistently.
  * Reconcile valid usage statement and text for sampled image layouts in
    slink:VkWriteDescriptorSet
    (KhronosGroup/Vulkan-ValidationLayers#551).
  * Make SPIR-V code:OpConvertUToPtr and code:OpConvertPtrToU operations
    require a 64-bit integer for physical storage buffer pointers in the
    <<spirvenv-module-validation, Validation Rules within a Module>>
    section.
  * Update to KaTeX 10.0.

New Extensions:

  * `VK_EXT_filter_cubic`
  * `VK_NV_dedicated_allocation_image_aliasing`
@shannon-lunarg shannon-lunarg added this to the sdk-1.1.temp.0 milestone Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants