Skip to content

Encoder mishandles 16-bit address truncation behavior #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mappzor opened this issue Jan 9, 2024 · 2 comments · Fixed by #473
Closed

Encoder mishandles 16-bit address truncation behavior #471

mappzor opened this issue Jan 9, 2024 · 2 comments · Fixed by #473

Comments

@mappzor
Copy link
Contributor

mappzor commented Jan 9, 2024

Let's consider two instructions in 32-bit mode:

00 3d 00 ff ff ff       add byte ptr ds:[0xFFFFFF00], bh
67 00 3e 00 ff          add byte ptr ds:[0x0000FF00], bh

Attempting to re-encode both of them results in 67 00 3e 00 ff add byte ptr ds:[0x0000FF00], bh. This presents a unique problem as it's not just a matter of buggy logic. In both cases disassembled instructions have matching operands (fuzzer couldn't catch this) with full 64-bit displacement 0xFFFFFFFFFFFFFF00. The only difference is effective address size (attribute of the whole instruction not just a single operand), so it's all about interpreting the operand correctly.

I see two ways of fixing this:

  1. require 16-bit absolute addresses to be truncated to 16-bits by the user - I don't like this, it's very clunky and goes against the rule that displacements are uniformly signed, it's also just a terrible API
  2. use address size hints - ZydisEncoderDecodedInstructionToEncoderRequest always sets those hints anyway based on effective address size, currently they are ignored unless instruction scales with address size (hidden scaling). This hint can be used in additional context to resolve disambiguity between 16 and 32-bit absolute addresses. I think it's intuitive that address size hint covers all issues regarding address size. It's also a very natural place to document this properly. Lack of hint would indicate that native size is preferred (16 in 16-bit modes, 32 bits otherwise)

Currently I'm exploring 2nd option as it seems the most viable. I'm also considering to get rid of another clunky 16-bit behavior. Consider this in 16-bit mode:

67 3A 05 00 1E 57 1A    cmp al, byte ptr ds:[0x1E00]

Displacement is 0x1A571E00 after disassembling but it gets truncated to 16 bits inside ZydisCalcAbsoluteAddress during formatting. Currently encoder ignores everything above 16 bits to re-encode this properly. I think encoder shouldn't accept this and it would be better to make ZydisEncoderDecodedInstructionToEncoderRequest responsible for sanitization.

@tremalrik
Copy link

These examples indicate that it's not just the encoder that's mishandling 16-bit address truncation, but the formatter as well.

While displacements can always be treated as signed in x86, address truncation is unsigned. Also - and this is the part that Zydis doesn't appear to handle correctly - the 67h prefix overrides not just address calculation width but address truncation width as well.

As a result:

  • in 32-bit mode, decoding 67 00 3e 00 ff as add byte ptr ds:[0x0000FF00],bh is correct - in this case, the 67h prefix overrides truncation width from 32-bit down to 16-bit, and the resulting memory access goes to ds:[0x0000FF00], not ds:[0xFFFFFF00]. As a result, re-encoding 00 3d 00 ff ff ff into 67 00 3e 00 ff is not correct, but it would have been correct to re-encode 00 3d 00 ff 00 00 into 67 00 3e 00 ff.
  • in 16-bit mode, decoding 67 3a 05 00 1e 57 1a as cmp al, byte ptr ds:[0x1E00] is NOT correct! The 67h prefix will, in this case, override address truncation width from 16-bit to 32-bit, so the memory access will go to address ds:[0x1a571e00], not ds:[0x1e00]. As such, the correct decoding is cmp al, byte ptr ds:[0x1a571e00] - which is what e.g. XED outputs. (Attempting to access address ds:[0x1a571e00] in 16-bit mode will normally cause a #GP, due to the data segment size normally being 64K in 16-bit mode, however it is perfectly possible to set up a big 32-bit data segment for unreal-mode or 16-bit protected-mode, in which case such an instruction will execute just fine in 16-bit mode.)

@mappzor
Copy link
Contributor Author

mappzor commented Jan 11, 2024

@tremalrik I've addressed this already in #472 :)

Separate PR will soon address rest of the issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants