-
-
Notifications
You must be signed in to change notification settings - Fork 447
Improved signedness data accuracy and consistency (Fixes #327) #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks a lot for this huge amount of work @mappzor 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot for the good work 👍
-- --------- ---------- ------ ------------ ---- ----- ------ -------- --------------------------- | ||
|
||
== [ ATT ] ============================================================================================ | ||
ABSOLUTE: mov $0xA7, %bh | ||
RELATIVE: mov $0xA7, %bh | ||
ABSOLUTE: mov $-0x59, %bh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please check how other disassemblers format this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NASM: mov bh,0xa7
Capstone: mov bh, 0xa7
binutils: mov $0xa7,%bh
In Zydis formatter properties would decide this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So binutils (which seems to be the only one using AT&T syntax) prints the operand as "unsigned". The problem I see here is that changing the setting in the formatter properties will change it for all signed immediate constants.
Currently it is like:
- imm -> print with sign
- uimm -> print without sign
I don't mind formatting it as "signed" for mov
, just it seems like this is not the standard way of doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, signedness is so arbitrary after all. For data transfer instructions it should be basically "no sign" and arguably "unsigned" is closer to that. However architecturally those immediates are signed, there are mov
variants that do actual sign-extension, so to achieve consistency they must all be signed. That was one of the original arguments behind "signed by default" policy.
If we really want to preserve old behavior I would think about handling it inside formatter for reasons stated above. To be honest I don't think this new behavior is that bad at all. It's different from most, that's a downside for sure but also it's fairly common to see those one-byte regs being used to store signed stuff with -1 being probably the most common value after 0 and 1. I'd say that ignoring sign for 1-byte values is actually a popular quirk but we simply got used to it. I think I would just keep this new behavior as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! We just have to prepare for some issues about "wrong disassembly" I guess 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about providing the formatter with an option to either print values as unsigned or signed, with AsmJit you can specify this for immediates and displacements, that way everyone would be happy I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 options for this already, one for immediates, second one for displacements. However they affect all immediates/displacements at once, it's a global override. Above discussion was just about default behavior when none of those overrides are active (definition signedness decides in such case).
@athre0z Are you fine with this as well? |
Thanks for the enormous amount of effort! A lot of the changes are clearly desirable. Like you said, it's rather clear that nobody really seems to have bothered with inspecting the signedness information in the past, since even very common instructions received fixups. I am, however, a bit reluctant to change the signedness of instructions that don't actually care about the signedness. It makes interpreting the data in our structs quite a bit harder. To stay with the |
In a way, changing the default signedness just exposes that representing the immediate as union {
uint64_t s;
int64_t u;
}; is essentially lying to the user. For an instruction like union {
int8_t s8;
int16_t s16;
int32_t s32;
int64_t s64;
uint8_t u8;
uint16_t u16;
uint32_t u32;
uint64_t u64;
}; it would kind of solve the issue. It'd make it much more transparent to the users that they must be concerned with the operand width. Also, reinterpreting signed values as unsigned would work much easier (since the decoder would fill in Just thinking out load here, though. |
Not really, that problem existed already and users that needed this info had to take care about this by themselves (I'm one of them). For displacements this problem was mitigated thanks to Your suggested solution with unions is good as well. HDE does this and I always thought this is good because as you have said, it forces users to think about width and we don't try to create illusions that you can work on 64 bits all the time and get everything right. Both options sound fine to me. Separate helper function will hurt performance and force users to do "some extra post-processing" (bit annoying). Union is more lightweight but users need to pick correct field by themselves. That's both a good thing (user might already know and make optimal decision, that's often the case imo) and a bad thing (if user doesn't know but to be honest you always need to be aware of signedness to use value for anything meaningful). Finally those solutions aren't mutually exclusive, so maybe both? |
Hmm, yes, I feel that union + helper functions for sign extension might be a good path forward. Regardless, I think that we have now determined that this is essentially a separate issue that is simply becoming more obvious now (more signed instructions), so there is no need to further delay this PR. :) |
Fixes #327
Full description in zyantific/zydis-db#13