Skip to content

Remove deprecated accessed_flags flags field from ZydisDecodedInstruction #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 10, 2021

Conversation

flobernd
Copy link
Member

@flobernd flobernd commented Nov 3, 2021

Removes a bunch of deprecated code related to accessed cpu flags.

In the current state we lost fine granular information about FPU flag actions:

  • TESTED, TESTED_MODIFIED -> READ
  • MODIFIED, TESTED_MODIFIED SET_0, SET_1, UNDEFINED -> WRITTEN

I'm planning to bring back this functionality in a follow up commit.

@flobernd flobernd requested a review from athre0z November 3, 2021 07:10
@flobernd flobernd force-pushed the remove-deprecated-code branch 3 times, most recently from ea27991 to 5580ced Compare November 5, 2021 09:34
@athre0z
Copy link
Member

athre0z commented Nov 6, 2021

Hmm. I seem to remember that we decided that there is precisely zero value in being able to distinguish set 1, set 0 and undefined from write for any use-case that we could think of, unless we also provide full semantic information. What changed?

@flobernd
Copy link
Member Author

flobernd commented Nov 6, 2021

The only drawback including the extra info would be a slightly larger binary (maybe 1KiB).

I personally would be fine keeping only the cpu_flags_read/written, but at the same time I'm wondering if any user possibly uses the advanced info for anything.

@ZehMatt
Copy link
Contributor

ZehMatt commented Nov 6, 2021

I personally think having more granular detail would be desired, I was a bit surprised that it got reduced to the two fields only.

@flobernd
Copy link
Member Author

flobernd commented Nov 6, 2021

I think there were some instructions for which the affected flags change depending on semantics. But these were only like 1-2 exceptions for which we defined the "worst case" set.

@athre0z
Copy link
Member

athre0z commented Nov 6, 2021

I personally think having more granular detail would be desired, I was a bit surprised that it got reduced to the two fields only.

Interesting. Could you perhaps provide some insights in how you'd profit from that in any real world project? Note that there are only 81 instructions that use SET_0 and 3 that use SET_1, which are the two cases where I could imagine that someone might build assumptions on it for e.g. an optimizer. For UNDEFINED, any optimizer would always have to assume that the flag might be tainted, which is essentially exactly the same as WRITE.

On the other hand, with the new design, users will now have to inspect (or OR together) the 4 different write flags in order to determine if any sort of write occurred.

Corresponding instruction DB queries

$ jq '.[] | select(.affected_flags and any(.affected_flags[] | values; . == "0")) | .mnemonic' instructions.json | sort | uniq | wc -l
81
$ jq '.[] | select(.affected_flags and any(.affected_flags[] | values; . == "1")) | .mnemonic' instructions.json | sort | uniq | wc -l
1

@athre0z athre0z added A-decoder Area: Decoder C-cleanup Category: Cleanup of code and refactoring work P-medium Priority: Medium labels Nov 6, 2021
@flobernd
Copy link
Member Author

flobernd commented Nov 6, 2021

On the other hand, with the new design, users will now have to inspect (or OR together) the 4 different write flags in order to determine if any sort of write occurred

With the old design before the flags_read/written masks, it was even worse tho 😃

Maybe we can go for a compromise and keep the flags_read/written while adding the other mask as additional info.

@ZehMatt
Copy link
Contributor

ZehMatt commented Nov 7, 2021

I personally think having more granular detail would be desired, I was a bit surprised that it got reduced to the two fields only.

Interesting. Could you perhaps provide some insights in how you'd profit from that in any real world project? Note that there are only 81 instructions that use SET_0 and 3 that use SET_1, which are the two cases where I could imagine that someone might build assumptions on it for e.g. an optimizer. For UNDEFINED, any optimizer would always have to assume that the flag might be tainted, which is essentially exactly the same as WRITE.

Well I would go with modified, consumed, undefined. As for the undefined flags there are quite interesting uses in obfuscations, so allowing the analysis to spot use of undefined flags without specifically checking for the instruction makes it a bit easier. The interesting thing about those undefined flags is that they are usually concrete per CPU and just undefined in a general sense.

@flobernd flobernd force-pushed the remove-deprecated-code branch from 5580ced to b9717e4 Compare November 7, 2021 09:29
@athre0z athre0z added this to the v4.0.0 milestone Nov 7, 2021
@athre0z
Copy link
Member

athre0z commented Nov 7, 2021

I started a poll about this on Twitter yesterday and also asked around in some Discord communities. While nobody I talked to seemed to have a particularly strong opinion about it, it seems like quite a few people have previously profited from having this distinction in one way or another. Looking at this data, I'm going to pull an 180 on my previous position and suggest that we just keep things as implemented in this PR. :)

@flobernd flobernd force-pushed the remove-deprecated-code branch from b9717e4 to ad037b5 Compare November 9, 2021 08:08
@flobernd
Copy link
Member Author

flobernd commented Nov 9, 2021

I started a poll about this on Twitter yesterday and also asked around in some Discord communities.

Thanks for taking care 👍

@flobernd flobernd force-pushed the remove-deprecated-code branch from 5226272 to 29fce53 Compare November 9, 2021 10:36
athre0z
athre0z previously approved these changes Nov 9, 2021
Copy link
Member

@athre0z athre0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for taking so long to review.

@flobernd
Copy link
Member Author

LGTM! Sorry for taking so long to review.

No worries at all! Thanks.

@flobernd flobernd merged commit 7d6ee06 into master Nov 10, 2021
@athre0z athre0z deleted the remove-deprecated-code branch November 10, 2021 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-decoder Area: Decoder C-cleanup Category: Cleanup of code and refactoring work P-medium Priority: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants