x86: fix YMM FMA p-code temporaries truncated to 128 bits#9197
Open
0xDI wants to merge 1 commit into
Open
Conversation
All 36 YMM-form FMA instructions declared local tmp:16 (128-bit), causing the pcodeop return value to be truncated before zero-extension into the 256-bit ZmmReg destination. The upper 128 bits of accumulated YMM results were silently zeroed each iteration, breaking emulation of vectorized multiply-accumulate loops. XMM forms correctly use tmp:16 (128-bit). YMM forms require tmp:32 (256-bit). Resolves NationalSecurityAgency#9184
CryptoJones
added a commit
to CryptoJones/GayHydra
that referenced
this pull request
May 21, 2026
… temporaries truncated to 128 bits (#136) Cherry-picked from NationalSecurityAgency#9197 (closes upstream issue NationalSecurityAgency#9184). Original commit: NSA/ghidra@47ff5cd357a60c9649a91b6ec8e331c1a0db7b3f Original author: 0xDI <0xDI@users.noreply.github.com> Co-authored-by: 0xDI <0xDI@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #9184
All 36 YMM-form FMA instructions in
fma.sincdeclaredlocal tmp:16(128-bit) for their p-code temporary. The pcodeop return value was truncated to 128 bits before being zero-extended into the 256-bit ZmmReg destination, silently zeroing the upper 128 bits of any accumulated result on each iteration. This breaks correct emulation of vectorized multiply-accumulate loops in the p-code emulator and concolic engine.XMM forms correctly use
tmp:16(128-bit). YMM forms requiretmp:32(256-bit). Changed all 36 affected definitions accordingly.