Skip to content

AArch64 LD4 instruction causing translation failure #1404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HclX opened this issue Jun 5, 2021 · 1 comment
Closed

AArch64 LD4 instruction causing translation failure #1404

HclX opened this issue Jun 5, 2021 · 1 comment

Comments

@HclX
Copy link

HclX commented Jun 5, 2021

While I was emulating an AArch64 program, the SIMD instructions (not sure if this is the right term) is causing problems, after digging around I think I've isolated the issue:

Here is a sample code:

#include <stdio.h>
#include <unicorn.h>
#include <assert.h>

void hook_code(uc_engine *uc, uint64_t address, uint32_t size, void *user_data) {
    printf("pc=%zX\n", address);
}

int main(int argc, char* argv[]){
    uc_engine* uc;

    assert(UC_ERR_OK == uc_open(UC_ARCH_ARM64, UC_MODE_ARM, &uc));

    const uint8_t code[] = {
        "\xEA\xA4\x08\x6F" //   USHLL2          V10.8H, V7.16B, #0
        "\x03\x01\x7E\x6E" //   UADDL2          V3.4S, V8.8H, V30.8H
        "\x05\x01\x7E\x2E" //   UADDL           V5.4S, V8.4H, V30.4H
        "\xE6\x03\x70\x6E" //   UADDL2          V6.4S, V31.8H, V16.8H
        "\xE7\x03\x70\x2E" //   UADDL           V7.4S, V31.4H, V16.4H
        "\x8A\x02\x0E\x8B" //   ADD             X10, X20, X14
        "\xE7\x10\x73\x2E" //   UADDW           V7.4S, V7.4S, V19.4H
        "\xC6\x10\x73\x6E" //   UADDW2          V6.4S, V6.4S, V19.8H
        "\xA5\x10\x72\x2E" //   UADDW           V5.4S, V5.4S, V18.4H
        "\x63\x10\x72\x6E" //   UADDW2          V3.4S, V3.4S, V18.8H
        "\x82\xA4\x08\x2F" //   USHLL           V2.8H, V4.8B, #0
        "\x4C\x01\x40\x4C" //   LD4             {V12.16B-V15.16B}, [X10]
#if 0
        "\x63\x10\x60\x6E" //   UADDW2          V3.4S, V3.4S, V0.8H
        "\xA0\x10\x60\x2E" //   UADDW           V0.4S, V5.4S, V0.4H
        "\xC5\x10\x74\x6E" //   UADDW2          V5.4S, V6.4S, V20.8H
        "\xE6\x10\x74\x2E" //   UADDW           V6.4S, V7.4S, V20.4H
        "\x81\xA4\x08\x6F" //   USHLL2          V1.8H, V4.16B, #0
        "\xC6\x10\x62\x2E" //   UADDW           V6.4S, V6.4S, V2.4H
        "\xA2\x10\x62\x6E" //   UADDW2          V2.4S, V5.4S, V2.8H
        "\x00\x10\x61\x2E" //   UADDW           V0.4S, V0.4S, V1.4H
#endif
    };


    uint64_t addr = 0x100000000;
    assert(UC_ERR_OK == uc_mem_map(uc, addr, 4096, UC_PROT_ALL));
    assert(UC_ERR_OK == uc_mem_write(uc, addr, code, sizeof(code)));

    uint64_t x10 = addr;
    uc_reg_write(uc, UC_ARM64_REG_X0, &x10);

    uint64_t cpacr_el1;
    assert(UC_ERR_OK == uc_reg_read(uc, UC_ARM64_REG_CPACR_EL1, &cpacr_el1));
    cpacr_el1 |= (3 << 20);

    assert(UC_ERR_OK == uc_reg_write(uc, UC_ARM64_REG_CPACR_EL1, &cpacr_el1));

    uc_hook hook;
    assert(UC_ERR_OK == uc_hook_add(uc, &hook, UC_HOOK_CODE, (void*)hook_code, NULL, addr, 0));

    auto err = uc_emu_start(uc, addr, addr + sizeof(code), 0, 0);

    uint64_t pc;
    uc_reg_read(uc, UC_ARM64_REG_PC, &pc);
    printf("pc=%zX, err=%d\n", pc, err);

    return 0;
}

Unicorn calls gen_intermediate_code_internal_a64 to translate the code. The code is translated into a buffer named tcg_ctx->gen_opc_buf and is expected to not fill this buffer for more than OPC_MAX_SIZE elements. However, there is no easy way to check it. The current logic is like the following:

 tcg_ctx->gen_opc_ptr = tcg_ctx->gen_opc_buf;
gen_opc_end = tcg_ctx->gen_opc_buf + OPC_MAX_SIZE;

do {
   translate_one_a64_instruction(); // inside it fills the buffer and increases tcg_ctx->gen_opc_ptr  as needed.
} while (tcg_ctx->gen_opc_ptr < gen_opc_end);

Unfortunately in the sample code the single one instruction LD4 {V12.16B-V15.16B}, [X10] generates more than 0x300 elements in tcg_ctx->gen_opc_buf, causing it go way beyond the expected end of the buffer.

This will later cause the failure in static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, at line 554:

    nb_ops = tcg_opc_ptr - s->gen_opc_buf;
    if (nb_ops > OPC_BUF_SIZE) {
        return NULL;
    }

Here is some debugger output about the size change:

Breakpoint 2, gen_intermediate_code_internal_a64_aarch64 (cpu=0x555555feb9b0, tb=0x7ffff6c0c010, search_pc=false) at /home/xuanxing/Work/unicorn/unicorn/qemu/target-arm/translate-a64.c:11200
(gdb) p {tcg_ctx->gen_opc_ptr,  gen_opc_end}
$18 = {0x7ffff7b933e6, 0x7ffff7b934e4}
(gdb) c
Continuing.

Breakpoint 2, gen_intermediate_code_internal_a64_aarch64 (cpu=0x555555feb9b0, tb=0x7ffff6c0c010, search_pc=false) at /home/xuanxing/Work/unicorn/unicorn/qemu/target-arm/translate-a64.c:11200
(gdb) p {tcg_ctx->gen_opc_ptr,  gen_opc_end}
$19 = {0x7ffff7b93778, 0x7ffff7b934e4}

So last instruction causes the tcg_ctx->gen_opc_ptr to increase from 0x7ffff7b933e6 to 0x7ffff7b93778, that's 0x392 elements in this buffer.

@wtdcode
Copy link
Member

wtdcode commented Jun 6, 2021

Unicorn currently has bad support for AArch64. Stay tuned for Unicorn2 please.

Link to #1217.

@wtdcode wtdcode closed this as completed Oct 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants