AArch64 LD4 instruction causing translation failure #1404

HclX · 2021-06-05T22:54:51Z

While I was emulating an AArch64 program, the SIMD instructions (not sure if this is the right term) is causing problems, after digging around I think I've isolated the issue:

Here is a sample code:

#include <stdio.h>
#include <unicorn.h>
#include <assert.h>

void hook_code(uc_engine *uc, uint64_t address, uint32_t size, void *user_data) {
    printf("pc=%zX\n", address);
}

int main(int argc, char* argv[]){
    uc_engine* uc;

    assert(UC_ERR_OK == uc_open(UC_ARCH_ARM64, UC_MODE_ARM, &uc));

    const uint8_t code[] = {
        "\xEA\xA4\x08\x6F" //   USHLL2          V10.8H, V7.16B, #0
        "\x03\x01\x7E\x6E" //   UADDL2          V3.4S, V8.8H, V30.8H
        "\x05\x01\x7E\x2E" //   UADDL           V5.4S, V8.4H, V30.4H
        "\xE6\x03\x70\x6E" //   UADDL2          V6.4S, V31.8H, V16.8H
        "\xE7\x03\x70\x2E" //   UADDL           V7.4S, V31.4H, V16.4H
        "\x8A\x02\x0E\x8B" //   ADD             X10, X20, X14
        "\xE7\x10\x73\x2E" //   UADDW           V7.4S, V7.4S, V19.4H
        "\xC6\x10\x73\x6E" //   UADDW2          V6.4S, V6.4S, V19.8H
        "\xA5\x10\x72\x2E" //   UADDW           V5.4S, V5.4S, V18.4H
        "\x63\x10\x72\x6E" //   UADDW2          V3.4S, V3.4S, V18.8H
        "\x82\xA4\x08\x2F" //   USHLL           V2.8H, V4.8B, #0
        "\x4C\x01\x40\x4C" //   LD4             {V12.16B-V15.16B}, [X10]
#if 0
        "\x63\x10\x60\x6E" //   UADDW2          V3.4S, V3.4S, V0.8H
        "\xA0\x10\x60\x2E" //   UADDW           V0.4S, V5.4S, V0.4H
        "\xC5\x10\x74\x6E" //   UADDW2          V5.4S, V6.4S, V20.8H
        "\xE6\x10\x74\x2E" //   UADDW           V6.4S, V7.4S, V20.4H
        "\x81\xA4\x08\x6F" //   USHLL2          V1.8H, V4.16B, #0
        "\xC6\x10\x62\x2E" //   UADDW           V6.4S, V6.4S, V2.4H
        "\xA2\x10\x62\x6E" //   UADDW2          V2.4S, V5.4S, V2.8H
        "\x00\x10\x61\x2E" //   UADDW           V0.4S, V0.4S, V1.4H
#endif
    };


    uint64_t addr = 0x100000000;
    assert(UC_ERR_OK == uc_mem_map(uc, addr, 4096, UC_PROT_ALL));
    assert(UC_ERR_OK == uc_mem_write(uc, addr, code, sizeof(code)));

    uint64_t x10 = addr;
    uc_reg_write(uc, UC_ARM64_REG_X0, &x10);

    uint64_t cpacr_el1;
    assert(UC_ERR_OK == uc_reg_read(uc, UC_ARM64_REG_CPACR_EL1, &cpacr_el1));
    cpacr_el1 |= (3 << 20);

    assert(UC_ERR_OK == uc_reg_write(uc, UC_ARM64_REG_CPACR_EL1, &cpacr_el1));

    uc_hook hook;
    assert(UC_ERR_OK == uc_hook_add(uc, &hook, UC_HOOK_CODE, (void*)hook_code, NULL, addr, 0));

    auto err = uc_emu_start(uc, addr, addr + sizeof(code), 0, 0);

    uint64_t pc;
    uc_reg_read(uc, UC_ARM64_REG_PC, &pc);
    printf("pc=%zX, err=%d\n", pc, err);

    return 0;
}

Unicorn calls gen_intermediate_code_internal_a64 to translate the code. The code is translated into a buffer named tcg_ctx->gen_opc_buf and is expected to not fill this buffer for more than OPC_MAX_SIZE elements. However, there is no easy way to check it. The current logic is like the following:

 tcg_ctx->gen_opc_ptr = tcg_ctx->gen_opc_buf;
gen_opc_end = tcg_ctx->gen_opc_buf + OPC_MAX_SIZE;

do {
   translate_one_a64_instruction(); // inside it fills the buffer and increases tcg_ctx->gen_opc_ptr  as needed.
} while (tcg_ctx->gen_opc_ptr < gen_opc_end);

Unfortunately in the sample code the single one instruction LD4 {V12.16B-V15.16B}, [X10] generates more than 0x300 elements in tcg_ctx->gen_opc_buf, causing it go way beyond the expected end of the buffer.

This will later cause the failure in static TCGArg *tcg_constant_folding(TCGContext *s, uint16_t *tcg_opc_ptr, at line 554:

    nb_ops = tcg_opc_ptr - s->gen_opc_buf;
    if (nb_ops > OPC_BUF_SIZE) {
        return NULL;
    }

Here is some debugger output about the size change:

Breakpoint 2, gen_intermediate_code_internal_a64_aarch64 (cpu=0x555555feb9b0, tb=0x7ffff6c0c010, search_pc=false) at /home/xuanxing/Work/unicorn/unicorn/qemu/target-arm/translate-a64.c:11200
(gdb) p {tcg_ctx->gen_opc_ptr,  gen_opc_end}
$18 = {0x7ffff7b933e6, 0x7ffff7b934e4}
(gdb) c
Continuing.

Breakpoint 2, gen_intermediate_code_internal_a64_aarch64 (cpu=0x555555feb9b0, tb=0x7ffff6c0c010, search_pc=false) at /home/xuanxing/Work/unicorn/unicorn/qemu/target-arm/translate-a64.c:11200
(gdb) p {tcg_ctx->gen_opc_ptr,  gen_opc_end}
$19 = {0x7ffff7b93778, 0x7ffff7b934e4}

So last instruction causes the tcg_ctx->gen_opc_ptr to increase from 0x7ffff7b933e6 to 0x7ffff7b93778, that's 0x392 elements in this buffer.

The text was updated successfully, but these errors were encountered:

wtdcode · 2021-06-06T05:05:41Z

Unicorn currently has bad support for AArch64. Stay tuned for Unicorn2 please.

Link to #1217.

wtdcode closed this as completed Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64 LD4 instruction causing translation failure #1404

AArch64 LD4 instruction causing translation failure #1404

HclX commented Jun 5, 2021

wtdcode commented Jun 6, 2021

AArch64 LD4 instruction causing translation failure #1404

AArch64 LD4 instruction causing translation failure #1404

Comments

HclX commented Jun 5, 2021

wtdcode commented Jun 6, 2021