Skip to content

Commit a1ffb70

Browse files
committed
Use rr-safe nopl; rdtsc sequence
When running under `rr`, it needs to patch out `rdtsc` to record the values returned. If this is not possible, `rr` falls back to an expensive signal-based emulation. As of rr master, a specific `nopl; rdtsc` sequence may be used to guarantee that `rdtsc` patching is always possible. Use this sequence for uses of rdtsc in our runtime.
1 parent ec07825 commit a1ffb70

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/julia_internal.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,8 +209,17 @@ JL_DLLEXPORT void jl_unlock_profile_wr(void) JL_NOTSAFEPOINT JL_NOTSAFEPOINT_LEA
209209
static inline uint64_t cycleclock(void) JL_NOTSAFEPOINT
210210
{
211211
#if defined(_CPU_X86_64_)
212+
// This is nopl 0(%rax, %rax, 1), but assembler are incosistent about whether
213+
// they emit that as a 4 or 5 byte sequence and we need to be guaranteed to use
214+
// the 5 byte one.
215+
#define NOP5_OVERRIDE_NOP ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n\t"
212216
uint64_t low, high;
213-
__asm__ volatile("rdtsc" : "=a"(low), "=d"(high));
217+
// This instruction sequence is promised by rr to be patchable. rr can usually
218+
// also patch `rdtsc` in regular code, but without the preceeding nop, there could
219+
// be an interfering branch into the middle of rr's patch region. Using this
220+
// sequence prevents a massive rr-induced slowdown if the compiler happens to emit
221+
// an unlucky pattern. See https://github.com/rr-debugger/rr/pull/3580.
222+
__asm__ volatile(NOP5_OVERRIDE_NOP "rdtsc" : "=a"(low), "=d"(high));
214223
return (high << 32) | low;
215224
#elif defined(_CPU_X86_)
216225
int64_t ret;

0 commit comments

Comments
 (0)