David Sehr 6c4265a541 The current X86 NOP padding uses one long NOP followed by the remainder in
one-byte NOPs.  If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact.  From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12.  This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176464 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-05 00:02:23 +00:00

28 lines
773 B
ArmAsm

# RUN: llvm-mc -filetype=obj -triple x86_64-pc-linux-gnu %s -o - \
# RUN: | llvm-objdump -disassemble -no-show-raw-insn - | FileCheck %s
# Test that long nops are generated for padding where possible.
.text
foo:
.bundle_align_mode 5
# This callq instruction is 5 bytes long
.bundle_lock align_to_end
callq bar
.bundle_unlock
# To align this group to a bundle end, we need a 15-byte NOP and a 12-byte NOP.
# CHECK: 0: nop
# CHECK-NEXT: f: nop
# CHECK-NEXT: 1b: callq
# This push instruction is 1 byte long
.bundle_lock align_to_end
push %rax
.bundle_unlock
# To align this group to a bundle end, we need two 15-byte NOPs, and a 1-byte.
# CHECK: 20: nop
# CHECK-NEXT: 2f: nop
# CHECK-NEXT: 3e: nop
# CHECK-NEXT: 3f: pushq