- Rewirte most atomic instructions in templates for both better
maintenance and future extensions, such as HLE in TSX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164357 91177308-0d34-0410-b5e6-96231b3b80d8
- Rewrite/merge pseudo-atomic instruction emitters to address the
following issue:
* Reduce one unnecessary load in spin-loop
previously the spin-loop looks like
thisMBB:
newMBB:
ld t1 = [bitinstr.addr]
op t2 = t1, [bitinstr.val]
not t3 = t2 (if Invert)
mov EAX = t1
lcs dest = [bitinstr.addr], t3 [EAX is implicit]
bz newMBB
fallthrough -->nextMBB
the 'ld' at the beginning of newMBB should be lift out of the loop
as lcs (or CMPXCHG on x86) will load the current memory value into
EAX. This loop is refined as:
thisMBB:
EAX = LOAD [MI.addr]
mainMBB:
t1 = OP [MI.val], EAX
LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
JNE mainMBB
sinkMBB:
* Remove immopc as, so far, all pseudo-atomic instructions has
all-register form only, there is no immedidate operand.
* Remove unnecessary attributes/modifiers in pseudo-atomic instruction
td
* Fix issues in PR13458
- Add comprehensive tests on atomic ops on various data types.
NOTE: Some of them are turned off due to missing functionality.
- Revise tests due to the new spin-loop generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164281 91177308-0d34-0410-b5e6-96231b3b80d8
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.
<rdar://problem/12282281>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163819 91177308-0d34-0410-b5e6-96231b3b80d8
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.
This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.
<rdar://problem/12282281>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163761 91177308-0d34-0410-b5e6-96231b3b80d8
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157818 91177308-0d34-0410-b5e6-96231b3b80d8
The getPointerRegClass() hook will return GR32_TC, or whatever is
appropriate for the current function.
Patch by Yiannis Tsiouris!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156459 91177308-0d34-0410-b5e6-96231b3b80d8
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
rdar: 10961709
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156312 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
X86InstrCompiler.td.
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153033 91177308-0d34-0410-b5e6-96231b3b80d8
The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@150708 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:
(sizeof(x)*8 - 1) ^ __builtin_clz(x)
Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.
The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.
Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.
These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147244 91177308-0d34-0410-b5e6-96231b3b80d8
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.
The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146974 91177308-0d34-0410-b5e6-96231b3b80d8
MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively. Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV). It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143062 91177308-0d34-0410-b5e6-96231b3b80d8
The explanation about a 0 argument being materialized as xor is no
longer valid. Rematerialization will check if EFLAGS is live before
clobbering it.
The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.
This causes one less testb instruction to be generated in the cmov.ll
test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139057 91177308-0d34-0410-b5e6-96231b3b80d8
from DYNAMIC_STACKALLOC.
Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added. They
will be custom emitted to inject the actual stack handling code.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138814 91177308-0d34-0410-b5e6-96231b3b80d8
doesn't, match it back to setb.
On a 64-bit version of the testcase before we'd get:
movq %rdi, %rax
addq %rsi, %rax
sbbb %dl, %dl
andb $1, %dl
ret
now we get:
movq %rdi, %rax
addq %rsi, %rax
setb %dl
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122217 91177308-0d34-0410-b5e6-96231b3b80d8
consistently by moving it out of lowering into dag combine.
Add some missing patterns for matching away extended versions of setcc_c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122201 91177308-0d34-0410-b5e6-96231b3b80d8
and make it a hard error for instructions to not have an asm string.
These instructions should be marked isCodeGenOnly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117861 91177308-0d34-0410-b5e6-96231b3b80d8