The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115243 91177308-0d34-0410-b5e6-96231b3b80d8
edited during emission.
If the basic block ends in a switch that gets lowered to a jump table, any
phis at the default edge were getting updated wrong. The jump table data
structure keeps a pointer to the header blocks that wasn't getting updated
after the MBB is split.
This bug was exposed on 32-bit Linux when disabling critical edge splitting in
codegen prepare.
The fix is to uipdate stale MBB pointers whenever a block is split during
emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115191 91177308-0d34-0410-b5e6-96231b3b80d8
LDM/STM instructions can run one cycle faster on some ARM processors if the
memory address is 64-bit aligned. Radar 8489376.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115047 91177308-0d34-0410-b5e6-96231b3b80d8
cost modeling for if-conversion. Now if only we had a way to estimate the misprediction probability.
Adjsut CodeGen/ARM/ifcvt10.ll. The pipeline on Cortex-A8 is long enough that it is still profitable
to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114995 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114973 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.
It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.
Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114859 91177308-0d34-0410-b5e6-96231b3b80d8
support aligned comm. Detect when compiling for 10.4 and don't
emit an alignment for comm. THis will hopefully fix PR8198.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114817 91177308-0d34-0410-b5e6-96231b3b80d8
x86-32: 32-bit calls were named "call" not "calll". 64-bit calls were correctly
named "callq", so this only impacted x86-32.
This fixes rdar://8456370 - llvm-mc rejects 'calll'
This also exposes that mingw/64 is generating a 32-bit call instead of a 64-bit call,
I will file a bugzilla.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114534 91177308-0d34-0410-b5e6-96231b3b80d8
(sbbl x, x) sets the registers to 0 or ~0. Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.
This fixes <rdar://problem/8449754>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114460 91177308-0d34-0410-b5e6-96231b3b80d8
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.
This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114348 91177308-0d34-0410-b5e6-96231b3b80d8
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114340 91177308-0d34-0410-b5e6-96231b3b80d8
NO path to the destination containing side effects, not that SOME path contains no side effects.
In practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114268 91177308-0d34-0410-b5e6-96231b3b80d8
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert. Radar 8407927.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114233 91177308-0d34-0410-b5e6-96231b3b80d8
1) Do forward copy propagation. This makes it easier to estimate the cost of the
instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
PHI nodes.
Critical edge splitting is not yet enabled by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114227 91177308-0d34-0410-b5e6-96231b3b80d8