of the sets is volatile. We were dropping the volatile bit of the
merged in set, leading (luckily) to assertions in cases like
PR7535. I cannot produce a testcase that repros with opt, but this
is obviously correct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112402 91177308-0d34-0410-b5e6-96231b3b80d8
times. This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle
instructions which indicates the shuffle mask, e.g.:
insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic. I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112387 91177308-0d34-0410-b5e6-96231b3b80d8
when the top elements of a vector are undefined. This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined. For example, on:
_Complex float f32(_Complex float A, _Complex float B) {
return A+B;
}
We used to produce (with SSE2, SSE4.1+ uses insertps):
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $16, %xmm2, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm1
movdqa %xmm2, %xmm0
unpcklps %xmm1, %xmm0
ret
We now produce:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movaps %xmm2, %xmm0
unpcklps %xmm3, %xmm0
ret
This implements rdar://8368414
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112378 91177308-0d34-0410-b5e6-96231b3b80d8
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112377 91177308-0d34-0410-b5e6-96231b3b80d8
According to the Microsoft documentation here:
http://msdn.microsoft.com/en-us/library/ms724284%28VS.85%29.aspx
this cast used in lib/System/Win32/Path.inc:
__int64 ft = *reinterpret_cast<__int64*>(&fi.ftLastWriteTime);
should not be done. The documentation says: "Do not cast a pointer to a
FILETIME structure to either a ULARGE_INTEGER* or __int64* value because
it can cause alignment faults on 64-bit Windows."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112376 91177308-0d34-0410-b5e6-96231b3b80d8
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112348 91177308-0d34-0410-b5e6-96231b3b80d8
the special values that for ARM would be used with IB or DA modes. Fall
through and consider materializing a new base address is it would be
profitable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112329 91177308-0d34-0410-b5e6-96231b3b80d8
all the other LDM/STM instructions. This fixes asm printer crashes when
compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.
Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier. Much of the backend
was not aware of these special cases. The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode. I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON. Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112322 91177308-0d34-0410-b5e6-96231b3b80d8
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112314 91177308-0d34-0410-b5e6-96231b3b80d8