floating point add/sub of appropriate shuffle vectors. Does not
synthesize the 256 bit AVX versions because they work differently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140332 91177308-0d34-0410-b5e6-96231b3b80d8
- x87: no min or max.
- SSE1: min/max for single precision scalars and vectors.
- SSE2: min/max for single and double precision scalars and vectors.
- AVX: as SSE2, but also supports the wider ymm vectors. (this is covered by the isTypeLegal check)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140296 91177308-0d34-0410-b5e6-96231b3b80d8
assert(!"error message");
To:
assert(0 && "error message");
which is more consistant across the code base.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140234 91177308-0d34-0410-b5e6-96231b3b80d8
dag-combine optimization to implement the ext-load efficiently (using shuffles).
For example the type <4 x i8> is stored in memory as i32, but it needs to
find its way into a <4 x i32> register. Previously we scalarized the memory
access, now we use shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139995 91177308-0d34-0410-b5e6-96231b3b80d8
maxps and maxpd). This broke the sse41-blend.ll testcase by causing
maxpd to be produced rather than a cmp+blend pair, which is the reason
I tweaked it. Gives a small speedup on doduc with dragonegg when the
GCC vectorizer is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139986 91177308-0d34-0410-b5e6-96231b3b80d8
take into consideration the presence of AVX. This change, together with
the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully)
emit the same code as SSE for 128-bit vector ops. I don't
have a testcase for this, but AVX now beats SSE in performance for
128-bit ops in the majority of programas in the llvm testsuite
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139817 91177308-0d34-0410-b5e6-96231b3b80d8
However with this fix it does now.
Basically the operand order for the x86 target specific node
is not the same as the instruction, but since the intrinsic need that
specific order at the instruction definition, just change the order
during legalization. Also, there were some wrong invertions of condition
codes, such as GE => LE, GT => LT, fix that too. Fix PR10907.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139528 91177308-0d34-0410-b5e6-96231b3b80d8
Undo the changes from r139285 which added custom lowering to vselect.
Add tablegen lowering for vselect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139479 91177308-0d34-0410-b5e6-96231b3b80d8
assert("not implemented for target shuffle node");
to:
assert(0 && "not implemented for target shuffle node");
This causes a test failure in CodeGen/X86/palignr.ll which has
been marked as XFAIL for the time being.
Test failure filed at PR10901.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139454 91177308-0d34-0410-b5e6-96231b3b80d8
in Nadav's r139285 and r139287 commits.
1) Rename vsel.ll to a more descriptive name
2) Change the order of BLEND operands to "Op1, Op2, Cond", this is
necessary because PBLENDVB is already used in different places with
this order, and it was being emitted in the wrong way for vselect
3) Add AVX patterns and tests for the same SSE41 instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139305 91177308-0d34-0410-b5e6-96231b3b80d8
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139159 91177308-0d34-0410-b5e6-96231b3b80d8
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139140 91177308-0d34-0410-b5e6-96231b3b80d8
- Duplicate some store patterns to their AVX forms!
- Catched a bug while restricting the patterns subtarget, fix it
and update a testcase to check it properly
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138851 91177308-0d34-0410-b5e6-96231b3b80d8
code is inserted to first check if the current stacklet has enough
space. If so, space is allocated by simply decrementing the stack
pointer. Otherwise a runtime routine (__morestack_allocate_stack_space
in libgcc) is called which allocates the required memory from the
heap.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138818 91177308-0d34-0410-b5e6-96231b3b80d8
from DYNAMIC_STACKALLOC.
Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added. They
will be custom emitted to inject the actual stack handling code.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138814 91177308-0d34-0410-b5e6-96231b3b80d8
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138812 91177308-0d34-0410-b5e6-96231b3b80d8
explicit about which subtarget they refer to, and add AVX versions of
the ones we currently don't. Make the mask check more strict, to be
clear it won't be used to match to 256-bit versions!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138514 91177308-0d34-0410-b5e6-96231b3b80d8