the X86 asmparser to produce ranges in the one case that was annoying me, for example:
test.s:10:15: error: invalid operand for instruction
movl 0(%rax), 0(%edx)
^~~~~~~
It should be straight-forward to enhance filecheck, tblgen, and/or the .ll parser to use
ranges where appropriate if someone is interested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142106 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen infers unmodeled side effects on instructions without a
pattern. Fix some instruction definitions where that was overlooked.
Also raise an error if a rematerializable instruction has unmodeled side
effects. That doen't make any sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141929 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen will mark any pattern-less instruction as having unmodeled side
effects. This is extra bad for V_SET0 which gets rematerialized a lot.
This was part of the cause for PR11125, but the real bug was fixed
in r141923.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141924 91177308-0d34-0410-b5e6-96231b3b80d8
release the stack segment and reset the stack pointer. Place the code in its own
MBB to make the verifier happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141859 91177308-0d34-0410-b5e6-96231b3b80d8
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141599 91177308-0d34-0410-b5e6-96231b3b80d8
A GR8_NOREX virtual register is created when extrating a sub_8bit_hi
sub-register:
%vreg2<def> = COPY %vreg1:sub_8bit_hi; GR8_NOREX:%vreg2 %GR64_ABCD:%vreg1
TEST8ri_NOREX %vreg2, 1, %EFLAGS<imp-def>; GR8_NOREX:%vreg2
If such a live range is ever split, its register class must not be
inflated to GR8. The sub-register copy can only target GR8_NOREX.
I dont have a test case for this theoretical bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141500 91177308-0d34-0410-b5e6-96231b3b80d8
In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
target all GR8 registers, only those in GR8_NOREX.
TO enforce this, we ensure that all instructions using the
EXTRACT_SUBREG are GR8_NOREX constrained.
This fixes PR11088.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141499 91177308-0d34-0410-b5e6-96231b3b80d8
This instruction is explicitly encoded without an REX prefix, so both
operands but be *_NOREX.
Also add an assertion to copyPhysReg() that fires when the MOV8rr_NOREX
constraints are not satisfied.
This fixes a miscompilation in 20040709-2 in the gcc test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141410 91177308-0d34-0410-b5e6-96231b3b80d8
There are fewer registers with sub_8bit sub-registers in 32-bit mode
than in 64-bit mode. In 32-bit mode, sub_8bit behaves the same as
sub_8bit_hi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141206 91177308-0d34-0410-b5e6-96231b3b80d8
This uses less memory and it reduces the complexity of sub-class
operations:
- hasSubClassEq() and friends become O(1) instead of O(N).
- getCommonSubClass() becomes O(N) instead of O(N^2).
In the future, TableGen will infer register classes. This makes it
cheap to add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140898 91177308-0d34-0410-b5e6-96231b3b80d8
This also makes it possible to reduce the number of pseudo instructions
and get rid of the encoding information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140776 91177308-0d34-0410-b5e6-96231b3b80d8
This also enables domain swizzling for AVX code which required a few
trivial test changes.
The pass will be moved to lib/CodeGen shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140659 91177308-0d34-0410-b5e6-96231b3b80d8
I am going to unify the SSEDomainFix and NEONMoveFix passes into a
single target independent pass. They are essentially doing the same
thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140652 91177308-0d34-0410-b5e6-96231b3b80d8
We already support GR64 <-> VR128 copies. All of these copies break
partial register dependencies by zeroing the high part of the target
register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140348 91177308-0d34-0410-b5e6-96231b3b80d8
floating point add/sub of appropriate shuffle vectors. Does not
synthesize the 256 bit AVX versions because they work differently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140332 91177308-0d34-0410-b5e6-96231b3b80d8
- x87: no min or max.
- SSE1: min/max for single precision scalars and vectors.
- SSE2: min/max for single and double precision scalars and vectors.
- AVX: as SSE2, but also supports the wider ymm vectors. (this is covered by the isTypeLegal check)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140296 91177308-0d34-0410-b5e6-96231b3b80d8
assert(!"error message");
To:
assert(0 && "error message");
which is more consistant across the code base.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140234 91177308-0d34-0410-b5e6-96231b3b80d8
dag-combine optimization to implement the ext-load efficiently (using shuffles).
For example the type <4 x i8> is stored in memory as i32, but it needs to
find its way into a <4 x i32> register. Previously we scalarized the memory
access, now we use shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139995 91177308-0d34-0410-b5e6-96231b3b80d8
maxps and maxpd). This broke the sse41-blend.ll testcase by causing
maxpd to be produced rather than a cmp+blend pair, which is the reason
I tweaked it. Gives a small speedup on doduc with dragonegg when the
GCC vectorizer is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139986 91177308-0d34-0410-b5e6-96231b3b80d8
are declared with load patterns. This fix the crash in PR10941. No testcases,
since a fold is triggered and then converted back to the register form
afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139953 91177308-0d34-0410-b5e6-96231b3b80d8
This PR basically reports a problem where a crash in generated code
happened due to %rbp being clobbered:
pushq %rbp
movq %rsp, %rbp
....
vmovmskps %ymm12, %ebp
....
movq %rbp, %rsp
popq %rbp
ret
Since Eric's r123367 commit, the default stack alignment for x86 32-bit
has changed to be 16-bytes. Since then, the MaxStackAlignmentHeuristicPass
hasn't been really used, but with AVX it becomes useful again, since per
ABI compliance we don't always align the stack to 256-bit, but only when
there are 256-bit incoming arguments.
ReserveFP was only used by this pass, but there's no RA target hook that
uses getReserveFP() to check for the presence of FP (since nothing was
triggering the pass to run, the uses of getReserveFP() were removed
through time without being noticed). Change this pass to use
setForceFramePointer, which is properly called by MachineFunction
hasFP method.
The testcase is very big and dependent on RA, not sure if it's worth
adding to test/CodeGen/X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139939 91177308-0d34-0410-b5e6-96231b3b80d8
take into consideration the presence of AVX. This change, together with
the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully)
emit the same code as SSE for 128-bit vector ops. I don't
have a testcase for this, but AVX now beats SSE in performance for
128-bit ops in the majority of programas in the llvm testsuite
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139817 91177308-0d34-0410-b5e6-96231b3b80d8
alignment check for 256-bit classes more strict. There're no testcases
but we catch more folding cases for AVX while running single and multi
sources in the llvm testsuite.
Since some 128-bit AVX instructions have different number of operands
than their SSE counterparts, they are placed in different tables.
256-bit AVX instructions should also be added in the table soon. And
there a few more 128-bit versions to handled, which should come in
the following commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139687 91177308-0d34-0410-b5e6-96231b3b80d8
more strict about the alignment checking. This was found by inspection
and I don't have any testcases so far, although the llvm testsuite runs
without any problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139625 91177308-0d34-0410-b5e6-96231b3b80d8
However with this fix it does now.
Basically the operand order for the x86 target specific node
is not the same as the instruction, but since the intrinsic need that
specific order at the instruction definition, just change the order
during legalization. Also, there were some wrong invertions of condition
codes, such as GE => LE, GT => LT, fix that too. Fix PR10907.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139528 91177308-0d34-0410-b5e6-96231b3b80d8
Undo the changes from r139285 which added custom lowering to vselect.
Add tablegen lowering for vselect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139479 91177308-0d34-0410-b5e6-96231b3b80d8
assert("not implemented for target shuffle node");
to:
assert(0 && "not implemented for target shuffle node");
This causes a test failure in CodeGen/X86/palignr.ll which has
been marked as XFAIL for the time being.
Test failure filed at PR10901.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139454 91177308-0d34-0410-b5e6-96231b3b80d8
single field (Flags), which is a bitwise OR of items from the TB_*
enum. This makes it easier to add new information in the future.
* Gives every static array an equivalent layout: { RegOp, MemOp, Flags }
* Adds a helper function, AddTableEntry, to avoid duplication of the
insertion code.
* Renames TB_NOT_REVERSABLE to TB_NO_REVERSE.
* Adds TB_NO_FORWARD, which is analogous to TB_NO_REVERSE, except that
it prevents addition of the Reg->Mem entry. (This is going to be used
by Native Client, in the next CL).
Patch by David Meyer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139311 91177308-0d34-0410-b5e6-96231b3b80d8
in Nadav's r139285 and r139287 commits.
1) Rename vsel.ll to a more descriptive name
2) Change the order of BLEND operands to "Op1, Op2, Cond", this is
necessary because PBLENDVB is already used in different places with
this order, and it was being emitted in the wrong way for vselect
3) Add AVX patterns and tests for the same SSE41 instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139305 91177308-0d34-0410-b5e6-96231b3b80d8
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139159 91177308-0d34-0410-b5e6-96231b3b80d8
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139140 91177308-0d34-0410-b5e6-96231b3b80d8
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139125 91177308-0d34-0410-b5e6-96231b3b80d8
The explanation about a 0 argument being materialized as xor is no
longer valid. Rematerialization will check if EFLAGS is live before
clobbering it.
The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.
This causes one less testb instruction to be generated in the cmov.ll
test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139057 91177308-0d34-0410-b5e6-96231b3b80d8
- Duplicate some store patterns to their AVX forms!
- Catched a bug while restricting the patterns subtarget, fix it
and update a testcase to check it properly
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138851 91177308-0d34-0410-b5e6-96231b3b80d8
In the case of EDInstInfo, this would actually cause a bug when -1 became 255
and was then compared >=0 in llvm-mc/Disassembler.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138825 91177308-0d34-0410-b5e6-96231b3b80d8
code is inserted to first check if the current stacklet has enough
space. If so, space is allocated by simply decrementing the stack
pointer. Otherwise a runtime routine (__morestack_allocate_stack_space
in libgcc) is called which allocates the required memory from the
heap.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138818 91177308-0d34-0410-b5e6-96231b3b80d8
from DYNAMIC_STACKALLOC.
Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added. They
will be custom emitted to inject the actual stack handling code.
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138814 91177308-0d34-0410-b5e6-96231b3b80d8
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138812 91177308-0d34-0410-b5e6-96231b3b80d8
explicit about which subtarget they refer to, and add AVX versions of
the ones we currently don't. Remove old and now wrong comments!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138515 91177308-0d34-0410-b5e6-96231b3b80d8
explicit about which subtarget they refer to, and add AVX versions of
the ones we currently don't. Make the mask check more strict, to be
clear it won't be used to match to 256-bit versions!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138514 91177308-0d34-0410-b5e6-96231b3b80d8
SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138317 91177308-0d34-0410-b5e6-96231b3b80d8
instead of 2. They were already defined this way in their regular
version, but not for the intrinsics versions (*_Int), and that would work
for assembly emission but not for object code, since a MachineOperand
would be missing. This commit fix PR10697.
Also removed the {VSQRT,VRSQRT,VRCP}r_Int forms and match the intrinsic
via INSERT_SUBREG+EXTRACT_SUBREG patterns. The same couldn't be done for
memory versions because sse_load_f32/sse_load_f64 operand need special
handling and don't work like regular "addr" operands.
There are right now 114 "*_Int" and 98 "Int_*" forms! I'm slowly
removing them as I step through, but hope we can get rid of these
someday, they are really annoying :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138012 91177308-0d34-0410-b5e6-96231b3b80d8
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137810 91177308-0d34-0410-b5e6-96231b3b80d8
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137807 91177308-0d34-0410-b5e6-96231b3b80d8
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:
For this shuffle:
shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
<i32 1, i32 0, i32 7, i32 6>
This was expanded to:
vextractf128 $1, %ymm1, %xmm2
vpextrq $0, %xmm2, %rax
vmovd %rax, %xmm1
vpextrq $1, %xmm2, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1
vpextrq $0, %xmm0, %rax
vmovd %rax, %xmm2
vpextrq $1, %xmm0, %rax
vmovd %rax, %xmm0
vpunpcklqdq %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
ret
Now we get:
vshufpd $1, %xmm0, %xmm0, %xmm0
vextractf128 $1, %ymm1, %xmm1
vshufpd $1, %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137733 91177308-0d34-0410-b5e6-96231b3b80d8
Allow a target assembly parser to do context sensitive constraint checking
on a potential instruction match. This will be used, for example, to handle
Thumb2 IT block parsing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137675 91177308-0d34-0410-b5e6-96231b3b80d8
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137519 91177308-0d34-0410-b5e6-96231b3b80d8
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137362 91177308-0d34-0410-b5e6-96231b3b80d8
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137308 91177308-0d34-0410-b5e6-96231b3b80d8
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137238 91177308-0d34-0410-b5e6-96231b3b80d8
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137227 91177308-0d34-0410-b5e6-96231b3b80d8