This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169425 91177308-0d34-0410-b5e6-96231b3b80d8
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169312 91177308-0d34-0410-b5e6-96231b3b80d8
on 64-bit PowerPC ELF.
The patch includes code to handle external assembly and MC output with the
integrated assembler. It intentionally does not support the "old" JIT.
For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:
Code sequence Relocation Symbol
ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x
add 9,9,x@tls R_PPC64_TLS x
The register 9 is arbitrary here. The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x. It will replace x@tls with the thread-pointer
register (13).
The two test cases verify correct assembly output and relocation output
as just described.
PowerPC-specific selection node variants are added for the two
instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.
The rest of the processing is straightforward.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169281 91177308-0d34-0410-b5e6-96231b3b80d8
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169218 91177308-0d34-0410-b5e6-96231b3b80d8
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169197 91177308-0d34-0410-b5e6-96231b3b80d8
The TwoAddressInstructionPass takes the machine code out of SSA form by
expanding REG_SEQUENCE instructions into copies. It is no longer
necessary to rewrite the registers used by a REG_SEQUENCE instruction
because the new coalescer algorithm can do it now.
REG_SEQUENCE is just converted to a sequence of sub-register copies now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169067 91177308-0d34-0410-b5e6-96231b3b80d8
Codegen was failing with an assertion because of unexpected vector
operands when legalizing the selection DAG for a MUL instruction.
The asserting code was legalizing multiplies for vectors of size 128
bits. It uses a custom lowering to try and detect cases where it can
use a VMULL instruction instead of a VMOVL + VMUL. The code was
looking for input operands to the MUL that had been sign or zero
extended. If it found the extended operands it would drop the
sign/zero extension and use the original vector size as input to a
VMULL instruction.
The code assumed that the original input vector was 64 bits so that
after dropping the extension it would fit directly into a D register
and could be used as an operand of a VMULL instruction. The input
code that trigger the failure used a vector of <4 x i8> that was
sign extended to <4 x i32>. It was not safe to drop the sign
extension in this case because the original vector is only 32 bits
wide. The fix is to insert a sign extension for the vector to reach
the required 64 bit size. In this particular example, the vector would
need to be sign extented to a <4 x i16>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169024 91177308-0d34-0410-b5e6-96231b3b80d8
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.
The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).
The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168998 91177308-0d34-0410-b5e6-96231b3b80d8
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168930 91177308-0d34-0410-b5e6-96231b3b80d8
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168883 91177308-0d34-0410-b5e6-96231b3b80d8
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168882 91177308-0d34-0410-b5e6-96231b3b80d8
This could cause miscompilations in targets where sub-register
composition is not always idempotent (ARM).
<rdar://problem/12758887>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168837 91177308-0d34-0410-b5e6-96231b3b80d8
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168773 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a hole in the "cheap" alias analysis logic implemented within
the DAG builder itself, regardless of whether proper alias analysis is
enabled. It now handles this pattern produced by LSR+CodeGenPrepare.
%sunkaddr1 = ptrtoint * %obj to i64
%sunkaddr2 = add i64 %sunkaddr1, %lsr.iv
%sunkaddr3 = inttoptr i64 %sunkaddr2 to i32*
store i32 %v, i32* %sunkaddr3
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168768 91177308-0d34-0410-b5e6-96231b3b80d8
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS. Defaults for Darwin
are unaffected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168747 91177308-0d34-0410-b5e6-96231b3b80d8
boundaries.
Given the following case:
BB0
%vreg1<def> = SUBrr %vreg0, %vreg7
%vreg2<def> = COPY %vreg7
BB1
%vreg10<def> = SUBrr %vreg0, %vreg2
We should be able to CSE between SUBrr in BB0 and SUBrr in BB1.
rdar://12462006
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168717 91177308-0d34-0410-b5e6-96231b3b80d8
when the destination register is wider than the memory load.
These load instructions load from m32 or m64 and set the upper bits to zero,
while the folded instructions may accept m128.
rdar://12721174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168710 91177308-0d34-0410-b5e6-96231b3b80d8
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer. Additionally,
only TOC entries are addressed via the TOC pointer.
With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset. Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed. Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.
Consider a load of an external 4-byte integer. With small code model, the
compiler generates:
ld 3, .LC1@toc(2)
lwz 4, 0(3)
.section .toc,"aw",@progbits
.LC1:
.tc ei[TC],ei
With medium model, it instead generates:
addis 3, 2, .LC1@toc@ha
ld 3, .LC1@toc@l(3)
lwz 4, 0(3)
.section .toc,"aw",@progbits
.LC1:
.tc ei[TC],ei
Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer. Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits. Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.
Consider next a load of a function-scope static integer. For small code
model, the compiler generates:
ld 3, .LC1@toc(2)
lwz 4, 0(3)
.section .toc,"aw",@progbits
.LC1:
.tc test_fn_static.si[TC],test_fn_static.si
.type test_fn_static.si,@object
.local test_fn_static.si
.comm test_fn_static.si,4,4
For medium code model, the compiler generates:
addis 3, 2, test_fn_static.si@toc@ha
addi 3, 3, test_fn_static.si@toc@l
lwz 4, 0(3)
.type test_fn_static.si,@object
.local test_fn_static.si
.comm test_fn_static.si,4,4
Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.
Note that it would be more efficient for the compiler to generate:
addis 3, 2, test_fn_static.si@toc@ha
lwz 4, test_fn_static.si@toc@l(3)
The current patch does not perform this optimization yet. This will be
addressed as a peephole optimization in a later patch.
For the moment, the default code model for 64-bit PowerPC will remain the
small code model. We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior. Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.
I've tested the regression suite and the application/benchmark test suite in
two ways: Once with the patch as submitted here, and once with additional
logic to force medium code model as the default. The tests all compile
cleanly, with one exception. The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers. It just so happens
that small code model was incredibly lucky, in that temporary values in
floating-point registers held the expected values needed by the external
library routine that was called incorrectly. My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."
Here are a few comments on how the patch works, since the selection code
can be difficult to follow:
The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses. These are expanded by SelectCodeCommon(). The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL. These new node types correspond naturally to
the sequences described above.
The addis/ld sequence is generated for the following cases:
* Jump table addresses
* Function addresses
* External global variables
* Tentative definitions of global variables (common linkage)
The addis/addi sequence is generated for the following cases:
* Constant pool entries
* File-scope static global variables
* Function-scope static variables
Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.
The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction. Each of the instructions is converted to
a "real" PowerPC instruction. When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).
I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL. So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.
Note that for LDtocL, we generate a new form of LD called LDrs. This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.
The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.
The above assumes use of an external assembler. For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter. Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168708 91177308-0d34-0410-b5e6-96231b3b80d8
argument. Instead, use a pair of .local and .comm directives.
This avoids spurious differences between binaries built by the
integrated assembler vs. those built by the external assembler,
since the external assembler may impose alignment requirements
on .lcomm symbols where the integrated assembler does not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168704 91177308-0d34-0410-b5e6-96231b3b80d8
This pass was conservative in that it always reserved the FP to enable dynamic
stack realignment, which allowed the RA to use aligned spills for vector
registers. This happens even when spills were not necessary. The RA has
since been improved to use unaligned spills when necessary.
The new behavior is to realign the stack if the frame pointer was already
reserved for some other reason, but don't reserve the frame pointer just
because a function contains vector virtual registers.
Part of rdar://12719844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168627 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for the FileCheck functionality change which will allow using
a variable later on the same line.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168588 91177308-0d34-0410-b5e6-96231b3b80d8
The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the
instruction definition for. Only whitespace changes in assembly output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168541 91177308-0d34-0410-b5e6-96231b3b80d8
It turned out that ARM wants different layout of type infos.
This is yet another patch in attempt to fix PR7187
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168325 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC the stack pointer is X1, but ADJCALLSTACK writes R1.
Fixes PR14315: Register regmask dependency problem with misched.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168248 91177308-0d34-0410-b5e6-96231b3b80d8
This patch replaces the hard coded GPR pair [R0, R1] of
Intrinsic:arm_ldrexd and [R2, R3] of Intrinsic:arm_strexd with
even/odd GPRPair reg class.
Similar to the lowering of atomic_64 operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168207 91177308-0d34-0410-b5e6-96231b3b80d8
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168086 91177308-0d34-0410-b5e6-96231b3b80d8
PPC64 target. The five tests modified herein test code generation that is
sensitive to the code model selected. So I've added -code-model=small to
the RUN commands for each.
Since small code model is the default, this has no effect for now; but this
prepares us for eventually changing the default to medium code model for PPC64.
Test changes verified with small and medium code model as default on
powerpc64-unknown-linux-gnu. All tests continue to pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167999 91177308-0d34-0410-b5e6-96231b3b80d8
The stack realignment code was fixed to work when there is stack realignment and
a dynamic alloca is present so this shouldn't cause correctness issues anymore.
Note that this also enables generation of AVX instructions for memset
under the assumptions:
- Unaligned loads/stores are always fast on CPUs supporting AVX
- AVX is not slower than SSE
We may need some tweaked heuristics if one of those assumptions turns out not to
be true.
Effectively reverts r58317. Part of PR2962.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167967 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes the definition of negative from -0..-255 to -1..-255. I am changing this because of
a bug that we had in some of the patterns that assumed that "subs" of zero does not set the carry flag.
rdar://12028498
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167963 91177308-0d34-0410-b5e6-96231b3b80d8
Loads from i1 become loads from i8 followed by trunc
Stores to i1 become zext to i8 followed by store to i8
Fixes PR13291
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167948 91177308-0d34-0410-b5e6-96231b3b80d8
eh table and handler data if there are no landing pads in the function.
Patch by Logan Chien with some cleanups from me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167945 91177308-0d34-0410-b5e6-96231b3b80d8
temporarily as it is breaking the gdb bots.
This reverts commit r167806/e7ff4c14b157746b3e0228d2dce9f70712d1c126.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167886 91177308-0d34-0410-b5e6-96231b3b80d8
chain is correctly setup.
As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.
rdar://12684358
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167859 91177308-0d34-0410-b5e6-96231b3b80d8
physical register as candidate for common subexpression elimination
in MachineCSE.
This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc
caused by MachineCSE invalidly merging two separate DYNALLOC insns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167855 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a type 'int a[1]' and a type 'int b[0]', the generated DWARF is the
same for both of them because we use the 'upper_bound' attribute. Instead use
the 'count' attrbute, which gives the correct number of elements in the array.
<rdar://problem/12566646>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167806 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for weak DAG edges to the general scheduling
infrastructure in preparation for MachineScheduler support for
heuristics based on weak edges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167738 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix operand order for atomic sub, where the minuend is the value
loaded from memory and the subtrahend is the parameter specified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167718 91177308-0d34-0410-b5e6-96231b3b80d8
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.
Available CPUs for this target:
sm_10 - Select the sm_10 processor.
sm_11 - Select the sm_11 processor.
sm_12 - Select the sm_12 processor.
sm_13 - Select the sm_13 processor.
sm_20 - Select the sm_20 processor.
sm_21 - Select the sm_21 processor.
sm_30 - Select the sm_30 processor.
sm_35 - Select the sm_35 processor.
Available features for this target:
ptx30 - Use PTX version 3.0.
ptx31 - Use PTX version 3.1.
sm_10 - Target SM 1.0.
sm_11 - Target SM 1.1.
sm_12 - Target SM 1.2.
sm_13 - Target SM 1.3.
sm_20 - Target SM 2.0.
sm_21 - Target SM 2.1.
sm_30 - Target SM 3.0.
sm_35 - Target SM 3.5.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167699 91177308-0d34-0410-b5e6-96231b3b80d8
mov lr, pc
b.w _foo
The "mov" instruction doesn't set bit zero to one, it's putting incorrect
value in lr. It messes up backtraces.
rdar://12663632
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167657 91177308-0d34-0410-b5e6-96231b3b80d8
The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B'
slots. This broke the checks in the assertions.
This fixes PR14302.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167625 91177308-0d34-0410-b5e6-96231b3b80d8
Improve ARM build attribute emission for architectures types.
This also changes the default architecture emitted for a generic CPU to "v7".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167574 91177308-0d34-0410-b5e6-96231b3b80d8
- Add RTM code generation support throught 3 X86 intrinsics:
xbegin()/xend() to start/end a transaction region, and xabort() to abort a
tranaction region
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167573 91177308-0d34-0410-b5e6-96231b3b80d8
misched is disabled by default. With -enable-misched, these heuristics
balance the schedule to simultaneously avoid saturating processor
resources, expose ILP, and minimize register pressure. I've been
analyzing the performance of these heuristics on everything in the
llvm test suite in addition to a few other benchmarks. I would like
each heuristic check to be verified by a unit test, but I'm still
trying to figure out the best way to do that. The heuristics are still
in considerable flux, but as they are refined we should be rigorous
about unit testing the improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167527 91177308-0d34-0410-b5e6-96231b3b80d8
to be extended to a full register. This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.
However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end. This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.
The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.
The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test). Since there's now an implicit
promotion to i64, that data flow no longer works as expected. To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167396 91177308-0d34-0410-b5e6-96231b3b80d8
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167388 91177308-0d34-0410-b5e6-96231b3b80d8
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167386 91177308-0d34-0410-b5e6-96231b3b80d8
The adc/sbb optimization is to able to convert following expression
into a single adc/sbb instruction:
(ult) ... = x + 1 // where the ult is unsigned-less-than comparison
(ult) ... = x - 1
This change is to flip the "x >u y" (i.e. ugt comparison) in order
to expose the adc/sbb opportunity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167180 91177308-0d34-0410-b5e6-96231b3b80d8
parameters. Examples of these are:
struct { } a;
union { } b[256];
int a[0];
An empty aggregate has an address, although dereferencing that address is
pointless. When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area. Passing an empty aggregate by reference passes an address just as
for any other aggregate. Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.
The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value. The handling of return values and by-reference
parameters was already correct.
Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167090 91177308-0d34-0410-b5e6-96231b3b80d8
the first source operand is tied to the destination operand.
This is to accurately model the corresponding instructions where the upper
bits are unmodified.
rdar://12558838
PR14221
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167064 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167015 91177308-0d34-0410-b5e6-96231b3b80d8
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167011 91177308-0d34-0410-b5e6-96231b3b80d8
We will make them delay slot forms if there is something that can be
placed in the delay slot during a separate pass. Mips16 extended instructions
cannot be placed in delay slots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166990 91177308-0d34-0410-b5e6-96231b3b80d8
ELF ABI.
A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter. If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.
Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.
The patch adds a new test case to verify the correct code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166968 91177308-0d34-0410-b5e6-96231b3b80d8
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.
Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166958 91177308-0d34-0410-b5e6-96231b3b80d8
Partial copies can show up even when CoalescerPair.isPartial() returns
false. For example:
%vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31
Such a partial-partial copy is not good enough for the transformation
adjustCopiesBackFrom() needs to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166944 91177308-0d34-0410-b5e6-96231b3b80d8
incorrect instruction sequence due to it not being aware that an
inline assembly instruction may reference memory.
This patch fixes the problem by causing the scheduler to always assume that any
inline assembly code instruction could access memory. This is necessary because
the internal representation of the inline instruction does not include
any information about memory accesses.
This should fix PR13504.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166929 91177308-0d34-0410-b5e6-96231b3b80d8
ELF subtarget.
The existing logic is used as a fallback to avoid any changes to the Darwin
ABI. PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.
I've added a test for PPC64 Linux to verify the 16-byte alignment. If
somebody wants to add a separate test for FreeBSD, that would be great.
Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166928 91177308-0d34-0410-b5e6-96231b3b80d8
Previously mips16 was sharing the pattern addr which is used for mips32
and mips64. This had a number of problems:
1) Storing and loading byte and halfword quantities for mips16 has particular
problems due to the primarily non mips16 nature of SP. When we must
load/store byte/halfword stack objects in a function, we must create a mips16
alias register for SP. This functionality is tested in stchar.ll.
2) We need to have an FP register under certain conditions (such as
dynamically sized alloca). We use mips16 register S0 for this purpose.
In this case, we also use this register when accessing frame objects so this
issue also affects the complex pattern addr16. This functionality is
tested in alloca16.ll.
The Mips16InstrInfo.td has been updated to use addr16 instead of addr.
The complex pattern C++ function for addr has been copied to addr16 and
updated to reflect the above issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166897 91177308-0d34-0410-b5e6-96231b3b80d8
Keep the integer_insertelement test case, the new coalescer can handle
this kind of lane insertion without help from pseudo-instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166835 91177308-0d34-0410-b5e6-96231b3b80d8
Some instructions in ARM require 2 even-odd paired GPRs. This
patch adds support for such register class.
Patch by Weiming Zhao!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166816 91177308-0d34-0410-b5e6-96231b3b80d8
structs having size 3, 5, 6, or 7. Such a struct must be passed and received
as right-justified within its register or memory slot. The problem is only
present for structs that are passed in registers.
Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register. This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot. Essentially I had accidentally
accounted for the right-adjustment twice.
In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.
The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this. I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.
I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions. I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166680 91177308-0d34-0410-b5e6-96231b3b80d8
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.
rdar://12559385
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166613 91177308-0d34-0410-b5e6-96231b3b80d8
- If more than 1 elemennts are defined and target supports the vectorized
conversion, use the vectorized one instead to reduce the strength on
conversion operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166546 91177308-0d34-0410-b5e6-96231b3b80d8
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
v2f32 is added to improve the efficiency of the code generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166545 91177308-0d34-0410-b5e6-96231b3b80d8
the difference from "int x" (which should go in registers and
"struct y {int x;}" (which should not).
Clang will be updated in the next patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166536 91177308-0d34-0410-b5e6-96231b3b80d8
- Check index being extracted to be constant 0 before simplfiying.
Otherwise, retain the original sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166504 91177308-0d34-0410-b5e6-96231b3b80d8
The CFG of the machine function needs to know that the targets of the indirect
branch are successors to the indirect branch.
<rdar://problem/12529625>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166448 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, it is enabled only if option "enable-mips-tail-calls" is given and
all of the callee's arguments are passed in registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166342 91177308-0d34-0410-b5e6-96231b3b80d8
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().
The X86 backend is already able to handle debugtrap(). This patch is to:
1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
make the __builtin_debugtrap() "available" to all existing ports without the hassle of
changing their code.
3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
__builtin_trap() will be expanded into the function call of the specified trap function.
This behavior may need change in the future.
The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166300 91177308-0d34-0410-b5e6-96231b3b80d8
- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
(so far 1) elements being inserted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166288 91177308-0d34-0410-b5e6-96231b3b80d8
Removed extra stack frame object for fixed byval arguments,
VarArgsStyleRegisters invocation was reworked due to some improper usage in
past. PR14099 also demonstrates it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166273 91177308-0d34-0410-b5e6-96231b3b80d8
When merging stack slots, if StackColoring::remapInstructions gets a
value back from GetUnderlyingObject that it does not know about or is
not itself a stack slot, clear the memory operand in case it aliases
the merged slot. This prevents the introduction of incorrect aliasing
information.
Author: Matthew Curtis <mcurtis@codeaurora.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166216 91177308-0d34-0410-b5e6-96231b3b80d8
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.
The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result. The generated code sequence is equivalent to what GCC would generate.
When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166178 91177308-0d34-0410-b5e6-96231b3b80d8
- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid
when '...' are all 'undef's.
- r166125 relies on this transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166155 91177308-0d34-0410-b5e6-96231b3b80d8
- If the extracted vector has the same type of all vectored being concatenated
together, it should be simplified directly into v_i, where i is the index of
the element being extracted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166125 91177308-0d34-0410-b5e6-96231b3b80d8
- MBB address is only valid as an immediate value in Small & Static
code/relocation models. On other models, LEA is needed to load IP address of
the restore MBB.
- A minor fix of MBB in MC lowering is added as well to enable target
relocation flag being propagated into MC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166084 91177308-0d34-0410-b5e6-96231b3b80d8
PR14098 contains an example where we would rematerialize a MOV8ri
immediately after the original instruction:
%vreg7:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7
%vreg22:sub_8bit<def> = MOV8ri 9; GR32_ABCD:%vreg7
Besides being pointless, it is also wrong since the original instruction
only redefines part of the register, and the value read by the new
instruction is wrong.
The problem was the LiveRangeEdit::allUsesAvailableAt() didn't
special-case OrigIdx == UseIdx and found the wrong SSA value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166068 91177308-0d34-0410-b5e6-96231b3b80d8
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them. A previous patch
addressed this for aggregates passed in registers. However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.
The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side. The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code. There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.
On the caller side, again a constant left over from 32-bit code is
fixed. Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only. The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.
The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes. Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.
As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area: Previously they were stored left-justified, and now are
properly stored right-justified. This requires changing the expected
output of existing test case structsinregs.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166022 91177308-0d34-0410-b5e6-96231b3b80d8
Stack is formed improperly for long structures passed as byval arguments for
EABI mode.
If we took AAPCS reference, we can found the next statements:
A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).
B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).
So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.
Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.
Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.
P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166018 91177308-0d34-0410-b5e6-96231b3b80d8
Original message:
The attached is the fix to radar://11663049. The optimization can be outlined by following rules:
(select (x != c), e, c) -> select (x != c), e, x),
(select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.
The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.
While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.
The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".
Original message since r165661:
My previous change has a bug: I negated the condition code of a CMOV, and go ahead creating a new CMOV using the *ORIGINAL* condition code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166017 91177308-0d34-0410-b5e6-96231b3b80d8
This is a medium term workaround until we have a more robust solution
in the form of a register liveness utility for postRA passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166001 91177308-0d34-0410-b5e6-96231b3b80d8
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
used as a light-weight replacement of setjmp/longjmp which are used to
implementation continuation, user-level threading, and etc. The support added
in this patch ONLY addresses this usage and is NOT intended to support SjLj
exception handling as zero-cost DWARF exception handling is used by default
in X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165989 91177308-0d34-0410-b5e6-96231b3b80d8
The new coalescer can merge a dead def into an unused lane of an
otherwise live vector register.
Clear the <dead> flag when that happens since the flag refers to the
full virtual register which is still live after the partial dead def.
This fixes PR14079.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165877 91177308-0d34-0410-b5e6-96231b3b80d8
It is possible that the live range of the value being pruned loops back
into the kill MBB where the search started. When that happens, make sure
that the beginning of KillMBB is also pruned.
Instead of starting a DFS at KillMBB and skipping the root of the
search, start a DFS at each KillMBB successor, and allow the search to
loop back to KillMBB.
This fixes PR14078.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165872 91177308-0d34-0410-b5e6-96231b3b80d8
X86 doesn't have i8 cmovs so isel would emit a branch. Emitting branches at this
level is often not a good idea because it's too late for many optimizations to
kick in. This solution doesn't add any extensions (truncs are free) and tries
to avoid introducing partial register stalls by filtering direct copyfromregs.
I'm seeing a ~10% speedup on reading a random .png file with libpng15 via
graphicsmagick on x86_64/westmere, but YMMV depending on the microarchitecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165868 91177308-0d34-0410-b5e6-96231b3b80d8
local frame causes problem.
For example:
void f(StructToPass s) {
g(&s, sizeof(s));
}
will cause problem with tail-call since part of s is passed via registers and
saved in f's local frame. When g tries to access s, part of s may be corrupted
since f's local frame is popped out before the tail-call.
The current fix is to disable tail-call if getVarArgsRegSaveSize is not 0 for
the caller. This is a conservative approach, if we can prove the address of
s or part of s is not taken and passed to g, it should be okay to perform
tail-call.
rdar://12442472
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165853 91177308-0d34-0410-b5e6-96231b3b80d8
Completely update one interval at a time instead of collecting live
range fragments to be updated. This avoids building data structures,
except for a single SmallPtrSet of updated intervals.
Also share code between handleMove() and handleMoveIntoBundle().
Add support for moving dead defs across other live values in the
interval. The MI scheduler can do that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165824 91177308-0d34-0410-b5e6-96231b3b80d8
PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all
PHI predecessors have a live-out value. These IMPLICIT_DEF values are
not considered to be real interference when coalescing virtual
registers:
%vreg1 = IMPLICIT_DEF
%vreg2 = MOV32r0
When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its
value number should simply be erased since the %vreg2 value number now
provides a live-out value for the PHI predecesor block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165813 91177308-0d34-0410-b5e6-96231b3b80d8
On PowerPC, a bitcast of <16 x i8> to i128 may run through a code
path in ExpandRes_BITCAST that attempts to do an intermediate
bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts
of the resulting i128 by pairing up two of those i32 vector elements
each. The code already recognizes that on a big-endian system, the
first two vector elements form the Hi part, and the final two vector
elements form the Lo part (vice-versa from the little-endian situation).
However, we also need to take endianness into account when forming each
of those separate pairs: on a big-endian system, vector element 0 is
the *high* part of the pair making up the Hi part of the result, and
vector element 1 is the low part of the pair. The code currently always
uses vector element 0 as the low part and vector element 1 as the high
part, as is appropriate for little-endian platforms only.
This patch fixes this by swapping the vector elements as they are
paired up as appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165802 91177308-0d34-0410-b5e6-96231b3b80d8
not legal. However, it should use a div instruction + mul + sub if divide is
legal. The rem legalization code was missing a check and incorrectly uses a
divrem libcall even when div is legal.
rdar://12481395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165778 91177308-0d34-0410-b5e6-96231b3b80d8
Not all instructions define a virtual register in their first operand.
Specifically, INLINEASM has a different format.
<rdar://problem/12472811>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165721 91177308-0d34-0410-b5e6-96231b3b80d8
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter. The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area. A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.
Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot. This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.
This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165714 91177308-0d34-0410-b5e6-96231b3b80d8
Original message:
The attached is the fix to radar://11663049. The optimization can be outlined by following rules:
(select (x != c), e, c) -> select (x != c), e, x),
(select (x == c), c, e) -> select (x == c), x, e)
where the <c> is an integer constant.
The reason for this change is that : on x86, conditional-move-from-constant needs two instructions;
however, conditional-move-from-register need only one instruction.
While the LowerSELECT() sounds to be the most convenient place for this optimization, it turns out to be a bad place. The reason is that by replacing the constant <c> with a symbolic value, it obscure some instruction-combining opportunities which would otherwise be very easy to spot. For that reason, I have to postpone the change to last instruction-combining phase.
The change passes the test of "make check-all -C <build-root/test" and "make -C project/test-suite/SingleSource".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165661 91177308-0d34-0410-b5e6-96231b3b80d8
the compiler makes use of GPR0. However, there are two flavors of
GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0). The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.
This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165658 91177308-0d34-0410-b5e6-96231b3b80d8
the Altivec extensions were introduced. Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state. Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems. It seems best to avoid this logic within LLVM as well.
This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit). The code remains in place for the
Darwin ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165656 91177308-0d34-0410-b5e6-96231b3b80d8
- Due to the current matching vector elements constraints in
ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from
v2f32) is scalarized. Add a customized v2f32 widening to convert it
into a target-specific X86ISD::VFPROUND to work around this
constraints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165631 91177308-0d34-0410-b5e6-96231b3b80d8
SDNode for LDRB_POST_IMM is invalid: number of registers added to SDNode fewer
that described in .td.
7 ops is needed, but SDNode with only 6 is created.
In more details:
In ARMInstrInfo.td, in multiclass AI2_ldridx, in definition _POST_IMM, offset
operand is defined as am2offset_imm. am2offset_imm is complex parameter type,
and actually it consists from dummy register and imm itself. As I understood
trick with dummy reg was made for AsmParser. In ARMISelLowering.cpp, this dummy
register was not added to SDNode, and it cause crash in Peephole Optimizer pass.
The problem fixed by setting up additional dummy reg when emitting
LDRB_POST_IMM instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165617 91177308-0d34-0410-b5e6-96231b3b80d8
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.
Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.
Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165616 91177308-0d34-0410-b5e6-96231b3b80d8
When the CFG contains a loop with multiple entry blocks, the traces
computed by MachineTraceMetrics don't always have the same nice
properties. Loop back-edges are normally excluded from traces, but
MachineLoopInfo doesn't recognize loops with multiple entry blocks, so
those back-edges may be included.
Avoid asserting when that happens by adding an isEarlierInSameTrace()
function that accurately determines if a dominating block is part of the
same trace AND is above the currrent block in the trace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165434 91177308-0d34-0410-b5e6-96231b3b80d8
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.
This patch changes the behavior by just returning a DAG with the
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.
It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165419 91177308-0d34-0410-b5e6-96231b3b80d8
Make sure functions located in user specified text sections (via the
section attribute) are located together with the default text sections.
Otherwise, for large object files, the relocations for call instructions
are more likely to be out of range. This becomes even more likely in the
presence of LTO.
rdar://12402636
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165254 91177308-0d34-0410-b5e6-96231b3b80d8
in the Intel syntax.
The MC layer supports emitting in the Intel syntax, but this would require the
inline assembly MachineInstr to be lowered to an MCInst before emission. This
is potential future work, but for now emitting directly from the MachineInstr
suffices.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165173 91177308-0d34-0410-b5e6-96231b3b80d8
multiple stores with a single load. We create the wide loads and stores (and their chains)
before we remove the scalar loads and stores and fix the DAG chain. We attempted to merge
loads with a different chain. When that happened, the assumption that it is safe to RAUW
broke and a cycle was introduced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165148 91177308-0d34-0410-b5e6-96231b3b80d8
is not profitable in many cases because modern processors perform multiple stores
in parallel and merging stores prior to merging requires extra work. We handle two main cases:
1. Store of multiple consecutive constants:
q->a = 3;
q->4 = 5;
In this case we store a single legal wide integer.
2. Store of multiple consecutive loads:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
In this case we load/store either ilegal vector registers or legal wide integer registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165125 91177308-0d34-0410-b5e6-96231b3b80d8
Enable the pass by default for targets that request it, and change the
-enable-early-ifcvt to the opposite -disable-early-ifcvt.
There are still some x86 regressions when enabling early if-conversion
because of the missing machine models. Disable the pass for x86 until
machine models are added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165075 91177308-0d34-0410-b5e6-96231b3b80d8
X86DAGToDAGISel::PreprocessISelDAG(), isel is moving load inside
callseq_start / callseq_end so it can be folded into a call. This can
create a cycle in the DAG when the call is glued to a copytoreg. We
have been lucky this hasn't caused too many issues because the pre-ra
scheduler has special handling of call sequences. However, it has
caused a crash in a specific tailcall case.
rdar://12393897
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165072 91177308-0d34-0410-b5e6-96231b3b80d8
JoinVals::pruneValues() calls LIS->pruneValue() to avoid conflicts when
overlapping two different values. This produces a set of live range end
points that are used to reconstruct the live range (with SSA update)
after joining the two registers.
When a value is pruned twice, the set of end points was insufficient:
v1 = DEF
v1 = REPLACE1
v1 = REPLACE2
KILL v1
The end point at KILL would only reconstruct the live range from
REPLACE2 to KILL, leaving the range REPLACE1-REPLACE2 dead.
Add REPLACE2 as an end point in this case so the full live range is
reconstructed.
This fixes PR13999.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165056 91177308-0d34-0410-b5e6-96231b3b80d8