This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209637 91177308-0d34-0410-b5e6-96231b3b80d8
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209636 91177308-0d34-0410-b5e6-96231b3b80d8
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209618 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc"
@foo = global [6 x i8] c"\B8*\00\00\00\C3", section ".text", align 16
@f = dllexport alias i32 (), [6 x i8]* @foo
!llvm.module.flags = !{!0}
!0 = metadata !{i32 6, metadata !"Linker Options", metadata !1}
!1 = metadata !{metadata !2, metadata !3}
!2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"}
!3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209600 91177308-0d34-0410-b5e6-96231b3b80d8
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209576 91177308-0d34-0410-b5e6-96231b3b80d8
After the load/store refactoring, we were sometimes trying to feed a
GPR64 into a 32-bit register offset operand. This failed in
copyPhysReg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209566 91177308-0d34-0410-b5e6-96231b3b80d8
This matches both what we do for the non-thread case and what gcc does.
With this patch clang would match gcc's behaviour in
static __thread int a = 42;
extern __thread int b __attribute__((alias("a")));
int *f(void) { return &a; }
int *g(void) { return &b; }
if not for pr19843. Manually writing the IL does produce the same access modes.
It is also a step in the direction of fixing pr19844.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209543 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instead the system is required to provide some means of handling unaligned
load/store without special instructions. Options include full hardware
support, full trap-and-emulate, and hybrids such as hardware support within
a cache line and trap-and-emulate for multi-line accesses.
MipsSETargetLowering::allowsUnalignedMemoryAccesses() has been configured to
assume that unaligned accesses are 'fast' on the basis that I expect few
hardware implementations will opt for pure-software handling of unaligned
accesses. The ones that do handle it purely in software can override this.
mips64-load-store-left-right.ll has been merged into load-store-left-right.ll
The stricter testing revealed a Bits!=Bytes bug in passByValArg(). This has
been fixed and the variables renamed to clarify the units they hold.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3872
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209512 91177308-0d34-0410-b5e6-96231b3b80d8
This allows existing DAG combines to work on them, and then
we can re-match to BFE if necessary during instruction selection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209462 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).
This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209451 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
* Split into two functions, one to test each struct.
* R0 and R2 must be defined by an lw with a %got reference to the correct
symbol.
* Test for $4 (first argument) where appropriate instead of accepting any
register.
* Test that the two lbu's are correctly combined into $4
Depends on D3844
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3845
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209424 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
lwl and lwr are not available in MIPS32r6/MIPS64r6. The purpose of the test
is to check that the '$1' expands to '0($x)' rather than to test something related
to the lwl or lwr instructions so we can simply switch to lw.
Depends on D3842
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209423 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch is necessary so that they do not fail on MIPS32r6/MIPS64r6 when
-integrated-as is enabled by default and we correctly detect the host CPU.
No functional change since these tests are testing the behaviour of the
constraint used for the third operand rather than the mnemonic.
Depends on D3842
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3843
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209421 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU). This permits the operating system to implement custom behaviour on
this trap. The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend. This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209390 91177308-0d34-0410-b5e6-96231b3b80d8
Committed in r209178 then reverted in r209251 due to LTO breakage,
here's a proper fix for the case of the missing subprogram DIE. The DIEs
were there, just in other compile units. Using the SPMap we can find the
right compile unit to search for and produce cross-unit references to
describe this kind of inlining.
One existing test case needed to be updated because it had a function
that wasn't in the CU's subprogram list, so it didn't appear in the
SPMap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209335 91177308-0d34-0410-b5e6-96231b3b80d8
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.
The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.
Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).
<rdar://problem/16975435>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209324 91177308-0d34-0410-b5e6-96231b3b80d8
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle. This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself. So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed. When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.
Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted. This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.
The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads. The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209267 91177308-0d34-0410-b5e6-96231b3b80d8
Povray and dealII currently assert with "Overran sorted position" in
AssignTopologicalOrder. The problem is that performPostLD1Combine can
introduce cycles.
Consider:
(insert_vector_elt (INSERT_SUBREG undef,
(load (add %vreg0, Constant<8>), undef), <= A
TargetConstant<2>),
(load %vreg0, undef), <= B
Constant<1>)
This is turned into a LD1LANEpost node. However the address in A is not a
valid user of the post-incremented address of B in LD1LANEpost.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209242 91177308-0d34-0410-b5e6-96231b3b80d8
make the functions to set them non-static.
Move and rename the llvm specific backend options to avoid conflicting
with the clang option.
Paired with a backend commit to update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209238 91177308-0d34-0410-b5e6-96231b3b80d8
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.
<rdar://problem/16731508>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209230 91177308-0d34-0410-b5e6-96231b3b80d8
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend. We matched an immediate offset with STWX8 even though it only
supports register offset.
The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.
Many thanks to Bill Schmidt for testing this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209219 91177308-0d34-0410-b5e6-96231b3b80d8
Since we visit the whole list of subprograms for each CU at module
start, this is clearly true - don't test for the case, just assert it.
A few old test cases seemed to have incomplete subprogram lists, but any
attempt to reproduce them shows full subprogram lists that even include
entities that have been completely inlined and the out of line
definition removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209178 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always
provide the operand size as output if the input operand is zero.
We can take advantage of this knowledge during instruction selection
stage in order to simplify a few corner case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209159 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.
Also added tests for SSE4.1, AVX, and AVX2.
Reviewers: delena, nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209156 91177308-0d34-0410-b5e6-96231b3b80d8