We switch the order of offset and field type to make TBAAStructType node
(name, parent node, offset) similar to scalar TBAA node (name, parent node).
TypeIsImmutable is added to TBAAStructTag node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180654 91177308-0d34-0410-b5e6-96231b3b80d8
Clarify documentation and API to make the difference between register and
register-indirect addressed locations more explicit. Put in a comment
to point out that with the current implementation we cannot specify
a register-indirect location with offset 0 (a breg 0 in DWARF).
No functionality change intended.
rdar://problem/13658587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180641 91177308-0d34-0410-b5e6-96231b3b80d8
TLVs probably won't be as common as the other types of variables. Check for them
last before defaulting to "DATA".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180631 91177308-0d34-0410-b5e6-96231b3b80d8
For Mach-O there were 2 implementations for parsing object files. A
standalone llvm/Object/MachOObject.h and llvm/Object/MachO.h which
implements the generic interface in llvm/Object/ObjectFile.h.
This patch adds the missing features to MachO.h, moves macho-dump to
use MachO.h and removes ObjectFile.h.
In addition to making sure that check-all is clean, I checked that the
new version produces exactly the same output in all Mach-O files in a
llvm+clang build directory (including executables and shared
libraries).
To test the performance, I ran macho-dump over all the files in a
llvm+clang build directory again, but this time redirecting the output
to /dev/null. Both the old and new versions take about 4.6 seconds
(2.5 user) to finish.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180624 91177308-0d34-0410-b5e6-96231b3b80d8
We need to intialize this to something and since clang does not set
the shader type attribute and clang is used only for compute shaders,
initializing it to COMPUTE seems like the best choice.
Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180620 91177308-0d34-0410-b5e6-96231b3b80d8
"hint" space for Thumb actually overlaps the encoding space of the CPS
instruction. In actuality, hints can be defined as CPS instructions where imod
and M bits are all nil.
Handle decoding of permitted nop-compatible hints (i.e. nop, yield, wfi, wfe,
sev) in DecodeT2CPSInstruction.
This commit adds a proper diagnostic message for Imm0_4 and updates all tests.
Patch by Mihail Popa <Mihail.Popa@arm.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180617 91177308-0d34-0410-b5e6-96231b3b80d8
Since we can't guarantee that the original dbg.declare instrinsic
is removed by LowerDbgDeclare(), we need to make sure that we are
not inserting the same dbg.value intrinsic over and over.
This removes tons of redundant DIEs when compiling optimized code.
rdar://problem/13056109
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180615 91177308-0d34-0410-b5e6-96231b3b80d8
In the default PowerPC assembler syntax, registers are specified simply
by number, so they cannot be distinguished from immediate values (without
looking at the opcode). This means that the default operand matching logic
for the asm parser does not work, and we need to specify custom matchers.
Since those can only be specified with RegisterOperand classes and not
directly on the RegisterClass, all instructions patterns used by the asm
parser need to use a RegisterOperand (instead of a RegisterClass) for
all their register operands.
This patch adds one RegisterOperand for each RegisterClass, using the
same name as the class, just in lower case, and updates all instruction
patterns to use RegisterOperand instead of RegisterClass operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180611 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes).
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180608 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong sub-opcodes). Note that apparently
the compiler currently never generates pre-inc instructions
for floating point types for some reason ...
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180607 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).
Tests will be added together with the asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180606 91177308-0d34-0410-b5e6-96231b3b80d8
When testing the asm parser, I ran into an error when using a conditional
branch to an external symbol (this doesn't occur in compiler-generated
code) due to missing support in PPCELFObjectWriter::getRelocTypeInner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180605 91177308-0d34-0410-b5e6-96231b3b80d8
This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180597 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r180222.
I think this might tie in with a different problem which will require a
different approach potentially. I am reverting this in the case I need to go
down that second path.
My apologies for the noise. = /.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180590 91177308-0d34-0410-b5e6-96231b3b80d8
Mips have delayslots for certain instructions
like jumps and branches. These are instructions
that follow the branch or jump and are executed
before the jump or branch is completed.
Early Mips compilers could not cope with delayslots
and left them up to the assembler. The assembler would
fill the delayslots with the appropriate instruction,
usually just a nop to allow correct runtime behavior.
The default behavior for this is set with .set reorder.
To tell the assembler that you don't want it to mess with
the delayslot one used .set noreorder.
For backwards compatibility we need to support
.set reorder and have it be the default behavior in the
assembler.
Our support for it is to insert a NOP directly after an
instruction with a delayslot when in .set reorder mode.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180584 91177308-0d34-0410-b5e6-96231b3b80d8
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180573 91177308-0d34-0410-b5e6-96231b3b80d8
Since the relocation iterator walks only the relocations in one section, we
can just use a pointer and avoid fetching information about the section at
every reference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180262 91177308-0d34-0410-b5e6-96231b3b80d8
getRelocationAddress is for dynamic libraries and executables,
getRelocationOffset for relocatable objects.
Mark the getRelocationAddress of COFF and MachO as not implemented yet. Add a
test of ELF's. llvm-readobj -r now prints the same values as readelf -r.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180259 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR15838. Need to check for blocks with nothing but dbg.value.
I'm not sure how to force this situation with a unit test. I tried to
reduce the test case in PR15838 (1k lines of metadata) but gave up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180227 91177308-0d34-0410-b5e6-96231b3b80d8
Due to the semantics of ARC, we must be extremely conservative with autorelease
calls inserted by the frontend since ARC gaurantees that said object will be in
the autorelease pool after that point, an optimization invariant that the
optimizer must respect.
On the other hand, we are allowed significantly more flexibility with
autoreleaseRV instructions.
Often times though this flexibility is disrupted by early transformations which
transform objc_autoreleaseRV => objc_autorelease if said instruction is no
longer being used as part of an RV pair (generally due to inlining). Since we
can not tell the difference in between an autorelease put into place by the
frontend and one created through said ``strength reduction'' we can not perform
these optimizations.
The addition of this set gets around said issues by allowing us to differentiate
in between said two cases.
rdar://problem/13697741.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180222 91177308-0d34-0410-b5e6-96231b3b80d8
While here, don't report a dummy symbol for relocations that don't have symbols.
We used to says such relocations were for the first defined symbol, but now we
return end_symbols(). The llvm-readobj output change agrees with otool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180214 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.
Patch by Daisuke Takahashi!
Fixes PR15758.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180196 91177308-0d34-0410-b5e6-96231b3b80d8
This should bring the ppc bots back. I will try to write a test that would
have found the problem on a little endian system too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180194 91177308-0d34-0410-b5e6-96231b3b80d8
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.
The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180193 91177308-0d34-0410-b5e6-96231b3b80d8
When MachineScheduler is enabled, this functionality can be
removed. Until then, provide a way to disable it for test cases and
designing MachineScheduler heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180192 91177308-0d34-0410-b5e6-96231b3b80d8
Since the relocation iterator walks only the relocations in one section, we
can just use a pointer and avoid fetching information about the section at
every reference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180189 91177308-0d34-0410-b5e6-96231b3b80d8
I know what would be cool! We should align the compact unwind section because
aligned data access is faster.
<rdar://problem/13723271>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180171 91177308-0d34-0410-b5e6-96231b3b80d8
debug location. This solves a problem where range of an inlined
subroutine is emitted wrongly.
Patch by Manman Ren.
Fixes rdar://problem/12415623
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180140 91177308-0d34-0410-b5e6-96231b3b80d8
This exposed an issue with PowerPC AltiVec where it appears it was setting the wrong vector boolean contents. The included change
fixes the PowerPC tests, and was OK'd by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180129 91177308-0d34-0410-b5e6-96231b3b80d8
1) Disallow 'returned' on parameter that is also 'sret' (no sensible semantics, as far as I can tell).
2) Conservatively disallow tail calls through 'returned' parameters that also are 'zext' or 'sext' (for consistency with treatment of other zero-extending and sign-extending operations in tail call position detection...can be revised later to handle situations that can be determined to be safe).
This is a new attribute that is not yet used, so there is no impact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180118 91177308-0d34-0410-b5e6-96231b3b80d8
This makes llvm-dwarfdump and llvm-symbolizer understand
debug info sections compressed by ld.gold linker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180088 91177308-0d34-0410-b5e6-96231b3b80d8
even if erroneously annotated with the parallel loop metadata.
Fixes Bug 15794:
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180081 91177308-0d34-0410-b5e6-96231b3b80d8
AArch64 always demands a register-scavenger, so the pointer should never be
NULL. However, in the spirit of paranoia, we'll assert it before use just in
case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180080 91177308-0d34-0410-b5e6-96231b3b80d8
The tag is of type TBAANode when flag EnableStructPathTBAA is off.
Move implementation of MDNode::getMostGenericTBAA to TypeBasedAliasAnalysis.cpp
since it depends on how to interprete the MDNodes for scalar TBAA and
struct-path aware TBAA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180068 91177308-0d34-0410-b5e6-96231b3b80d8
now taken care of by the frontend, which allows us to parse arbitrary C/C++
variables.
Part of rdar://13663589
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180037 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is http://llvm.org/PR15802. Backslashes preceding double quotes in
arguments must be escaped. The interesting bit is that all other
backslashes should *not* be escaped, because the un-escaping logic is
only triggered by the presence of a double quote character.
Reviewers: Bigcheese
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D705
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180035 91177308-0d34-0410-b5e6-96231b3b80d8
Also add a check for llvm.used in the verifier and simplify clients now that
they can assume they have a ConstantArray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180019 91177308-0d34-0410-b5e6-96231b3b80d8
-- C.4 and C.5 statements, when NSAA is not equal to SP.
-- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
variadic procedure.
Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
No exceptions, no CPRCs here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180011 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179996 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
This is an edge case that can happen if we modify a chain of multiple selects.
Update all operands in that case and remove the assert. PR15805.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179982 91177308-0d34-0410-b5e6-96231b3b80d8
There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.
This reverts commit r179957. Original commit message:
"SimplifyCFG: If convert single conditional stores
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179980 91177308-0d34-0410-b5e6-96231b3b80d8
This will make it clearer when we are actually resetting a sequence's progress
vs just changing state. This is an important distinction because the former case
clears any pointers that we are tracking while the later does not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179963 91177308-0d34-0410-b5e6-96231b3b80d8
With a little help from the frontend, it looks like the standard va_*
intrinsics can do the job.
Also clean up an old bitcast hack in LowerVAARG that dealt with
unaligned double loads. Load SDNodes can specify an alignment now.
Still missing: Calling varargs functions with float arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179961 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179957 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, when spilling 64-bit paired registers, an LDMIA with both
a FrameIndex and an offset was produced. This kind of instruction
shouldn't exist, and the extra operand was being confused with the
predicate, causing aborts later on.
This removes the invalid 0-offset from the instruction being
produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179956 91177308-0d34-0410-b5e6-96231b3b80d8
I think it's almost impossible to fold atomic fences profitably under
LLVM/C++11 semantics. As a result, this is now unused and just
cluttering up the target interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179940 91177308-0d34-0410-b5e6-96231b3b80d8
The getSwappedPredicate function can be used in other places (such as in
improvements to the PPCCTRLoops pass). Instead of trapping it as a static
function in PPCInstrInfo, move it into PPCPredicates with other
predicate-related things.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179926 91177308-0d34-0410-b5e6-96231b3b80d8
The logic that actually compares the types considers pointers and integers the
same if they are of the same size. This created a strange mismatch between hash
and reality and made the test case for this fail on some platforms (yay,
test cases).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179905 91177308-0d34-0410-b5e6-96231b3b80d8
trying to move as much FastISel logic as possible out of the main path in
SelectionDAGISel - intermixing them just adds confusion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179902 91177308-0d34-0410-b5e6-96231b3b80d8
When matching a compare with a subtract where the arguments of the compare are
swapped w.r.t. the arguments of the subtract, we need to negate the predicates
(or CR bit indices) of the users. This, however, is not the same as inverting
the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM
backend seems to do this correctly, but when I adapted the code for the PPC
backend, I introduced an error in this logic.
Comparison optimization is now enabled again by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179899 91177308-0d34-0410-b5e6-96231b3b80d8
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179886 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for recoded (meaning assembly-language compatible to
standard mips32) arithmetic 32-bit instructions.
Patch by Zoran Jovanovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179873 91177308-0d34-0410-b5e6-96231b3b80d8
Thanks to Evgeniy Stepanov for reporting this.
It might be a good idea to add a command iterator abstraction to MachO.h, but
this fixes the bug for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179848 91177308-0d34-0410-b5e6-96231b3b80d8
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179836 91177308-0d34-0410-b5e6-96231b3b80d8
InstFlag has a default value of 0 and will simplify the VOP3 patterns.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179829 91177308-0d34-0410-b5e6-96231b3b80d8
If the return type is a pointer and the call returns an integer, then do the
inttoptr convertions. And vice versa.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179817 91177308-0d34-0410-b5e6-96231b3b80d8
This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do
I'm disabling this for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179807 91177308-0d34-0410-b5e6-96231b3b80d8
variant/dialect. Addresses a FIXME in the emitMnemonicAliases function.
Use and test case to come shortly.
rdar://13688439 and part of PR13340.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179804 91177308-0d34-0410-b5e6-96231b3b80d8
Many PPC instructions have a so-called 'record form' which stores to a specific
condition register the result of comparing the result of the instruction with
zero (always as a signed comparison). For integer operations on PPC64, this is
always a 64-bit comparison.
This implementation is derived from the implementation in the ARM backend;
there are some differences because PPC condition registers are allocatable
virtual registers (although the record forms always use a specific one), and we
look for a matching subtraction instruction after the compare (but before the
first use) in addition to before it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179802 91177308-0d34-0410-b5e6-96231b3b80d8
Semantics of parameters named Index and Idx were inconsistent between
"include/llvm/IR/Attributes.h", "lib/IR/AttributeImpl.h" and
"lib/IR/Attributes.cpp": sometimes these were fixed 1-based indexes of IR
parameters (or AttributeSet::ReturnIndex for IR return values or
AttributeSet::FunctionIndex for IR functions), other times they were the
internal slot for storage in the underlying AttributeSetImpl. I renamed usage of
the former to "Index" and usage of the latter to "Slot" ("Slot" was already
being used consistently for the latter in a subset of cases)
Patch by Stephen Lin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179791 91177308-0d34-0410-b5e6-96231b3b80d8
1. Verify::VerifyParameterAttrs in "lib/IR/Verifier.cpp" and
AttrBuilder::removeFunctionOnlyAttrs in "lib/IR/Attributes.cpp" (only called
by Verify::VerifyFunctionAttrs) separately maintained a list of function-only
attribute types. I've consolidated the logic into a new function used for
both cases in "lib/IR/Verifier.cpp", so this logic is in one place (other
than the AsmParser front-end)
2. Various functions in "lib/IR/Verifier.cpp" passed AttributeSet around by
reference needlessly, as it's just a handle to an immutable pimpl body.
Patch by Stephen Lin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179790 91177308-0d34-0410-b5e6-96231b3b80d8
In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and
handled by the DAG-based ISel. However, X86FastISel::X86SelectLoad() makes
no such requirement. There doesn't appear to be an x86 architectural
correctness issue with allowing potentially unaligned store instructions.
This patch removes this restriction.
Patch by Jim Stichnot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179774 91177308-0d34-0410-b5e6-96231b3b80d8
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.
This appears to help bzip2 by about 1.5% on an imac12,2.
radar://12960601
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179773 91177308-0d34-0410-b5e6-96231b3b80d8
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:
objc_retain(%a)
for (...) {
objc_retain(%a)
%block <- %a
F(%block)
objc_release(%block)
}
objc_release(%a)
From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.
This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.
Additionally a test case is included.
rdar://12969722
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179747 91177308-0d34-0410-b5e6-96231b3b80d8
It's sometimes beneficial to emit a testcase with the old style attribute
syntax. Allow someone to do this.
<rdar://problem/13563209>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179735 91177308-0d34-0410-b5e6-96231b3b80d8
* We only ever specialize these templates with an instantiation of ELFType,
so we don't need a template template.
* Replace LLVM_ELF_COMMA with just passing the individual parameters to the
macro. This requires a second macro for when we only have ELFT, but that
is still a small win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179726 91177308-0d34-0410-b5e6-96231b3b80d8
unable to handle cases such as __asm mov eax, 8*-8.
This patch also attempts to simplify the state machine. Further, the error
reporting has been improved. Test cases included, but more will be added to
the clang side shortly.
rdar://13668445
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179719 91177308-0d34-0410-b5e6-96231b3b80d8
for the sdiv/srem/udiv/urem bitcode instructions. This is done for the i8,
i16, and i32 types, as well as i64 for the x86_64 target.
Patch by Jim Stichnoth
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179715 91177308-0d34-0410-b5e6-96231b3b80d8
PR15000 has a testcase where the time to compile was bordering on 30s. When I
dropped the limit value to 100, it became a much more managable 6s. The compile
time seems to increase in a roughly linear fashion based on increasing the limit
value. (See the runtimes below.)
So, let's lower the limit to 100 so that they can get a more reasonable compile
time.
Limit Value Time
----------- ----
10 0.9744s
20 1.8035s
30 2.3618s
40 2.9814s
50 3.6988s
60 4.5486s
70 4.9314s
80 5.8012s
90 6.4246s
100 7.0852s
110 7.6634s
120 8.3553s
130 9.0552s
140 9.6820s
150 9.8804s
160 10.8901s
170 10.9855s
180 12.0114s
190 12.6816s
200 13.2754s
210 13.9942s
220 13.8097s
230 14.3272s
240 15.7753s
250 15.6673s
260 16.0541s
270 16.7625s
280 17.3823s
290 18.8213s
300 18.6120s
310 20.0333s
320 19.5165s
330 20.2505s
340 20.7068s
350 21.1833s
360 22.9216s
370 22.2152s
380 23.9390s
390 23.4609s
400 24.0426s
410 24.6410s
420 26.5208s
430 27.7155s
440 26.4142s
450 28.5646s
460 27.3494s
470 29.7255s
480 29.4646s
490 30.5001s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179713 91177308-0d34-0410-b5e6-96231b3b80d8
The reference manual defines only 5 permitted values for the immediate field of the "hint" instruction:
1. nop (imm == 0)
2. yield (imm == 1)
3. wfe (imm == 2)
4. wfi (imm == 3)
5. sev (imm == 4)
Therefore, restrict the permitted values for the "hint" instruction to 0 through 4.
Patch by Mihail Popa <Mihail.Popa@arm.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179707 91177308-0d34-0410-b5e6-96231b3b80d8
GCC complains: Core.cpp:1449:27: warning: overflow in implicit constant conversion [-Woverflow]
I'm not sure if that's really a problem here, but using the enum type is better
style anyways.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179696 91177308-0d34-0410-b5e6-96231b3b80d8
A couple of recently introduced conditional branch patterns
also need to be marked as isCodeGenOnly since they cannot
be handled by the asm parser.
No change in generated code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179690 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows the Mips assembler to parse and emit nested
expressions as instruction operands. It also extends the
expansion of memory instructions when an offset is given as
an expression.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179657 91177308-0d34-0410-b5e6-96231b3b80d8