Altivec only directly supports aligned loads, but the loads have a strange
property: If given an unaligned address, they truncate the address to the next
lower aligned address, and load from there. This property, along with an extra
load and some special-purpose permutation-control instructions that generate
the appropriate permutations from the original unaligned address, allow
efficient lowering of aligned loads. This code uses the trick explained in the
Apple Velocity Engine optimization overview document to prevent the needed
extra load from possibly causing a page fault if the original address happens
to be aligned.
As noted in the FIXMEs, there are several additional optimizations that can be
performed to reduce the cost of these loads even more. These will be
implemented in future commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182691 91177308-0d34-0410-b5e6-96231b3b80d8
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.
Added analysis and code generation tests. Added documentation for the
new attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182638 91177308-0d34-0410-b5e6-96231b3b80d8
This implements the @llvm.readcyclecounter intrinsic as the specific
MRC instruction specified in the ARM manuals for CPUs with the Power
Management extensions.
Older CPUs had slightly different methods which may also have to be
implemented eventually, but this should cover all v7 cases.
rdar://problem/13939186
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182603 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the LiveDebugVariables pass is running *after* register
coalescing, the ConnectedVNInfoEqClasses class needs to deal with
DBG_VALUE instructions.
This only comes up when rematerialization during coalescing causes the
remaining live range of a virtual register to separate into two
connected components.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182592 91177308-0d34-0410-b5e6-96231b3b80d8
Allow LLVM to take advantage of shift instructions that set the ZF flag,
making instructions that test the destination superfluous.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182454 91177308-0d34-0410-b5e6-96231b3b80d8
The intrinsic calls are dropped, but the annotated value is propagated.
Fixes PR 15253
Original patch by Zeng Bin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182387 91177308-0d34-0410-b5e6-96231b3b80d8
pic calls. These need to be there so we don't try and use helper
functions when we call those.
As part of this, make sure that we properly exclude helper functions in pic
mode when indirect calls are involved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182343 91177308-0d34-0410-b5e6-96231b3b80d8
By default, a teq instruction is inserted after integer divide. No divide-by-zero
checks are performed if option "-mnocheck-zero-division" is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182306 91177308-0d34-0410-b5e6-96231b3b80d8
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182274 91177308-0d34-0410-b5e6-96231b3b80d8
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182254 91177308-0d34-0410-b5e6-96231b3b80d8
Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:
Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ...
FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.
We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.
This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182237 91177308-0d34-0410-b5e6-96231b3b80d8
Also clean up the arguments to all the MOVCC instructions so the
operands always are (true-val, false-val, cond-code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182221 91177308-0d34-0410-b5e6-96231b3b80d8
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182191 91177308-0d34-0410-b5e6-96231b3b80d8
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182184 91177308-0d34-0410-b5e6-96231b3b80d8
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182175 91177308-0d34-0410-b5e6-96231b3b80d8
It should increase PV substitution opportunities and lower gpr
usage (pending computations path are "flushed" sooner)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182128 91177308-0d34-0410-b5e6-96231b3b80d8
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182126 91177308-0d34-0410-b5e6-96231b3b80d8
Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182124 91177308-0d34-0410-b5e6-96231b3b80d8
Shuffles that only move an element into position 0 of the vector are common in
the output of the loop vectorizer and often generate suboptimal code when SSSE3
is not available. Lower them to vector shifts if possible.
We still prefer palignr over psrldq because it has higher throughput on
sandybridge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182102 91177308-0d34-0410-b5e6-96231b3b80d8