Corrected type for index of llvm.x86.avx2.gather.d.pd.256
from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
from 256-bit to 128-bit.
Support the following intrinsics:
llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159402 91177308-0d34-0410-b5e6-96231b3b80d8
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159312 91177308-0d34-0410-b5e6-96231b3b80d8
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159302 91177308-0d34-0410-b5e6-96231b3b80d8
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159301 91177308-0d34-0410-b5e6-96231b3b80d8
which many Mips 64 ABIs use than for O64 which many
if not all other target ABIs use.
Most architectures have the following 64 bit relocation record format:
typedef struct
{
Elf64_Addr r_offset; /* Address of reference */
Elf64_Xword r_info; /* Symbol index and type of relocation */
} Elf64_Rel;
typedef struct
{
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;
Whereas N64 has the following format:
typedef struct
{
Elf64_Addr r_offset;/* Address of reference */
Elf64_Word r_sym; /* Symbol index */
Elf64_Byte r_ssym; /* Special symbol */
Elf64_Byte r_type3; /* Relocation type */
Elf64_Byte r_type2; /* Relocation type */
Elf64_Byte r_type; /* Relocation type */
} Elf64_Rel;
typedef struct
{
Elf64_Addr r_offset;/* Address of reference */
Elf64_Word r_sym; /* Symbol index */
Elf64_Byte r_ssym; /* Special symbol */
Elf64_Byte r_type3; /* Relocation type */
Elf64_Byte r_type2; /* Relocation type */
Elf64_Byte r_type; /* Relocation type */
Elf64_Sxword r_addend;
} Elf64_Rela;
The structure is the same size, but the r_info data element
is now 5 separate elements. Besides the content aspects,
endian byte reordering will be different for the area with
each element being endianized separately.
I treat this as generic and continue to pass r_type as
an integer masking and unmasking the byte sized N64
values for N64 mode. I've implemented this and it causes no
affect on other current targets.
This passes make check.
Jack
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159299 91177308-0d34-0410-b5e6-96231b3b80d8
up to r158925 were handled as processor specific. Making them
generic and putting tests for these modifiers in the CodeGen/Generic
directory caused a number of targets to fail.
This commit addresses that problem by having the targets call
the generic routine for generic modifiers that they don't currently
have explicit code for.
For now only generic print operands 'c' and 'n' are supported.vi
Affected files:
test/CodeGen/Generic/asm-large-immediate.ll
lib/Target/PowerPC/PPCAsmPrinter.cpp
lib/Target/NVPTX/NVPTXAsmPrinter.cpp
lib/Target/ARM/ARMAsmPrinter.cpp
lib/Target/XCore/XCoreAsmPrinter.cpp
lib/Target/X86/X86AsmPrinter.cpp
lib/Target/Hexagon/HexagonAsmPrinter.cpp
lib/Target/CellSPU/SPUAsmPrinter.cpp
lib/Target/Sparc/SparcAsmPrinter.cpp
lib/Target/MBlaze/MBlazeAsmPrinter.cpp
lib/Target/Mips/MipsAsmPrinter.cpp
MSP430 isn't represented because it did not even run with
the long existing 'c' modifier and it was not apparent what
needs to be done to get it inline asm ready.
Contributer: Jack Carter
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159203 91177308-0d34-0410-b5e6-96231b3b80d8
More condition codes are included when deciding whether to remove cmp after
a sub instruction. Specifically, we extend from GE|LT|GT|LE to
GE|LT|GT|LE|HS|LS|HI|LO|EQ|NE. If we have "sub a, b; cmp b, a; movhs", we
should be able to replace with "sub a, b; movls".
rdar: 11725965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159166 91177308-0d34-0410-b5e6-96231b3b80d8
The function live-out registers must be live at all function returns,
and %RCX is only used by eh.return. When a function also has a normal
return, only %RAX holds a return value.
This fixes PR13188.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159116 91177308-0d34-0410-b5e6-96231b3b80d8
This allows the user/front-end to specify a model that is better
than what LLVM would choose by default. For example, a variable
might be declared as
@x = thread_local(initialexec) global i32 42
if it will not be used in a shared library that is dlopen'ed.
If the specified model isn't supported by the target, or if LLVM can
make a better choice, a different model may be used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159077 91177308-0d34-0410-b5e6-96231b3b80d8
There are patterns to handle immediates when they fit in the immediate field.
e.g. %sub = add i32 %x, -123
=> sub r0, r0, #123
Add patterns to catch immediates that do not fit but should be materialized
with a single movw instruction rather than movw + movt pair.
e.g. %sub = add i32 %x, -65535
=> movw r1, #65535
sub r0, r0, r1
rdar://11726136
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159057 91177308-0d34-0410-b5e6-96231b3b80d8
As an example of how the custom DiagnosticType can be used to provide
better operand-mismatch diagnostics, add a custom diagnostic for
the imm0_15 operand class used for several system instructions.
Update the tests to expect the improved diagnostic.
rdar://8987109
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159051 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Allow up to 64 functional units per processor itinerary.
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159027 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it explicit when ScoreboardHazardRecognizer will be used.
"GenericItineraries" would only make sense if it contained real
itinerary values and still required ScoreboardHazardRecognizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158963 91177308-0d34-0410-b5e6-96231b3b80d8
The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame
pointer exists, but the frame pointer was forced by the presence of
llvm.eh.unwind.init which isn't guaranteed.
If llvm.eh.unwind.init is actually required in functions calling
eh.return (is it?), we should diagnose that instead of emitting bad
machine code.
This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158961 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minor drive-by fix with no robust way to unit test.
As an example see neon-div.ll:
SU(16): %Q8<def> = VMOVLsv4i32 %D17, pred:14, pred:%noreg, %Q8<imp-use,kill>
val SU(1): Latency=2 Reg=%Q8
...should be latency=1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158960 91177308-0d34-0410-b5e6-96231b3b80d8
Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158959 91177308-0d34-0410-b5e6-96231b3b80d8
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).
This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.
Fast mode - allows formation of fused FP ops whenever they're profitable.
Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.
Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.
Note: This option only controls formation of fused ops by the optimizers. Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.
Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158956 91177308-0d34-0410-b5e6-96231b3b80d8
The existing contraction patterns are replaced with fma/fneg.
Overall functionality should be the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158955 91177308-0d34-0410-b5e6-96231b3b80d8
to be generic across architectures. It has the
following description in the gnu sources:
Substitute immediate value without immediate syntax
Several Architectures such as x86 have local implementations
of operand modifier 'c' which go beyond the above description
slightly. To make use of the generic modifiers without overriding
local implementation one can make a call to the base class method
for AsmPrinter::PrintAsmOperand() in the locally derived method's
"default" case in the switch statement. That way if it is already
defined locally the generic version will never get called.
This change is needed when test/CodeGen/generic/asm-large-immediate.ll
failed on a native Mips board. The test was assuming a generic
implementation was in place.
Affected files:
lib/Target/Mips/MipsAsmPrinter.cpp:
Changed the default case to call the base method.
lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
Added 'c' to the switch cases.
test/CodeGen/Mips/asm-large-immediate.ll
Mips compiled version of the generic one
Contributer: Jack Carter
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158925 91177308-0d34-0410-b5e6-96231b3b80d8
that are generated by TableGen and are already available in
MipsGenRegisterInfo.inc. Suggested by Jakob Stoklund Olesen.
Also, fix bug in function DecodeAFGR64RegisterClass.
Patch by Vladimir Medic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158846 91177308-0d34-0410-b5e6-96231b3b80d8
There is a pretty staggering amount of this in LLVM's header files, this
is not all of the instances I'm afraid. These include all of the
functions that (in my build) are used by a non-static inline (or
external) function. Specifically, these issues were caught by the new
'-Winternal-linkage-in-inline' warning.
I'll try to just clean up the remainder of the clearly redundant "static
inline" cases on functions (not methods!) defined within headers if
I can do so in a reliable way.
There were even several cases of a missing 'inline' altogether, or my
personal favorite "static bool inline". Go figure. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158800 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
OR
UnsafeFPMath option (-enable-unsafe-fp-math)
are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).
If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158757 91177308-0d34-0410-b5e6-96231b3b80d8
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158743 91177308-0d34-0410-b5e6-96231b3b80d8
The condition code didn't actually matter for arm "b" instructions,
unlike "bl". It should just use the R_ARM_JUMP24 reloc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158722 91177308-0d34-0410-b5e6-96231b3b80d8
For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.
On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:
SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%
Largest slowdowns:
SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158719 91177308-0d34-0410-b5e6-96231b3b80d8
TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.
Add a command line option to llc that enables it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158692 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158679 91177308-0d34-0410-b5e6-96231b3b80d8
The NOP, WFE, WFI, SEV and YIELD instructions are all hints w/
a different immediate value in bits [7,0]. Define a generic HINT
instruction and refactor NOP, WFI, WFI, SEV and YIELD to be
assembly aliases of that.
rdar://11600518
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158674 91177308-0d34-0410-b5e6-96231b3b80d8
when a compile time constant is known. This occurs when implicitly zero
extending function arguments from 16 bits to 32 bits. The 8 bit case doesn't
need to be handled, as the 8 bit constants are encoded directly, thereby
not needing a separate load instruction to form the constant into a register.
<rdar://problem/11481151>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158659 91177308-0d34-0410-b5e6-96231b3b80d8
This patch causes problems when both dynamic stack realignment and
dynamic allocas combine in the same function. With this patch, we no
longer build the epilog correctly, and silently restore registers from
the wrong position in the stack.
Thanks to Matt for tracking this down, and getting at least an initial
test case to Chad. I'm going to try to check a variation of that test
case in so we can easily track the fixes required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158654 91177308-0d34-0410-b5e6-96231b3b80d8
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.
In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158607 91177308-0d34-0410-b5e6-96231b3b80d8
This patch will optimize abs(x-y)
FROM
sub, movs, rsbmi
TO
subs, rsbmi
For abs, we will use cmp instead of movs. This is necessary because we already
have an existing peephole pass which optimizes away cmp following sub.
rdar: 11633193
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158551 91177308-0d34-0410-b5e6-96231b3b80d8
to load an immediate that does not fit into 16-bit. Also, take into
consideration the global base register slot on the stack when computing the
stack size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158430 91177308-0d34-0410-b5e6-96231b3b80d8
compute the size of basic blocks in a function. Also, define a function which
emits a series of instructions to load an immediate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158429 91177308-0d34-0410-b5e6-96231b3b80d8
object for the global base register.
This is the first of a series of patches which implements long branch expansion
for MIPS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158427 91177308-0d34-0410-b5e6-96231b3b80d8
delay slot filler pass of MIPS, per suggestion of Jakob Stoklund Olesen.
This change, along with the fix in r158154, enables machine verification
to be run after delay slot filling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158426 91177308-0d34-0410-b5e6-96231b3b80d8
non mips16
2. fix some comments to change OPcode->EXTEND for extended instructions
Patch by Reed Kotler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158378 91177308-0d34-0410-b5e6-96231b3b80d8
On the POWER7, adds and logical operations can also be handled
in the load/store pipelines. We'll call these IntSimple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158366 91177308-0d34-0410-b5e6-96231b3b80d8
POWER4 is a 64-bit CPU (better matched to the 970).
The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750).
Patch by Andreas Tobler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158363 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158349 91177308-0d34-0410-b5e6-96231b3b80d8
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158337 91177308-0d34-0410-b5e6-96231b3b80d8
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158323 91177308-0d34-0410-b5e6-96231b3b80d8
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158302 91177308-0d34-0410-b5e6-96231b3b80d8
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).
Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)
Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158296 91177308-0d34-0410-b5e6-96231b3b80d8
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.
This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%
Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158294 91177308-0d34-0410-b5e6-96231b3b80d8
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.
Thanks to Jakob and Owen for their suggestions in this matter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158283 91177308-0d34-0410-b5e6-96231b3b80d8
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).
Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%
Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%
This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158259 91177308-0d34-0410-b5e6-96231b3b80d8
As Chris points out, this can now be removed!
TODO: check if the associated section on viterbi's inner loop can also be removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158224 91177308-0d34-0410-b5e6-96231b3b80d8
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158223 91177308-0d34-0410-b5e6-96231b3b80d8
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.
Thanks again to Jakob for suggesting this fix to the CTR loop problem!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158221 91177308-0d34-0410-b5e6-96231b3b80d8
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.
The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.
To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158212 91177308-0d34-0410-b5e6-96231b3b80d8
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158206 91177308-0d34-0410-b5e6-96231b3b80d8
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
%vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158205 91177308-0d34-0410-b5e6-96231b3b80d8
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158204 91177308-0d34-0410-b5e6-96231b3b80d8
This patch will generate the following for integer ABS:
movl %edi, %eax
negl %eax
cmovll %edi, %eax
INSTEAD OF
movl %edi, %ecx
sarl $31, %ecx
leal (%rdi,%rcx), %eax
xorl %ecx, %eax
There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov.
This is implemented in PerformXorCombine.
rdar://10695237
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158175 91177308-0d34-0410-b5e6-96231b3b80d8
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158126 91177308-0d34-0410-b5e6-96231b3b80d8
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158122 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158096 91177308-0d34-0410-b5e6-96231b3b80d8
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158090 91177308-0d34-0410-b5e6-96231b3b80d8
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157979 91177308-0d34-0410-b5e6-96231b3b80d8
when a compile time constant is known. This occurs when implicitly zero
extending function arguments from 16 bits to 32 bits.
<rdar://problem/11481151>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157966 91177308-0d34-0410-b5e6-96231b3b80d8
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157911 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first of a series of patches which make changes to the backend to
emit unaligned load/store instructions (lwl,lwr,swl,swr) during instruction
selection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157862 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157854 91177308-0d34-0410-b5e6-96231b3b80d8
This patch will optimize the following:
sub r1, r3
cmp r3, r1 or cmp r1, r3
bge L1
TO
sub r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can eliminate
the "cmp" instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157831 91177308-0d34-0410-b5e6-96231b3b80d8
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157818 91177308-0d34-0410-b5e6-96231b3b80d8