Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.
significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):
speedups:
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
-0.55171% +/- 0.333168%
MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
-17.5576% +/- 14.598%
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
-29.5708% +/- 7.09058%
MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
-34.9471% +/- 11.4391%
SingleSource/Benchmarks/BenchmarkGame/puzzle
-25.1347% +/- 11.0104%
SingleSource/Benchmarks/Misc/flops-8
-17.7297% +/- 9.79061%
SingleSource/Benchmarks/Shootout-C++/ary3
-35.5018% +/- 23.9458%
SingleSource/Regression/C/uint64_to_float
-56.3165% +/- 25.4234%
SingleSource/UnitTests/Vectorizer/gcc-loops
-18.5309% +/- 6.8496%
regressions:
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
18.351% +/- 12.156%
SingleSource/Benchmarks/Shootout-C++/methcall
27.3086% +/- 14.4733%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197099 91177308-0d34-0410-b5e6-96231b3b80d8
For one predicate to subsume another, they must both check the same condition
register. Failure to check this prerequisite was causing miscompiles.
Fixes PR18003.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197089 91177308-0d34-0410-b5e6-96231b3b80d8
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.
This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196472 91177308-0d34-0410-b5e6-96231b3b80d8
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196171 91177308-0d34-0410-b5e6-96231b3b80d8
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.
Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196111 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).
One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.
Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:
MultiSource/Benchmarks/BitBench/drop3/drop3
with AA: N/A
without AA: -28.7614% +/- 19.8356%
(significantly against AA)
MultiSource/Benchmarks/FreeBench/neural/neural
with AA: -17.7406% +/- 11.2712%
without AA: N/A
(significantly in favor of AA)
MultiSource/Benchmarks/SciMark2-C/scimark2
with AA: -11.2079% +/- 1.80543%
without AA: -11.3263% +/- 2.79651%
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
with AA: -41.8649% +/- 17.0053%
without AA: -34.5256% +/- 23.7072%
MultiSource/Benchmarks/mafft/pairlocalalign
with AA: 25.3016% +/- 17.8614%
without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)
MultiSource/Benchmarks/sim/sim
with AA: N/A
without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)
SingleSource/Benchmarks/BenchmarkGame/Large/fasta
with AA: 15.0664% +/- 6.70216%
without AA: 12.7747% +/- 8.43043%
SingleSource/Benchmarks/BenchmarkGame/puzzle
with AA: 82.2713% +/- 26.3567%
without AA: 75.7525% +/- 41.1842%
SingleSource/Benchmarks/Misc/flops-2
with AA: -37.1621% +/- 20.7964%
without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)
These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195981 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.
No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195980 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195949 91177308-0d34-0410-b5e6-96231b3b80d8
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195948 91177308-0d34-0410-b5e6-96231b3b80d8
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.
This could alter scheduling slightly, but I don't expect a large effect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195947 91177308-0d34-0410-b5e6-96231b3b80d8
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195946 91177308-0d34-0410-b5e6-96231b3b80d8
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195929 91177308-0d34-0410-b5e6-96231b3b80d8
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195927 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name. This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195908 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.
Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195890 91177308-0d34-0410-b5e6-96231b3b80d8
The instruction definitions incorrectly specified that popcntd and popcntw have
record forms; they do not. This mistake was causing invalid code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195272 91177308-0d34-0410-b5e6-96231b3b80d8
Masking operations (where only some number of the low bits are being kept) are
selected to rldicl(x, 0, mb). If x is a logical right shift (which would become
rldicl(y, 64-n, n)), we might be able to fold the two instructions together:
rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb
The right shift is really a left rotate followed by a mask, and if the explicit
mask is a more-restrictive sub-mask of the mask implied by the shift, only one
rldicl is needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195185 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195064 91177308-0d34-0410-b5e6-96231b3b80d8
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194997 91177308-0d34-0410-b5e6-96231b3b80d8
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194840 91177308-0d34-0410-b5e6-96231b3b80d8
On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:
mr r3, r4
we print this:
mr 3, 4
The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194384 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.
The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.
I will send an email to llvmdev with instructions on how to use this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192181 91177308-0d34-0410-b5e6-96231b3b80d8
When generating code for shared libraries, even local calls may be
intercepted, so we need a nop after the call for the linker to fix up the
TOC. Test case adapted from the one provided in PR17354.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191440 91177308-0d34-0410-b5e6-96231b3b80d8
When asked to pad an irregular number of bytes, we should fill with
zeros. This is consistent with the behavior specified in the AIX
Assembler Language Reference as well as other LLVM and binutils
assemblers.
N.B. There is a small deviation from binutils' PPC assembler:
when handling pads which are greater than 4 bytes but not mod 4,
binutils will not emit any NOP sequences at all and only use zeros.
This may or may not be a bug but there is no excellent rationale as to
why that behavior is important to emulate. If that behavior is needed,
we can change writeNopData() to behave in the same way.
This fixes PR17352.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191426 91177308-0d34-0410-b5e6-96231b3b80d8
Encodings were checked against the Power ISA documents and double
checked against binutils.
This fixes PR17350.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191419 91177308-0d34-0410-b5e6-96231b3b80d8
The binutils assembler supports a mode called DOLLAR_DOT which treats
the dollar sign token as a reference to the current program counter if
the dollar sign doesn't precede a constant or identifier.
This commit adds a new MCAsmInfo flag stating whether or not a given
target supports this interpretation of the dollar sign token; by
default, this flag is not enabled.
Further, enable this flag for PPC. The system assembler for AIX and
binutils both support using the dollar sign in this manner.
This fixes PR17353.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191368 91177308-0d34-0410-b5e6-96231b3b80d8