This is a skeleton for a pre-RA MachineInstr scheduler strategy. Currently
it only tries to expose more parallelism for ALU instructions (this also
makes the distribution of GPR channels more uniform and increases the
chances of ALU instructions to be packed together in a single VLIW group).
Also it tries to reduce clause switching by grouping instruction of the
same kind (ALU/FETCH/CF) together.
Vincent Lejeune:
- Support for VLIW4 Slot assignement
- Recomputation of ScheduleDAG to get more parallelism opportunities
Tom Stellard:
- Fix assertion failure when trying to determine an instruction's slot
based on its destination register's class
- Fix some compiler warnings
Vincent Lejeune: [v2]
- Remove recomputation of ScheduleDAG (will be provided in a later patch)
- Improve estimation of an ALU clause size so that heuristic does not emit cf
instructions at the wrong position.
- Make schedule heuristic smarter using SUnit Depth
- Take constant read limitations into account
Vincent Lejeune: [v3]
- Fix some uninitialized values in ConstPair
- Add asserts to ensure an ALU slot is always populated
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176498 91177308-0d34-0410-b5e6-96231b3b80d8
Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176488 91177308-0d34-0410-b5e6-96231b3b80d8
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
mayLoad complexify scheduling and does not bring any usefull info
as the location is not writeable at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176486 91177308-0d34-0410-b5e6-96231b3b80d8
one-byte NOPs. If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact. From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12. This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176464 91177308-0d34-0410-b5e6-96231b3b80d8
GlobalValue linkage up to ExternalLinkage in the ExtractGV pass. This
prevents linkonce and linkonce_odr symbols from being DCE'd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176459 91177308-0d34-0410-b5e6-96231b3b80d8
'R' An address that can be sued in a non-macro load or store.
This patch includes a positive test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176452 91177308-0d34-0410-b5e6-96231b3b80d8
* Only apply divide bypass optimization when not optimizing for size.
* Fixed bug caused by constant for 0 value of type Int32,
used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176442 91177308-0d34-0410-b5e6-96231b3b80d8
This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation
fingers crossed so that it does break clang boostrap again..
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176408 91177308-0d34-0410-b5e6-96231b3b80d8
this is similar to getObjectSize(), but doesnt subtract the offset
tweak the BasicAA code accordingly (per PR14988)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176407 91177308-0d34-0410-b5e6-96231b3b80d8
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.
PR14448.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176399 91177308-0d34-0410-b5e6-96231b3b80d8
The sys::fs::is_directory() check is unnecessary because, if the filename is
a directory, the function will fail anyway with the same error code returned.
Remove the check to avoid an unnecessary stat call.
Someone needs to review on windows and see if the check is necessary there or not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176386 91177308-0d34-0410-b5e6-96231b3b80d8
This patch eliminates the need to emit a constant move instruction when this
pattern is matched:
(select (setgt a, Constant), T, F)
The pattern above effectively turns into this:
(conditional-move (setlt a, Constant + 1), F, T)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176384 91177308-0d34-0410-b5e6-96231b3b80d8
This reduces the time actually spent doing string to ID conversion and shows a 10% improvement in compile time for a particularly bad case that involves ARM Neon intrinsics (these have many overloads).
Patch by Jean-Luc Duprat!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176365 91177308-0d34-0410-b5e6-96231b3b80d8
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
but TLI.getShiftAmountTy() so far only return scalar type. As a
result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
return target-specificed scalar type or the same vector type as the
1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176364 91177308-0d34-0410-b5e6-96231b3b80d8
dispatch code. As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176363 91177308-0d34-0410-b5e6-96231b3b80d8
v2: based on Michels patch, but now allows copying of all registers sizes.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176346 91177308-0d34-0410-b5e6-96231b3b80d8
It's much easier to specify the encoding with tablegen directly.
Signed-off-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176344 91177308-0d34-0410-b5e6-96231b3b80d8
This function will be used later when the capability to search delay slot
filling instructions in successor blocks is added. No intended functionality
changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176325 91177308-0d34-0410-b5e6-96231b3b80d8
We avoided computing DAG height/depth during Node printing because it
shouldn't depend on an otherwise valid DAG. But this has become far
too annoying for the common case of a valid DAG where we want to see
valid values. If doing the computation on-the-fly turns out to be a
problem in practice, then I'll add a mode to the diagnostics to only
force it when we're likely to have a valid DAG, otherwise explicitly
print INVALID instead of bogus numbers. For now, just go for it all
the time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176314 91177308-0d34-0410-b5e6-96231b3b80d8
SelectionDAGIsel::LowerArguments needs a function, not a basic block. So it
makes sense to pass it the function instead of extracting a basic-block from
the function and then tossing it. This is also more self-documenting (functions
have arguments, BBs don't).
In addition, added comments to a couple of Select* methods.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176305 91177308-0d34-0410-b5e6-96231b3b80d8
This was causing the folding set to fail to fold attributes, because it was
being calculated in one spot without an empty values string but here with an
empty values string.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176301 91177308-0d34-0410-b5e6-96231b3b80d8
The instcombine recognized pattern looks like:
a = b * c
d = a +/- Cst
or
a = b * c
d = Cst +/- a
When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0).
The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176300 91177308-0d34-0410-b5e6-96231b3b80d8
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.
Differential Revision: http://llvm-reviews.chandlerc.com/D466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8
The work done by the post-encoder (setting architecturally unused bits to 0 as
required) can be done by the existing operand that covers the "#0.0". This
removes at least one use of the discouraged PostEncoderMethod uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176261 91177308-0d34-0410-b5e6-96231b3b80d8
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176259 91177308-0d34-0410-b5e6-96231b3b80d8
Shadow checks are disabled and memory loads always produce fully initialized
values in functions that don't have a sanitize_memory attribute. Value and
argument shadow is propagated as usual.
This change also updates blacklist behaviour to match the above.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176247 91177308-0d34-0410-b5e6-96231b3b80d8
to create the parent path.
This can happen if the path is a relative filename and the current directory was removed.
Thanks to Daniel D. for the hint in fixing it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176226 91177308-0d34-0410-b5e6-96231b3b80d8
This problem is exposed by r171325 which is already reverted. It is rather
hard to fabricate a testing case without it.
r171325 should *NOT* be resurrected as it has a potential problem although
this problem dosen't directly contribute to PR14988.
The bug is tracked by:
- rdar://13063553, and
- http://llvm.org/bugs/show_bug.cgi?id=14988
Thank Arnold for coming up a better solution to this problem. After
comparing this solution and my original proposal, I decided to ditch mine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176225 91177308-0d34-0410-b5e6-96231b3b80d8
definition DIE (TAG_variable), and put AT_MIPS_linkage_name to TAG_member when
DarwinGDBCompat is true.
Darwin GDB needs AT_MIPS_linkage_name at both places to work.
Follow-up patch to r176143.
rdar://problem/13291234
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176220 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables the counters on non-debug builds. This reduces the runtime of SelectionDAGISel::SelectCodeCommon by ~5%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176214 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes an issue where trying to assemlbe valid ADR instructions would cause
LLVM to hit a failed assertion.
Patch by Keith Walker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176189 91177308-0d34-0410-b5e6-96231b3b80d8
This properly asks TargetLibraryInfo if a call is available and if it is, it
can be translated into the corresponding LLVM builtin. We don't vectorize sqrt()
yet because I'm not sure about the semantics for negative numbers. The other
intrinsic should be exact equivalents to the libm functions.
Differential Revision: http://llvm-reviews.chandlerc.com/D465
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176188 91177308-0d34-0410-b5e6-96231b3b80d8
passing a null pointer to the function name in to GCDAProfiling, and add another
switch onto GCOVProfiling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176173 91177308-0d34-0410-b5e6-96231b3b80d8
PR15262 reported a bug where the following instruction:
i8 getelementptr inbounds i8* bitcast ([4 x i8] addrspace(12)* @buf to i8*),
i32 2
was getting folded into:
addrspace(12)* getelementptr inbounds ([4 x i8] addrspace(12)* @buf, i32 0,
i32 2)
This caused instcombine to crash because the original instruction and
the folded instruction have different types. The issue was fixed by
disallowing bitcasts between different address spaces to be folded away.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176156 91177308-0d34-0410-b5e6-96231b3b80d8
definition DIE, to make old GDB happy.
We have a regression for old GDB when Clang uses DW_TAG_member to declare
static members inside a class, instead of DW_TAG_variable. This patch will fix
this regression.
rdar://problem/13291234
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176143 91177308-0d34-0410-b5e6-96231b3b80d8
enhancement done the trivial way; by extending inputs and truncating outputs
which is addequate for targets with little or no support for integer arithmetic
on integer types less than 32 bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176139 91177308-0d34-0410-b5e6-96231b3b80d8