This commit makes the following changes:
- Stop issuing a warning when the triples' string representations do not match
exactly if the Triple objects generated from the strings compare equal.
- On Apple platforms, choose the triple that has the larger minimum version
number.
rdar://problem/16743513
Differential Revision: http://reviews.llvm.org/D7591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228999 91177308-0d34-0410-b5e6-96231b3b80d8
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228980 91177308-0d34-0410-b5e6-96231b3b80d8
Constant pool entries are uniqued by their contents regardless of their
type. This means that a pshufb can have a shuffle mask which isn't a
simple array of bytes.
The code path which attempts to decode the mask didn't check for
failure, causing PR22559.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228979 91177308-0d34-0410-b5e6-96231b3b80d8
I'd modify my migration tool to account for this, but this is the only
instance of a typedef'd pointer type to a gep I found in the whole test
suite, so it didn't seem worthwhile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228970 91177308-0d34-0410-b5e6-96231b3b80d8
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228969 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implement the bulk of returning values in Mips fast-isel
Test Plan:
reatabi.ll
Passes test-suite at -O0,-O2 and with mips32r2 and mips32r1.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D5920
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228958 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instances of the AssumptionCache are per function, so we can't re-use
the same AssumptionCache instance when recursing in the CallAnalyzer to
analyze a different function. Instead we have to pass the
AssumptionCacheTracker to the CallAnalyzer so it can get the right
AssumptionCache on demand.
Reviewers: hfinkel
Subscribers: llvm-commits, hans
Differential Revision: http://reviews.llvm.org/D7533
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228957 91177308-0d34-0410-b5e6-96231b3b80d8
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228931 91177308-0d34-0410-b5e6-96231b3b80d8
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.
Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.
This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.
Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.
Differential Revision: http://reviews.llvm.org/D7585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228923 91177308-0d34-0410-b5e6-96231b3b80d8
The changes in r223113 (ARM modified-immediate syntax) have broken
instructions like:
mov r0, #~0xffffff00
The problem is that I've added a spurious range check on the immediate
operand to ensure that it lies between INT32_MIN and UINT32_MAX. While
this range check is correct in theory, it causes problems because the
operand is stored in an int64_t (by MC). So valid 32-bit constants like
\#~0xffffff00 become out of range. The solution is to simply remove this
range check. It is not possible to validate the range of the immediate
operand with the current setup because: 1) The operand is stored in an
int64_t by MC, 2) The immediate can be of the forms #imm, #-imm, #~imm
or even #((~imm)) etc. So we just chop the value to 32 bits and use it.
Also noted that the original range check was note tested by any of the
unit tests. I've added a new test to cover #~imm kind of operands.
Change-Id: I411e90d84312a2eff01b732bb238af536c4a7599
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228920 91177308-0d34-0410-b5e6-96231b3b80d8
I've built some tests in WebRTC with and without this change. With this change number of __tsan_read/write calls is reduced by 20-40%, binary size decreases by 5-10% and execution time drops by ~5%. For example:
$ ls -l old/modules_unittests new/modules_unittests
-rwxr-x--- 1 dvyukov 41708976 Jan 20 18:35 old/modules_unittests
-rwxr-x--- 1 dvyukov 38294008 Jan 20 18:29 new/modules_unittests
$ objdump -d old/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
239871
$ objdump -d new/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
148365
http://reviews.llvm.org/D7069
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228917 91177308-0d34-0410-b5e6-96231b3b80d8
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.
There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228916 91177308-0d34-0410-b5e6-96231b3b80d8
This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size.
We go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).
Differential Revision: http://reviews.llvm.org/D7561
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228915 91177308-0d34-0410-b5e6-96231b3b80d8
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.
Differential Revision: http://reviews.llvm.org/D7571
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228911 91177308-0d34-0410-b5e6-96231b3b80d8
We would crash if we couldn't locate a Function that either Location's
Value belonged to. Now we just print out a debug message and return
conservatively.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228901 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.
The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.
Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.
I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.
I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228899 91177308-0d34-0410-b5e6-96231b3b80d8
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".
P7 test-suite speedups (no regressions):
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
-0.626647% +/- 0.323583%
MultiSource/Benchmarks/Olden/power/power
-18.2821% +/- 8.06481%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228895 91177308-0d34-0410-b5e6-96231b3b80d8
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228885 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7286
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228872 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Move calls to get_input_file and release_input_file out of
getModuleForFile(). Otherwise release_input_file may end up
unmapping a view of the file while the view is still being
used by the Module (on 32-bit hosts).
Fix for PR22482.
Test Plan: Add test using --no-map-whole-files.
Reviewers: rafael, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7539
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228842 91177308-0d34-0410-b5e6-96231b3b80d8
This is a union of these commits:
* R600/SI: Enable more tests for VI which need no changes
* R600/SI: Enable V_BCNT tests for VI
Differences:
- v_bcnt_..._e32 -> _e64
- s_load_dword* inline offset is in bytes instead of dwords
* R600/SI: Enable all tests for VI which use S_LOAD_DWORD
The inline offset is changed from dwords to bytes.
* R600/SI: Enable LDS tests for VI
Differences:
- the s_load_dword inline offset changed from dwords to bytes
- the tests checked very little on CI, so they have been fixed to check all
instructions that "SI" checked
* R600/SI: Enable lshr tests for VI
* R600/SI: Fix divrem64 tests
- "v_lshl_64" was missing "b" before "64"
- added VI-NOT checks
* R600/SI: Enable the SI.tid test for VI
* R600/SI: Enable the frem test for VI
Also, the frem_f64 checking is added for CI-VI.
* R600/SI: Add VI tests for rsq.clamped
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228830 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is a follow-up of r228826 (see code-review: D7506).
Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we
have to fix the cost heuristic for intrinsic calls to cttz/ctlz.
This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl
queries TLI to check if a call to cttz/ctlz is cheap for the target.
Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86,
SimplifyCFG only speculates a call to cttz/ctlz if it is cheap.
Differential Revision: http://reviews.llvm.org/D7554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228829 91177308-0d34-0410-b5e6-96231b3b80d8
This testcase change was associated incorrectly to a followup commit in my git tree, not the base commit. Sorry!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228827 91177308-0d34-0410-b5e6-96231b3b80d8
analysis.
We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.
Test changes:
* combine-comparisons-by-cse.ll: Removed unneeded branch check.
* 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
* coalesce-subregs.ll: Superfluous block check.
* 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
* PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
* select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228826 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently we have Mips32 and Mips64 disassemblers and this causes the target
triple to affect the disassembly despite all the relevant information being in
the ELF header. These implementations do not need to be separate.
This patch merges them together such that the appropriate tables are checked
for the subtarget (e.g. Mips64 is checked when GP64 is enabled).
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7498
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228825 91177308-0d34-0410-b5e6-96231b3b80d8
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:
x[i*3+0] = y1
x[i*3+1] = y2
x[i*3+2] = y3
Base instruction -> i*3
+---+----+
/ | \
ST[y1] +1 +2 <-- Roots
| |
ST[y2] ST[y3]
There may be multiple DAGRootSets, for example:
x[i*2+0] = ... (1)
x[i*2+1] = ... (1)
x[i*2+4] = ... (2)
x[i*2+5] = ... (2)
x[(i+1234)*2+5678] = ... (3)
x[(i+1234)*2+5679] = ... (3)
This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228821 91177308-0d34-0410-b5e6-96231b3b80d8
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228782 91177308-0d34-0410-b5e6-96231b3b80d8
The isSigned argument of makeLibCall function was hard-coded to false
(unsigned). This caused zero extension on MIPS64 soft float.
As the result SingleSource/Benchmarks/Stanford/FloatMM test and
SingleSource/UnitTests/2005-07-17-INT-To-FP test failed.
The solution was to use the proper argument.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7292
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228765 91177308-0d34-0410-b5e6-96231b3b80d8
Simply loading or storing the frame pointer is not sufficient for
Windows targets. Instead, create a synthetic frame object that we will
lower later. References to this synthetic object will be replaced with
the correct reference to the frame address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228748 91177308-0d34-0410-b5e6-96231b3b80d8
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.
Previously we were deleting arguments if sub-value 0 turned out to be dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228731 91177308-0d34-0410-b5e6-96231b3b80d8
See full discussion in http://reviews.llvm.org/D7491.
We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers. The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction. The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.
Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc. It has also been verified with the
--no-tls-optimize flag workaround removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228725 91177308-0d34-0410-b5e6-96231b3b80d8
Some old assembly code uses the cntlz alias for cntlzw, binutils supports this,
and we should too. Fixes PR22519.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228719 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc.
Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources.
Differential Revision: http://reviews.llvm.org/D7484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228685 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches X86FastISel how to select AVX instructions for scalar
float/double convert operations.
Before this patch, X86FastISel always selected legacy SSE instructions
for FPExt (from float to double) and FPTrunc (from double to float).
For example:
\code
define double @foo(float %f) {
%conv = fpext float %f to double
ret double %conv
}
\end code
Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:
cvtss2sd %xmm0, %xmm0
With this patch, X86FastIsel selects a VCVTSS2SDrr instead:
vcvtss2sd %xmm0, %xmm0, %xmm0
Added test fast-isel-fptrunc-fpext.ll to check both the register-register and
the register-memory float/double conversion variants.
Differential Revision: http://reviews.llvm.org/D7438
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228682 91177308-0d34-0410-b5e6-96231b3b80d8
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228677 91177308-0d34-0410-b5e6-96231b3b80d8
nodes when folding bitcasts of constants.
We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.
This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.
(Also as a note, this does not appear to be in 3.6, no backport needed)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228656 91177308-0d34-0410-b5e6-96231b3b80d8
Win64 has specific contraints on what valid prologues and epilogues look
like. This constraint is born from the flexibility and descriptiveness
of Win64's unwind opcodes.
Prologues previously emitted by LLVM could not be represented by the
unwind opcodes, preventing operations powered by stack unwinding to
successfully work.
Differential Revision: http://reviews.llvm.org/D7520
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228641 91177308-0d34-0410-b5e6-96231b3b80d8
for any padding introduced by SROA. In particular, do not emit debug info
for an alloca that represents only the padding introduced by a previous
iteration.
Fixes PR22495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228632 91177308-0d34-0410-b5e6-96231b3b80d8
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.
Part of PR22495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228631 91177308-0d34-0410-b5e6-96231b3b80d8
Remove handling for DW_TAG_constant. We started producing it in
r110656, but reverted that in r110876 without dropping the support.
Finish the job.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228623 91177308-0d34-0410-b5e6-96231b3b80d8
These tests the two optimizations for backedge insertion currently implemented and the split backedge flag which is currently off by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228617 91177308-0d34-0410-b5e6-96231b3b80d8
This is just adding really simple tests which should have been part of the original submission. When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission. I fixed two such bugs in this submission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228610 91177308-0d34-0410-b5e6-96231b3b80d8
When creating a scev for sext({X,+,Y}), scev checks if the expression
is equivalent to {sext X,+,zext Y}. If it can prove that, it also
tags the original {X,+,Y} as <nsw>, which is not correct.
In the test case I run `-scalar-evolution` twice because the bug
manifests only once SCEV has run through and seen the `sext`
expressions (and then does a in-place mutation on {X,+,Y}).
Differential Revision: http://reviews.llvm.org/D7495
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228586 91177308-0d34-0410-b5e6-96231b3b80d8
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.
Phabricator review: http://reviews.llvm.org/D7469
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228580 91177308-0d34-0410-b5e6-96231b3b80d8
For the attached test case different types are used in the ICmpInst
and SelectInst that represent the min/max expressions. However, if the
ICmpInst type is smaller a comparison with the sign/zero extended
operands would have yielded the same result. This situation might
arise after the instruction combination pass was applied.
Differential Revision: http://reviews.llvm.org/D7338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228572 91177308-0d34-0410-b5e6-96231b3b80d8
wrong basic block.
This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.
Also, hoist up the code which computes the insertion point to the only place
that need that computation.
rdar://problem/15978721
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228566 91177308-0d34-0410-b5e6-96231b3b80d8
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.
This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228559 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.
E.g.
%val = call { i32, i32 } @whatever()
[...]
ret { i32, i32 } %val
The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.
Also unifies the logic slightly between aggregate and non-aggregate
cases..
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228558 91177308-0d34-0410-b5e6-96231b3b80d8
add recurrences don't overflow.
This change makes the optimization more restrictive. It still assumes
that an overflowing `add nsw` is undefined behavior; and this change
will need revisiting once we have a consistent semantics for poison
values.
Differential Revision: http://reviews.llvm.org/D7331
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228552 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7490
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228525 91177308-0d34-0410-b5e6-96231b3b80d8
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Should fix PR21645.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228518 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out there is a simpler way of checking that all bytes in a word are equal
than binary decomposition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228503 91177308-0d34-0410-b5e6-96231b3b80d8
different fields.
We can show that two GEPs off of the same (possibly multidimensional)
array of structs, into different fields, can't alias. Quoting:
For two GEPOperators GEP1 and GEP2, if we find that:
- both GEPs begin indexing from the exact same pointer;
- the last indices in both GEPs are constants, indexing into a struct;
- said indices are different, hence,the pointed-to fields are different;
- and both GEPs only index through arrays prior to that;
this lets us determine that the struct that GEP1 indexes into and the
struct that GEP2 indexes into must either precisely overlap or be
completely disjoint. Because they cannot partially overlap, indexing
into different non-overlapping fields of the struct will never alias.
The other BasicAA::aliasGEP rules worked in some cases, but not all
(for example, the i32x3 struct in the testcase).
We can add this simple ad-hoc rule to complement them.
rdar://19717375
Differential Revision: http://reviews.llvm.org/D7453
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228498 91177308-0d34-0410-b5e6-96231b3b80d8
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228495 91177308-0d34-0410-b5e6-96231b3b80d8
COFF section flags are not idempotent:
'rd' will make a read-write section because 'd' implies write
'dr' will make a read-only section because 'r' disables write
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228490 91177308-0d34-0410-b5e6-96231b3b80d8
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228488 91177308-0d34-0410-b5e6-96231b3b80d8
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).
Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
Remove unnecessary restriction of 24-bits for line numbers in
`MDLocation`.
The rest of the debug info schema (with the exception of local
variables) uses 32-bits for line numbers. As I introduce the
specialized nodes, it makes sense to canonicalize on one size or the
other.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228455 91177308-0d34-0410-b5e6-96231b3b80d8
An atomic store always make the target location fully initialized (in the
current implementation). It should not store origin. Initialized memory can't
have meaningful origin, and, due to origin granularity (4 bytes) there is a
chance that this extra store would overwrite meaningfull origin for an adjacent
location.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228444 91177308-0d34-0410-b5e6-96231b3b80d8
Normalize
select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b)
select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b)
This normal form may enable further combines on the And/Or and shortens
paths for the values. Many targets prefer the other but can go back
easily in CodeGen.
Differential Revision: http://reviews.llvm.org/D7399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228409 91177308-0d34-0410-b5e6-96231b3b80d8
Doesn't seem necessary anymore. I think this was mostly compensating for
not enabling WQM for texture sampling instructions.
v2: Add test coverage
Reviewed-by: Tom Stellard <tom@stellard.net>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228373 91177308-0d34-0410-b5e6-96231b3b80d8
If whole quad mode isn't enabled for these, the level of detail is
calculated incorrectly for pixels along diagonal triangle edges, causing
artifacts.
v2: Use a TSFlag instead of lots of switch cases
v3: Add test coverage
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642
Reviewed-by: Tom Stellard <tom@stellard.net>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228372 91177308-0d34-0410-b5e6-96231b3b80d8
Since testing the function indirectly is tricky, introduce a direct
print-memderefs pass, in the same spirit as print-memdeps, which prints
dereferenceability information matched by FileCheck.
Differential Revision: http://reviews.llvm.org/D7075
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228369 91177308-0d34-0410-b5e6-96231b3b80d8