Add user-supplied C runtime and compiler-rt library functions to
llvm.compiler.used to protect them from premature optimization by
passes like -globalopt and -ipsccp. Calls to (seemingly unused)
runtime library functions can be added by -instcombine and instruction
lowering.
Patch by Duncan Exon Smith, thanks!
Fixes <rdar://problem/14740087>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194514 91177308-0d34-0410-b5e6-96231b3b80d8
The system LDM and STM instructions can't usually writeback to the base
register. The one exception is when an LDM is actually an exception-return
(i.e. contains PC in the register list).
(There's already a test that "ldm sp!, {r0-r3, pc}^" works, which is why there
is no positive test).
rdar://problem/15223374
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194512 91177308-0d34-0410-b5e6-96231b3b80d8
This commit significantly speeds up both bytecode and native
builds of LLVM clients (from ~20 second to sub-second link time),
and allows to invoke LLVM functions from OCaml toplevel.
The behavior for --disable-shared builds is unchanged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194509 91177308-0d34-0410-b5e6-96231b3b80d8
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.
Fixes pr17815.
Patch by Chris Smowton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194506 91177308-0d34-0410-b5e6-96231b3b80d8
copy in MC layer. Added the MC layer tests. Fixed triple setting in test cases.
Patch by Ana Pazos <apazos@codeaurora.org>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194501 91177308-0d34-0410-b5e6-96231b3b80d8
We already know how to fold a reload from a frameindex without
analyzing the load instruction. Generalize this to handle any
frameindex load. This streamlines the logic for rematerializing loads
from stack arguments. As a side effect, it allows stackmaps to record
a stack argument location without spilling it.
Verified no effect on codegen for llvm test-suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194497 91177308-0d34-0410-b5e6-96231b3b80d8
Like GCC, this re-uses the 'f' constraint and a new 'w' print-modifier:
asm ("ldi.w %w0, 1", "=f"(result));
Unlike GCC, the 'w' print-modifer is not _required_ to produce the intended
output. This is a consequence of differences in the internal handling of
the registers in each compiler. To be source-compatible between the
compilers, users must use the 'w' print-modifier.
MSA registers (including control registers) are supported in clobber lists.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194476 91177308-0d34-0410-b5e6-96231b3b80d8
Upcoming commit(s) are going to add support for bseti and bnegi. This would
cause some existing tests to (correctly) change behaviour and emit a different
instruction. This patch prevents this by changing the constant used in ori and
xori tests so that they will not be matchable by the bseti and bnegi patterns
when these instructions are matchable from normal IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194467 91177308-0d34-0410-b5e6-96231b3b80d8
ATOMIC_FENCE is lowered to a compiler barrier which is codegen only. There
is no need to emit an instructions since the XCore provides sequential
consistency.
Original patch by Richard Osborne
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194464 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r194451.
Not sure why the tests are failing on the buildbot. They run fine on my
local machine. Could it possibly be because of the endianness of the
architectures? The GCNO and GCDA files are little-endian encoded, and
llvm-cov expects it to remain that way. Is this a safe assumption?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194454 91177308-0d34-0410-b5e6-96231b3b80d8
This test compares the output of llvm-cov against a coverage file
generated by gcov. Since the source file must be in the current
directory when reading GCNO files, the test will first cd into the
Inputs directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194451 91177308-0d34-0410-b5e6-96231b3b80d8
Print the range of registers used with a single letter prefix.
This better matches what the shader compiler produces and
is overall less obnoxious than concatenating all of the
subregister names together.
Instead of SGPR0, it will print s0. Instead of SGPR0_SGPR1,
it will print s[0:1] and so on.
There doesn't appear to be a straightforward way
to get the actual register info in the InstPrinter,
so this parses the generated name to print with the
new syntax.
The required test changes are pretty nasty, and register
matching regexes are now worse. Since there isn't a way to
add to a variable in FileCheck, some of the tests now don't
check the exact number of registers used, but I don't think that
will be a real problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194443 91177308-0d34-0410-b5e6-96231b3b80d8
This has no material effect at this time since we don't have a direct
object emitter for mips16 and the assembler can't tell them apart. I
place a comment "16 bit inst" for those so that I can tell them apart in the
output. The constant island pass has only been minimally changed to allow
this. More complete branch work is forthcoming but this is the first
step.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194442 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes <rdar://15432754> [JS] Assertion: "Folded a def to a non-store!"
The primary purpose of anyregcc is to prevent a patchpoint's call
arguments and return value from being spilled. They must be available
in a register, although the calling convention does not pin the
register. It's up to the front end to avoid using this convention for
calls with more arguments than allocatable registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194428 91177308-0d34-0410-b5e6-96231b3b80d8
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
value from dead block to alive join node is certainly ok.)
The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that
split the critial edge needs to be flagged "dead" as well.
There are 3 ways to fix this bug:
1) Get rid of the assertion as I mentioned eariler
2) When an dead edge is split, flag the inserted block "dead".
3) proactively split the critical edges connecting dead and live blocks when
new dead blocks are revealed.
This fix go for 3) with additional 2 LOC.
Testing case was added by Rafael the other day.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194424 91177308-0d34-0410-b5e6-96231b3b80d8
On non-Darwin PPC systems, we currently strip off the register name prefix
prior to instruction printing. So instead of something like this:
mr r3, r4
we print this:
mr 3, 4
The first form is the default on Darwin, and is understood by binutils, but not
yet understood by our integrated assembler. Once our integrated-as understands
full register names as well, this temporary option will be replaced by tying
this functionality to the verbose-asm option. The numeric-only form is
compatible with legacy assemblers and tools, and is also gcc's default on most
PPC systems. On the other hand, it is harder to read, and there are some
analysis tools that expect full register names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194384 91177308-0d34-0410-b5e6-96231b3b80d8
Llvm_target.intptr_type used to implicitly use global context. As
none of other functions in OCaml bindings do, it is changed to
accept context explicitly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194381 91177308-0d34-0410-b5e6-96231b3b80d8
In historical reason, tblgen is not strictly required to be free from memory leaks.
For now, I mark them as XFAIL, they could be fixed, though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194353 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful if you want to run multiple variations
of a single test, and the majority of check lines
should be the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194343 91177308-0d34-0410-b5e6-96231b3b80d8
formal arguments on the stack and stores created afterwards. We need this to
ensure tail call optimized function calls do not write over the argument area
of the stack before it is read out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194309 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194306 91177308-0d34-0410-b5e6-96231b3b80d8
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194293 91177308-0d34-0410-b5e6-96231b3b80d8
On darwin, when trying to create compact unwind info, a .cfi_cfa_def
directive would case an llvm_unreachable() to be hit. Back off when we
see this directive and generate the regular DWARF style eh_frame.
rdar://15406518
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194285 91177308-0d34-0410-b5e6-96231b3b80d8
isPhysRegUsed if the unwind information is required.
Indeed, the runtime may need a correct stack to be able to unwind the call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194271 91177308-0d34-0410-b5e6-96231b3b80d8
ARM prologues usually look like:
push {r7, lr}
sub sp, sp, #4
If code size is extremely important, this can be optimised to the single
instruction:
push {r6, r7, lr}
where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.
This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194264 91177308-0d34-0410-b5e6-96231b3b80d8
Linux cannot open directories with open(2), although cygwin and *bsd can.
Motivation: The test, Object/directory.ll, had been failing with --target=cygwin on Linux. XFAIL was improper for host issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194257 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Consider a GEP of:
i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32 0, i64 0)
If we proceeded to GEP the aforementioned object by 8, would form a GEP of:
i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32 0, i64 8)
Note that we would go through the first array member, causing an
out-of-bounds accesses. This is problematic because we might get fooled
if we are trying to evaluate loads using this GEP, for example, based
off of an object with a constant initializer where the array is zero.
This fixes PR17732.
Reviewers: nicholas, chandlerc, void
Reviewed By: void
CC: llvm-commits, echristo, void, aemerson
Differential Revision: http://llvm-reviews.chandlerc.com/D2093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194220 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Michele Scandale!
Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.
I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.
Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.
Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).
I finally added a new test case related to the multiplication by -1
issue on GT comparisons.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194116 91177308-0d34-0410-b5e6-96231b3b80d8
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.
My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194102 91177308-0d34-0410-b5e6-96231b3b80d8
Cortex-M0 supports these 32-bit instructions despite being Thumb1 only
(mostly). We knew about that but not that the aliases without the default "sy"
operand were also permitted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194094 91177308-0d34-0410-b5e6-96231b3b80d8
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.
Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.
rdar://15377890
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194083 91177308-0d34-0410-b5e6-96231b3b80d8
Submit the basic port of the rest of ARM constant islands code to Mips.
Two test cases are added which reflect the next level of functionality:
constants getting moved to water areas that are out of range from the
initial placement at the end of the function and basic blocks being split to
create water when none exists that can be used. There is a bunch of this
code that is not complete and has been marked with IN_PROGRESS. I will
finish cleaning this all up during the next week or two and submit the
rest of the test cases. I have elminated some code for dealing with
inline assembly because to me it unecessarily complicates things and
some of the newer features of llvm like function attributies and builtin
assembler give me better tools to solve the alignment issues created
there. Also, for Mips16 I even have the option of not doing constant
islands in the present of inline assembler if I chose. When everything
has been completed I will summarize the port and notify people that
are knowledgable regarding the ARM Constant Islands code so they can
review it in it's entirety if they wish.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194053 91177308-0d34-0410-b5e6-96231b3b80d8
If an inline assembly operand has multiple constraints (e.g. "Ir" for immediate
or register) and an operand modifier (E.g. "w" for "print register as wN") then
we need to decide behaviour when the modifier doesn't apply to the constraint.
Previousely produced some combination of an assertion failure and a fatal
error. GCC's behaviour appears to be to ignore the modifier and print the
operand in the default way. This patch should implement that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194024 91177308-0d34-0410-b5e6-96231b3b80d8
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194013 91177308-0d34-0410-b5e6-96231b3b80d8
In order to create an ObjectFile implementation that uses bitcode files, we
need to propagate the bitcode errors to the ObjectFile interface, so we need
to convert it to use the same error handling as ObjectFile: error_code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193996 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r193356, it caused PR17781.
A reduced test case covering this regression has been added to the test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193955 91177308-0d34-0410-b5e6-96231b3b80d8
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.
Patch by Tim Northover
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193943 91177308-0d34-0410-b5e6-96231b3b80d8
There is still a long way to go for llvm-nm, but at least we now match
nm's letter output in the cases we test for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193912 91177308-0d34-0410-b5e6-96231b3b80d8
- When selecting BLEND from vselect, the operands need swapping as due to the
difference between vselect and SSE/AVX's BLEND insn
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193900 91177308-0d34-0410-b5e6-96231b3b80d8
I hit some problems with future work due to the member subprogram of
'a_b's type having a subprogram (an implicit default ctor, !52 in the
pre-commit source) with no name. Clang now generates a name for such a
function but in this case doesn't even emit debug info for it as it is
unused (Clang never emits the body of the ctor, instead just emitting
memset if needed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193892 91177308-0d34-0410-b5e6-96231b3b80d8
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.
On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.
This should also help lpbench and paq8p.
I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.
radar://15339680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193891 91177308-0d34-0410-b5e6-96231b3b80d8
The point is to ensure that the attribute in question
(DW_AT_data_member_location) is associated with the prior tag, so ensure
that we don't see another tag starting between the intended tag and the
desired attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193884 91177308-0d34-0410-b5e6-96231b3b80d8
In a failed attempt to allow the gnu-public-names.ll test case to not
hardcode the size of the unit that the pubnames section referred to I've
at least managed to have unit headers and pubnames headers print out in
a similar style.
This failed to achieve the desired goal because the header in a unit
specifies the length of the unit without the length element of the
header whereas the length in the pubnames includes this element, so the
numbers are off by 4 bytes. I don't know of any arithmetic powers in
FileCheck so the test case can't simply say "CU_LENGTH + 4".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193872 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193860 91177308-0d34-0410-b5e6-96231b3b80d8
Add a Virtualization ARM subtarget feature along with adding proper build
attribute emission for Tag_Virtualization_use (encodes Virtualization and
TrustZone) and Tag_MPextension_use.
Also rework test/CodeGen/ARM/2010-10-19-mc-elf-objheader.ll testcase to
something that is more maintainable. This changes the focus of this
testcase away from testing CPU defaults (which is tested elsewhere), onto
specifically testing that attributes are encoded correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193859 91177308-0d34-0410-b5e6-96231b3b80d8
Fix Tag_ABI_HardFP_use build attribute to handle single precision FP,
replace deprecated Tag_ABI_HardFP_use value of 3 with 0 and also add
some tests for Tag_ABI_VFP_args.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193856 91177308-0d34-0410-b5e6-96231b3b80d8
This adds another heuristic to BPI, similar to the existing heuristic that
considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by
Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and
equality comparisons with -1 are also unlikely to succeed. Local
experimentation supports this hypothesis: This yields a 1-2% speedup in the
test-suite sqlite benchmark on the PPC A2 core, with no significant
regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193855 91177308-0d34-0410-b5e6-96231b3b80d8
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.
This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.
radar://15339680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193853 91177308-0d34-0410-b5e6-96231b3b80d8
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".
If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.
Fix rdar://15317907
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193808 91177308-0d34-0410-b5e6-96231b3b80d8
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193800 91177308-0d34-0410-b5e6-96231b3b80d8
That way the test won't start faililng when someone adds a new attribute
and wants to use the next logical enum (38) for bitcode. The new
bitcode file tries to use the number 48 as an attribute instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193787 91177308-0d34-0410-b5e6-96231b3b80d8
Two of the tests are new test cases (cross-module-a.ll, multi-module-a.ll)
not yet supported on MIPS, while XFAIL for the other two tests was
accidentally removed in r193570 and this change reverts those lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193781 91177308-0d34-0410-b5e6-96231b3b80d8
We add a map in DwarfDebug to map MDNodes that are shareable across CUs to the
corresponding DIEs: MDTypeNodeToDieMap. These DIEs can be shared across CUs,
that is why we keep the maps in DwarfDebug instead of CompileUnit.
We make the assumption that if a DIE is not added to an owner yet, we assume
it belongs to the current CU. Since DIEs for the type system are added to
their owners immediately after creation, and other DIEs belong to the current
CU, the assumption should be true.
A testing case is added to show that we only create a single DIE for a type
MDNode and we use ref_addr to refer to the type DIE.
We also add a testing case to show ref_addr relocations for non-darwin
platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193779 91177308-0d34-0410-b5e6-96231b3b80d8
As on other hosts, the CPU identification instruction is priveleged,
so we need to look through /proc/cpuinfo. I copied the PowerPC way of
handling "generic".
Several tests were implicitly assuming z10 and so failed on z196.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193742 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a new subtarget feature called FPARMv8 (implied by NEON), and
predicates the support of the FP instructions and registers on this feature.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193739 91177308-0d34-0410-b5e6-96231b3b80d8
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
%tmp = zext <16 x i8> %a to <16 x i32>
store <16 x i32> %tmp, <16 x i32>*%p
ret void
}
Generates:
.section __TEXT,__text,regular,pure_instructions
.section __TEXT,__const
.align 5
LCPI0_0:
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.long 255 ## 0xff
.section __TEXT,__text,regular,pure_instructions
.globl _bar
.align 4, 0x90
_bar:
vpunpckhbw %xmm0, %xmm0, %xmm1
vpunpckhwd %xmm0, %xmm1, %xmm2
vpmovzxwd %xmm1, %xmm1
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vmovaps LCPI0_0(%rip), %ymm2
vandps %ymm2, %ymm1, %ymm1
vpmovzxbw %xmm0, %xmm3
vpunpckhwd %xmm0, %xmm3, %xmm3
vpmovzxbd %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vandps %ymm2, %ymm0, %ymm0
vmovaps %ymm0, (%rdi)
vmovaps %ymm1, 32(%rdi)
vzeroupper
ret
So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
- the number of vector elements is even,
- the source type is legal,
- the type of a split source is illegal,
- the type of an extended (by doubling element size) source is legal, and
- the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."
With this change, the above example now compiles to:
_bar:
vpxor %xmm1, %xmm1, %xmm1
vpunpcklbw %xmm1, %xmm0, %xmm2
vpunpckhwd %xmm1, %xmm2, %xmm3
vpunpcklwd %xmm1, %xmm2, %xmm2
vinsertf128 $1, %xmm3, %ymm2, %ymm2
vpunpckhbw %xmm1, %xmm0, %xmm0
vpunpckhwd %xmm1, %xmm0, %xmm3
vpunpcklwd %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm3, %ymm0, %ymm0
vmovaps %ymm0, 32(%rdi)
vmovaps %ymm2, (%rdi)
vzeroupper
ret
This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.
rdar://14735100
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193727 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193718 91177308-0d34-0410-b5e6-96231b3b80d8
Also corrected the definition of the intrinsics for these instructions (the
result register is also the first operand), and added intrinsics for bsel and
bseli to clang (they already existed in the backend).
These four operations are mostly equivalent to bsel, and bseli (the difference
is which operand is tied to the result). As a result some of the tests changed
as described below.
bitwise.ll:
- bsel.v test adapted so that the mask is unknown at compile-time. This stops
it emitting bmnzi.b instead of the intended bsel.v.
- The bseli.b test now tests the right thing. Namely the case when one of the
values is an uimm8, rather than when the condition is a uimm8 (which is
covered by bmnzi.b)
compare.ll:
- bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this
is the same operation (see MSA.txt).
i8.ll
- CHECK-DAG-ized test.
- bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands
because this is the same operation (see MSA.txt).
- bseli.b still emits bseli.b though because the immediate makes it
distinguishable from bmnzi.b.
vec.ll:
- CHECK-DAG-ized test.
- bmz.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
- bsel.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193693 91177308-0d34-0410-b5e6-96231b3b80d8
This required correcting the definition of the bins[lr]i intrinsics because
the result is also the first operand.
It also required removing the (arbitrary) check for 32-bit immediates in
MipsSEDAGToDAGISel::selectVSplat().
Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d
because the constant is legalized into a ConstantPool. Similar things can
happen with binsri.d with more than 10 bits set in the mask. The resulting
code when this happens is correct but not optimal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193687 91177308-0d34-0410-b5e6-96231b3b80d8
(or (and $a, $mask), (and $b, $inverse_mask)) => (vselect $mask, $a, $b).
where $mask is a constant splat. This allows bitwise operations to make use
of bsel.
It's also a stepping stone towards matching bins[lr], and bins[lr]i from
normal IR.
Two sets of similar tests have been added in this commit. The bsel_* functions
test the case where binsri cannot be used. The binsr_*_i functions will
start to use the binsri instruction in the next commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193682 91177308-0d34-0410-b5e6-96231b3b80d8
splat.d is implemented but this subtest is currently disabled. This is because
it is difficult to match the appropriate IR on MIPS32. There is a patch under
review that should help with this so I hope to enable the subtest soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193680 91177308-0d34-0410-b5e6-96231b3b80d8
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
Reviewed by Nadav
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193676 91177308-0d34-0410-b5e6-96231b3b80d8
after the DIE creation, we construct the context first.
Ensure that we create the context before we create a type so that we can add
the newly created type to the parent. Remove last use of addToContextOwner
now that it's not needed.
We use createAndAddDIE to wrap around "new DIE(". Now all shareable DIEs
should be added to their parents right after the creation.
Reviewed off-list by Eric, Thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193657 91177308-0d34-0410-b5e6-96231b3b80d8
Add a tag before the name attribute for readability. Use CHECK-NEXT
instead of CHECK-NOT followed by a CHECK. Add new lines to separate checking
of different DIEs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193629 91177308-0d34-0410-b5e6-96231b3b80d8
after the DIE creation, we construct the context first.
This touches creation of namespaces and global variables. The purpose is to
handle all DIE creations similarly: constructs the context first, then creates
the DIE and immediately adds the DIE to its parent.
We use createAndAddDIE to wrap around "new DIE(".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193589 91177308-0d34-0410-b5e6-96231b3b80d8
Updated a test case that assumed that <2 x double> would vectorize to use
<4 x float>.
radar://15338229
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193574 91177308-0d34-0410-b5e6-96231b3b80d8
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
radar://15336950
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193573 91177308-0d34-0410-b5e6-96231b3b80d8
ELF. They can overlap with the other symbols, e.g. if a source file
"foo.c" contains a function "foo" with a static variable "c".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193569 91177308-0d34-0410-b5e6-96231b3b80d8
More patches will be submitted to convert "new DIE(" to use createAddAndDIE in
DwarfCompileUnit.cpp. This will simplify implementation of addDIEEntry where
we have to decide between ref4 and ref_addr, because DIEs that can be shared
across CU will be added to a CU already.
Reviewed off-list by Eric.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193567 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-mcmarkup, obj2yaml and yaml2obj were missing from the substitutions list,
causing the test suite to fail in a sandboxed environment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193559 91177308-0d34-0410-b5e6-96231b3b80d8