Summary:
- Conditional moves acting on 64-bit GPR's should require MIPS-IV rather than MIPS64
- ISD::MUL, and ISD::MULH[US] should be lowered on all 64-bit ISA's
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
I've added additional testcases to cover as much of the codegen changes
affecting MIPS-IV as I can. Where I've been unable to find an existing
MIPS64 testcase that can be re-used for MIPS-IV (mainly tests covering
ISD::GlobalAddress and similar), I at least agree that MIPS-IV should
behave like MIPS64. Further testcases that are fixed by this patch will follow
in my next commit. The testcases from that commit that fail for MIPS-IV without
this patch are:
LLVM :: CodeGen/Mips/2010-07-20-Switch.ll
LLVM :: CodeGen/Mips/cmov.ll
LLVM :: CodeGen/Mips/eh-dwarf-cfa.ll
LLVM :: CodeGen/Mips/largeimmprinting.ll
LLVM :: CodeGen/Mips/longbranch.ll
LLVM :: CodeGen/Mips/mips64-f128.ll
LLVM :: CodeGen/Mips/mips64directive.ll
LLVM :: CodeGen/Mips/mips64ext.ll
LLVM :: CodeGen/Mips/mips64fpldst.ll
LLVM :: CodeGen/Mips/mips64intldst.ll
LLVM :: CodeGen/Mips/mips64load-store-left-right.ll
LLVM :: CodeGen/Mips/sint-fp-store_pattern.ll
Reviewers: dsanders
Reviewed By: dsanders
CC: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3343
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206183 91177308-0d34-0410-b5e6-96231b3b80d8
Code change is because optimizeCompareInstr didn't know how to pull the
condition code out of FCSEL instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206171 91177308-0d34-0410-b5e6-96231b3b80d8
There was one definite issue in ARM64 (the off-by-1 check for whether
a shift could be folded in) and one difference that is probably
correct: ARM64 didn't fold nodes with multiple uses into the
arithmetic operations unless optimising for code size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206168 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206167 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previously loadImmediate() would produce MKMSK instructions with invalid
immediate values such as mkmsk r0, 9. Fix this by checking the mask size
is valid.
Reviewers: robertlytton
Reviewed By: robertlytton
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3289
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206163 91177308-0d34-0410-b5e6-96231b3b80d8
BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting
AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for
example, non-power-of-two vector types return non-simple EVTs (not MVT::Other).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206150 91177308-0d34-0410-b5e6-96231b3b80d8
If a filename is a multiple of 18 characters, there will be no null-terminator.
This will result in an invalid access by the constructed StringRef. Add a test
case to exercise this and fix that handling. Address this same vulnerability in
llvm-readobj as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206145 91177308-0d34-0410-b5e6-96231b3b80d8
Implements the various TTI functions to enable constant hoisting on PPC. The
only significant test-suite change is this:
MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup
(which essentially reverses the slowdown from r206120).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206141 91177308-0d34-0410-b5e6-96231b3b80d8
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:
%mul32 = trunc i64 %mul64 to i32
%zext = zext i32 %mul32 to i64
%overflow = icmp ne i64 %mul64, %zext
or
%overflow = icmp ugt i64 %mul64 , 0xffffffff
then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.
Differential Revision: http://llvm-reviews.chandlerc.com/D2814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206137 91177308-0d34-0410-b5e6-96231b3b80d8
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.
This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206136 91177308-0d34-0410-b5e6-96231b3b80d8
I found this from a particular GDB test suite case of inlining
(something similar is provided as a test case) but came across a few
other related cases (other callers of the same functions, and one other
instance of the same coding mistake in a separate function).
I'm not sure what the best way to test this is (let alone to cover the
other cases I discovered), so hopefully this sufficies - open to ideas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206130 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for file auxiliary symbol entries in COFF symbol tables. A COFF
symbol table with a FILE entry is followed by sizeof(__FILE__) / 18 auxiliary
symbol records which contain the filename. Read them and form the original
filename that the record contains. Then display the name in the output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206126 91177308-0d34-0410-b5e6-96231b3b80d8
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206101 91177308-0d34-0410-b5e6-96231b3b80d8
Originally the cost model would give up for large constants and just return the
maximum cost. This is not what we want for constant hoisting, because some of
these constants are large in bitwidth, but are still cheap to materialize.
This commit fixes the cost model to either return TCC_Free if the cost cannot be
determined, or accurately calculate the cost even for large constants
(bitwidth > 128).
This fixes <rdar://problem/16591573>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206100 91177308-0d34-0410-b5e6-96231b3b80d8
therefore, their declaration cannot have one DW_AT_linkage_name.
The specific instances however can and should have that attribute.
This patch reorders the code in DwarfUnit::getOrCreateSubprogramDIE()
to emit linkage names for C/Dtors.
rdar://problem/16362674.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206096 91177308-0d34-0410-b5e6-96231b3b80d8
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).
However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.
Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206093 91177308-0d34-0410-b5e6-96231b3b80d8
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
2. In cases where type-punning was used, and BasicAA was used
to override TBAA, BasicAA may no longer do so. (this had forced us to disable
all use of TBAA in CodeGen; something which we can now enable again)
This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.
I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206092 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds patterns to generate the cls instruction ARM64. Includes tests
for 64 bit and 32 bit operands.
rdar://15611957
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206079 91177308-0d34-0410-b5e6-96231b3b80d8
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!
This is related to PR18747.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206075 91177308-0d34-0410-b5e6-96231b3b80d8
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206072 91177308-0d34-0410-b5e6-96231b3b80d8
The TargetLowering::expandMUL() helper contains lowering code extracted
from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more
ISD::MUL patterns without having to use a library call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206037 91177308-0d34-0410-b5e6-96231b3b80d8
This was most likely caused by an uninitialized value and the relevant code was re-written in r205292. Reverting to see if it still fails on any of the buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206033 91177308-0d34-0410-b5e6-96231b3b80d8
The patch implements support for both relocation record formats: Elf_Rel
and Elf_Rela. It is possible to define relocation against symbol only.
Relocations against sections will be implemented later. Now yaml2obj
recognizes X86_64, MIPS and Hexagon relocation types.
Example of relocation section specification:
Sections:
- Name: .text
Type: SHT_PROGBITS
Content: "0000000000000000"
AddressAlign: 16
Flags: [SHF_ALLOC]
- Name: .rel.text
Type: SHT_REL
Info: .text
AddressAlign: 4
Relocations:
- Offset: 0x1
Symbol: glob1
Type: R_MIPS_32
- Offset: 0x2
Symbol: glob2
Type: R_MIPS_CALL16
The patch reviewed by Michael Spencer, Sean Silva, Shankar Easwaran.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206017 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.
This cleanup is in preparation for an upcoming test change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205996 91177308-0d34-0410-b5e6-96231b3b80d8
To support compressing the debug_line section that contains multiple
fragments (due, I believe, to variation in choices of line table
encoding depending on the size of instruction ranges in the actual
program code) we needed to support compressing multiple MCFragments in a
single pass.
This patch implements that behavior by mutating the post-relaxed and
relocated section to be the compressed form of its former self,
including renaming the section.
This is a more flexible (and less invasive, to a degree) implementation
that will allow for other features such as "use compression only if it's
smaller than the uncompressed data".
Compressing debug_frame would be a possible further extension to this
work, but I've left it for now. The hurdle there is alignment sections -
which might require going as far as to refactor
MCAssembler.cpp:writeFragment to handle writing to a byte buffer or an
MCObjectWriter (there's already a virtual call there, so it shouldn't
add substantial compile-time cost) which could in turn involve
refactoring MCAsmBackend::writeNopData to use that same abstraction...
which involves touching all the backends. This would remove the limited
handling of fragment writing seen in
ELFObjectWriter.cpp:getUncompressedData which would be nice - but it's
more invasive.
I did discover that I (perhaps obviously) don't need to handle
relocations when I rewrite the fragments - since the relocations have
already been applied and computed (and stored into
ELFObjectWriter::Relocations) by this stage (necessarily, because we
need to have written any immediate values or assembly-time relocations
into the data already before we compress it, which we have). The test
case doesn't necessarily cover that in detail - I can add more test
coverage if that's preferred.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205990 91177308-0d34-0410-b5e6-96231b3b80d8
To support compression for debug_line and debug_frame a different
approach is required. To simplify review, revert the old implementation
and XFAIL the test case. New implementation to follow shortly.
Reverts r205059 and r204958.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205989 91177308-0d34-0410-b5e6-96231b3b80d8
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
FYI, re-committing this with a tweak so MemoryOp's default
constructor is trivial and will work with MSVC 2012. Thanks
to Reid Kleckner and Jim Grosbach for help with the tweak.
rdar://11312406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205986 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It is now the smallest superset for these ISA's.
FeatureMips4 now contains FeatureFPIdx since [ls][dw]xc1 were added in MIPS-IV.
Made the FPIdx feature bit lowercase so that it can be used in the -mattr option.
Depends on D3274
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3275
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205964 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce ScalarTraits::mustQuote which determines whether or not a
StringRef needs quoting before it is acceptable to output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205955 91177308-0d34-0410-b5e6-96231b3b80d8
The immediate cost calculation code was hitting an assertion in the included
test case, because APInt was still internally 128-bits. Truncating it to 64-bits
fixed the issue.
Fixes <rdar://problem/16572521>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205947 91177308-0d34-0410-b5e6-96231b3b80d8
It doesn't build with MSVC 2012, because MSVC doesn't allow union
members that have non-trivial default constructors. This change added
'SMLoc AlignmentLoc' to MemoryOp, which made MemoryOp's default ctor
non-trivial.
This reverts commit r205930.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205944 91177308-0d34-0410-b5e6-96231b3b80d8
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.
The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.
Patch by Louis Gerbarg <lgg@apple.com>
rdar://16355124
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205938 91177308-0d34-0410-b5e6-96231b3b80d8
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.
rdar://16530923
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205937 91177308-0d34-0410-b5e6-96231b3b80d8
alignments on vld/vst instructions. And report errors for
alignments that are not supported.
While this is a large diff and an big test case, the changes
are very straight forward. But pretty much had to touch
all vld/vst instructions changing the addrmode to one of the
new ones that where added will do the proper checking for
the specific instruction.
rdar://11312406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205930 91177308-0d34-0410-b5e6-96231b3b80d8
In AArch64 i64 to i32 truncate operation is a subregister access.
This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205925 91177308-0d34-0410-b5e6-96231b3b80d8
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.
<rdar://problem/16389332>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205923 91177308-0d34-0410-b5e6-96231b3b80d8
Don't quote octal compatible strings if they are only two wide, they
aren't ambiguous.
This reverts commit r205857 which reverted r205857.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205914 91177308-0d34-0410-b5e6-96231b3b80d8
obj2yaml would fail when seeing a Weak External auxiliary record with a
characteristics field holding zero instead of one of
IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY, IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY,
or IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205911 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205907 91177308-0d34-0410-b5e6-96231b3b80d8
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205855 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies).
nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008.
Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3274
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205844 91177308-0d34-0410-b5e6-96231b3b80d8
YAMLIO would turn a BinaryRef into the string 0000000004000000.
However, the leading zero causes parsers to interpret it as being an
octal number instead of a hexadecimal one.
Instead, escape such strings as needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205839 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Local common symbols were properly inserted into the .bss section.
However, putting external common symbols in the .bss section would give
them a strong definition.
Instead, encode them as undefined, external symbols who's symbol value
is equivalent to their size.
Reviewers: Bigcheese, rafael, rnk
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3324
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205811 91177308-0d34-0410-b5e6-96231b3b80d8
This implements the target-hooks for ARM64 to enable constant hoisting.
This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205791 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds support in 'opt' to filter pass remarks emitted by
optimization passes. A new flag -pass-remarks specifies which
passes should emit a diagnostic when LLVMContext::emitOptimizationRemark
is invoked.
This will allow the front end to simply pass along the regular
expression from its own -Rpass flag when launching the backend.
Depends on D3227.
Reviewers: qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3291
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205775 91177308-0d34-0410-b5e6-96231b3b80d8
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).
This should fix PR19345, assuming there's only one issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205758 91177308-0d34-0410-b5e6-96231b3b80d8
The IO normalizer would essentially lump I386 and AMD64 relocations
together. Relocation types with the same numeric value would then get
mapped in appropriately.
For example:
IMAGE_REL_AMD64_ADDR64 and IMAGE_REL_I386_DIR16 both have a numeric
value of one. We would see IMAGE_REL_I386_DIR16 in obj2yaml conversions
of object files with a machine type of IMAGE_FILE_MACHINE_AMD64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205746 91177308-0d34-0410-b5e6-96231b3b80d8
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.
This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched. This occasionally
resulted in some instructions being incorrectly deleted from the
program.
v2:
- Fix bug with 64-bit mul
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205731 91177308-0d34-0410-b5e6-96231b3b80d8
into a constant size alloca by inlining.
Ran a run over the testsuite, no results out of the noise, fixes
the testcase in the PR.
PR19115.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205710 91177308-0d34-0410-b5e6-96231b3b80d8
cygwin has llvm-dwarfdump problems and isn't paying attention to the
specific xfail there.
s390x isn't matching for an unknown reason.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205708 91177308-0d34-0410-b5e6-96231b3b80d8
It affected callee's stack pop in x86. It is one of devergences between cygwin and mingw since mingw-gcc-4.6.
Added testcases to llvm/test/CodeGen/X86/win32_sret.ll for cygwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205688 91177308-0d34-0410-b5e6-96231b3b80d8
I really should read the spec more often (and test GCC more often too).
I just assumed that namespace aliases would be the same as using
directives, except with a name. But apparently that's not how the DWARF
standards suggests they be implemented. DWARF4 provides an example and
other non-normative text suggesting that namespace aliases be
implemented by named imported declarations intsead of named imported
modules.
So be it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205685 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a warning when linker_private or linker_private_weak is provided and
we handle it in a compatible manner.
Suggested by Chris Lattner!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205681 91177308-0d34-0410-b5e6-96231b3b80d8
This consolidates the duplicated MachO checks in the directive parsing for
various directives that are unsupported for Mach-O. The error message change is
unimportant as this restores the behaviour to that prior to the addition of the
new directive handling. Furthermore, use a more direct check for MachO
targeting rather than an indirect feature check of the assembler.
Also simplify the test execution command to avoid temporary files. Further more,
perform the check in both object and assembly emission.
Whether all non-applicable directives are handled is another question. .fnstart
is marked as being unsupported, however, the complementary .fnend is not. The
additional unwinding directives are also still honoured. This change does not
change that, though, it would be good to validate and mark them as being
unsupported if they are unsupported for the MachO emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205678 91177308-0d34-0410-b5e6-96231b3b80d8
This restores the linker_private and linker_private_weak lexemes to permit
translation of the deprecated lexmes. The behaviour is identical to the bitcode
handling: linker_private and linker_private_weak are handled as if private had
been specified. This enables compatibility with IR generated by LLVM 3.4.
Reported on IRC by ki9a!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205675 91177308-0d34-0410-b5e6-96231b3b80d8
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.
Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown
Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup
Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205658 91177308-0d34-0410-b5e6-96231b3b80d8
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205630 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the
processor which are used to select between the 1985 and 2008 versions of IEEE
754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid
operation exceptions when given NaN), in 2008 mode they are non-arithmetic
(i.e. they are copies).
nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because
the ISA spec does not explicitly state that they obey Has2008 and ABS2008.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3274
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205628 91177308-0d34-0410-b5e6-96231b3b80d8
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.
AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205626 91177308-0d34-0410-b5e6-96231b3b80d8
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.
Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.
Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).
Should fix PR19335.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205625 91177308-0d34-0410-b5e6-96231b3b80d8
Removed "GNU Assembler extension (compatibility)" definitions from ARMInstrInfo.td
Fixed ARMAsmParser::ParseInstruction GNU compatability branch, so it also works for thumb mode from now.
Added new tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205622 91177308-0d34-0410-b5e6-96231b3b80d8
Sorry for the breakage.
For now, it will fail in two ways:
1. To fail for targeting x86_64-mingw32.
<stdin>:131:8: note: possible intended match here
0x30830a0100000002 3 0 1 0 0 is_stmt
2. To fail not to find the target x86.
llc: : error: unable to get target for 'x86_64-unknown-unknown',
see --version and --triple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205621 91177308-0d34-0410-b5e6-96231b3b80d8
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.
It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.
It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.
This should also fix PR19331.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205616 91177308-0d34-0410-b5e6-96231b3b80d8
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.
Should fix PR19332.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205615 91177308-0d34-0410-b5e6-96231b3b80d8
recoloring cut-offs are encountered and register allocation failed.
This is related to PR18747
Patch by MAYUR PANDEY <mayur.p@samsung.com>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205601 91177308-0d34-0410-b5e6-96231b3b80d8
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.
Patch by Jingyue Wu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205571 91177308-0d34-0410-b5e6-96231b3b80d8
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
%X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32
These instructions are live, and their definitions should not be rewritten.
Fixes <rdar://problem/16492408>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205565 91177308-0d34-0410-b5e6-96231b3b80d8
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4
Fixes Arithm/Absdiff.Mat/3 OpenCV test
Patch by: Bruno Jiménez
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205562 91177308-0d34-0410-b5e6-96231b3b80d8