I should have read that comment a little more carefully. ;)
Regression test in the works, committing in the mean time to un-break people.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205511 91177308-0d34-0410-b5e6-96231b3b80d8
This has the following advantages:
* Less code.
* The old ELF implementation was wrong for non-relocatable objects.
* The old ELF implementation (and I think MachO) was wrong for thumb.
No current testcase since this is only used from MCJIT and it only uses
relocatable objects and I don't think it supports thumb yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205508 91177308-0d34-0410-b5e6-96231b3b80d8
This code is no longer usefull, because we only compute and use the
IDom once. There is no benefit in caching it anymore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205498 91177308-0d34-0410-b5e6-96231b3b80d8
When a vector type legalizes to a larger vector type, and the target does not
support the associated extending load (or truncating store), then legalization
will scalarize the load (or store) resulting in an associated scalarization
cost. BasicTTI::getMemoryOpCost needs to account for this.
Between this, and r205487, PowerPC on the P7 with VSX enabled shows:
MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup
SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup
SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup
(some of these are new; some of these, such as PAQ8p, just reverse regressions
that VSX support would trigger)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205495 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r205479.
It turns out that nm does use addresses, it is just that every reasonable
relocatable ELF object has sections with address 0. I have no idea if those
exist in reality, but it at least it shows that llvm-nm should use the name
address.
The added test was includes an unusual .o file with non 0 section addresses. I
created it by hacking ELFObjectWriter.cpp.
Really sorry for the churn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205493 91177308-0d34-0410-b5e6-96231b3b80d8
TargetInstrInfo::findCommutedOpIndices to enable VFMA*231 commutation, rather
than abusing commuteInstruction.
Thanks very much for the suggestion guys!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205489 91177308-0d34-0410-b5e6-96231b3b80d8
For an cast (extension, etc.), the currently logic predicts a low cost if the
associated operation (keyed on the destination type) is legal (or promoted).
This is not true when the number of values required to legalize the type is
changing. For example, <8 x i16> being sign extended by <8 x i32> is not
generically cheap on PPC with VSX, even though sign extension to v4i32 is
legal, because two output v4i32 values are required compared to the single
v8i16 input value, and without custom logic in the target, this conversion will
scalarize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205487 91177308-0d34-0410-b5e6-96231b3b80d8
opportunities in the current basic block, rather than just the last one seen.
<rdar://problem/16478629>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205481 91177308-0d34-0410-b5e6-96231b3b80d8
What llvm-nm prints depends on the file format. On ELF for example, if the
file is relocatable, it prints offsets. If it is not, it prints addresses.
Since it doesn't really need to care what it is that it is printing, use the
generic term value.
Fix or implement getSymbolValue to keep llvm-nm working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205479 91177308-0d34-0410-b5e6-96231b3b80d8
PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to
calculate the base cost of the memory access, and then adjust on top of that.
There is no functionality change from this modification, but it will become
important so that PPCTTI can take advantage of scalarization information for which
BasicTTI::getMemoryOpCost will account in the near future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205476 91177308-0d34-0410-b5e6-96231b3b80d8
on FMA3 memory operands. FMA3 instructions are VEX encoded, so they can load
from unaligned memory.
Testcase to follow, along with related patch.
<rdar://problem/16478629>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205472 91177308-0d34-0410-b5e6-96231b3b80d8
Update the subtarget information for Windows on ARM. This enables using the MC
layer to target Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205459 91177308-0d34-0410-b5e6-96231b3b80d8
Just pass a MachineInstr reference rather than an MBB iterator.
Creating a MachineInstr& is the first thing every implementation did
anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205453 91177308-0d34-0410-b5e6-96231b3b80d8
Unlike other v6+ processors, cortex-m0 never supports unaligned accesses.
From the v6m ARM ARM:
"A3.2 Alignment support: ARMv6-M always generates a fault when an unaligned
access occurs."
rdar://16491560
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205452 91177308-0d34-0410-b5e6-96231b3b80d8
Adds the instructions ext/ext32/cins/cins32.
It also changes pop/dpop to accept the two operand version and
adds a simple pattern to generate baddu.
Tests for the two operand versions (including baddu/dmul/dpop/pop)
and the code generation pattern for baddu are included.
Reviewed by: Daniel.Sanders@imgtec.com
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205449 91177308-0d34-0410-b5e6-96231b3b80d8
Weak symbols cannot use the small code model's usual ADRP sequences since the
instruction simply may not be able to encode a value of 0.
This redirects them to use the GOT, which hopefully linkers are able to cope
with even in the static relocation model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205426 91177308-0d34-0410-b5e6-96231b3b80d8
We were creating libcall nodes that returned an MVT::f128, when these
particular operations actually return an int of some stripe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205425 91177308-0d34-0410-b5e6-96231b3b80d8
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205424 91177308-0d34-0410-b5e6-96231b3b80d8
Again, coalescing and other optimisations swiftly made the MachineInstrs
consistent again, but when compiled at -O0 a bad INSERT_SUBREGISTER was
produced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205423 91177308-0d34-0410-b5e6-96231b3b80d8
The previous attempt was fine with optimisations, but was actually rather
cavalier with its types. When compiled at -O0, it produced invalid COPY
MachineInstrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205422 91177308-0d34-0410-b5e6-96231b3b80d8
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.
Patch by Reinoud Elhorst.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205409 91177308-0d34-0410-b5e6-96231b3b80d8
and isTargetCygwin() to isTargetWindowsCygwin() to be consistent with the
four Windows environments in Triple.h.
Suggestion by Saleem Abdulrasool!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205393 91177308-0d34-0410-b5e6-96231b3b80d8
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).
For example, the TSVC benchmark function s173 has:
...
%3 = add nsw i64 %indvars.iv, 16000
%arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.
Fixes PR19296.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205387 91177308-0d34-0410-b5e6-96231b3b80d8
I'm not sure the comment in the implementation really adds a lot of
value (it's clear that we emit zero when no symbol is provided, but it
doesn't explain why we would do that). Happy to iterate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205386 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the magic-number-esque code creating/retrieving the same
label for a debug_loc entry from two places and removes the last small
piece of reusable logic from emitDebugLoc so that there will be less
duplication when refactoring it into two functions (one for debug_loc,
the other for debug_loc.dwo).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205382 91177308-0d34-0410-b5e6-96231b3b80d8