vector list parameter that is using all lanes "{d0[], d2[]}" but can
match and instruction with a ”{d0, d2}" parameter.
I’m finishing up a fix for proper checking of the unsupported
alignments on vld/vst instructions and ran into this. Thus I don’t
have a test case at this time. And adding all code that will
demonstrate the bug would obscure the very simple one line fix.
So if you would indulge me on not having a test case at this
time I’ll instead offer up a detailed explanation of what is
going on in this commit message.
This instruction:
vld2.8 {d0[], d2[]}, [r4:64]
is not legal as the alignment can only be 16 when the size is 8.
Per this documentation:
A8.8.325 VLD2 (single 2-element structure to all lanes)
<align> The alignment. It can be one of:
16 2-byte alignment, available only if <size> is 8, encoded as a = 1.
32 4-byte alignment, available only if <size> is 16, encoded as a = 1.
64 8-byte alignment, available only if <size> is 32, encoded as a = 1.
omitted Standard alignment, see Unaligned data access on page A3-108.
So when code is added to the llvm integrated assembler to not match
that instruction because of the alignment it then goes on to try to match
other instructions and comes across this:
vld2.8 {d0, d2}, [r4:64]
and and matches it. This is because of the method
ARMOperand::isVecListDPairSpaced() is missing the check of the Kind.
In this case the Kind is k_VectorListAllLanes . While the name of the method
may suggest that this is OK it really should check that the Kind is
k_VectorList.
As the method ARMOperand::isDoubleSpacedVectorAllLanes() is what was
used to match {d0[], d2[]} and correctly checks the Kind:
bool isDoubleSpacedVectorAllLanes() const {
return Kind == k_VectorListAllLanes && VectorList.isDoubleSpaced;
}
where the original ARMOperand::isVecListDPairSpaced() does not check
the Kind:
bool isVecListDPairSpaced() const {
if (isSingleSpacedVectorList()) return false;
return (ARMMCRegisterClasses[ARM::DPairSpcRegClassID]
.contains(VectorList.RegNum));
}
Jim Grosbach has reviewed the change and said: Yep, that sounds right. …
And by "right" I mean, "wow, that's a nasty latent bug I'm really, really
glad to see fixed." :)
rdar://16436683
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204861 91177308-0d34-0410-b5e6-96231b3b80d8
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204848 91177308-0d34-0410-b5e6-96231b3b80d8
These patterns are dead (because v4f32 stores are currently promoted to v4i32
and stored using Altivec instructions), and also are likely not correct
(because they'd store the vector elements in the opposite order from that
assumed by the rest of the Altivec code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204839 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204838 91177308-0d34-0410-b5e6-96231b3b80d8
> For functions where esi is used as base pointer, we would previously fall ba
> from lowering memcpy with "rep movs" because that clobbers esi.
>
> With this patch, we just store esi in another physical register, and restore
> it afterwards. This adds a little bit of register preassure, but the more
> efficient memcpy should be worth it.
>
> Differential Revision: http://llvm-reviews.chandlerc.com/D2968
This didn't work. I was ending up with code like this:
lea edi,[esi+38h]
mov ecx,0Fh
mov edx,esi
mov esi,ebx
rep movs dword ptr es:[edi],dword ptr [esi]
lea ecx,[esi+74h] <-- Ooops, we're now using esi before restoring it from edx.
add ebx,3Ch
mov esi,edx
I guess if we want to do this we need stronger glue or something, or doing the expansion
much later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204829 91177308-0d34-0410-b5e6-96231b3b80d8
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.
This fixes the lowering for v2f64 VSELECT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204828 91177308-0d34-0410-b5e6-96231b3b80d8
This enables TableGen to generate an additional two operand matcher
for our ArithLogicR class of instructions (constituted by 3 register operands).
E.g.: and $1, $2 <=> and $1, $1, $2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204826 91177308-0d34-0410-b5e6-96231b3b80d8
The '.dword' directive accepts a list of expressions and emits
them in 8-byte chunks in successive locations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204822 91177308-0d34-0410-b5e6-96231b3b80d8
parseDirectiveWord is a generic function that parses an expression which
means there's no need for it to have such an specific name. Renaming it to
parseDataDirective so that it can also be used to handle .dword directives[1].
[1]To be added in a follow up commit.
No functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204818 91177308-0d34-0410-b5e6-96231b3b80d8
The '.set mips64' directive enables the feature Mips:FeatureMips64
from assembly. Note that it doesn't modify the ELF header as opposed
to the use of -mips64 from the command-line. The reason for this
is that we want to be as compatible as possible with existing assemblers
like GAS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204817 91177308-0d34-0410-b5e6-96231b3b80d8
The '.set mips64r2' directive enables the feature Mips:FeatureMips64r2
from assembly. Note that it doesn't modify the ELF header as opposed
to the use of -mips64r2 from the command-line. The reason for this
is that we want to be as compatible as possible with existing assemblers
like GAS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204815 91177308-0d34-0410-b5e6-96231b3b80d8
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204813 91177308-0d34-0410-b5e6-96231b3b80d8
Given that we support multiple directives that enable a particular feature
(e.g. '.set mips16'), it's best to hoist that code into a new function
so that we don't repeat the same pattern w.r.t parsing and handling error cases.
No functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204811 91177308-0d34-0410-b5e6-96231b3b80d8
After some discussion on IRC, emitting a call to the library function seems
like a better default, since it will move from a compiler internal error to
a linker error, that the user can work around until LLVM is fixed.
I'm also adding a note on the responsibility of the user to confirm that
the cache was cleared on platforms where nothing is done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204806 91177308-0d34-0410-b5e6-96231b3b80d8
The directive '.option pic2' enables PIC from assembly source.
At the moment none of the macros/directives check the PIC bit
but that's going to be fixed relatively soon. For example, the
expansion of macros like 'la' depend on the relocation model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204803 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204802 91177308-0d34-0410-b5e6-96231b3b80d8
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204801 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204784 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204782 91177308-0d34-0410-b5e6-96231b3b80d8
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204781 91177308-0d34-0410-b5e6-96231b3b80d8
Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).
In particular, prior to this patch we were generating:
vpbroadcastd LCPI1_0(%rip), %ymm2
vpand %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0 ## <- domain change penalty
Now, we generate the following nice sequence where everything is in the float
domain:
vbroadcastss LCPI1_0(%rip), %ymm2
vandps %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0
<rdar://problem/16354675>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204770 91177308-0d34-0410-b5e6-96231b3b80d8
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:
1. This will eliminate an otherwise-necessary copy of the addend
2. One of the product operands is killed by the instruction
The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.
As a simple example, if we have:
%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
...
%vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
%RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
...
We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
is replaced by:
%vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
%RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204768 91177308-0d34-0410-b5e6-96231b3b80d8
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.
I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204748 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204746 91177308-0d34-0410-b5e6-96231b3b80d8
Remove handling of select_cc, since it makes no sense to be there. This
now does nothing, but I'll be adding some handling of other target nodes
soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204743 91177308-0d34-0410-b5e6-96231b3b80d8
If getElementPtr uses a constant as base pointer, then make the constant opaque.
This prevents constant folding it with the offset. The offset can usually be
encoded in the load/store instruction itself and the base address doesn't have
to be rematerialized several times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204739 91177308-0d34-0410-b5e6-96231b3b80d8
The cost for the first four stackmap operands was always TCC_Free.
This is only true for the first two operands. All other operands
are TCC_Free if they are within 64bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204738 91177308-0d34-0410-b5e6-96231b3b80d8
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.
Fixes <rdar://problem/16167303>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204735 91177308-0d34-0410-b5e6-96231b3b80d8
I found three implementations of this. This splits it out into a new function
and uses it from the three places.
My plan is to add a fourth use when lowering a vector_shuffle:v16i16.
Compared the assembly output of test/CodeGen/X86 before and after.
The only change is due to how the first PSHUFB was generated in
LowerVECTOR_SHUFFLEv8i16. If the shuffle mask specified undef (i.e. -1), the
old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the
control mask. Now we write 0x80. These are of course interchangeable since
bit 7 decides if a constant zero is written in the result byte. The other
instances of this code use 0x80 consistently.
Related to <rdar://problem/16167303>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204734 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Remove the XFAIL added in my previous commit and correct the test such that
it correctly tests the expansion of the assembler temporary.
Also added a test to check that $at is always $1 when written by the
user.
Corrected the new assembler temporary warnings so that they are emitted for
numeric registers too.
Differential Revision: http://llvm-reviews.chandlerc.com/D3169
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204711 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The assembler temporary is normally $at ($1) but can be reassigned using
'.set at=$reg'. Regardless of which register is nominated as the assembler
temporary, $at remains $1 when written by the user.
Adds warnings under the following conditions:
* The register nominated as the assembler temporary is used by the user.
* '.set noat' is in effect and $at is used by the user.
Both of these only work for named registers. I have a follow up commit that makes it work for numeric registers as well.
XFAIL set-at-directive.s since it incorrectly tests that $at is redefined by
'.set at=$reg'. Testcases will follow in a separate commit.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
Differential Revision: http://llvm-reviews.chandlerc.com/D3167
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204710 91177308-0d34-0410-b5e6-96231b3b80d8
Try to match scalar and first like the other instructions.
Expand 64-bit ands to a pair of 32-bit ands since that is not
available on the VALU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204660 91177308-0d34-0410-b5e6-96231b3b80d8
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files. This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.
The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings. (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)
Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204634 91177308-0d34-0410-b5e6-96231b3b80d8
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.
<rdar://problem/16074331>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204631 91177308-0d34-0410-b5e6-96231b3b80d8
This can be observed with the old testcase of CodeGen/X86/pr12312.ll:
47c47
< vorps %ymm0, %ymm1, %ymm0
---
> vorps %ymm1, %ymm0, %ymm0
97c97
< vorps %ymm1, %ymm0, %ymm0
---
> vorps %ymm0, %ymm1, %ymm0
The vector VecIns is populated with all the values from VecInMap. This is done
while iterating VecInMap. VecInMap uses a hash of pointer values so the
resulting order can vary depending on the memory layout.
The fix is to populate the vector VecIns earlier as VecInMap is populated.
This is done in DAG traversal order.
Fixes <rdar://problem/16398806>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204623 91177308-0d34-0410-b5e6-96231b3b80d8
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section. This is keyed off a _CALL_ELF=2
macro check.
The CALL_ELF check is not clearly documented at this time. The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html
> Adding comment from Uli:
Looks good to me. I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204614 91177308-0d34-0410-b5e6-96231b3b80d8
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.
Before this change, we had the following commutative instructions:
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
XSADDDP
XSMULDP
XVADDDP
XVADDSP
XVMULDP
XVMULSP
After:
ADD4
ADD4o
ADD8
ADD8o
ADDC
ADDC8
ADDC8o
ADDCo
ADDE
ADDE8
ADDE8o
ADDEo
AND
AND8
AND8o
ANDo
CRAND
CREQV
CRNAND
CRNOR
CROR
CRXOR
EQV
EQV8
EQV8o
EQVo
FADD
FADDS
FADDSo
FADDo
FMADD
FMADDS
FMADDSo
FMADDo
FMSUB
FMSUBS
FMSUBSo
FMSUBo
FMUL
FMULS
FMULSo
FMULo
FNMADD
FNMADDS
FNMADDSo
FNMADDo
FNMSUB
FNMSUBS
FNMSUBSo
FNMSUBo
MULHD
MULHDU
MULHDUo
MULHDo
MULHW
MULHWU
MULHWUo
MULHWo
MULLD
MULLDo
MULLW
MULLWo
NAND
NAND8
NAND8o
NANDo
NOR
NOR8
NOR8o
NORo
OR
OR8
OR8o
ORo
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
VADDCUW
VADDFP
VADDSBS
VADDSHS
VADDSWS
VADDUBM
VADDUBS
VADDUHM
VADDUHS
VADDUWM
VADDUWS
VAND
VAVGSB
VAVGSH
VAVGSW
VAVGUB
VAVGUH
VAVGUW
VMADDFP
VMAXFP
VMAXSB
VMAXSH
VMAXSW
VMAXUB
VMAXUH
VMAXUW
VMHADDSHS
VMHRADDSHS
VMINFP
VMINSB
VMINSH
VMINSW
VMINUB
VMINUH
VMINUW
VMLADDUHM
VMULESB
VMULESH
VMULEUB
VMULEUH
VMULOSB
VMULOSH
VMULOUB
VMULOUH
VNMSUBFP
VOR
VXOR
XOR
XOR8
XOR8o
XORo
XSADDDP
XSMADDADP
XSMAXDP
XSMINDP
XSMSUBADP
XSMULDP
XSNMADDADP
XSNMSUBADP
XVADDDP
XVADDSP
XVMADDADP
XVMADDASP
XVMAXDP
XVMAXSP
XVMINDP
XVMINSP
XVMSUBADP
XVMSUBASP
XVMULDP
XVMULSP
XVNMADDADP
XVNMADDASP
XVNMSUBADP
XVNMSUBASP
XXLAND
XXLNOR
XXLOR
XXLXOR
This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204609 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
- If only two registers are passed to a three-register operation, then the
first argument is both source and destination register.
- If a non-register is passed as the last argument, generate the immediate
version of the instruction.
Also mark DADD commutative and add scheduling information (to the generic
scheduler), and implement DSUB.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
CC: theraven
Differential Revision: http://llvm-reviews.chandlerc.com/D3148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204605 91177308-0d34-0410-b5e6-96231b3b80d8
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.
My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread. So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.
Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:
unsigned chooseVSXCopy(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,
unsigned DestReg, unsigned SrcReg,
unsigned StartDist = 1,
unsigned Depth = 3) const;
with an implementation like:
if (!Depth)
return PPC::XXLOR;
const unsigned MaxDist = 30;
unsigned Dist = StartDist;
for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
if (J->isTransient() && !J->isCopy())
continue;
if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
return PPC::XXLOR;
++Dist;
}
// We've exceeded the required distance for the high-latency form, use it.
if (Dist > MaxDist)
return PPC::XVCPSGNDP;
// If this is only an exit block, use the low-latency form.
if (MBB.succ_empty())
return PPC::XXLOR;
// We've reached the end of the block, check the successor blocks (up to some
// depth), and use the high-latency form if that is okay with all successors.
for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
Dist, --Depth) == PPC::XXLOR)
return PPC::XXLOR;
}
// All of our successor blocks seem okay with the high-latency variant, so
// we'll use it.
return PPC::XVCPSGNDP;
and then changed the copy opcode selection from:
Opc = PPC::XXLOR;
to:
Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);
In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204591 91177308-0d34-0410-b5e6-96231b3b80d8
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204559 91177308-0d34-0410-b5e6-96231b3b80d8
When a label is parsed, check if there is type information available for the
label. If so, check if the symbol is a function. If the symbol is a function
and we are in thumb mode and no explicit thumb_func has been emitted, adjust the
symbol data to indicate that the function definition is a thumb function.
The application of this inferencing is improved value handling in the object
file (the required thumb bit is set on symbols which are thumb functions). It
also helps improve compatibility with binutils.
The one complication that arises from this handling is the MCAsmStreamer. The
default implementation of getOrCreateSymbolData in MCStreamer does not support
tracking the symbol data. In order to support the semantics of thumb functions,
track symbol data in assembly streamer. Although O(n) in number of labels in
the TU, this is already done in various other streamers and as such the memory
overhead is not a practical concern in this scenario.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204544 91177308-0d34-0410-b5e6-96231b3b80d8
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204543 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, only regular AArch64 instructions were annotated with SchedRW lists.
This patch does the same for NEON enabling these instructions to be scheduled by
the MIScheduler. Additionally, store operations are now modeled and a few
SchedRW lists were updated for bug fixes (e.g. multiple def operands).
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204505 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
VECTOR_SHUFFLE concatenates the vectors in an vectorwise fashion.
<0b00, 0b01> + <0b10, 0b11> -> <0b00, 0b01, 0b10, 0b11>
VSHF concatenates the vectors in a bitwise fashion:
<0b00, 0b01> + <0b10, 0b11> ->
0b0100 + 0b1110 -> 0b01001110
<0b10, 0b11, 0b00, 0b01>
We must therefore swap the operands to get the correct result.
The test case that discovered the issue was MultiSource/Benchmarks/nbench.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3142
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204480 91177308-0d34-0410-b5e6-96231b3b80d8
The SReg_(32|64) register classes contain special registers in addition
to the numbered SGPRs. This can lead to machine verifier errors when
these register classes are used as sub-registers for SReg_128, since
SReg_128 only uses the numbered SGPRs.
Replacing SReg_(32|64) with SGPR_(32|64) fixes this problem, since
the SGPR_(32|64) register classes contain only numbered SGPRs.
Tests cases for this are comming in a later commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204474 91177308-0d34-0410-b5e6-96231b3b80d8
...instead of a separate Requires for each one. This style was already
used in some places and seems more compact.
No behavioral change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204452 91177308-0d34-0410-b5e6-96231b3b80d8
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204435 91177308-0d34-0410-b5e6-96231b3b80d8
The commit r203762 introduced silent failure for complext SO expression, and it's even worse than compiler crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204427 91177308-0d34-0410-b5e6-96231b3b80d8
.data_region is only used in Darwin, so it shouldn't be generated
for other OS. Currently AArch64 doesn't support darwin yet, so
I removed it from AArch64. When Darwin is supported someday, we can
add it back and associate it with Darwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204424 91177308-0d34-0410-b5e6-96231b3b80d8
Sicne MBB->computeRegisterLivenes() returns Dead for sub regs like s0,
d0 is used in vpop instead of updating sp, which causes s0 dead before
its use.
This patch checks the liveness of each subreg to make sure the reg is
actually dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204411 91177308-0d34-0410-b5e6-96231b3b80d8
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204389 91177308-0d34-0410-b5e6-96231b3b80d8
The Octeon cpu from Cavium Networks is mips64r2 based and has an extended
instruction set. In order to utilize this with LLVM, a new cpu feature "octeon"
and a subtarget feature "cnmips" is added. A small set of new instructions
(baddu, dmul, pop, dpop, seq, sne) is also added. LLVM generates dmul, pop and
dpop instructions with option -mcpu=octeon or -mattr=+cnmips.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204337 91177308-0d34-0410-b5e6-96231b3b80d8
Given
bar = foo + 4
.long bar
MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.
Fixing this unfortunately required some other changes:
* Since the variable is now evaluated, it would prevent the ELF writer from
noticing the weakref marker the elf streamer uses. This patch then replaces
that with a VariantKind in MCSymbolRefExpr.
* Using VariantKind then requires us to look past other VariantKind to see
.weakref bar,foo
call bar@PLT
doing this also fixes
zed = foo +2
call zed@PLT
so that is a good thing.
* Looking past VariantKind means that the relocation selection has to use
the fixup instead of the target.
This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204294 91177308-0d34-0410-b5e6-96231b3b80d8
It isn't actually used now, and probably never will be, plus it makes
tests less annoying. I also think SC prints GDS instructions as a
separate instruction name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204270 91177308-0d34-0410-b5e6-96231b3b80d8
For functions where esi is used as base pointer, we would previously fall back
from lowering memcpy with "rep movs" because that clobbers esi.
With this patch, we just store esi in another physical register, and restore
it afterwards. This adds a little bit of register preassure, but the more
efficient memcpy should be worth it.
Differential Revision: http://llvm-reviews.chandlerc.com/D2968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204174 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
X86BaseInfo.h defines an enum for the offset of each operand in a memory operand
sequence. Some code uses it and some does not. This patch replaces (hopefully)
all remaining locations where an integer literal was used instead of this enum.
No functionality change intended.
Reviewers: nadav
CC: llvm-commits, t.p.northover
Differential Revision: http://llvm-reviews.chandlerc.com/D3108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204158 91177308-0d34-0410-b5e6-96231b3b80d8
When converting a signed 32-bit integer to double-precision floating point on
hardware without a lfiwax instruction, we have to instead use a lfd followed
by fcfid. We were erroneously offsetting the address by 4 bytes in
preparation for either a lfiwax or lfiwzx when generating the lfd. This fixes
that silly error.
This was not caught in the test suite since the conversion tests were run with
-mcpu=pwr7, which implies availability of lfiwax. I've added another test
case for older hardware that checks the code we expect in the absence of
lfiwax and other flavors of fcfid. There are fewer tests in this test case
because we punt to DAG selection in more cases on older hardware. (We must
generate complex fiddly sequences in those cases, and there is marginal
benefit in duplicating that logic in fast-isel.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204155 91177308-0d34-0410-b5e6-96231b3b80d8
The revision I'm reverting breaks handling of transitive aliases. This blocks us
and breaks sanitizer bootstrap:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/2651
(and checked locally by Alexey).
This revision is the result of:
svn merge -r204059:204058 -r204028:204027 -r203962:203961 .
+ the regression test added to test/MC/ELF/alias.s
Another way to reproduce the regression with clang:
$ cat q.c
void a1();
void a2() __attribute__((alias("a1")));
void a3() __attribute__((alias("a2")));
void a1() {}
$ ~/work/llvm-build/bin/clang-3.5-good -c q.c && mv q.o good.o && \
~/work/llvm-build/bin/clang-3.5-bad -c q.c && mv q.o bad.o && \
objdump -t good.o bad.o
good.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 q.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 0000000000000006 a1
0000000000000000 g F .text 0000000000000006 a2
0000000000000000 g F .text 0000000000000006 a3
bad.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 q.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 g F .text 0000000000000006 a1
0000000000000000 g F .text 0000000000000006 a2
0000000000000000 g .text 0000000000000000 a3
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204137 91177308-0d34-0410-b5e6-96231b3b80d8
Add an assertion that a valid section is referenced. The potential NULL pointer
dereference was identified by the clang static analyzer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204114 91177308-0d34-0410-b5e6-96231b3b80d8
This performs the equivalent of a .set directive in that it creates a symbol
which is an alias for another symbol or value which may possibly be yet
undefined. This directive also has the added property in that it marks the
aliased symbol as being a thumb function entry point, in the same way that the
.thumb_func directive does.
The current implementation fails one test due to an unrelated issue. Functions
within .thumb sections are not marked as thumb_func. The result is that
the aliasee function is not valued correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204059 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX.
For that to work properly, we also need to teach the legalizer about the
specific promotion required here. The default vector promotion uses
bitcasting to a vector type of the same total size. We want to promote the
vector element type, effectively widening the operation and then truncating
the result. This is analogous to the current logic of how int_to_fp is
promoted.
The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType. This is now shared between
int_to_fp and fp_to_int.
There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86. It can now go through the new target-independent fp_to_*int promotion
logic.
I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.
Fixes <rdar://problem/16202247>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204058 91177308-0d34-0410-b5e6-96231b3b80d8
The type of the immediates should not matter as long as the encoding is
equivalent to the encoding of one of the legal inline constants.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204056 91177308-0d34-0410-b5e6-96231b3b80d8
This instructions writes to an 32-bit SGPR. This change required adding
the 32-bit VCC_LO and VCC_HI registers, because the full VCC register
is 64 bits.
This fixes verifier errors on several of the indirect addressing piglit
tests.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204055 91177308-0d34-0410-b5e6-96231b3b80d8
Added checks for number of operands and operand register classes.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204054 91177308-0d34-0410-b5e6-96231b3b80d8
- Adds support for inserting vzerouppers before tail-calls.
This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
regmask operands, which allows VZeroUpperInserter to see where tail-calls use
vector registers.
- Fixes a bug that caused the previous version of this optimization to miss some
vzeroupper insertion points in loops. (Loops-with-vector-code that followed
loops-without-vector-code were mistakenly overlooked by the previous version).
- New algorithm never revisits instructions.
Fixes <rdar://problem/16228798>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204021 91177308-0d34-0410-b5e6-96231b3b80d8
Utilize the previous move of MVT to a separate header for all trivial
cases (that don't need any further restructuring).
Reviewed By: Tim Northover
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204003 91177308-0d34-0410-b5e6-96231b3b80d8
This change brings getCallPreservedMask()'s logic in line with
getCalleeSavedRegs().
While this changes the control flow slightly, the change is not
currently observable. is64Bit must be false to get to the accidental
fallthrough, but the case that we fall into (coldcc) does nothing unless
is64Bit is true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203943 91177308-0d34-0410-b5e6-96231b3b80d8
Changing order of checks in getCallPreservedMask() to match
getCalleeSavedRegs() so that the logic is easier to compare.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203939 91177308-0d34-0410-b5e6-96231b3b80d8
The current logic assumes that MF is not 0. Assert that it isn't, and
remove the default of 0 from the header.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203934 91177308-0d34-0410-b5e6-96231b3b80d8
Commit r181723 introduced code to avoid placing initialized variables
needing relocations into the .rodata section, which avoid copy relocs
that do not work as expected on ppc64 function references.
The same treatment is also needed for *named* .rodata.XXX sections.
This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal
to modify "Kind" *before* calling the default SelectSectionForGlobal
routine, instead of first calling the default routine and then just
checking for the (main) .rodata section afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203921 91177308-0d34-0410-b5e6-96231b3b80d8
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:
* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.
It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.
With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.
The objc uses are currently split in
* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.
The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are
* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.
Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.
For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203866 91177308-0d34-0410-b5e6-96231b3b80d8
operator* on the by-operand iterators to return a MachineOperand& rather than
a MachineInstr&. At this point they almost behave like normal iterators!
Again, this requires making some existing loops more verbose, but should pave
the way for the big range-based for-loop cleanups in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203865 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the implementation of local directional labels to use a dedicated
map. With that it can then just use CreateTempSymbol, which is what the rest
of MC uses.
CreateTempSymbol doesn't do a great job at making sure the names are unique
(or being efficient when the names are not needed), but that should probably
be fixed in a followup patch.
This fixes pr18928.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203826 91177308-0d34-0410-b5e6-96231b3b80d8
LDS instructions are pseudo instructions which model
the OQAP defs and uses within a single instruction.
This fixes a hang in the opencv MedianFilter tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203818 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up to r203635. Saleem pointed out that since symbolic register
names are much easier to read, it would be good if we could turn them off only
when we really need to because we're using an external assembler.
Differential Revision: http://llvm-reviews.chandlerc.com/D3056
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
This is an update of D2973 which was reverted because of a bug reported
as PR19084.
Reviewers: t.p.northover, chapuni
Reviewed By: t.p.northover
CC: llvm-commits, alex, chapuni
Differential Revision: http://llvm-reviews.chandlerc.com/D3021
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203797 91177308-0d34-0410-b5e6-96231b3b80d8
Only one instruction pair needed changing: SMULH & UMULH. The previous
code worked, but MC was doing extra work treating Ra as a valid
operand (which then got completely overwritten in MCCodeEmitter).
No behaviour change, so no tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203772 91177308-0d34-0410-b5e6-96231b3b80d8
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.
The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).
Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that. The assembler and disassembler
are fully implemented and tested. However:
- CodeGen support causes miscompiles; test-suite runtime failures:
MultiSource/Benchmarks/FreeBench/distray/distray
MultiSource/Benchmarks/McCat/08-main/main
MultiSource/Benchmarks/Olden/voronoi/voronoi
MultiSource/Benchmarks/mafft/pairlocalalign
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
SingleSource/Benchmarks/CoyoteBench/almabench
SingleSource/Benchmarks/Misc/matmul_f64_4x4
- The lowering currently falls back to using Altivec instructions far more
than it should. Worse, there are some things that are scalarized through the
stack that shouldn't be.
- A lot of unnecessary copies make it past the optimizers, and this needs to
be fixed.
- Many more regression tests are needed.
Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203768 91177308-0d34-0410-b5e6-96231b3b80d8
There are currently two schemes for mapping instruction operands to
instruction-format variables for generating the instruction encoders and
decoders for the assembler and disassembler respectively: a) to map by name and
b) to map by position.
In the long run, we'd like to remove the position-based scheme and use only
name-based mapping. Unfortunately, the name-based scheme currently cannot deal
with complex operands (those with suboperands), and so we currently must use
the position-based scheme for those. On the other hand, the position-based
scheme cannot deal with (register) variables that are split into multiple
ranges. An upcoming commit to the PowerPC backend (adding VSX support) will
require this capability. While we could teach the position-based scheme to
handle that, since we'd like to move away from the position-based mapping
generally, it seems silly to teach it new tricks now. What makes more sense is
to allow for partial transitioning: use the name-based mapping when possible,
and only use the position-based scheme when necessary.
Now the problem is that mixing the two sensibly was not possible: the
position-based mapping would map based on position, but would not skip those
variables that were mapped by name. Instead, the two sets of assignments would
overlap. However, I cannot currently change the current behavior, because there
are some backends that rely on it [I think mistakenly, but I'll send a message
to llvmdev about that]. So I've added a new TableGen bit variable:
noNamedPositionallyEncodedOperands, that can be used to cause the
position-based mapping to skip variables mapped by name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203767 91177308-0d34-0410-b5e6-96231b3b80d8
Support to the IAS was added to actually parse and handle the complex SO
expressions. However, the object file lowering was not updated to compensate
for the fact that the shift operand may be an absolute expression.
When trying to assemble to an object file, the lowering would fail while
succeeding when emitting purely assembly. Add an appropriate test.
The test case is inspired by the test case provided by Jiangning Liu who also
brought the issue to light.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203762 91177308-0d34-0410-b5e6-96231b3b80d8
Extend what's currently done for shift because the HW performs this masking
implicitly:
(rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)
I use the newly factored out multiclass that was only supporting shifts so
far.
For testing I extended my testcase for the new rotation idiom.
<rdar://problem/15295856>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203718 91177308-0d34-0410-b5e6-96231b3b80d8
The peephole (shift x, (and y, 31)) -> (shift x, y) is repeated for each
integer type and each shift variant.
To improve this a new multiclass is added that covers all integer types. The
shift patterns are now instantiated from this. I am planning to add new
instances for rotates as well.
No functional change intended:
* test/CodeGen/X86/shift-and.ll provides coverage
* Compared the expanded tablegen output and matched up the defs for these
Pat<>s before and after
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203685 91177308-0d34-0410-b5e6-96231b3b80d8
When printing assembly we don't have a Layout object, but we can still
try to fold some constants.
Testcase by Ulrich Weigand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203677 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a white lie to workaround a widespread bug in the -mfp64
implementation.
The problem is that none of the 32-bit fpu ops mention the fact that they
clobber the upper 32-bits of the 64-bit FPR. This allows MTHC1 to be
scheduled on the wrong side of most 32-bit FPU ops, particularly MTC1.
Fixing that requires a major overhaul of the FPU implementation which can't
be done right now due to time constraints.
The testcase is SingleSource/Benchmarks/Misc/oourafft.c when given
TARGET_CFLAGS='-mips32r2 mfp64 -mmsa'.
Also correct the comment added in r203464 to indicate that two
instructions were affected.
Reviewers: matheusalmeida, jacksprat
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3029
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203659 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Correct the match patterns and the lowerings that made the CodeGen tests pass despite the mistakes.
The original testcase that discovered the problem was SingleSource/UnitTests/SignlessType/factor.c in test-suite.
During review, we also found that some of the existing CodeGen tests were incorrect and fixed them:
* bitwise.ll: In bsel_v16i8 the IfSet/IfClear were reversed because bsel and bmnz have different operand orders and the test didn't correctly account for this. bmnz goes 'IfClear, IfSet, CondMask', while bsel goes 'CondMask, IfClear, IfSet'.
* vec.ll: In the cases where a bsel is emitted as a bmnz (they are the same operation with a different input tied to the result) the operands were in the wrong order.
* compare.ll and compare_float.ll: The bsel operand order was correct for a greater-than comparison, but a greater-than comparison instruction doesn't exist. Lowering this operation inverts the condition so the IfSet/IfClear need to be swapped to match.
The differences between BSEL, BMNZ, and BMZ and how they map to/from vselect are rather confusing. I've therefore added a note to MSA.txt to explain this in a single place in addition to the comments that explain each case.
Reviewers: matheusalmeida, jacksprat
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203657 91177308-0d34-0410-b5e6-96231b3b80d8
When the list of VFP registers to be saved was non-contiguous (so multiple
vpush/vpop instructions were needed) these were being ordered oddly, as in:
vpush {d8, d9}
vpush {d11}
This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't
match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be
broken).
This switches the order of vpush/vpop (in both prologue and epilogue,
obviously) so that the Dwarf locations are correct again.
rdar://problem/16264856
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203655 91177308-0d34-0410-b5e6-96231b3b80d8
The function hasReliableSymbolDifference had exactly one use in the MachO
writer. It is also only true for X86_64. In fact, the comments refers to
"Darwin x86_64" and everything else, so this makes the code match the
comment.
If this is to be abstracted again, it should be a property of
TargetObjectWriter, like useAggressiveSymbolFolding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203605 91177308-0d34-0410-b5e6-96231b3b80d8
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not. Previously, this option was entirely ignored on the ARM
target and only honoured on x86.
This option is mostly useful in profiling scenarios. The default remains that
tail call optimisations will be applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203577 91177308-0d34-0410-b5e6-96231b3b80d8
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203576 91177308-0d34-0410-b5e6-96231b3b80d8
Tail call optimisation was previously disabled on all targets other than
iOS5.0+. This enables the tail call optimisation on all Thumb 2 capable
platforms.
The test adjustments are to remove the IR hint "tail" to function invocation.
The tests were designed assuming that tail call optimisations would not kick in
which no longer holds true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203575 91177308-0d34-0410-b5e6-96231b3b80d8
ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no
need for any code to handle them specially.
There should be no functionality change so no tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203567 91177308-0d34-0410-b5e6-96231b3b80d8
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203559 91177308-0d34-0410-b5e6-96231b3b80d8
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.
The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).
Patch by Louis Gerbarg <lgg@apple.com>.
rdar://15479984
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203524 91177308-0d34-0410-b5e6-96231b3b80d8
NVPTX, like the other backends, relies on generic symbol name sanitizing done by
MCSymbol. However, the ptxas assembler is more stringent and disallows some
additional characters in symbol names.
See PR19099 for more details.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203483 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a white lie to workaround a widespread bug in the -mfp64
implementation.
The problem is that none of the 32-bit fpu ops mention the fact that they
clobber the upper 32-bits of the 64-bit FPR. This allows MFHC1 to be
scheduled on the wrong side of most 32-bit FPU ops. Fixing that requires a
major overhaul of the FPU implementation which can't be done right now due to
time constraints.
MFHC1 is one of two affected instructions. These instructions are the only
FPU instructions that don't read or write the lower 32-bits. We therefore
pretend that it reads the bottom 32-bits to artificially create a dependency and
prevent the scheduler changing the behaviour of the code.
The other instruction is MTHC1 which will be fixed once I've have found a failing
test case for it.
The testcase is test-suite/SingleSource/UnitTests/Vector/simple.c when
given TARGET_CFLAGS="-mips32r2 -mfp64 -mmsa".
Reviewers: jacksprat, matheusalmeida
Reviewed By: jacksprat
Differential Revision: http://llvm-reviews.chandlerc.com/D2966
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203464 91177308-0d34-0410-b5e6-96231b3b80d8
The function was making too many assumptions about its input:
1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.
2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.
http://llvm.org/PR19094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203450 91177308-0d34-0410-b5e6-96231b3b80d8
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203437 91177308-0d34-0410-b5e6-96231b3b80d8
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203364 91177308-0d34-0410-b5e6-96231b3b80d8
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203280 91177308-0d34-0410-b5e6-96231b3b80d8
The integrated assembler now works for ppc. Since this was the last use of the
bg/p predicate and Hal says that it is now dead, drop the predicate too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203269 91177308-0d34-0410-b5e6-96231b3b80d8