We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204813 91177308-0d34-0410-b5e6-96231b3b80d8
Given that we support multiple directives that enable a particular feature
(e.g. '.set mips16'), it's best to hoist that code into a new function
so that we don't repeat the same pattern w.r.t parsing and handling error cases.
No functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204811 91177308-0d34-0410-b5e6-96231b3b80d8
After some discussion on IRC, emitting a call to the library function seems
like a better default, since it will move from a compiler internal error to
a linker error, that the user can work around until LLVM is fixed.
I'm also adding a note on the responsibility of the user to confirm that
the cache was cleared on platforms where nothing is done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204806 91177308-0d34-0410-b5e6-96231b3b80d8
The directive '.option pic2' enables PIC from assembly source.
At the moment none of the macros/directives check the PIC bit
but that's going to be fixed relatively soon. For example, the
expansion of macros like 'la' depend on the relocation model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204803 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204802 91177308-0d34-0410-b5e6-96231b3b80d8
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204801 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204784 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204782 91177308-0d34-0410-b5e6-96231b3b80d8
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204781 91177308-0d34-0410-b5e6-96231b3b80d8
Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).
In particular, prior to this patch we were generating:
vpbroadcastd LCPI1_0(%rip), %ymm2
vpand %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0 ## <- domain change penalty
Now, we generate the following nice sequence where everything is in the float
domain:
vbroadcastss LCPI1_0(%rip), %ymm2
vandps %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0
<rdar://problem/16354675>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204770 91177308-0d34-0410-b5e6-96231b3b80d8
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:
1. This will eliminate an otherwise-necessary copy of the addend
2. One of the product operands is killed by the instruction
The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.
As a simple example, if we have:
%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
...
%vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
%RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
...
We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
is replaced by:
%vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
%RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204768 91177308-0d34-0410-b5e6-96231b3b80d8
Although the first two operands are the ones that can be swapped, the tied
input operand is listed before them, so we need to adjust for that.
I have a test case for this, but it goes along with an upcoming commit (so it
will come soon).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204748 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen will create a lookup table for the A-type FMA instructions providing
their corresponding M-form opcodes. This will be used by upcoming commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204746 91177308-0d34-0410-b5e6-96231b3b80d8
Remove handling of select_cc, since it makes no sense to be there. This
now does nothing, but I'll be adding some handling of other target nodes
soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204743 91177308-0d34-0410-b5e6-96231b3b80d8
If getElementPtr uses a constant as base pointer, then make the constant opaque.
This prevents constant folding it with the offset. The offset can usually be
encoded in the load/store instruction itself and the base address doesn't have
to be rematerialized several times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204739 91177308-0d34-0410-b5e6-96231b3b80d8
The cost for the first four stackmap operands was always TCC_Free.
This is only true for the first two operands. All other operands
are TCC_Free if they are within 64bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204738 91177308-0d34-0410-b5e6-96231b3b80d8
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.
Fixes <rdar://problem/16167303>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204735 91177308-0d34-0410-b5e6-96231b3b80d8
I found three implementations of this. This splits it out into a new function
and uses it from the three places.
My plan is to add a fourth use when lowering a vector_shuffle:v16i16.
Compared the assembly output of test/CodeGen/X86 before and after.
The only change is due to how the first PSHUFB was generated in
LowerVECTOR_SHUFFLEv8i16. If the shuffle mask specified undef (i.e. -1), the
old implementation would write -1 * 2 and -1 * 2 + 1 (254 and 255) in the
control mask. Now we write 0x80. These are of course interchangeable since
bit 7 decides if a constant zero is written in the result byte. The other
instances of this code use 0x80 consistently.
Related to <rdar://problem/16167303>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204734 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Remove the XFAIL added in my previous commit and correct the test such that
it correctly tests the expansion of the assembler temporary.
Also added a test to check that $at is always $1 when written by the
user.
Corrected the new assembler temporary warnings so that they are emitted for
numeric registers too.
Differential Revision: http://llvm-reviews.chandlerc.com/D3169
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204711 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The assembler temporary is normally $at ($1) but can be reassigned using
'.set at=$reg'. Regardless of which register is nominated as the assembler
temporary, $at remains $1 when written by the user.
Adds warnings under the following conditions:
* The register nominated as the assembler temporary is used by the user.
* '.set noat' is in effect and $at is used by the user.
Both of these only work for named registers. I have a follow up commit that makes it work for numeric registers as well.
XFAIL set-at-directive.s since it incorrectly tests that $at is redefined by
'.set at=$reg'. Testcases will follow in a separate commit.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
Differential Revision: http://llvm-reviews.chandlerc.com/D3167
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204710 91177308-0d34-0410-b5e6-96231b3b80d8
Try to match scalar and first like the other instructions.
Expand 64-bit ands to a pair of 32-bit ands since that is not
available on the VALU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204660 91177308-0d34-0410-b5e6-96231b3b80d8
As a first step towards real little-endian code generation, this patch
changes the PowerPC MC layer to actually generate little-endian object
files. This involves passing the little-endian flag through the various
layers, including down to createELFObjectWriter so we actually get basic
little-endian ELF objects, emitting instructions in little-endian order,
and handling fixups and relocations as appropriate for little-endian.
The bulk of the patch is to update most test cases in test/MC/PowerPC
to verify both big- and little-endian encodings. (The only test cases
*not* updated are those that create actual big-endian ABI code, like
the TLS tests.)
Note that while the object files are now little-endian, the generated
code itself is not yet updated, in particular, it still does not adhere
to the ELFv2 ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204634 91177308-0d34-0410-b5e6-96231b3b80d8
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.
<rdar://problem/16074331>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204631 91177308-0d34-0410-b5e6-96231b3b80d8
This can be observed with the old testcase of CodeGen/X86/pr12312.ll:
47c47
< vorps %ymm0, %ymm1, %ymm0
---
> vorps %ymm1, %ymm0, %ymm0
97c97
< vorps %ymm1, %ymm0, %ymm0
---
> vorps %ymm0, %ymm1, %ymm0
The vector VecIns is populated with all the values from VecInMap. This is done
while iterating VecInMap. VecInMap uses a hash of pointer values so the
resulting order can vary depending on the memory layout.
The fix is to populate the vector VecIns earlier as VecInMap is populated.
This is done in DAG traversal order.
Fixes <rdar://problem/16398806>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204623 91177308-0d34-0410-b5e6-96231b3b80d8
[PPC64LE] ELFv2 ABI updates for the .opd section
The PPC64 Little Endian (PPC64LE) target supports the ELFv2 ABI, and as
such, does not have a ".opd" section. This is keyed off a _CALL_ELF=2
macro check.
The CALL_ELF check is not clearly documented at this time. The basis
for usage in this patch is from the gcc thread here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html
> Adding comment from Uli:
Looks good to me. I think the old-style JIT doesn't really work
anyway for 64-bit, but at least with this patch LLVM will compile
and link again on a ppc64le host ...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204614 91177308-0d34-0410-b5e6-96231b3b80d8
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.
Before this change, we had the following commutative instructions:
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
XSADDDP
XSMULDP
XVADDDP
XVADDSP
XVMULDP
XVMULSP
After:
ADD4
ADD4o
ADD8
ADD8o
ADDC
ADDC8
ADDC8o
ADDCo
ADDE
ADDE8
ADDE8o
ADDEo
AND
AND8
AND8o
ANDo
CRAND
CREQV
CRNAND
CRNOR
CROR
CRXOR
EQV
EQV8
EQV8o
EQVo
FADD
FADDS
FADDSo
FADDo
FMADD
FMADDS
FMADDSo
FMADDo
FMSUB
FMSUBS
FMSUBSo
FMSUBo
FMUL
FMULS
FMULSo
FMULo
FNMADD
FNMADDS
FNMADDSo
FNMADDo
FNMSUB
FNMSUBS
FNMSUBSo
FNMSUBo
MULHD
MULHDU
MULHDUo
MULHDo
MULHW
MULHWU
MULHWUo
MULHWo
MULLD
MULLDo
MULLW
MULLWo
NAND
NAND8
NAND8o
NANDo
NOR
NOR8
NOR8o
NORo
OR
OR8
OR8o
ORo
RLDIMI
RLDIMIo
RLWIMI
RLWIMI8
RLWIMI8o
RLWIMIo
VADDCUW
VADDFP
VADDSBS
VADDSHS
VADDSWS
VADDUBM
VADDUBS
VADDUHM
VADDUHS
VADDUWM
VADDUWS
VAND
VAVGSB
VAVGSH
VAVGSW
VAVGUB
VAVGUH
VAVGUW
VMADDFP
VMAXFP
VMAXSB
VMAXSH
VMAXSW
VMAXUB
VMAXUH
VMAXUW
VMHADDSHS
VMHRADDSHS
VMINFP
VMINSB
VMINSH
VMINSW
VMINUB
VMINUH
VMINUW
VMLADDUHM
VMULESB
VMULESH
VMULEUB
VMULEUH
VMULOSB
VMULOSH
VMULOUB
VMULOUH
VNMSUBFP
VOR
VXOR
XOR
XOR8
XOR8o
XORo
XSADDDP
XSMADDADP
XSMAXDP
XSMINDP
XSMSUBADP
XSMULDP
XSNMADDADP
XSNMSUBADP
XVADDDP
XVADDSP
XVMADDADP
XVMADDASP
XVMAXDP
XVMAXSP
XVMINDP
XVMINSP
XVMSUBADP
XVMSUBASP
XVMULDP
XVMULSP
XVNMADDADP
XVNMADDASP
XVNMSUBADP
XVNMSUBASP
XXLAND
XXLNOR
XXLOR
XXLXOR
This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204609 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
- If only two registers are passed to a three-register operation, then the
first argument is both source and destination register.
- If a non-register is passed as the last argument, generate the immediate
version of the instruction.
Also mark DADD commutative and add scheduling information (to the generic
scheduler), and implement DSUB.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
CC: theraven
Differential Revision: http://llvm-reviews.chandlerc.com/D3148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204605 91177308-0d34-0410-b5e6-96231b3b80d8
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.
My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread. So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.
Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:
unsigned chooseVSXCopy(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,
unsigned DestReg, unsigned SrcReg,
unsigned StartDist = 1,
unsigned Depth = 3) const;
with an implementation like:
if (!Depth)
return PPC::XXLOR;
const unsigned MaxDist = 30;
unsigned Dist = StartDist;
for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
if (J->isTransient() && !J->isCopy())
continue;
if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
return PPC::XXLOR;
++Dist;
}
// We've exceeded the required distance for the high-latency form, use it.
if (Dist > MaxDist)
return PPC::XVCPSGNDP;
// If this is only an exit block, use the low-latency form.
if (MBB.succ_empty())
return PPC::XXLOR;
// We've reached the end of the block, check the successor blocks (up to some
// depth), and use the high-latency form if that is okay with all successors.
for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
Dist, --Depth) == PPC::XXLOR)
return PPC::XXLOR;
}
// All of our successor blocks seem okay with the high-latency variant, so
// we'll use it.
return PPC::XVCPSGNDP;
and then changed the copy opcode selection from:
Opc = PPC::XXLOR;
to:
Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);
In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204591 91177308-0d34-0410-b5e6-96231b3b80d8