[PPC64LE] Correct vperm -> shuffle transform for little endian

As discussed in cfe commit r210279, the correct little-endian semantics for the vec_perm Altivec interfaces are implemented by reversing the order of the input vectors and complementing the permute control vector. This converts the desired permute from little endian element order into the big endian element order that the underlying PowerPC vperm instruction uses. This is represented with a ppc_altivec_vperm intrinsic function. The instruction combining pass contains code to convert a ppc_altivec_vperm intrinsic into a vector shuffle operation when the intrinsic has a permute control vector (mask) that is a constant. However, the vector shuffle operation assumes that vector elements are in natural order for their endianness, so for little endian code we will get the wrong result with the existing transformation. This patch reverses the semantic change to vec_perm that was performed in altivec.h by once again swapping the input operands and complementing the permute control vector, returning the element ordering to little endian. The correctness of this code is tested by the new perm.c test added in a previous patch, and by other tests in the test suite that fail without this patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210282 91177308-0d34-0410-b5e6-96231b3b80d8
2025-02-01 02:33:44 +00:00 · 2014-06-05 19:46:04 +00:00 · 2014-06-05 19:46:04 +00:00 · 542fdf5fba
commit 542fdf5fba
parent ef31a79323
1 changed files with 10 additions and 1 deletions
--- a/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/lib/Transforms/InstCombine/InstCombineCalls.cpp
@ -800,6 +800,11 @@ Instruction *InstCombiner::visitCallInst(CallInst &CI) {

  case Intrinsic::ppc_altivec_vperm:
    // Turn vperm(V1,V2,mask) -> shuffle(V1,V2,mask) if mask is a constant.
+    // Note that ppc_altivec_vperm has a big-endian bias, so when creating
+    // a vectorshuffle for little endian, we must undo the transformation
+    // performed on vec_perm in altivec.h.  That is, we must complement
+    // the permutation mask with respect to 31 and reverse the order of
+    // V1 and V2.
    if (Constant *Mask = dyn_cast<Constant>(II->getArgOperand(2))) {
      assert(Mask->getType()->getVectorNumElements() == 16 &&
             "Bad type for intrinsic!");
@ -832,10 +837,14 @@ Instruction *InstCombiner::visitCallInst(CallInst &CI) {
          unsigned Idx =
            cast<ConstantInt>(Mask->getAggregateElement(i))->getZExtValue();
          Idx &= 31;  // Match the hardware behavior.
+          if (DL && DL->isLittleEndian())
+            Idx = 31 - Idx;

          if (!ExtractedElts[Idx]) {
+            Value *Op0ToUse = (DL && DL->isLittleEndian()) ? Op1 : Op0;
+            Value *Op1ToUse = (DL && DL->isLittleEndian()) ? Op0 : Op1;
            ExtractedElts[Idx] =
-              Builder->CreateExtractElement(Idx < 16 ? Op0 : Op1,
+              Builder->CreateExtractElement(Idx < 16 ? Op0ToUse : Op1ToUse,
                                            Builder->getInt32(Idx&15));
          }