The old algorithm inserted a 'rotqmbyi' instruction which was
both redundant and wrong - it made shufb select bytes from the
wrong end of the input quad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116701 91177308-0d34-0410-b5e6-96231b3b80d8