11884 Commits

Author SHA1 Message Date
Chandler Carruth
ec35919c9a [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218233 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 03:05:23 +00:00
Chandler Carruth
de95c380c7 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 00:32:15 +00:00
Chandler Carruth
37bb4b0365 [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:46:13 +00:00
Chandler Carruth
7da57cf5b4 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218224 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:32:42 +00:00
Chandler Carruth
974542d7d8 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:35:14 +00:00
Chandler Carruth
1a5f7f54f4 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:49:46 +00:00
Chandler Carruth
6ef31b0079 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:20:44 +00:00
Chandler Carruth
7a94357b04 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218212 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:13:11 +00:00
Chandler Carruth
7922d3e39a [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:01:19 +00:00
Chandler Carruth
e4cb9d5f25 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218210 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:51:33 +00:00
Chandler Carruth
29720a4bad [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:17:55 +00:00
Chandler Carruth
0dd52092d0 [x86] Add some more comprehensive tests for v4f64 blending.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:12:19 +00:00
Chandler Carruth
57191b0b48 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218206 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:07:41 +00:00
Chandler Carruth
291140b112 [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:35:22 +00:00
Chandler Carruth
1ca1e33c3a [x86] Add some more test cases covering specific blend patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218200 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:01:26 +00:00
Chandler Carruth
b7ef7f97a8 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218199 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 08:49:27 +00:00
Chandler Carruth
ae464b2ba1 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 22:09:27 +00:00
Chandler Carruth
479d0ba62b [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218192 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:26:41 +00:00
David Majnemer
182c8ff6c0 Update tests which broke from r218189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218191 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:18:43 +00:00
Chandler Carruth
9c7ffd20df [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 20:52:07 +00:00
Chandler Carruth
cc727ec92a [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218188 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 18:36:39 +00:00
Chandler Carruth
c16105b078 [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 04:15:22 +00:00
Chandler Carruth
cc62abbe39 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 03:32:25 +00:00
Peter Collingbourne
87f7e75e58 Fix crash with an insertvalue that produces an empty object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218171 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 00:10:47 +00:00
Matt Arsenault
1a505ebae4 R600: Un-xfail a test which passes with pass disabled
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:20 +00:00
Matt Arsenault
ea3a0242f4 R600/SI: Un-xfail tests which work now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:18 +00:00
Matt Arsenault
c58ab80f78 R600/SI: Un xfail a test that works now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 22:42:40 +00:00
Juergen Ributzka
faf93a6e0c [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218161 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 22:23:46 +00:00
Chandler Carruth
dc58d1e099 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218143 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 20:00:32 +00:00
Matt Arsenault
c14f7630e0 R600/SI: Fix test to prepare for scheduler
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218131 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 18:11:16 +00:00
Hal Finkel
c404e8208c Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 11:42:56 +00:00
Chandler Carruth
89436b4160 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218115 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 09:45:21 +00:00
Chandler Carruth
3e990c1e5b [x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218114 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 08:51:06 +00:00
Chandler Carruth
ec1f7b1c87 [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218112 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 08:37:44 +00:00
Chandler Carruth
330aa6fd6b [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218102 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 06:07:49 +00:00
Jiangning Liu
61519cd699 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 05:30:35 +00:00
Hans Wennborg
2ee31bcdee Fix an it's vs. its typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218093 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 01:14:56 +00:00
Matt Arsenault
bd2b96a12d R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 00:42:06 +00:00
Chandler Carruth
9b676fd6f2 [x86] Extend this test to cover SSE4.1. Nothing interesting here, but
paves the way for subsequent changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218091 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 00:30:24 +00:00
Quentin Colombet
65edced76b [ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).

<rdar://problem/18352998>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218076 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 21:17:50 +00:00
Robin Morisset
5052940c27 Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;

Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.
```

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D5386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218066 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 18:56:04 +00:00
Matt Arsenault
e08e52528b R600: Bug 20982 - Avoid undefined left shift of negative value
I'm not sure what the hardware actually does, so don't
bother trying to fold it for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218057 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 15:52:26 +00:00
Chandler Carruth
72f0d9515e [x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate.
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.

There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 09:00:25 +00:00
Chandler Carruth
088aa097d5 [x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new
vector shuffle lowering. This will be needed for up-coming palignr
tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218037 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 08:33:04 +00:00
Juergen Ributzka
f789dac2dd Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218035 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 08:07:40 +00:00
Juergen Ributzka
ef48b51126 Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:26:26 +00:00
Juergen Ributzka
e7fba004ce [FastISel][AArch64] Simplify XALU multiplies.
Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218033 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:04:54 +00:00
Juergen Ributzka
4b6f00ad18 [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218032 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:04:49 +00:00
Juergen Ributzka
22b557d942 [FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 05:40:47 +00:00
Juergen Ributzka
ffbd4879eb [FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.

and  x1, x1, #0xffffffff   ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 05:40:41 +00:00