x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.
To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
the match to try to fit RIP into the base register. If it fails, it
now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
as it did before. The reason these immediates are safe is because the
ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
This is the only case changed by the patch, and the primary place you
see it is in TLS, either the win64 section offset TLS or Linux
local-exec TLS model in a PIC compilation. Here the ABI again ensures
that the immediates fit because we are in small mode, and any other
operations required due to the PIC relocation model have been handled
externally to the Wrapper node (extra loads etc are made around the
wrapper node in ISelLowering).
I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154304 91177308-0d34-0410-b5e6-96231b3b80d8
comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.
Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.
This will make any subsequent changes to TLS much more clear during
review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154303 91177308-0d34-0410-b5e6-96231b3b80d8
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154296 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.
I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]
I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154294 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154284 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154266 91177308-0d34-0410-b5e6-96231b3b80d8
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154265 91177308-0d34-0410-b5e6-96231b3b80d8
by default.
This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.
I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154235 91177308-0d34-0410-b5e6-96231b3b80d8
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154119 91177308-0d34-0410-b5e6-96231b3b80d8
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.
Patch by Shamil Kurmangaleev!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153904 91177308-0d34-0410-b5e6-96231b3b80d8
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153864 91177308-0d34-0410-b5e6-96231b3b80d8
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153817 91177308-0d34-0410-b5e6-96231b3b80d8
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153635 91177308-0d34-0410-b5e6-96231b3b80d8
This is a code change to add support for changing instruction sequences of the form:
load
inc/dec of 8/16/32/64 bits
store
into the appropriate X86 inc/dec through memory instruction:
inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153617 91177308-0d34-0410-b5e6-96231b3b80d8
* Removed test/lib/llvm.exp - it is no longer needed
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
left in the test suite so this code is no longer required. test/lit.cfg is
now much shorter and clearer
* Removed a lot of duplicate code in lit.local.cfg files that need access to
the root configuration, by adding a "root" attribute to the TestingConfig
object. This attribute is dynamically computed to provide the same
information as was previously provided by the custom getRoot functions.
* Documented the config.root attribute in docs/CommandGuide/lit.pod
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153408 91177308-0d34-0410-b5e6-96231b3b80d8
This results in things such as
vmovups 16(%rdi), %xmm0
vinsertf128 $1, %xmm0, %ymm0, %ymm0
to be combined to
vinsertf128 $1, 16(%rdi), %ymm0, %ymm0
rdar://11076953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153092 91177308-0d34-0410-b5e6-96231b3b80d8
i128). In that case, we may not be able to print out the MCExpr as an
expression. For instance, we could have an MCExpr like this:
0xBEEF0000BEEF0000 | (0xBEEF0000BEEF0000 << 64)
The MCExpr printer handles sizes up to 64-bits, but this expression would
require 128-bits. In this situation, try to evaluate the constant expression and
emit that as the value into 64-bit chunks.
<rdar://problem/11070338>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153081 91177308-0d34-0410-b5e6-96231b3b80d8
X86InstrCompiler.td.
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153033 91177308-0d34-0410-b5e6-96231b3b80d8
This results in things such as
vmovaps -96(%rbx), %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
to be combined to
vinsertf128 $1, -96(%rbx), %ymm0, %ymm0
rdar://10643481
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152762 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message from r147481:
DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
Fix:
Unaligned loads need to generate a vmovups.
rdar://10974078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152366 91177308-0d34-0410-b5e6-96231b3b80d8
This was testing the handling of sub-register coalescing followed by
remat. The original problem was caused by the extra <imp-def> operands
added by sub-register coalescing. Those <imp-def> operands are not
added any longer, and the test case passes even when the original patch
is reverted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152040 91177308-0d34-0410-b5e6-96231b3b80d8
Some BBs can become dead after codegen preparation. If we delete them here, it
could help enable tail-call optimizations later on.
<rdar://problem/10256573>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152002 91177308-0d34-0410-b5e6-96231b3b80d8
In this instance we are generating the tail-call during legalizeDAG. The 2nd
floor call can't be a tail call because it clobbers %xmm1, which is defined by
the first floor call. The first floor call can't be a tail-call because it's
not in the tail position. The only reasonable way I could think to fix this
in a target-independent manner was to check for glue logic on the copy reg.
rdar://10930395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151877 91177308-0d34-0410-b5e6-96231b3b80d8
so that the test will not fail when run on an Intel Atom
processor, due to the Atom scheduler producing an instruction sequence that is
different from that which is normally expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151832 91177308-0d34-0410-b5e6-96231b3b80d8