inform the optimizers that the result must be zero/
sign extended from the smaller type. For example,
if a fp to unsigned i16 is promoted to fp to i32,
then we are allowed to assume that the extra 16 bits
are zero (because the result of fp to i16 is undefined
if the result does not fit in an i16). This is
quite aggressive, but should help the optimizers
produce better code. This requires correcting a
test which thought that fp_to_uint is some kind
of truncation, which it is not: in the testcase
(which does fp to i1), either the fp value converts
to 0 or 1 or the result is undefined, which is
quite different to truncation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58991 91177308-0d34-0410-b5e6-96231b3b80d8
have its node id set. The new and and shift nodes are the nodes that need
the IDs. This fixes PR2982.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58655 91177308-0d34-0410-b5e6-96231b3b80d8
bits, use a union of a SimpleValueType enum and a regular Type*.
This increases the size of MVT on 64-bit hosts from 32 bits to 64 bits.
In most cases, this doesn't add significant overhead. There are places
in codegen that use arrays of MVTs, so these are now larger, but
they're small in common cases.
This eliminates restrictions on the size of integer types and vector
types that can be represented in codegen. As the included testcase
demonstrates, it's now possible to codegen very large add operations.
There are still some complications with using very large types. PR2880
is still open so they can't be used as return values on normal targets,
there are no libcalls defined for very large integers so operations
like multiply and divide aren't supported.
This also introduces a minimal tablgen Type library, capable of
handling IntegerType and VectorType. This will allow parts of
TableGen that don't depend on using SimpleValueType values to handle
arbitrary integer and vector types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58623 91177308-0d34-0410-b5e6-96231b3b80d8
a memset using 16-byte XMM stores, but where the stack realignment code
didn't work. Until it does (PR2962) disable use of xmm regs in memcpy
and memset formation for linux and other targets with insufficiently
aligned stacks.
This is part of PR2888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58317 91177308-0d34-0410-b5e6-96231b3b80d8
LHS is a foldable load, then LHS and RHS are swapped
and SetCCOpcode is changed to SETUGT. But the later
code is expecting operands to be the wrong way round
for SETUGT, but they are not in this case, resulting
in an inverted compare. The solution is to move the
load normalization before the correction for SETUGT.
This bug was tickled by LegalizeTypes which happened
to legalize the testcase slightly differently to
LegalizeDAG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@58092 91177308-0d34-0410-b5e6-96231b3b80d8
in the 32-bit signed offset field of addresses. Even though this
may be intended, some linkers refuse to relocate code where the
relocated address computation overflows.
Also, fix the sign-extension of constant offsets to use the
actual pointer size, rather than the size of the GlobalAddress
node, which may be different, for example on x86-64 where MVT::i32
is used when the address is being fit into the 32-bit displacement
field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57885 91177308-0d34-0410-b5e6-96231b3b80d8
Where previously LLVM might emit code like this:
ucomisd %xmm1, %xmm0
setne %al
setp %cl
orb %al, %cl
jne .LBB4_2
it now emits this:
ucomisd %xmm1, %xmm0
jne .LBB4_2
jp .LBB4_2
It has fewer instructions and uses fewer registers, but it does
have more branches. And in the case that this code is followed by
a non-fallthrough edge, it may be followed by a jmp instruction,
resulting in three branch instructions in sequence. Some effort
is made to avoid this situation.
To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and
FCMP_UNE in lowered form, and replace them with code that emits
two branches, except in the case where it would require converting
a fall-through edge to an explicit branch.
Also, X86InstrInfo.cpp's branch analysis and transform code now
knows now to handle blocks with multiple conditional branches. It
uses loops instead of having fixed checks for up to two
instructions. It can now analyze and transform code generated
from FCMP_OEQ and FCMP_UNE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57873 91177308-0d34-0410-b5e6-96231b3b80d8
the copy instruction from the instruction list before asking the
target to create the new instruction. This gets the old instruction
out of the way so that it doesn't interfere with the target's
rematerialization code. In the case of x86, this helps it find
more cases where EFLAGS is not live.
Also, in the X86InstrInfo.cpp, teach isSafeToClobberEFLAGS to check
to see if it reached the end of the block after scanning each
instruction, instead of just before. This lets it notice when the
end of the block is only two instructions away, without doing any
additional scanning.
These changes allow rematerialization to clobber EFLAGS in more
cases, for example using xor instead of mov to set the return value
to zero in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57872 91177308-0d34-0410-b5e6-96231b3b80d8
for strange asm conditions earlier. In this case, we have a
double being passed in an integer reg class. Convert to like
sized integer register so that we allocate the right number
for the class (two i32's for the f64 in this case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57862 91177308-0d34-0410-b5e6-96231b3b80d8
and add a TargetLowering hook for it to use to determine when this
is legal (i.e. not in PIC mode, etc.)
This allows instruction selection to emit folded constant offsets
in more cases, such as the included testcase, eliminating the need
for explicit arithmetic instructions.
This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp
that attempted to achieve the same effect, but wasn't as effective.
Also, fix handling of offsets in GlobalAddressSDNodes in several
places, including changing GlobalAddressSDNode's offset from
int to int64_t.
The Mips, Alpha, Sparc, and CellSPU targets appear to be
unaware of GlobalAddress offsets currently, so set the hook to
false on those targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57748 91177308-0d34-0410-b5e6-96231b3b80d8
in 32-bit mode instead of assigning a register pair. This has nothing to
do with PR2356, but I happened to notice it while working on it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57704 91177308-0d34-0410-b5e6-96231b3b80d8
use a SUB instruction instead of an ADD, because -128 can be
encoded in an 8-bit signed immediate field, while +128 can't be.
This avoids the need for a 32-bit immediate field in this case.
A similar optimization applies to 64-bit adds with 0x80000000,
with the 32-bit signed immediate field.
To support this, teach tablegen how to handle 64-bit constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57663 91177308-0d34-0410-b5e6-96231b3b80d8
shift counts, and patterns that match dynamic shift counts
when the subtract is obscured by a truncate node.
Add DAGCombiner support for recognizing rotate patterns
when the shift counts are defined by truncate nodes.
Fix and simplify the code for commuting shld and shrd
instructions to work even when the given instruction doesn't
have a parent, and when the caller needs a new instruction.
These changes allow LLVM to use the shld, shrd, rol, and ror
instructions on x86 to replace equivalent code using two
shifts and an or in many more cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57662 91177308-0d34-0410-b5e6-96231b3b80d8
i.e. conditions that cannot be checked with a single instruction. For example,
SETONE and SETUEQ on x86.
- Teach legalizer to implement *illegal* setcc as a and / or of a number of
legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is
implemented as SETO & SETNE, SETUEQ is SETUO | SETEQ.
- Move x86 target over.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57542 91177308-0d34-0410-b5e6-96231b3b80d8
create a new DAG node to represent the new shift to keep the
DAG consistent, even though it'll almost always be folded into
the address.
If a user of the resulting address has multiple uses, the
nodes may get revisited by a later MatchAddress call, in which
case DAG inconsistencies do matter.
This fixes PR2849.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57465 91177308-0d34-0410-b5e6-96231b3b80d8
parameters instead of raw Constants. This prevents the constants from
being selected by the isel pass, fixing PR2735.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57385 91177308-0d34-0410-b5e6-96231b3b80d8
instead.
So now: -fast-isel or -fast-isel=true enable fast-isel, and
-fast-isel=false disables it. Fast-isel is also on by default
with -fast, and off by default otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57270 91177308-0d34-0410-b5e6-96231b3b80d8
are Inexact. (These are not Inexact as defined
by IEEE754, but that seems like a reasonable way
to abstract what happens: information is lost.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57218 91177308-0d34-0410-b5e6-96231b3b80d8
was setting kill flags on tied uses in two-address instructions.
The kill flags were causing the allocator to think it could
allocate the use and its tied def in different registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57039 91177308-0d34-0410-b5e6-96231b3b80d8
"If a re-materializable instruction has a register
operand, the spiller will change the register operand's
spill weight to HUGE_VAL to avoid it being spilled.
However, if the operand is already in the queue ready
to be spilled, avoid re-materializing it".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56837 91177308-0d34-0410-b5e6-96231b3b80d8
meaning sse_regparm (i.e. float/double values go
in XMM0 instead of ST0). Update documentation
to reflect reality.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56619 91177308-0d34-0410-b5e6-96231b3b80d8
with an earlyclobber operand elsewhere. Propagate
this bit and the earlyclobber bit through SDISel.
Change linear-scan RA not to allocate regs in a way
that conflicts with an earlyclobber. See also comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56290 91177308-0d34-0410-b5e6-96231b3b80d8
cases. See the comment above OptimizeSMax for the full story, and
the testcase for an example. This cancels out a pessimization
commonly attributed to indvars, and will allow us to lift some of
the artificial throttles in indvars, rather than add new ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56230 91177308-0d34-0410-b5e6-96231b3b80d8
vr1024 = extract_subreg vr1025, 1
...
vr1024 = mov8rr AH
If vr1024 is coalesced with AH, the extract_subreg is now illegal since AH does not have a super-reg whose sub-register 1 is AH.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56118 91177308-0d34-0410-b5e6-96231b3b80d8
i32>. This is a little messy, but it works.
We should really get rid of the intrinsics, though, since they map
perfectly well to standard LLVM instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55864 91177308-0d34-0410-b5e6-96231b3b80d8
its work by putting all nodes in the worklist, requiring a big
dynamic allocation. Now, DAGCombiner just iterates over the AllNodes
list and maintains a worklist for nodes that are newly created or
need to be revisited. This allows the worklist to stay small in most
cases, so it can be a SmallVector.
This has the side effect of making DAGCombine not miss a folding
opportunity in alloca-align-rounding.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55498 91177308-0d34-0410-b5e6-96231b3b80d8
assign it to a version of the xmm register with the regclass that matches its
type. This fixes PR2715, a bug handling some crazy xpcom case in mozilla.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55358 91177308-0d34-0410-b5e6-96231b3b80d8
and use it in FastISelEmitter.cpp, and make FastISel
subtarget aware. Among other things, this lets it work
properly on x86 targets that don't have SSE, where it
successfully selects x87 instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55156 91177308-0d34-0410-b5e6-96231b3b80d8
1. x86-64 byval alignment should be max of 8 and alignment of type. Previously the code was not doing what the commit message was saying.
2. Do not use byte repeat move and store operations. These are slow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55139 91177308-0d34-0410-b5e6-96231b3b80d8
demonstrate the extent of its capabilities. Note that it
only attempts to operate on one of the blocks in this
testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55016 91177308-0d34-0410-b5e6-96231b3b80d8
builtins on X86.
Change "lock" instructions to be on a separate line.
This is needed to work around a bug in the Darwin
assembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54999 91177308-0d34-0410-b5e6-96231b3b80d8
LowerSubregs, and fix an x86-64 isel bug that this exposed.
SUBREG_TO_REG for x86-64 implicit zero extension is only safe for
isel to generate when the source is known to always have zeros in
the high 32 bits. The EXTRACT_SUBREG instruction does not clear
the high 32 bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54444 91177308-0d34-0410-b5e6-96231b3b80d8