a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171366 91177308-0d34-0410-b5e6-96231b3b80d8
bitwidth op back to the original size. If we reduce ANDs then this can cause
an endless loop. This patch changes the ZEXT to ANY_EXTEND if the demanded bits
are equal or smaller than the size of the reduced operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170505 91177308-0d34-0410-b5e6-96231b3b80d8
A register can be associated with several distinct register classes.
For example, on PPC, the floating point registers are each associated with
both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code
would fail when provided with a floating point register and an f64 operand
because it would happen to find the register in the F4RC class first and
return that. From the F4RC class, SDAG would extract f32 as the register
type and then assert because of the invalid implied conversion between
the f64 value and the f32 register.
Instead, search all register classes. If a register class containing the
the requested register has the requested type, then return that register
class. Otherwise, as before, return the first register class found that
contains the requested register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170436 91177308-0d34-0410-b5e6-96231b3b80d8
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169536 91177308-0d34-0410-b5e6-96231b3b80d8
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169131 91177308-0d34-0410-b5e6-96231b3b80d8
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168882 91177308-0d34-0410-b5e6-96231b3b80d8
InputArg in r165616.
This will enable us to get the actual type for both InputArg and OutputArg.
rdar://9932559
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167265 91177308-0d34-0410-b5e6-96231b3b80d8
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().
The X86 backend is already able to handle debugtrap(). This patch is to:
1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
make the __builtin_debugtrap() "available" to all existing ports without the hassle of
changing their code.
3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
__builtin_trap() will be expanded into the function call of the specified trap function.
This behavior may need change in the future.
The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166300 91177308-0d34-0410-b5e6-96231b3b80d8
The next step is to update the optimizers to allow them to optimize the different address spaces with this information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165505 91177308-0d34-0410-b5e6-96231b3b80d8
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165488 91177308-0d34-0410-b5e6-96231b3b80d8
Provide interface in TargetLowering to set or get the minimum number of basic
blocks whereby jump tables are generated for switch statements rather than an
if sequence.
getMinimumJumpTableEntries() defaults to 4.
setMinimumJumpTableEntries() allows target configuration.
This patch changes the default for the Hexagon architecture to 5
as it improves performance on some benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164628 91177308-0d34-0410-b5e6-96231b3b80d8
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163150 91177308-0d34-0410-b5e6-96231b3b80d8
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156233 91177308-0d34-0410-b5e6-96231b3b80d8
We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156220 91177308-0d34-0410-b5e6-96231b3b80d8
The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156147 91177308-0d34-0410-b5e6-96231b3b80d8
transformation:
(X op C1) ^ C2 --> (X op C1) & ~C2 iff (C1&C2) == C2
should be done.
This change has been tested:
Using a debug+asserts build:
on the specific test case that brought this bug to light
make check-all
lnt nt
using this clang to build a release version of clang
Using the release+asserts clang-with-clang build:
on the specific test case that brought this bug to light
make check-all
lnt nt
Checking in because Evan wants it checked in. Test case forthcoming after
scrubbing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154955 91177308-0d34-0410-b5e6-96231b3b80d8
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154292 91177308-0d34-0410-b5e6-96231b3b80d8
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154119 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
When folding X == X we need to check getBooleanContents() to determine if the
result is a vector of ones or a vector of negative ones.
I tried creating a test case, but the problem seems to only be exposed on a
much older version of clang (around r144500).
rdar://10923049
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153966 91177308-0d34-0410-b5e6-96231b3b80d8
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148225 91177308-0d34-0410-b5e6-96231b3b80d8
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147861 91177308-0d34-0410-b5e6-96231b3b80d8
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144100 91177308-0d34-0410-b5e6-96231b3b80d8
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139159 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a chain of zext -> assert_zext -> zext -> use, the first zext would get simplified away because of the later zext, and then the later zext would get simplified away because of the assert. The solution is to teach SimplifyDemandedBits that assert_zext demands all of the high bits of its input, rather than only those demanded by its users. No testcase because the only example I have manifests as llvm-gcc miscompiling LLVM, and I haven't found a smaller case that reproduces this problem.
Fixes <rdar://problem/10063365>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139059 91177308-0d34-0410-b5e6-96231b3b80d8
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@135180 91177308-0d34-0410-b5e6-96231b3b80d8
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.
struct foo { char x[24]; };
long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl $3, %eax
imull $-1431655765, %eax, %eax
instead of
movl 4(%esp), %eax
subl 8(%esp), %eax
movl $715827883, %ecx
imull %ecx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
movl %edx, %eax
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134695 91177308-0d34-0410-b5e6-96231b3b80d8
(only happens when using the -promote-elements option).
The correct legalization order is to first try to promote element. Next, we try
to widen vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132648 91177308-0d34-0410-b5e6-96231b3b80d8