checks enabled:
1) Use '<' to compare integers in a comparison function rather than '<='.
2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.
The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123662 91177308-0d34-0410-b5e6-96231b3b80d8
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.
Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123609 91177308-0d34-0410-b5e6-96231b3b80d8
then don't try to decimate it into its individual pieces. This will just make a mess of the
IR and is pointless if none of the elements are individually accessed. This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.
The testcase now is optimized to:
define i64 @test2(i64 %X) {
br label %L2
L2: ; preds = %0
ret i64 %X
}
before we generated:
define i64 @test2(i64 %X) {
%sroa.store.elt = lshr i64 %X, 56
%1 = trunc i64 %sroa.store.elt to i8
%sroa.store.elt8 = lshr i64 %X, 48
%2 = trunc i64 %sroa.store.elt8 to i8
%sroa.store.elt9 = lshr i64 %X, 40
%3 = trunc i64 %sroa.store.elt9 to i8
%sroa.store.elt10 = lshr i64 %X, 32
%4 = trunc i64 %sroa.store.elt10 to i8
%sroa.store.elt11 = lshr i64 %X, 24
%5 = trunc i64 %sroa.store.elt11 to i8
%sroa.store.elt12 = lshr i64 %X, 16
%6 = trunc i64 %sroa.store.elt12 to i8
%sroa.store.elt13 = lshr i64 %X, 8
%7 = trunc i64 %sroa.store.elt13 to i8
%8 = trunc i64 %X to i8
br label %L2
L2: ; preds = %0
%9 = zext i8 %1 to i64
%10 = shl i64 %9, 56
%11 = zext i8 %2 to i64
%12 = shl i64 %11, 48
%13 = or i64 %12, %10
%14 = zext i8 %3 to i64
%15 = shl i64 %14, 40
%16 = or i64 %15, %13
%17 = zext i8 %4 to i64
%18 = shl i64 %17, 32
%19 = or i64 %18, %16
%20 = zext i8 %5 to i64
%21 = shl i64 %20, 24
%22 = or i64 %21, %19
%23 = zext i8 %6 to i64
%24 = shl i64 %23, 16
%25 = or i64 %24, %22
%26 = zext i8 %7 to i64
%27 = shl i64 %26, 8
%28 = or i64 %27, %25
%29 = zext i8 %8 to i64
%30 = or i64 %29, %28
ret i64 %30
}
In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off. It's better to not generate this stuff
in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123571 91177308-0d34-0410-b5e6-96231b3b80d8
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123568 91177308-0d34-0410-b5e6-96231b3b80d8
instead of DomTree/DomFrontier. This may be interesting for reducing compile
time. This is currently disabled, but seems to work just fine.
When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123434 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123381 91177308-0d34-0410-b5e6-96231b3b80d8
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123380 91177308-0d34-0410-b5e6-96231b3b80d8
if it is passed as a byval argument. The byval argument will just be a
read, so it is safe to read from the original global instead. This allows
us to promote away the %agg.tmp alloca in PR8582
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@119686 91177308-0d34-0410-b5e6-96231b3b80d8
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116820 91177308-0d34-0410-b5e6-96231b3b80d8
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@116334 91177308-0d34-0410-b5e6-96231b3b80d8
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115243 91177308-0d34-0410-b5e6-96231b3b80d8
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112696 91177308-0d34-0410-b5e6-96231b3b80d8
large integers, the first inserted value would always create
an 'or X, 0'. Even though this is trivially zapped by
instcombine, don't bother creating this pointless instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106979 91177308-0d34-0410-b5e6-96231b3b80d8
Probably the best way to know that all getOperand() calls have been handled
is to replace that API instead of updating.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@101579 91177308-0d34-0410-b5e6-96231b3b80d8
with a fix for self-hosting
rotate CallInst operands, i.e. move callee to the back
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@101465 91177308-0d34-0410-b5e6-96231b3b80d8