I can see with the original code was that I forgot that this runs after
type legalization and hence the result type will always be i32. (Custom
legalization of EXTRACT_VECTOR_ELT is only enabled for vector types with
8- and 16-bit elements.)
Regarding the FIXME comment: any information about sign and zero-extension
should be captured by separate extension operations. The DAG combiner should
handle those to produce either VGETLANEu or VGETLANEs, and that seems to be
working now. If there are cases that we're missing, let me know.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84218 91177308-0d34-0410-b5e6-96231b3b80d8
1. Emit external function type information for all COFF targets since it's
a feature of object format
2. Emit linker directives only for cygming (since this is ld-specific stuff)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84214 91177308-0d34-0410-b5e6-96231b3b80d8
In the case where there are no good places to put constants and we fall back
upon inserting unconditional branches to make new blocks, allow all constant
pool references in range of those blocks to put constants there, even if that
means resetting the "high water marks" for those references. This will still
terminate because you can't keep splitting blocks forever, and in the bad
cases where we have to split blocks, it is important to avoid splitting more
than necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84202 91177308-0d34-0410-b5e6-96231b3b80d8
as expressions, code for parsing a few arm specific directives (still needs
the MCStreamer calls for these). Some clean up of the operand parsing code
and adding some comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84201 91177308-0d34-0410-b5e6-96231b3b80d8
identifying the malloc as a non-array malloc. This broke GlobalOpt's optimization of stores of mallocs
to global variables.
The fix is to classify malloc's into 3 categories:
1. non-array mallocs
2. array mallocs whose array size can be determined
3. mallocs that cannot be determined to be of type 1 or 2 and cannot be optimized
getMallocArraySize() returns NULL for category 3, and all users of this function must avoid their
malloc optimization if this function returns NULL.
Eventually, currently unexpected codegen for computing the malloc's size argument will be supported in
isArrayMalloc() and getMallocArraySize(), extending malloc optimizations to those examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84199 91177308-0d34-0410-b5e6-96231b3b80d8
When ARMConstantIslandPass cannot find any good locations (i.e., "water") to
place constants, it falls back to inserting unconditional branches to make a
place to put them. My recent change exposed a problem in this area. We may
sometimes append to the same block more than one unconditional branch. The
symptoms of this are that the generated assembly has a branch to an undefined
label and running llc with -debug will cause a seg fault.
This happens more easily since my change to prevent CPEs from moving from
lower to higher addresses as the algorithm iterates, but it could have
happened before. The end of the block may be in range for various constant
pool references, but the insertion point for new CPEs is not right at the end
of the block -- it is at the end of the CPEs that have already been placed
at the end of the block. The insertion point could be out of range. When
that happens, the fallback code will always append another unconditional
branch if the end of the block is in range.
The fix is to only append an unconditional branch if the block does not
already end with one. I also removed a check to see if the constant pool load
instruction is at the end of the block, since that is redundant with
checking if the end of the block is in-range.
There is more to be done here, but I think this fixes the immediate problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84172 91177308-0d34-0410-b5e6-96231b3b80d8
don't bother every time going around the main worklist. This speeds up a
release-asserts opt -std-compile-opts on 403.gcc by about 4% (1.5s). It
seems to speed up the most expensive instances of instcombine by ~10%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84171 91177308-0d34-0410-b5e6-96231b3b80d8
instruction (which disqualifies stores, unreachable, etc) and at least the
first operand is a constant. This filters out a lot of obvious cases that
can't be folded. Also, switch the IRBuilder to a TargetFolder, which tries
harder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84170 91177308-0d34-0410-b5e6-96231b3b80d8
header is just the entry block to the loop, and it needn't be at
the top of the loop in the code layout.
Remove the code that suppressed loop alignment for outer loops,
so that outer loops are aligned.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84158 91177308-0d34-0410-b5e6-96231b3b80d8
so get rid of eh.selector.i64 and rename eh.selector.i32 to eh.selector.
Likewise for eh.typeid.for. This aligns us with gcc, which always uses a
32 bit value for the selector on all platforms. My understanding is that
the register allocator used to assert if the selector intrinsic size didn't
match the pointer size, and this was the reason for introducing the two
variants. However my testing shows that this is no longer the case (I
fixed some bugs in selector lowering yesterday, and some more today in the
fastisel path; these might have caused the original problems).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84106 91177308-0d34-0410-b5e6-96231b3b80d8
cannot alias the GEP. GEP pointer alias rule states this clearly:
A pointer value formed from a getelementptr instruction is associated with the
addresses associated with the first operand of the getelementptr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84079 91177308-0d34-0410-b5e6-96231b3b80d8
(for uses marked kill and defs marked dead) a few instructions in
addition to forwards. Also, increase the maximum number of instructions
to scan, as it appears to help in a fair number of cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84061 91177308-0d34-0410-b5e6-96231b3b80d8
to remat non-load instructions as loads, and the remat code now uses
the UnmodeledSideEffects flags, MachineMemOperands, and similar things
to decide which instructions are valid for rematerialization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84060 91177308-0d34-0410-b5e6-96231b3b80d8
Also fixed a couple of coding style things that crept in. And added more
to the temporary hacked up ARMAsmParser::MatchInstruction() method for testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84040 91177308-0d34-0410-b5e6-96231b3b80d8
multiple instructions, the expansion is done during selection so there is
no need to do anything special during legalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84036 91177308-0d34-0410-b5e6-96231b3b80d8
truncating an SDValue (depending on whether the target
type is bigger or smaller than the value's type); or zero
extending or truncating it. Use it in a few places (this
seems to be a popular operation, but I only modified cases
of it in SelectionDAGBuild). In particular, the eh_selector
lowering was doing this wrong due to a repeated rather than
inverted test, fixed with this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84027 91177308-0d34-0410-b5e6-96231b3b80d8
be in a register. The previous use of ARM address mode 2 was completely
arbitrary and inappropriate for Thumb. Radar 7137468.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@84022 91177308-0d34-0410-b5e6-96231b3b80d8
BasicBlocks, so that it doesn't blindly procede in the presence of
large individual BasicBlocks. This addresses a class of code-size
expansion problems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83992 91177308-0d34-0410-b5e6-96231b3b80d8
GlobalValue is destroyed. Function destruction still leaks machine code and
can crash on leaked stubs, but this is some progress.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83987 91177308-0d34-0410-b5e6-96231b3b80d8
before its reference is only supported on ARM has not been true for a while.
In fact, until recently, that was only supported for Thumb. Besides that,
CPEs are always a multiple of 4 bytes in size, so inserting a CPE should have
no effect on Thumb alignment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83916 91177308-0d34-0410-b5e6-96231b3b80d8
MultiSource/Benchmarks/MiBench/automotive-susan test. The failure has
since been masked by an unrelated change (just randomly), so I don't have
a testcase for this now. Radar 7291928.
The situation where this happened is that a constant pool entry (CPE) was
placed at a lower address than the load that referenced it. There were in
fact 2 CPEs placed at adjacent addresses and referenced by 2 loads that were
close together in the code. The distance from the loads to the CPEs was
right at the limit of what they could handle, so that only one of the CPEs
could be placed within range. On every iteration, the first CPE was found
to be out of range, causing a new CPE to be inserted. The second CPE had
been in range but the newly inserted entry pushed it too far away. Thus the
second CPE was also replaced by a new entry, which in turn pushed the first
CPE out of range. Etc.
Judging from some comments in the code, the initial implementation of this
pass did not support CPEs placed _before_ their references. In the case
where the CPE is placed at a higher address, the key to making the algorithm
terminate is that new CPEs are only inserted at the end of a group of adjacent
CPEs. This is implemented by removing a basic block from the "WaterList"
once it has been used, and then adding the newly inserted CPE block to the
list so that the next insertion will come after it. This avoids the ping-pong
effect where CPEs are repeatedly moved to the beginning of a group of
adjacent CPEs. This does not work when going backwards, however, because the
entries at the end of an adjacent group of CPEs are closer than the CPEs
earlier in the group.
To make this pass terminate, we need to maintain a property that changes can
only happen in some sort of monotonic fashion. The fix used here is to require
that the CPE for a particular constant pool load can only move to lower
addresses. This is a very simple change to the code and should not cause
any significant degradation in the results.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83902 91177308-0d34-0410-b5e6-96231b3b80d8
bootstrap of FSF-style PPC, so there is some
reason to believe the original bug (which was
never analyzed) has been fixed, probably by
82266.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83871 91177308-0d34-0410-b5e6-96231b3b80d8
it to hold the address of an sret return value, for x86-64 ABI purposes.
Also, fix the test that was originally intended to test this to actually
test it, using FileCheck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83853 91177308-0d34-0410-b5e6-96231b3b80d8
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
Try #3, this time with some unneeded debug info stuff removed
which was causing dead pointers to be added to the worklist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83818 91177308-0d34-0410-b5e6-96231b3b80d8
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83814 91177308-0d34-0410-b5e6-96231b3b80d8
into a shuffle even if it was used by another insertelement. If the
visitation order of instcombine was wrong, this would turn a chain of
insertelements into a chain of shufflevectors, which was quite painful.
Since CollectShuffleElements handles these cases, the code can just
be nuked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83810 91177308-0d34-0410-b5e6-96231b3b80d8
input the the mul is a zext from bool, just that it is all zeros
other than the low bit. This fixes some phase ordering issues
that would cause us to miss some xforms in mul.ll when the worklist
is visited differently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83794 91177308-0d34-0410-b5e6-96231b3b80d8
it to visit instructions from the start of the function to the
end of the function in the first path. This greatly speeds up
some pathological cases (e.g. PR5150).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83790 91177308-0d34-0410-b5e6-96231b3b80d8
For now the metadata of sinked/hoisted instructions is still wrong, but that'll
be fixed when instructions will have debug metadata directly attached.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83786 91177308-0d34-0410-b5e6-96231b3b80d8
done by condprop, but do it in a much more general form. The
basic idea is that we can do a limited form of tail duplication
in the case when we have a branch on a phi. Moving the branch
up in to the predecessor block makes instruction selection
much easier and encourages chained jump threadings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83759 91177308-0d34-0410-b5e6-96231b3b80d8
inserted only once, just use vector. Don't compute ExitBlocks unless we
need it, change std::sort to array_pod_sort.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83747 91177308-0d34-0410-b5e6-96231b3b80d8
from GVN, this also speeds it up, inserts fewer PHI nodes (see the
testcase) and allows it to remove more loads (due to fewer PHI nodes
standing in the way).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83746 91177308-0d34-0410-b5e6-96231b3b80d8
not just at the end. Add a big comment explaining when this could
be useful (which never happens for jump threading).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83741 91177308-0d34-0410-b5e6-96231b3b80d8
DemoteRegToStack. This makes it more efficient (because it isn't
creating a ton of load/stores that are eventually removed by a later
mem2reg), and more slightly more effective (because those load/stores
don't get in the way of threading).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83706 91177308-0d34-0410-b5e6-96231b3b80d8
into MachineInstrs. This is mostly just moving the code from
ScheduleDAGSDNodesEmit.cpp into a new class. This decouples MachineInstr
emitting from scheduling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83699 91177308-0d34-0410-b5e6-96231b3b80d8
is trivially rematerializable and integrate it into
TargetInstrInfo::isTriviallyReMaterializable. This way, all places that
need to know whether an instruction is rematerializable will get the
same answer.
This enables the useful parts of the aggressive-remat option by
default -- using AliasAnalysis to determine whether a memory location
is invariant, and removes the questionable parts -- rematting operations
with virtual register inputs that may not be live everywhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83687 91177308-0d34-0410-b5e6-96231b3b80d8
While recording beginning of a function, use scope info from the first location entry instead of just relying on first location entry itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83684 91177308-0d34-0410-b5e6-96231b3b80d8
mappings, which could cause errors and assert-failures. This patch fixes that,
adds a test, and refactors the global-mapping-removal code into a single place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83678 91177308-0d34-0410-b5e6-96231b3b80d8
constants used in inlining heuristics (especially
those used in more than one file). No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83675 91177308-0d34-0410-b5e6-96231b3b80d8
lists. Changed ARMAsmParser::MatchRegisterName to return -1 instead of 0 on
errors so 0-15 values could be returned as register numbers. Also added the
rest of the arm register names to the currently hacked up version to allow more
testing. Some changes to ARMAsmParser::ParseOperand to give different errors
for things not yet supported and some additions to the hacked
ARMAsmParser::MatchInstruction to allow more testing for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83673 91177308-0d34-0410-b5e6-96231b3b80d8
when one of the bits being tested would end up being the sign bit in the
narrower type, and a signed comparison is being performed, since this would
change the result of the signed comparison. This fixes PR5132.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83670 91177308-0d34-0410-b5e6-96231b3b80d8
and that will make Caller too big to inline, see if it
might be better to inline Caller into its callers instead.
This situation is described in PR 2973, although I haven't
tried the specific case in SPASS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@83602 91177308-0d34-0410-b5e6-96231b3b80d8