It also allows nested { } expressions, as now that they are sized, we can merge pull bits from the nested value.
In the current behaviour, everything in { } must have been convertible to a single bit.
However, now that binary literals are sized, its useful to be able to initialize a range of bits.
So, for example, its now possible to do
bits<8> x = { 0, 1, { 0b1001 }, 0, 0b0 }
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215086 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of these becoming an integer literal internally, they now become bits<n> values.
Prior to this change, 0b001 was 1 bit long. This is confusing as clearly the user gave 3 bits.
This new type holds both the literal value and the size, and so can ensure sizes match on initializers.
For example, this used to be legal
bits<1> x = 0b00;
but now it must be written as
bits<2> x = 0b00;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215084 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this change, it was legal to do something like
bits<2> opc = { 0, 1 };
bits<2> opc2 = { 1, 0 };
bits<2> a = { opc, opc2 };
This involved silently dropping bits from opc and opc2 which is very hard to debug.
Now the above test would be an error. Having tested with an assert, none of LLVM/clang was relying on this behaviour.
Thanks to Adam Nemet for the above test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215083 91177308-0d34-0410-b5e6-96231b3b80d8
The commit after this changes { } and 0bxx literals to be of type bits<n> and not int. This means we need to write exactly the right number of bits, and not rely on the values being silently zero extended for us.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215082 91177308-0d34-0410-b5e6-96231b3b80d8
This changes Win64EHEmitter into a utility WinEH UnwindEmitter that can be
shared across multiple architectures and a target specific bit which is
overridden (Win64::UnwindEmitter). This enables sharing the section selection
code across X86 and the intended use in ARM for emitting unwind information for
Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215050 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR18916. I don't think we need to implement support for either
hybrid syntax. Nobody should write Intel assembly with '%' prefixes on
their registers or AT&T assembly without them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215031 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r214761.
Revert while Reid investigates & provides a reproduction for an
assertion failure for this on Windows.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214999 91177308-0d34-0410-b5e6-96231b3b80d8
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8
In r210492 the logic of calculateDbgValueHistory was changed to end
register variable live ranges at the end of MBB conditionally on
the fact that the register was or not clobbered by the function body.
This requires an initial scan of all the operands of the function
to collect all clobbered registers. In a second pass over all
instructions, we compare this set with the set of clobbered
registers for the current MachineInstruction. This modification
incurred a compilation time regression on some benchmarks: the
debug info emission phase takes ~10% more time.
While a small performance hit is unavoidable due to the initial
scan requirement, we can improve the situation by avoiding to
create too many temporary sets and just use lambdas to work directly
on the result of the initial scan.
Fixes <rdar://problem/17884104>
Patch by Frederic Riss!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214987 91177308-0d34-0410-b5e6-96231b3b80d8
The handling of the epilogue is best expressed as an early exit and
there is no reason to look for register defs in DbgValue MIs.
Patch by Frederic Riss!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214986 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise we can end up with an argument frame size that is not a
multiple of stack slot size, which is very awkward.
This fixes PR20547, which was a bug in x86_64 Sys V vararg handling.
However, it's much easier to test this with x86 callee-cleanup
functions, which previously ended in "retl $6" instead of "retl $8".
This does affect behavior of all backends, but it presumably fixes the
same bug in all of them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214980 91177308-0d34-0410-b5e6-96231b3b80d8
For triple aarch64-linux-gnu we were incorrectly setting IRIX.
For triple aarch64 we are correctly setting SYSV.
Patch by Ana Pazos <apazos@codeaurora.org>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214974 91177308-0d34-0410-b5e6-96231b3b80d8
This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020.
Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg.
Assuming that works as advertised, there should be functional difference with this patch, just less code.
Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful.
Differential Revision: http://reviews.llvm.org/D4566
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214973 91177308-0d34-0410-b5e6-96231b3b80d8
This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214963 91177308-0d34-0410-b5e6-96231b3b80d8
Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214959 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically Cortex-A57. This probably applies to Cyclone too but I haven't enabled it for that as I can't test it.
This gives ~4% improvement on SPEC 174.vpr, and ~1% in 471.omnetpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214957 91177308-0d34-0410-b5e6-96231b3b80d8
test case to actually generate correct code.
The primary miscompile fixed here is that we weren't correctly handling
in-place elements in one half of a single-input v8i16 shuffle when
moving a dword of elements from that half to the other half. Some times,
we would clobber the in-place elements in forming the dword to move
across halves.
The fix to this involves forcibly marking the in-place inputs even when
there is no need to gather them into a dword, and to much more carefully
re-arrange the elements when grouping them into a dword to move across
halves. With these two changes we would generate correct shuffles for
the test case, but found another miscompile. There are also some random
perturbations of the generated shuffle pattern in SSE2. It looks like
a wash; more instructions in some cases fewer in others.
The second miscompile would corrupt the results into nonsense. This is
a buggy pattern in one of the added DAG combines. Mapping elements
through a PSHUFD when pairing redundant half-shuffles is *much* harder
than this code makes it out to be -- it requires reasoning about *all*
of where the input is used in the PSHUFD, not just one part of where it
is used. Plus, we can't combine a half shuffle *into* a PSHUFD but the
code didn't guard against it. I think this was just a bad idea and I've
just removed that aspect of the combine. No tests regress as
a consequence so seems OK.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214954 91177308-0d34-0410-b5e6-96231b3b80d8
not corrupting the mask by mutating it more times than intended. No
functionality changed (the results were non-overlapping so the old
version "worked" but was non-obvious).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214953 91177308-0d34-0410-b5e6-96231b3b80d8
already have a large number of blocks. Works around a performance issue with
the greedy register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214944 91177308-0d34-0410-b5e6-96231b3b80d8
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214943 91177308-0d34-0410-b5e6-96231b3b80d8
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214942 91177308-0d34-0410-b5e6-96231b3b80d8
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.
The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214939 91177308-0d34-0410-b5e6-96231b3b80d8
This simplifies construction and usage while making the data structure
smaller. It was a holdover from the days when we didn't have a separate
DebugLocList and all we had was a flat list of DebugLocEntries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214933 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214928 91177308-0d34-0410-b5e6-96231b3b80d8
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214923 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214919 91177308-0d34-0410-b5e6-96231b3b80d8
a test case.
We also miscompile this test case which is showing a serious flaw in the
single-input v8i16 shuffle code. I've left the specific instruction
checks FIXME-ed out until I can address the bug in the single-input
code, but I wanted to separate out a significant functionality change to
produce correct code from a very simple and targeted crasher fix.
The miscompile problem stems from keeping track of inputs by value
rather than by index. As a consequence of doing this, we can't reliably
update those inputs because they might swap and we can't detect this
without copying the mask.
The blend code now uses indices for the input lists and this seems
strictly better. It also should make it easier to sort things and do
other cleanups. I think the time has come to simplify The Great Lambda
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214914 91177308-0d34-0410-b5e6-96231b3b80d8
This allows accessing an SReg subregister with a normal subregister
index, instead of getting a machine verifier error.
Also be sure to include all of these subregisters in SReg_32.
This fixes inferring SGPR instead of SReg when finding a
super register class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214901 91177308-0d34-0410-b5e6-96231b3b80d8
`BasicBlockFwdRefs` (and `BlockAddrFwdRefs` before it) was being emptied
in a non-deterministic order. When predicting use-list order I've
worked around this another way, but even when parsing lazily (and we
can't recreate use-list order) use-lists should be deterministic.
Make them so by using a side-queue of functions with forward-referenced
blocks that gets visited in order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214899 91177308-0d34-0410-b5e6-96231b3b80d8
Optimize the following IR:
%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4
Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable.
Reviewed By: nicholas
Differential Revision: http://reviews.llvm.org/D3942
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214897 91177308-0d34-0410-b5e6-96231b3b80d8