Fix MLA defs to use register class GPRnopc.
Add encoding tests for multiply instructions.
(Alias for MUL/SMLAL/UMLAL added by r199026.)
Patch by Zhaoshi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199491 91177308-0d34-0410-b5e6-96231b3b80d8
and tweak comments prior to more invasive surgery. Also clean up some
other non-doxygen comments, and run clang-format over the parts that are
going to change dramatically in subsequent commits so that those don't
get cluttered with formatting changes.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199489 91177308-0d34-0410-b5e6-96231b3b80d8
the verifier after ensuring the CFG is at least usefully formed.
This fixes a number of problems:
1) The PreVerifier was missing the controls the Verifier provides over
*how* an invalid module is handled -- it just aborted the program!
Now it uses the same logic as the Verifier which is significantly
more library-friendly.
2) The DominatorTree used previously could have been cached and not
updated due to bugs in prior passes and we would silently use the
stale tree. This could cause dominance errors to not be as quickly
diagnosed.
3) We can now (in the next patch) pull the functionality of the verifier
apart from the pass infrastructure so that you can verify IR without
having any form of pass manager. This in turn frees the code to share
logic between old and new pass manager variants.
Along the way I fixed at least one annoying bug -- the state for
'Broken' wasn't being cleared from run to run causing all functions
visited after the first broken function to be marked as broken
regardless of whether *they* were a problem. Fortunately, I don't really
know much of a way to observe this peculiarity.
In case folks are worried about the runtime cost, its negligible.
I looked at running the entire regression test suite (which should be
a relatively good use of the verifier) before and after but was unable
to even measure the time spent on the verifier and there was no
regresion from before to after. I checked both with debug builds and
optimized builds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199487 91177308-0d34-0410-b5e6-96231b3b80d8
This makes things a lot easier, because we can now talk about the
"argument allocation", which allocates all the memory for the call in
one shot.
The only functional change is to the verifier for a feature that hasn't
shipped yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199434 91177308-0d34-0410-b5e6-96231b3b80d8
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.
Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).
Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199430 91177308-0d34-0410-b5e6-96231b3b80d8
with raw_ostream's write_escaped() method.
For example darwin's otool(1) program that uses the llvm
disassembler now produces disassembly like this:
leaq 0x7b(%rip), %rdi ## literal pool for: "%f\ntoto\n"
and not print the new lines which messes up the output.
rdar://15145300
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199407 91177308-0d34-0410-b5e6-96231b3b80d8
This is necessary because the classes are shared between all implementations.
No functional change since the InstrItinData's have been duplicated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199394 91177308-0d34-0410-b5e6-96231b3b80d8
IIArith -> II_ADD, II_ADDU, II_AND, II_CL[ZO], II_DADDIU, II_DADDU,
II_DROTR, II_DROTR32, II_DROTRV, II_DSLL, II_DSLL32, II_DSLLV,
II_DSR[AL], II_DSR[AL]32, II_DSR[AL]V, II_DSUBU, II_LUI, II_MOV[ZFNT],
II_NOR, II_OR, II_RDHWR, II_ROTR, II_ROTRV, II_SLL, II_SLLV, II_SR[AL],
II_SR[AL]V, II_SUBU, II_XOR
No functional change since the InstrItinData's have been duplicated.
This is necessary because the classes are shared between all schedulers.
Once this patch series is committed there will be an InstrItinClass for
each mnemonic with minimal grouping. This does increase the size of the
itinerary tables for each MIPS scheduler but we have a few options for dealing
with that later. These options include reducing the number of classes once
we see the best way to simplify them, or by extending tablegen to be able
to compress the table by eliminating duplicates entries, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199391 91177308-0d34-0410-b5e6-96231b3b80d8
Affects:
DMULT, DMULTu, MADD, MADD_MM, MADDU, MADDU_MM, MSUB, MSUB_MM, MSUBU,
MSUBU_MM, MULT, MULTu
Does not affect MULT_MM, MULTu_MM since they are currently miscategorised
as IIImul.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199381 91177308-0d34-0410-b5e6-96231b3b80d8
There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:
v1 = SOMETHING (v1 is QQ)
v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
...
uses of v2
In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:
v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
...
uses of v2:Q1_Q2
For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:
v1 = SOMETHING (v1 is QQ)
v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
...
uses of v2 as a QQQ
The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).
This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). This situation can be
spotted by the fact that both source and destination in the
not-quite-coalesced pair have a sub-register index and
rematerialisation is skipped in that situation.
Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).
rdar://problem/15775279
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199376 91177308-0d34-0410-b5e6-96231b3b80d8
flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.
- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.
This is the LLVM part of a larger change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199371 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the capability to dump export table contents. An example
output is this:
Export Table:
Ordinal RVA Name
5 0x2008 exportfn1
6 0x2010 exportfn2
By adding this feature to llvm-objdump, we will be able to use it to check
export table contents in LLD's tests. Currently we are doing binary
comparison in the tests, which is fragile and not readable to humans.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199358 91177308-0d34-0410-b5e6-96231b3b80d8
Move copying of global initializers below the cloning of functions.
The BlockAddress doesn't have access to the correct basic blocks until the
functions have been cloned. This causes the BlockAddress to point to the old
values. Just wait until the functions have been cloned before copying the
initializers.
PR13163
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199354 91177308-0d34-0410-b5e6-96231b3b80d8
DataRefImpl (a union of two integers and a pointer) is not the ideal data type
to represent a reference to an import directory entity. We should just use the
pointer to the import table and an offset instead to simplify. No functionality
change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199349 91177308-0d34-0410-b5e6-96231b3b80d8
than it needs to be by 1 bit but I need to finish some other things so
that all the boundary cases will work in that situation. constpool.c
in test-suite will fail to assemble under our new internal test-suite sync
without this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199343 91177308-0d34-0410-b5e6-96231b3b80d8
ARM assembly syntax uses @ for a comment, execpt for the second
parameter of the .symver directive which requires @ as part of the
symbol name. This commit fixes the parsing of this directive by
adding a special case for ARM for this one argumnet.
To make the change we had to move the AllowAtInIdentifier variable
to the MCAsmLexer interface (from AsmLexer) and expose a setter for
the value. The ELFAsmParser then toggles this value when parsing
the second argument to the .symver directive for a target that
uses @ as a comment symbol
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199339 91177308-0d34-0410-b5e6-96231b3b80d8
Add a hook in the C API of LTO so that clients of the code generator can set
their own handler for the LLVM diagnostics.
The handler is defined like this:
typedef void (*lto_diagnostic_handler_t)(lto_codegen_diagnostic_severity_t
severity, const char *diag, void *ctxt)
- severity says how bad this is.
- diag is a string that contains the diagnostic message.
- ctxt is the registered context for this handler.
This hook is more general than the lto_get_error_message, since this function
keeps only the latest message and can only be queried when something went wrong
(no warning for instance).
<rdar://problem/15517596>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199338 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a regression intruced by r199135.
Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().
However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.
After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.
This fixes the problem restoring the old behavior of SimplifyVBinOp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199328 91177308-0d34-0410-b5e6-96231b3b80d8
I did write a version returning ErrorOr<OwningPtr<Binary> >, but it is too
cumbersome to use without std::move. I will keep the patch locally and submit
when we switch to c++11.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199326 91177308-0d34-0410-b5e6-96231b3b80d8
MSVC on x64 requires that we create image relative symbol
references to refer to RTTI data. Seeing as how there is no way to
explicitly make reference to a given relocation type in LLVM IR, pattern
match expressions of the form &foo - &__ImageBase.
Differential Revision: http://llvm-reviews.chandlerc.com/D2523
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199312 91177308-0d34-0410-b5e6-96231b3b80d8
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.
The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.
In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199294 91177308-0d34-0410-b5e6-96231b3b80d8
The GNU as behavior is a bit different and very strange. It will mark any
label that contains an instruction. We can implement that, but using the
type looks more natural since gas will not mark a function if a .word is
used to output the instructions!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199287 91177308-0d34-0410-b5e6-96231b3b80d8
When expanding neon pseudo stores, it may miss the implicit uses of sub
regs, which may cause post RA scheduler reorder instructions that
breakes anti dependency.
For example:
VST1d64QPseudo %R0<kill>, 16, %Q9_Q10, pred:14, pred:%noreg
will be expanded to
VST1d64Q %R0<kill>, 16, %D18, pred:14, pred:%noreg;
An instruction that defines %D20 may be scheduled before the store by
mistake.
This patches adds implicit uses for such case. For the example above, it
emits:
VST1d64Q %R0<kill>, 8, %D18, pred:14, pred:%noreg, %Q9_Q10<imp-use>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199282 91177308-0d34-0410-b5e6-96231b3b80d8
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).
This fixes PR18399.
Patch by Zhaoshi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199266 91177308-0d34-0410-b5e6-96231b3b80d8
We should set them to expand for now since there are no patterns
dealing with them. Actually, there are no instructions either so I
doubt they'll ever be acceptable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199265 91177308-0d34-0410-b5e6-96231b3b80d8
promotion code, Tablegen will now select FPExt for floating point promotions
(previously it had returned AExt, which is not valid for floating point types).
Any out-of-tree targets that were relying on AExt being returned for FP
promotions will need to update their code check for FPExt instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199252 91177308-0d34-0410-b5e6-96231b3b80d8
This also fixes the placement of the function label comment. It was being
placed next to the mips16 directive instead of next to the label.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199245 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll. The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")). This commit
fixes the bug.
The original commit message follows.
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.
This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker. This puts the onus on the
linker to decide whether (and what) to internalize.
In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.
This patch enables three strategies:
- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
visibility.
LTO_INTERNALIZE_FULL should be used when linking an executable.
Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized. E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise. However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.
lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
<rdar://problem/14334895>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199244 91177308-0d34-0410-b5e6-96231b3b80d8
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199218 91177308-0d34-0410-b5e6-96231b3b80d8
Sorry, I don't understand why the warning is generated (a gcc
bug?). Anyhow, the change should improve readablity. No functionality
change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199214 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a regression intruced by r198113.
Revision r198113 introduced an algorithm that tries to fold a vector shift
by immediate count into a build_vector if the input vector is a known vector
of constants.
However the algorithm only worked under the assumption that the input vector
type and the shift type are exactly the same.
This patch disables the folding of vector shift by immediate count if the
input vector type and the shift value type are not the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199213 91177308-0d34-0410-b5e6-96231b3b80d8
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199204 91177308-0d34-0410-b5e6-96231b3b80d8
fastcall requires @ as global prefix instead of _ but getNameWithPrefix
wrongly assumes the OutName buffer is empty and replaces at index 0.
For imported functions this buffer is pre-filled with "__imp_" resulting
in broken "@_imp_foo@0" mangling.
Instead replace at the proper index. We also never have to prepend the
@-prefix because this fastcall mangling is only used on 32-bit Windows
targets which have _ has global prefix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199203 91177308-0d34-0410-b5e6-96231b3b80d8
This should allow SSE instructions to be encoded correctly in 16-bit mode which r198586 probably broke.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199193 91177308-0d34-0410-b5e6-96231b3b80d8
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.
This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker. This puts the onus on the
linker to decide whether (and what) to internalize.
In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.
This patch enables three strategies:
- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
visibility.
LTO_INTERNALIZE_FULL should be used when linking an executable.
Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized. E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise. However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.
lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().
<rdar://problem/14334895>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199191 91177308-0d34-0410-b5e6-96231b3b80d8
When creating a virtual register for a def, the value type should be
used to pick the register class. If we only use the register class
constraint on the instruction, we might pick a too large register class.
Some registers can store values of different sizes. For example, the x86
xmm registers can hold f32, f64, and 128-bit vectors. The three
different value sizes are represented by register classes with identical
register sets: FR32, FR64, and VR128. These register classes have
different spill slot sizes, so it is important to use the right one.
The register class constraint on an instruction doesn't necessarily care
about the size of the value its defining. The value type determines
that.
This fixes a problem where InstrEmitter was picking 32-bit register
classes for 64-bit values on SPARC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199187 91177308-0d34-0410-b5e6-96231b3b80d8
The already allocatable DPair superclass contains odd-even D register
pair in addition to the even-odd pairs in the QPR register class. There
is no reason to constrain the set of D register pairs that can be used
for NEON values. Any NEON instructions that require a Q register will
automatically constrain the register class to QPR.
The allocation order for DPair begins with the QPR registers, so
register allocation is unlikely to change much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199186 91177308-0d34-0410-b5e6-96231b3b80d8
We need to ensure that StackSlotColoring.cpp does not reuse stack
spill slots in functions that call "returns_twice" functions such as
setjmp(), otherwise this can lead to miscompiled code, because a stack
slot would be clobbered when it's still live.
This was already handled correctly for functions that call setjmp()
(though this wasn't covered by a test), but not for functions that
invoke setjmp().
We fix this by changing callsFunctionThatReturnsTwice() to check for
invoke instructions.
This fixes PR18244.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199180 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow it to be called from target independent parts of the main
streamer that don't know if there is a registered target streamer or not. This
in turn will allow targets to perform extra actions at specified points in the
interface: add extra flags for some labels, extra work during finalization, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199174 91177308-0d34-0410-b5e6-96231b3b80d8
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.
Reviewed by Andrea Di Biagio
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199135 91177308-0d34-0410-b5e6-96231b3b80d8
This is a very confusing option for a feature that will go away.
-enable-misched is exposed instead to help triage issues with the new
scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199133 91177308-0d34-0410-b5e6-96231b3b80d8
The issue is caused when Post-RA scheduler reorders a bundle instruction
(IT block). However, it only flips the CPSR liveness of the bundle instruction,
leaves the instructions inside the bundle unchanged, which causes inconstancy and crashes
Thumb2SizeReduction.cpp::ReduceMBB().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199127 91177308-0d34-0410-b5e6-96231b3b80d8
APInt only knows how to compare values with the same BitWidth and asserts
in all other cases.
With this fix, function PerformORCombine does not use the APInt equality
operator if the APInt values returned by 'isConstantSplat' differ in BitWidth.
In that case they are different and no comparison is needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199119 91177308-0d34-0410-b5e6-96231b3b80d8
...into (ashr (shl (anyext X), ...), ...), which requires one fewer
instruction. The (anyext X) can sometimes be simplified too.
I didn't do this in DAGCombiner because widening shifts isn't a win
on all targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199114 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we only used GPR for the destination placeholder in "ldr rD, [pc,
incorrect codegen under the integrated assembler.
This should fix both issues (which probably only affect MachO targets at the
moment).
rdar://problem/15800156
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199108 91177308-0d34-0410-b5e6-96231b3b80d8
This finishes the job started in r198756, and creates separate opcodes for
64-bit vs. 32-bit versions of the rest of the RET instructions too.
LRETL/LRETQ are interesting... I can't see any justification for their
existence in the SDM. There should be no 'LRETL' in 64-bit mode, and no
need for a REX.W prefix for LRETQ. But this is what GAS does, and my
Sandybridge CPU and an Opteron 6376 concur when tested as follows:
asm __volatile__("pushq $0x1234\nmovq $0x33,%rax\nsalq $32,%rax\norq $1f,%rax\npushq %rax\nlretl $8\n1:");
asm __volatile__("pushq $1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
asm __volatile__("pushq $0x33\npushq $1f\nlretq\n1:");
asm __volatile__("pushq $0x1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
cf. PR8592 and commit r118903, which added LRETQ. I only added LRETIQ to
match it.
I don't quite understand how the Intel syntax parsing for ret
instructions is working, despite r154468 allegedly fixing it. Aren't the
explicitly sized 'retw', 'retd' and 'retq' supposed to work? I have at
least made the 'lretq' work with (and indeed *require*) the 'q'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199106 91177308-0d34-0410-b5e6-96231b3b80d8
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199104 91177308-0d34-0410-b5e6-96231b3b80d8
trees into the Support library.
These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.
So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.
The yak shaving will continue...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199095 91177308-0d34-0410-b5e6-96231b3b80d8
Very sorry, this was a premature patch that I still need to investigate and
finish off (for some reason beyond me at the moment it doesn't actually fix the
issue in all cases).
This reverts commit r199091.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199093 91177308-0d34-0410-b5e6-96231b3b80d8
There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:
v1 = SOMETHING (v1 is QQ)
v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
...
uses of v2
In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:
v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
...
uses of v2:Q1_Q2
For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:
v1 = SOMETHING (v1 is QQ)
v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
...
uses of v2 as a QQQ
The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).
This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). Hopefully this situation is rare
enough that it won't matter.
Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199091 91177308-0d34-0410-b5e6-96231b3b80d8
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the old pass creation functionality to its own header and
updates the callers of that routine. Then it adds a new PM supporting
bitcode writer to the header file, and wires that up in the opt tool.
A test is added that round-trips code into bitcode and back out using
the new pass manager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199078 91177308-0d34-0410-b5e6-96231b3b80d8
This patch covered 2 more scenarios:
1. Two operands of shuffle_vector are the same, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> %a, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
2. One of operands is undef, like
%shuffle.i = shufflevector <8 x i8> %a, <8 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
After this patch, perm instructions will have chance to be emitted instead of lots of INS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199069 91177308-0d34-0410-b5e6-96231b3b80d8
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.
Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199068 91177308-0d34-0410-b5e6-96231b3b80d8
Targets like SPARC and MIPS have delay slots and normally bundle the
delay slot instruction with the corresponding terminator.
Teach isBlockOnlyReachableByFallthrough to find any MBB operands on
bundled terminators so SPARC doesn't need to specialize this function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199061 91177308-0d34-0410-b5e6-96231b3b80d8
This implements the legacy passes in terms of the new ones. It adds
basic testing using explicit runs of the passes. Next up will be wiring
the basic output mechanism of opt up when the new pass manager is
engaged unless bitcode writing is requested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199049 91177308-0d34-0410-b5e6-96231b3b80d8
API is exposed.
This removes the support for deleting the ostream, switches the member
and constructor order arround to be consistent with the creation
routines, and switches to using references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199047 91177308-0d34-0410-b5e6-96231b3b80d8
Nothing was using the ability of the pass to delete the raw_ostream it
printed to, and nothing was trying to pass it a pointer to the
raw_ostream. Also, the function variant had a different order of
arguments from all of the others which was just really confusing. Now
the interface accepts a reference, doesn't offer to delete it, and uses
a consistent order. The implementation of the printing passes haven't
been updated with this simplification, this is just the API switch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199044 91177308-0d34-0410-b5e6-96231b3b80d8
name to match the source file which I got earlier. Update the include
sites. Also modernize the comments in the header to use the more
recommended doxygen style.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199041 91177308-0d34-0410-b5e6-96231b3b80d8
An improper qualifier would result in a superfluous error due to the parser not
consuming the remainder of the statement. Simply consume the remainder of the
statement to avoid the error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199035 91177308-0d34-0410-b5e6-96231b3b80d8
The implicit immediate 0 forms are assembly aliases, not distinct instruction
encodings. Fix the initial implementation introduced in r198914 to an alias to
avoid two separate instruction definitions for the same encoding.
An InstAlias is insufficient in this case as the necessary due to the need to
add a new additional operand for the implicit zero. By using the AsmPsuedoInst,
fall back to the C++ code to transform the instruction to the equivalent
_POST_IMM form, inserting the additional implicit immediate 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199032 91177308-0d34-0410-b5e6-96231b3b80d8
This is different from the argument passing convention which puts the
first float argument in %f1.
With this patch, all returned floats are treated as if the 'inreg' flag
were set. This means multiple float return values get packed in %f0,
%f1, %f2, ...
Note that when returning a struct in registers, clang will set the
'inreg' flag on the return value, so that behavior is unchanged. This
also happens when returning a float _Complex.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199028 91177308-0d34-0410-b5e6-96231b3b80d8
case when the lookup table doesn't have any holes.
This means we can build a lookup table for switches like this:
switch (x) {
case 0: return 1;
case 1: return 2;
case 2: return 3;
case 3: return 4;
default: exit(1);
}
The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.
This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199025 91177308-0d34-0410-b5e6-96231b3b80d8
A 32-bit immediate value can be formed from a constant expression and loaded
into a register. Add support to emit this into an object file. Because this
value is a constant, a relocation must *not* be produced for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199023 91177308-0d34-0410-b5e6-96231b3b80d8
mode that can be used to debug the execution of everything.
No support for analyses here, that will come later. This already helps
show parts of the opt commandline integration that isn't working. Tests
of that will start using it as the bugs are fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199004 91177308-0d34-0410-b5e6-96231b3b80d8
Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
select the proper mask depending on the target cpu we compile for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198985 91177308-0d34-0410-b5e6-96231b3b80d8