Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.
This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.
This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78142 91177308-0d34-0410-b5e6-96231b3b80d8
When LowerExtract eliminates an EXTRACT_SUBREG with a kill flag, it moves the
kill flag to the place where the sub-register is killed. This can accidentally
overlap with the use of a sibling sub-register, and we have trouble.
In the test case we have this code:
Live Ins: %R0 %R1 %R2
%R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
%R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
%R1L<def> = EXTRACT_SUBREG %R1<kill>, 1
%R0L<def> = EXTRACT_SUBREG %R0<kill>, 1
%R0H<def> = ADD16 %R2H<kill>, %R2L<kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: eliminated!
subreg: killed here: %R0H<def> = ADD16 %R2H, %R2L, %R2<imp-use,kill>, %AZ<imp-def>, %AN<imp-def>, %AC0<imp-def>, %V<imp-def>, %VS<imp-def>
The kill flag on %R2 is moved to the last instruction, and the live range overlaps with the definition of %R2H:
*** Bad machine code: Redefining a live physical register ***
- function: f
- basic block: 0x18358c0 (#0)
- instruction: %R2H<def> = LOAD16fi <fi#-1>, 0, Mem:LD(2,4) [FixedStack-1 + 0]
Register R2H was defined but already live.
The fix is to replace EXTRACT_SUBREG with IMPLICIT_DEF instead of eliminating
it completely:
subreg: CONVERTING: %R2L<def> = EXTRACT_SUBREG %R2<kill>, 1
subreg: replace by: %R2L<def> = IMPLICIT_DEF %R2<kill>
Note that these IMPLICIT_DEF instructions survive to the asm output. It is
necessary to fix the stack-color-with-reg test case because of that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78093 91177308-0d34-0410-b5e6-96231b3b80d8
killed by another operand.
There is probably a better fix. Either 1) scavenger can look at other operands, or
2) livevariables can be smarter about kill markers. Patches welcome.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78072 91177308-0d34-0410-b5e6-96231b3b80d8
When LowerSubregsInstructionPass::LowerInsert eliminates an INSERT_SUBREG
instriction because it is an identity copy, make sure that the same registers
are alive before and after the elimination.
When the super-register is marked <undef> this requires inserting an
IMPLICIT_DEF instruction to make sure the super register is live.
Fix a related bug where a kill flag on the inserted sub-register was not transferred properly.
Finally, clear the undef flag in MachineInstr::addRegisterKilled. Undef implies dead and kill implies live, so they cant both be valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77989 91177308-0d34-0410-b5e6-96231b3b80d8
This is not just a matter of passing in the target triple from the module;
currently backends are making decisions based on the build and host
architecture. The goal is to migrate to making these decisions based off of the
triple (in conjunction with the feature string). Thus most clients pass in the
target triple, or the host triple if that is empty.
This has one important change in the way behavior of the JIT and llc.
For the JIT, it was previously selecting the Target based on the host
(naturally), but it was setting the target machine features based on the triple
from the module. Now it is setting the target machine features based on the
triple of the host.
For LLC, -march was previously only used to select the target, the target
machine features were initialized from the module's triple (which may have been
empty). Now the target triple is taken from the module, or the host's triple is
used if that is empty. Then the triple is adjusted to match -march.
The take away is that -march for llc is now used in conjunction with the host
triple to initialize the subtarget. If users want more deterministic behavior
from llc, they should use -mtriple, or set the triple in the input module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77946 91177308-0d34-0410-b5e6-96231b3b80d8
__builtin_bfin_ones does the same as ctpop, so it can be implemented in the front-end.
__builtin_bfin_loadbytes loads from an unaligned pointer with the disalignexcpt instruction. It does the same as loading from a pointer with the low bits masked. It is better if the front-end creates a masked load. We can always instruction select the masked to disalignexcpt+load.
We keep csync/ssync/idle. These intrinsics represent instructions that need workarounds for some silicon revisions. We may even want to convert inline assembler to intrinsics to enable the workarounds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77917 91177308-0d34-0410-b5e6-96231b3b80d8
Allow imp-def and imp-use of anything in the scavenger asserts, just like the machine code verifier.
Allow redefinition of a sub-register of a live register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77904 91177308-0d34-0410-b5e6-96231b3b80d8
to:
.quad X
even on a 32-bit system, where X is not 64-bits. There isn't much that
we can do here, so we just print:
.quad ((X) & 4294967295)
instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77818 91177308-0d34-0410-b5e6-96231b3b80d8
myself because I'm getting tired of seeing the red buildbots, which have
been red since 5:30PM PDT last night.
Proposed supplement to developer policy: committers should make sure to
be around to watch for buildbot failures after committing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77785 91177308-0d34-0410-b5e6-96231b3b80d8
instructions for calls since BL and BLX are always 32-bit long and BX is always
16-bit long.
Also, we should be using BLX to call external function stubs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77756 91177308-0d34-0410-b5e6-96231b3b80d8
padding is disabled, tabs get replaced by spaces except in the case of
the first operand, where the tab is output to line up the operands after
the mnemonics.
Add some better comments and eliminate redundant code.
Fix some testcases to not assume tabs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77740 91177308-0d34-0410-b5e6-96231b3b80d8
to ensure the instruction that follows a TBB (when the number of table entries
is odd) is 2-byte aligned.
Patch by Sandeep Patel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77705 91177308-0d34-0410-b5e6-96231b3b80d8
into the mergable section if it is one of our special cases. This could
obviously be improved, but this is the minimal fix and restores us to the
previous behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77679 91177308-0d34-0410-b5e6-96231b3b80d8
When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix.
This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection.
Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77582 91177308-0d34-0410-b5e6-96231b3b80d8
due to x86 encoding restrictions. This is currently off by default
because it may cause code quality regressions. This is for PR4572.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77565 91177308-0d34-0410-b5e6-96231b3b80d8
wide vectors. Likewise, change VSTn intrinsics to take separate arguments
for each vector in a multi-vector struct. Adjust tests accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77468 91177308-0d34-0410-b5e6-96231b3b80d8
- This change also makes it possible to switch between ARM / Thumb on a
per-function basis.
- Fixed thumb2 routine which expand reg + arbitrary immediate. It was using
using ARM so_imm logic.
- Use movw and movt to do reg + imm when profitable.
- Other code clean ups and minor optimizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77300 91177308-0d34-0410-b5e6-96231b3b80d8
for now. Make the section switching directives more consistent
by not including \n and including \t for them all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77107 91177308-0d34-0410-b5e6-96231b3b80d8
and make it more aggressive, we now put:
const int G2 __attribute__((weak)) = 42;
into the text (readonly) segment like gcc, previously we put
it into the data (readwrite) segment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77104 91177308-0d34-0410-b5e6-96231b3b80d8
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@77024 91177308-0d34-0410-b5e6-96231b3b80d8
dumping ground of various SSE4.1 tests, since filecheck can reasonably
handle them all in one file. Generalize it to check x86-64 stuff as
well since it has a different ABI (a convenient way to test both the
reg and mem forms of these instructions).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76848 91177308-0d34-0410-b5e6-96231b3b80d8
Getelementptrs that are defined to wrap are virtually useless to
optimization, and getelementptrs that are undefined on any kind
of overflow are too restrictive -- it's difficult to ensure that
all intermediate addresses are within bounds. I'm going to take
a different approach.
Remove a few optimizations that depended on this flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76437 91177308-0d34-0410-b5e6-96231b3b80d8
Inline asm instructions may have additional <imp-def,kill> register operands.
These operands are not marked with a flag like the normal asm operands, so we
must not assert that there is a flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76373 91177308-0d34-0410-b5e6-96231b3b80d8
stack alignment right when it is. This is not
ideal but conservatively correct. Adjust a test
to compensate for changed stack offset value.
gcc.apple/asm-block-57.c
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76120 91177308-0d34-0410-b5e6-96231b3b80d8
The inline asm operands must be parsed from the first flag, you cannot assume
that an immediate operand preceeding a register use operand is the flag.
PowerPC "m" operands are represented as (flag, imm, reg) triples.
isRegTiedToDefOperand() would incorrectly interpret the imm as the flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76101 91177308-0d34-0410-b5e6-96231b3b80d8
Avoid remat'ing instructions whose def have sub-register indices for now. It's just really really hard to get all the cases right.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75900 91177308-0d34-0410-b5e6-96231b3b80d8
symbols were not getting stubs. While I'm at it, add a big testcase for
stub generation to make sure I don't break anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75737 91177308-0d34-0410-b5e6-96231b3b80d8
using horrible string hacking. This gives us a different label,
but it's just an assembler temporary, so the name doesn't matter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75733 91177308-0d34-0410-b5e6-96231b3b80d8
additional bug fixes:
1. The bug that everyone hit was a problem in the asmprinter where it
would remove $stub but keep the L prefix on a name when emitting the
indirect symbol. This is easy to fix by keeping the name of the stub
and the name of the symbol in a StringMap instead of just keeping a
StringSet and trying to reconstruct it late.
2. There was a problem printing the personality function. The current
logic to print out the personality function from the DWARF information
is a bit of a cesspool right now that duplicates a bunch of other
logic in the asm printer. The short version of it is that it depends
on emitting both the L and _ prefix for symbols (at least on darwin)
and until I can untangle it, it is best to switch the mangler back to
emitting both prefixes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75646 91177308-0d34-0410-b5e6-96231b3b80d8
unbreaking llvm-gcc (on Darwin).
--- Reverse-merging r75620 into '.':
U include/llvm/Support/Mangler.h
--- Reverse-merging r75610 into '.':
U test/CodeGen/X86/loop-hoist.ll
G include/llvm/Support/Mangler.h
U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U lib/VMCore/Mangler.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75636 91177308-0d34-0410-b5e6-96231b3b80d8
to symbols instead of doing it with "printSuffixedName". This gets us to the point
where there is a real separation between computing a symbol name and printing it,
something I need for MC printer stuff.
This patch also fixes a corner case bug where unnamed private globals wouldn't get
the private label prefix.
Next up, rename all uses of getValueName -> getMangledName for better greppability,
and then tackle the ppc/arm backends to eliminate "printSuffixedName".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75610 91177308-0d34-0410-b5e6-96231b3b80d8
indicates whether the label is private or not, instead of taking
prefix stuff. One effect of this is that symbols will be generated
with *just* the private prefix, instead of both the private prefix
*and* the user-label-prefix, but this doesn't matter as long as it
is consistent. For example we'll now get "Lfoo" instead of "L_foo".
These are just assembler temporary labels anyway, so they never even
make it into the .o file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75607 91177308-0d34-0410-b5e6-96231b3b80d8
1) unique globals with the existing "Count" local in Mangler, not with
atomic nonsense. Using atomics will give us nondeterminstic output
from the compiler when using multiple threads, which is bad.
2) Do not mangle an unknown global name with a type suffix. We don't
need this anymore now that llvm ir doesn't have type planes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75541 91177308-0d34-0410-b5e6-96231b3b80d8
of lea. It is better for code size (and presumably efficiency) to use:
movl $foo, %eax
rather than:
leal foo, eax
Both give a nice zero extending "move immediate" instruction, the former is just
smaller. Note that global addresses should be handled different by the x86
backend, but I chose to follow the style already in place and add more fixme's.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75403 91177308-0d34-0410-b5e6-96231b3b80d8
Basically, using:
lea symbol(%rip), %rax
is not valid in -static mode, because the current RIP may not be
within 32-bits of "symbol" when an app is built partially pic and
partially static. The fix for this is to compile it to:
lea symbol, %rax
It would be better to codegen this as:
movq $symbol, %rax
but that will come next.
The hard part of fixing this bug was fixing abi-isel, which was actively
testing for the wrong behavior. Also, the RUN lines are completely impossible
to understand what they are testing. To help with this, convert the -static
x86-64 codegen tests to use filecheck. This is much more stable and makes it
more clear what the codegen is expected to be.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75382 91177308-0d34-0410-b5e6-96231b3b80d8
A side-effect of this change is asm printer is now using unified assembly. There are some minor clean ups and fixes as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75359 91177308-0d34-0410-b5e6-96231b3b80d8
value. Adjust other code to deal with that correctly. Make
DAGTypeLegalizer::PromoteIntRes_EXTRACT_VECTOR_ELT take advantage of
this new flexibility to simplify the code and make it deal with unusual
vectors (like <4 x i1>) correctly. Fixes PR3037.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75176 91177308-0d34-0410-b5e6-96231b3b80d8
registers based on dynamic conditions. For example, X86 EBP/RBP, when used as
frame register has to be spilled in the first fixed object. It should inform
PEI this so it doesn't get allocated another stack object. Also, it should not
be spilled as other callee-saved registers but rather its spilling and restoring
are being handled by emitPrologue and emitEpilogue. Avoid spilling it twice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75116 91177308-0d34-0410-b5e6-96231b3b80d8
as an (index,bool) pair. The bool flag records whether the kill is a
PHI kill or not. This code will be used to enable splitting of live
intervals containing PHI-kills.
A slight change to live interval weights introduced an extra spill
into lsr-code-insertion (outside the critical sections). The test
condition has been updated to reflect this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75097 91177308-0d34-0410-b5e6-96231b3b80d8
* remove some old code that was needed when we'd put ESP in the scale instead of
the base of some instructions.
* Fix a bug with the P modifier in inline asm that caused us to drop it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@75077 91177308-0d34-0410-b5e6-96231b3b80d8
VSETCC must define all bits, which is different than it was documented
to before. Since all targets that implement VSETCC already have this
behavior, and we don't optimize based on this, just change the
documentation. We now get nice code for vec_compare.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74978 91177308-0d34-0410-b5e6-96231b3b80d8
finishes off enough support for vector compares to get the icmp/fcmp
version of 2008-07-23-VSetCC.ll passing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74961 91177308-0d34-0410-b5e6-96231b3b80d8
Note, isUndef marker must be placed even on implicit_def def operand or else the scavenger will not ignore it. This is necessary because -O0 path does not use liveintervalanalysis, it treats implicit_def just like any other def.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74601 91177308-0d34-0410-b5e6-96231b3b80d8
Avoid unnecessary duplication of operand 0 of X86::FpSET_ST0_80. This duplication would
cause one register to remain on the stack at the function return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74534 91177308-0d34-0410-b5e6-96231b3b80d8
The register allocator, when it allocates a register to a virtual register defined by an implicit_def, can allocate any physical register without worrying about overlapping live ranges. It should mark all of operands of the said virtual register so later passes will do the right thing.
This is not the best solution. But it should be a lot less fragile to having the scavenger try to track what is defined by implicit_def.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74518 91177308-0d34-0410-b5e6-96231b3b80d8
Not sure I understand how the temp register gets used,
but this fixes a bug and introduces no regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74446 91177308-0d34-0410-b5e6-96231b3b80d8
After much back and forth, I decided to deviate from ARM design and split LDR into 4 instructions (r + imm12, r + imm8, r + r << imm12, constantpool). The advantage of this is 1) it follows the latest ARM technical manual, and 2) makes it easier to reduce the width of the instruction later. The down side is this creates more inconsistency between the two sub-targets. We should split ARM LDR instruction in a similar fashion later. I've added a README entry for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74420 91177308-0d34-0410-b5e6-96231b3b80d8
implementation primarily differs from the former in that the asmprinter
doesn't make a zillion decisions about whether or not something will be
RIP relative or not. Instead, those decisions are made by isel lowering
and propagated through to the asm printer. To achieve this, we:
1. Represent RIP relative addresses by setting the base of the X86 addr
mode to X86::RIP.
2. When ISel Lowering decides that it is safe to use RIP, it lowers to
X86ISD::WrapperRIP. When it is unsafe to use RIP, it lowers to
X86ISD::Wrapper as before.
3. This removes isRIPRel from X86ISelAddressMode, representing it with
a basereg of RIP instead.
4. The addressing mode matching logic in isel is greatly simplified.
5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate
passed through various printoperand routines is gone now.
6. The various symbol printing routines in asmprinter now no longer infer
when to emit (%rip), they just print the symbol.
I think this is a big improvement over the previous situation. It does have
two small caveats though: 1. I implemented a horrible "no-rip" modifier for
the inline asm "P" constraint modifier. This is a short term hack, there is
a much better, but more involved, solution. 2. I had to xfail an
-aggressive-remat testcase because it isn't handling the use of RIP in the
constant-pool reading instruction. This specific test is easy to fix without
-aggressive-remat, which I intend to do next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@74372 91177308-0d34-0410-b5e6-96231b3b80d8