All arguments are formally assigned to stack positions and then promoted
to floating point and integer registers. Since there are more floating
point registers than integer registers, this can cause situations where
floating point arguments are assigned to registers after integer
arguments that where assigned to the stack.
Use the inreg flag to indicate 32-bit fragments of structs containing
both float and int members.
The three-way shadowing between stack, integer, and floating point
registers requires custom argument lowering. The good news is that
return values are passed in the exact same way, and we can share the
code.
Still missing:
- Update LowerReturn to handle structs returned in registers.
- LowerCall.
- Variadic functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178958 91177308-0d34-0410-b5e6-96231b3b80d8
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate. Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.
v2:
- Clean up logic
Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178927 91177308-0d34-0410-b5e6-96231b3b80d8
On cores for which we know the misprediction penalty, and we have
the isel instruction, we can profitably perform early if conversion.
This enables us to replace some small branch sequences with selects
and avoid the potential stalls from mispredicting the branches.
Enabling this feature required implementing canInsertSelect and
insertSelect in PPCInstrInfo; isel code in PPCISelLowering was
refactored to use these functions as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178926 91177308-0d34-0410-b5e6-96231b3b80d8
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178922 91177308-0d34-0410-b5e6-96231b3b80d8
The normal dataflow sequence in the ARC optimizer consists of the following
states:
Retain -> CanRelease -> Use -> Release
The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).
If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:
Retain -> CanRelease -> Release (TopDown)
Retain <- Use <- Release (BottomUp)
*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178921 91177308-0d34-0410-b5e6-96231b3b80d8
This optimization is unstable at this moment; it
1) block us on a very important application
2) PR15200
3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
(the CHECK command compare the output against wrong result)
I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way. Although in theory downstream optimizaters might reveal
some gather/scatter optimization opportunities, the chance is quite slim.
For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure
out a stretch of memory tightenly cover the area accessed by the dynamic index.
rdar://13174884
PR15200
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178912 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-mips-linux green.
llvm-mips-linux runs on a big endian machine. This test passes if I change 'e'
to 'E' in the target data layout string.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178910 91177308-0d34-0410-b5e6-96231b3b80d8
memory operands.
Essentially, this layers an infix calculator on top of the parsing state
machine. The scale on the index register is still expected to be an immediate
__asm mov eax, [eax + ebx*4]
and will not work with more complex expressions. For example,
__asm mov eax, [eax + ebx*(2*2)]
The plus and minus binary operators assume the numeric value of a register is
zero so as to not change the displacement. Register operands should never
be an operand for a multiply or divide operation; the scale*indexreg
expression is always replaced with a zero on the operand stack to prevent
such a case.
rdar://13521380
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178881 91177308-0d34-0410-b5e6-96231b3b80d8
InMemoryStruct is extremely dangerous as it returns data from an internal
buffer when the endiannes doesn't match. This should fix the tests on big
endian hosts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178875 91177308-0d34-0410-b5e6-96231b3b80d8
When the RuntimeDyldELF::processRelocationRef routine finds the target
symbol of a relocation in the local or global symbol table, it performs
a section-relative relocation:
Value.SectionID = lsi->second.first;
Value.Addend = lsi->second.second;
At this point, however, any Addend that might have been specified in
the original relocation record is lost. This is somewhat difficult to
trigger for relocations within the code section since they usually
do not contain non-zero Addends (when built with the default JIT code
model, in any case). However, the problem can be reliably triggered
by a relocation within the data section caused by code like:
int test[2] = { -1, 0 };
int *p = &test[1];
The initializer of "p" will need a relocation to "test + 4". On
platforms using RelA relocations this means an Addend of 4 is required.
Current code ignores this addend when processing the relocation,
resulting in incorrect execution.
Fixed by taking the Addend into account when processing relocations
to symbols found in the local or global symbol table.
Tested on x86_64-linux and powerpc64-linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178869 91177308-0d34-0410-b5e6-96231b3b80d8
Pass down the fact that an operand is going to be a vector of constants.
This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178809 91177308-0d34-0410-b5e6-96231b3b80d8
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.
With this change we differentiate between shifts of constants and other shifts.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178801 91177308-0d34-0410-b5e6-96231b3b80d8
It had been dropped during the switch to yaml::IO. Also add a test going
from yaml2obj to llvm-readobj. It can be extended as we add more
fields/formats to yaml2obj.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178786 91177308-0d34-0410-b5e6-96231b3b80d8
At the time when the XCore backend was added there were some issues with
with overlapping register classes but these all seem to be fixed now.
Describing the register classes correctly allow us to get rid of a
codegen only instruction (LDAWSP_lru6_RRegs) and it means we can
disassemble ru6 instructions that use registers above r11.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178782 91177308-0d34-0410-b5e6-96231b3b80d8
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.
Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.
Also process blocks in a reverse post-order and propagate high-latency
flags to successors.
<rdar://problem/13468102>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178773 91177308-0d34-0410-b5e6-96231b3b80d8
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178737 91177308-0d34-0410-b5e6-96231b3b80d8
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.
For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.
radar://13130673
radar://13537826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
Normally r_info is just a 32 of 64 bit number matching the endian of the rest
of the file. Unfortunately, mips 64 bit little endian is special: The top 32
bits are a little endian number and the following 32 are a big endian one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178694 91177308-0d34-0410-b5e6-96231b3b80d8
ELF with support for:
- File headers
- Section headers + data
- Relocations
- Symbols
- Unwind data (only COFF/Win64)
The output format follows a few rules:
- Values are almost always output one per line (as elf-dump/coff-dump already do). - Many values are translated to something readable (like enum names), with the raw value in parentheses.
- Hex numbers are output in uppercase, prefixed with "0x".
- Flags are sorted alphabetically.
- Lists and groups are always delimited.
Example output:
---------- snip ----------
Sections [
Section {
Index: 1
Name: .text (5)
Type: SHT_PROGBITS (0x1)
Flags [ (0x6)
SHF_ALLOC (0x2)
SHF_EXECINSTR (0x4)
]
Address: 0x0
Offset: 0x40
Size: 33
Link: 0
Info: 0
AddressAlignment: 16
EntrySize: 0
Relocations [
0x6 R_386_32 .rodata.str1.1 0x0
0xB R_386_PC32 puts 0x0
0x12 R_386_32 .rodata.str1.1 0x0
0x17 R_386_PC32 puts 0x0
]
SectionData (
0000: 83EC04C7 04240000 0000E8FC FFFFFFC7 |.....$..........|
0010: 04240600 0000E8FC FFFFFF31 C083C404 |.$.........1....|
0020: C3 |.|
)
}
]
---------- snip ----------
Relocations and symbols can be output standalone or together with the section header as displayed in the example.
This feature set supports all tests in test/MC/COFF and test/MC/ELF (and I suspect all additional tests using elf-dump), making elf-dump and coff-dump deprecated.
Patch by Nico Rieck!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178679 91177308-0d34-0410-b5e6-96231b3b80d8
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178639 91177308-0d34-0410-b5e6-96231b3b80d8
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178621 91177308-0d34-0410-b5e6-96231b3b80d8
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178617 91177308-0d34-0410-b5e6-96231b3b80d8
This finally fixes the encoding. The patch also
* Removes eh-frame.ll. It was an unnecessary .ll to .o test that was checking
the wrong value.
* Merge fde-reloc.s and eh-frame.s into a single test, since the only difference
was the run lines.
* Don't blindly test the content of the entire .eh_frame section. It makes it
hard to anyone actually fixing a bug and hitting a difference in a binary
blob. Instead, use a CHECK for each field and document what is being checked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178615 91177308-0d34-0410-b5e6-96231b3b80d8
The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.
Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.
*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178612 91177308-0d34-0410-b5e6-96231b3b80d8
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps
points to the start of the function.
Patch by Sasa Stankovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178588 91177308-0d34-0410-b5e6-96231b3b80d8
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target. The
target has nothing to do with it in this case; we always need a stwcx.
Thanks to Kai Nacke for reporting the problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178559 91177308-0d34-0410-b5e6-96231b3b80d8
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.
copy(char *a, char *b, int n) {
do {
int t0 = a[0];
int t1 = a[1];
b[0] = t0;
b[1] = t1;
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178546 91177308-0d34-0410-b5e6-96231b3b80d8
The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.
Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.
PR15440
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178531 91177308-0d34-0410-b5e6-96231b3b80d8
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178527 91177308-0d34-0410-b5e6-96231b3b80d8
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178526 91177308-0d34-0410-b5e6-96231b3b80d8
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.
This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178525 91177308-0d34-0410-b5e6-96231b3b80d8
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.
Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178523 91177308-0d34-0410-b5e6-96231b3b80d8
Revision 177141 caused a regression in all but
mips64 little endian. That is because none of the
other Mips targets had test cases checking the
contents of the .eh_frame section. This patch fixes
both the llvm code and adds an assembler test case
to include the current 4 flavors.
The test cases unfortunately rely on llvm-objdump. A
preferable method would be to use a pretty printer output
such as what readelf -wf <elf_file> would give.
I also changed the name of the test case to correct a typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178506 91177308-0d34-0410-b5e6-96231b3b80d8
We would also like to merge sequences that involve a variable index like in the
example below.
int index = *idx++
int i0 = c[index+0];
int i1 = c[index+1];
b[0] = i0;
b[1] = i1;
By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.
The dag for the code above will look something like:
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i8 load %index))))
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (add (i8 load %index)
(i8 1))))
vs
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178483 91177308-0d34-0410-b5e6-96231b3b80d8
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178480 91177308-0d34-0410-b5e6-96231b3b80d8
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178470 91177308-0d34-0410-b5e6-96231b3b80d8
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178446 91177308-0d34-0410-b5e6-96231b3b80d8
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode. Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.
This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178438 91177308-0d34-0410-b5e6-96231b3b80d8
'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).
Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178427 91177308-0d34-0410-b5e6-96231b3b80d8
specific code paths.
This allows us to write code like:
if (__nvvm_reflect("FOO"))
// Do something
else
// Do something else
and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178416 91177308-0d34-0410-b5e6-96231b3b80d8
Check that instruction selection can select multiply-add/sub DSP instructions
from a pattern that doesn't have intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178406 91177308-0d34-0410-b5e6-96231b3b80d8
derived class MipsSETargetLowering.
We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16
doesn't have support for multipy-add/sub instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178404 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically, objc-arc-expand will make sure that the
objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret
will all have %call as an argument.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178382 91177308-0d34-0410-b5e6-96231b3b80d8
clang.arc.used is an interesting call for ARC since ObjCARCContract
needs to run to remove said intrinsic to avoid a linker error (since the
call does not exist).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178369 91177308-0d34-0410-b5e6-96231b3b80d8
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).
Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178362 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178337 91177308-0d34-0410-b5e6-96231b3b80d8
Mips assembler supports macros that allows the OR instruction
to have an immediate parameter. This patch adds an instruction
alias that converts this macro into a Mips ORI instruction.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178316 91177308-0d34-0410-b5e6-96231b3b80d8
- RDRAND always clears the destination value when a random value is not
available (i.e. CF == 0). This value is truncated or zero-extended as
the false boolean value to be returned. Boolean simplification needs
to skip this 'zext' or 'trunc' node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178312 91177308-0d34-0410-b5e6-96231b3b80d8
Mips assembler allows following to be used as aliased instructions:
jal $rs for jalr $rs
jal $rd,$rd for jalr $rd,$rs
This patch provides alias definitions in td files and test cases to show the usage.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178304 91177308-0d34-0410-b5e6-96231b3b80d8
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178286 91177308-0d34-0410-b5e6-96231b3b80d8
Since we handle optimizable objc_retainBlocks through strength reduction
in OptimizableIndividualCalls, we know that all code after that point
will only see non-optimizable objc_retainBlock calls. IsForwarding is
only called by functions after that point, so it is ok to just classify
objc_retainBlock as non-forwarding.
<rdar://problem/13249661>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178285 91177308-0d34-0410-b5e6-96231b3b80d8
If an objc_retainBlock has the copy_on_escape metadata attached to it
AND if the block pointer argument only escapes down the stack, we are
allowed to strength reduce the objc_retainBlock to to an objc_retain and
thus optimize it.
Current there is logic in the ARC data flow analysis to handle
this case which is complicated and involved making distinctions in
between objc_retainBlock and objc_retain in certain places and
considering them the same in others.
This patch simplifies said code by:
1. Performing the strength reduction in the initial ARC peephole
analysis (ObjCARCOpts::OptimizeIndividualCalls).
2. Changes the ARC dataflow analysis (which runs after the peephole
analysis) to consider all objc_retainBlock calls to not be optimizable
(since if the call was optimizable, we would have strength reduced it
already).
This patch leaves in the infrastructure in the ARC dataflow analysis to
handle this case, which due to 2 will just be dead code. I am doing this
on purpose to separate the removal of the old code from the testing of
the new code.
<rdar://problem/13249661>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178284 91177308-0d34-0410-b5e6-96231b3b80d8
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178276 91177308-0d34-0410-b5e6-96231b3b80d8
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178233 91177308-0d34-0410-b5e6-96231b3b80d8
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178220 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 342d92c7a0.
Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178215 91177308-0d34-0410-b5e6-96231b3b80d8
form of call in preference to memory indirect on Atom.
In this case, the patch applies the optimization to the code for reloading
spilled registers.
The patch also includes changes to sibcall.ll and movgs.ll, which were
failing on the Atom buildbot after the first patch was applied.
This patch by Sriram Murali.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178193 91177308-0d34-0410-b5e6-96231b3b80d8
Made sure we were looking a correct section
Added Mips32/64 as an extra check
Updated llvm-objdump to generate symbolic info for Mips relocations
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178190 91177308-0d34-0410-b5e6-96231b3b80d8
indirect through a memory address is to load the memory address into
a register and then call indirect through the register.
This patch implements this improvement by modifying SelectionDAG to
force a function address which is a memory reference to be loaded
into a virtual register.
Patch by Sriram Murali.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178171 91177308-0d34-0410-b5e6-96231b3b80d8
6 more piglit tests.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178145 91177308-0d34-0410-b5e6-96231b3b80d8
It seems that the Darwin PPC assembler requires r0 to be written as 0 when it
means 0 (at least in lwarx/stwcx.). Fixes PR15605.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178142 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178127 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178126 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178125 91177308-0d34-0410-b5e6-96231b3b80d8
The R0 register can now be allocated because instructions
that cannot use R0 as a GPR have been appropriately marked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178123 91177308-0d34-0410-b5e6-96231b3b80d8
Some implementation detail in the forgotten past required the link
register to be placed in the GPRC and G8RC register classes. This is
just wrong on the face of it, and causes several extra intersection
register classes to be generated. I found this was having evil
effects on instruction scheduling, by causing the wrong register class
to be consulted for register pressure decisions.
No code generation changes are expected, other than some minor changes
in instruction order. Seven tests in the test bucket required minor
tweaks to adjust to the new normal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178114 91177308-0d34-0410-b5e6-96231b3b80d8
The test was removed since I had not turned off the test during release
builds. This fails since ARC annotations support is conditionally
compiled out during release builds. I added the proper requires header
to assuage this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178101 91177308-0d34-0410-b5e6-96231b3b80d8
This is just the basic groundwork for supporting DW_TAG_imported_module but I
wanted to commit this before pushing support further into Clang or LLVM so that
this rather churny change is isolated from the rest of the work. The major
churn here is obviously adding another field (within the common DIScope prefix)
to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should
be the last big churny change needed for DW_TAG_imported_module/using directive
support/PR14606.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178099 91177308-0d34-0410-b5e6-96231b3b80d8
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).
As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178096 91177308-0d34-0410-b5e6-96231b3b80d8
All Intel CPUs since Yonah look a lot alike, at least at the granularity
of the scheduling models. We can add more accurate models for
processors that aren't Sandy Bridge if required. Haswell will probably
need its own.
The Atom processor and anything based on NetBurst is completely
different. So are the non-Intel chips.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178080 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.
This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178060 91177308-0d34-0410-b5e6-96231b3b80d8
The previous algorithm could not deal properly with scavenging multiple virtual
registers because it kept only one live virtual -> physical mapping (and
iterated through operands in order). Now we don't maintain a current mapping,
but rather use replaceRegWith to completely remove the virtual register as
soon as the mapping is established.
In order to allow the register scavenger to return a physical register killed
by an instruction for definition by that same instruction, we now call
RS->forward(I) prior to eliminating virtual registers defined in I. This
requires a minor update to forward to ignore virtual registers.
These new features will be tested in forthcoming commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178058 91177308-0d34-0410-b5e6-96231b3b80d8
- 'prefetch' intrinsics are only lowered when SSE is available. On non-X86
builds, 'generic' CPU is used and stops lowering any prefetch intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178046 91177308-0d34-0410-b5e6-96231b3b80d8
They read from constant register space anyway.
v2: fix lit tests
Signed-off-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178020 91177308-0d34-0410-b5e6-96231b3b80d8
If PC or SP is the destination, the disassembler erroneously failed with the
invalid encoding, despite the manual saying that both are fine.
This patch addresses failure to decode encoding T4 of LDR (A8.8.62) which is a
postindexed load, where the offset 0xc is applied to SP after the load occurs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178017 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes PR15570: SEGV: SCEV back-edge info invalid after dead code removal.
Indvars creates a SCEV expression for the loop's back edge taken
count, then determines that the comparison is always true and
removes it.
When loop-unroll asks for the expression, it contains a NULL
SCEVUnknkown (as a CallbackVH).
forgetMemoizedResults should invalidate the loop back edges expression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177986 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow for verification and analysis of the merge function of
the data flow analyses in the ARC optimizer.
The actual implementation of this feature is by introducing calls to
the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend}
which are only declared. Each such call takes in a pointer to a global
with the same name as the pointer whose provenance is being tracked and
a pointer whose name is one of our Sequence states and points to a
string that contains the same name.
To ensure that the optimizer does not consider these annotations in any
way, I made it so that the annotations are considered to be of IC_None
type.
A test case is included for this commit and the previous
ObjCARCAnnotation commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177952 91177308-0d34-0410-b5e6-96231b3b80d8
- It's still considered aligned when the specified alignment is larger
than the natural alignment;
- The new alignment for the high 128-bit vector should be min(16,
alignment) as the pointer is advanced by 16, a power-of-2 offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177947 91177308-0d34-0410-b5e6-96231b3b80d8
- Handle the case where the result of 'insert_subvect' is bitcasted
before 'extract_subvec'. This removes the redundant insertf128/extractf128
pair on unaligned 256-bit vector load/store on vectors of non 64-bit integer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177945 91177308-0d34-0410-b5e6-96231b3b80d8
For instance, following transformation will be disabled:
x + x + x => 3.0f * x;
The problem of these transformations is that it introduces a FP constant, which
following Instruction-Selection pass cannot handle.
Reviewed by Nadav, thanks a lot!
rdar://13445387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177933 91177308-0d34-0410-b5e6-96231b3b80d8
I know it is incorrect and they'd fail with +Asserts for win32 targets, though.
I'll try to fix them tonight.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177914 91177308-0d34-0410-b5e6-96231b3b80d8
test/CodeGen/Generic/2008-02-20-MatchingMem.ll: Test contains inline assembly not supported by Hexagon.
Following tests are XFAILed due to multiple return values which Hexagon doesn't support.
test/CodeGen/Generic/multiple-return-values-cross-block-with-invoke.ll
test/CodeGen/Generic/select-cc.ll
test/CodeGen/Generic/vector.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177912 91177308-0d34-0410-b5e6-96231b3b80d8
The problem is that the code mistakenly took for granted that following constructor
is able to create an APFloat from a *SIGNED* integer:
APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value)
rdar://13486998
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177906 91177308-0d34-0410-b5e6-96231b3b80d8
Hexagon does not support -filetype=obj(direct object generation) flag. Therefore,
the following tests are being XFAILed:
test/DebugInfo/dwarf-public-names.ll
test/DebugInfo/member-pointers.ll
test/DebugInfo/two-cus-from-same-file.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177901 91177308-0d34-0410-b5e6-96231b3b80d8
This simplification happens at 2 places :
- using the nsw attribute when the shl / mul is used by a sign test
- when the shl / mul is compared for (in)equality to zero
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177856 91177308-0d34-0410-b5e6-96231b3b80d8
DAG arguments can optionally be named:
(dag node, node:$name)
With this change, the node is also optional:
(dag node, node:$name, $name)
The missing node is treated as an UnsetInit, so the above is equivalent
to:
(dag node, node:$name, ?:$name)
This syntax is useful in output patterns where we currently require the
types of variables to be repeated:
def : Pat<(subc i32:$b, i32:$c), (SUBCCrr i32:$b, i32:$c)>;
This is preferable:
def : Pat<(subc i32:$b, i32:$c), (SUBCCrr $b, $c)>;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177843 91177308-0d34-0410-b5e6-96231b3b80d8
Performing this check unilaterally prevented us from generating FMAs when the incoming IR contained illegal vector types which would eventually be legalized to underlying types that *did* support FMA.
For example, an @llvm.fmuladd on an OpenCL float16 should become a sequence of float4 FMAs, not float4 fmul+fadd's.
NOTE: Because we still call the target-specific profitability hook, individual targets can reinstate the old behavior, if desired, by simply performing the legality check inside their callback hook. They can also perform more sophisticated legality checks, if, for example, some illegal vector types can be productively implemented as FMAs, but not others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177820 91177308-0d34-0410-b5e6-96231b3b80d8
Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries
between pointers do not include TBAA tags.
Add testing case for "placement new". TBAA currently says NoAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177772 91177308-0d34-0410-b5e6-96231b3b80d8
The original code used i32, and i64 if legal. This introduced unneeded
casts when they aren't legal, or when the index variable i has another
type. In order of preference: try to use i's type; use the smallest
fitting legal type (using an added DataLayout method); default to i32.
A testcase checks that this works when the index gep operand is i16.
Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com>
Reviewed by : Duncan
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177712 91177308-0d34-0410-b5e6-96231b3b80d8
the ARM build bots, and it adds a weird case to the test suite where
a test uses as inputs files in the parent directory.
Talked about this with Dave on IRC and he's fine with this approach even
though it isn't optimal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177700 91177308-0d34-0410-b5e6-96231b3b80d8
For mips a branch an 18-bit signed offset (the 16-bit
offset field shifted left 2 bits) is added to the
address of the instruction following the branch
(not the branch itself), in the branch delay slot,
to form a PC-relative effective target address.
Previously, the code generator did not perform the
shift of the immediate branch offset which resulted
in wrong instruction opcode. This patch fixes the issue.
Contributor: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177687 91177308-0d34-0410-b5e6-96231b3b80d8
This patch uses the generated instruction info tables to
identify memory/load store instructions.
After successful matching and based on the operand type
and size, it generates additional instructions to the output.
Contributor: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177685 91177308-0d34-0410-b5e6-96231b3b80d8
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177679 91177308-0d34-0410-b5e6-96231b3b80d8
The .set directive in the Mips the assembler can be
used to set the value of a symbol to an expression.
This changes the symbol's value and type to conform
to the expression's.
Syntax: .set symbol, expression
This patch implements the parsing of the above syntax
and enables the parser to use defined symbols when
parsing operands.
Contributor: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177667 91177308-0d34-0410-b5e6-96231b3b80d8
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.
Benchmarking the speed at -O3 of:
static jmp_buf env_sigill;
void foo() {
__builtin_longjmp(env_sigill,1);
}
main() {
...
for (int i = 0; i < c; ++i) {
if (__builtin_setjmp(env_sigill)) {
goto done;
} else {
foo();
}
done:;
}
...
}
vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177666 91177308-0d34-0410-b5e6-96231b3b80d8
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177654 91177308-0d34-0410-b5e6-96231b3b80d8
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).
It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177653 91177308-0d34-0410-b5e6-96231b3b80d8
NEON is not IEEE 754 compliant, so we should avoid lowering single-precision
floating point operations with NEON unless unsafe-math is turned on. The
equivalent VFP instructions are IEEE 754 compliant, but in some cores they're
much slower, so some archs/OSs might still request it to be on by default,
such as Swift and Darwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177651 91177308-0d34-0410-b5e6-96231b3b80d8
The simplify-libcalls pass implemented a doInitialization hook to infer
function prototype attributes for well-known functions. Given that the
simplify-libcalls pass is going away *and* that the functionattrs pass
is already in place to deduce function attributes, I am moving this logic
to the functionattrs pass. This approach was discussed during patch
review:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177619 91177308-0d34-0410-b5e6-96231b3b80d8
- After moving logic recognizing vector shift with scalar amount from
DAG combining into DAG lowering, we declare to customize all vector
shifts even vector shift on AVX is legal. As a result, the cost model
needs special tuning to identify these legal cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
The differing file (due to the #line directive in the original source) is for
future testing improvements coming soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177560 91177308-0d34-0410-b5e6-96231b3b80d8
Moving the DIFile parameter to immediately proceed the tag so that it will be a
common prefix with other DIScopes (once the DIFile is replaced with the raw
file/directory pair).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177492 91177308-0d34-0410-b5e6-96231b3b80d8
This is the backend portion of a Clang test case
(clang/test/CodeGenCXX/debug-info-namespace.cpp) that was roughly/coarsely
testing LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177487 91177308-0d34-0410-b5e6-96231b3b80d8
- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
to support extended 256-bit integer in AVX but not AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177478 91177308-0d34-0410-b5e6-96231b3b80d8
This makes DIType's first non-tag parameter the same as DIFile's, allowing them
to both share the common implementation of getFilename/getDirectory in DIScope.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177467 91177308-0d34-0410-b5e6-96231b3b80d8
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177465 91177308-0d34-0410-b5e6-96231b3b80d8
This is another step along the way to making all DIScopes have a common prefix
which can be added to in a general manner to support using directives
(DW_TAG_imported_module).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177462 91177308-0d34-0410-b5e6-96231b3b80d8
- it is trivially known to be used inside the loop in a way that can not be optimized away
- there is no use outside of the loop which can take advantage of the computation hoisting
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177432 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:
stwu $rS, $ptroff($ptrreg)
This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:
stwu $rS, $dst
To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.
Approved by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177429 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:
1. r0 is treated specially (as the constant 0) by certain instructions, and so
cannot be used with those instructions as a regular register.
2. r0 is used as a temporary register in the CR-register spilling process
(where, under some circumstances, we require two GPRs).
This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.
Once the CR spilling code is improved, we'll be able to allocate r0.
Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.
As r0 is still reserved, no functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177423 91177308-0d34-0410-b5e6-96231b3b80d8
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177404 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177380 91177308-0d34-0410-b5e6-96231b3b80d8
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.
In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177338 91177308-0d34-0410-b5e6-96231b3b80d8
The default logic marks them as too expensive.
For example, before this patch we estimated:
cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float>
While this translates to:
vmovl.u16 q8, d16
vcvt.f32.u32 q8, q8
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13445992
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8
Fix cost of some "cheap" cast instructions. Before this patch we used to
estimate for example:
cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16>
While we would emit:
vcvt.s32.f32 q8, q8
vmovn.i32 d16, q8
vuzp.8 d16, d17
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13434072
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8
Hal Finkel recently added code to allow unaligned memory references
for PowerPC. Two tests were temporarily modified with
-disable-ppc-unaligned to keep them from failing. This patch adjusts
the expected code generation for the unaligned references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177328 91177308-0d34-0410-b5e6-96231b3b80d8
This handles the case where we have an inbounds GEP with alloca as the pointer.
This fixes the regression in PR12750 and rdar://13286434.
Note that we can also fix this by handling some GEP cases in isKnownNonNull.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177321 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177315 91177308-0d34-0410-b5e6-96231b3b80d8
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.
This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.
Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177295 91177308-0d34-0410-b5e6-96231b3b80d8
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177294 91177308-0d34-0410-b5e6-96231b3b80d8
*NOTE* I verified that the original bug behind
dont-infinite-loop-during-block-escape-analysis.ll occurs when using opt on
retain-block-escape-analysis.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177240 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first step to making all DIScopes have a common metadata prefix (so
that things (using directives, for example) that can appear in any scope can be
added to that common prefix). DIFile is itself a DIScope so the common prefix
of all DIScopes cannot be a DIFile - instead it's the raw filename/directory
name pair.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177239 91177308-0d34-0410-b5e6-96231b3b80d8