I don't have the IR which is causing the build bot breakage but I can
postulate as to why they are timing out:
1. SimplifyWithOpReplaced was stripping flags from the simplified value.
2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because
it's simplification wasn't correct.
3. InstCombine would revisit the add instruction and note that it can
rederive the flags.
4. By modifying the value, we chose to revisit instructions which reuse
the value. One of the instructions is the original select, causing
LLVM to never reach fixpoint.
Instead, strip the flags only when we are sure we are going to perform
the simplification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239141 91177308-0d34-0410-b5e6-96231b3b80d8
This is breaking a lot of build bots and is causing very long-running
compiles (infinite loops)?
Likely, we shouldn't return nullptr?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239139 91177308-0d34-0410-b5e6-96231b3b80d8
When we generate coverage data, we explicitly set each coverage map's
alignment to 8 (See InstrProfiling::lowerCoverageData), but when we
read the coverage data, we assume consecutive maps are exactly
adjacent. When we're dealing with 32 bit, maps can end on a 4 byte
boundary, causing us to think the padding is part of the next record.
Fix this by adjusting the buffer to an appropriately aligned address
between records.
This is pretty awkward to test, as it requires a binary with multiple
coverage maps to hit, so we'd need to check in multiple source files
and a binary blob as inputs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239129 91177308-0d34-0410-b5e6-96231b3b80d8
We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison. However, the other operand cannot
have flags.
This fixes PR23757.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239115 91177308-0d34-0410-b5e6-96231b3b80d8
gc.statepoint intrinsics with a far immediate call target
were lowered incorrectly as pc-rel32 calls.
This change fixes the problem, and generates an indirect call
via a scratch register.
For example:
Intrinsic:
%safepoint_token = call i32 (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* inttoptr (i64 140727162896504 to void ()*), i32 0, i32 0, i32 0, i32 0)
Old Incorrect Lowering:
callq 140727162896504
New Correct Lowering:
movabsq $140727162896504, %rax
callq *%rax
In lowerCallFromStatepoint(), the callee-target was modified and
represented as a "TargetConstant" node, rather than a "Constant" node.
Undoing this modification enabled LowerCall() to generate the
correct CALL instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239114 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.
Should fix PR23710.
Test Plan: Added test for the correct behavior based on the case I posted to PR23710.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10258
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239111 91177308-0d34-0410-b5e6-96231b3b80d8
Report proper error code from MachOObjectFile constructor if we
can't parse another segment load command (we already return a proper
error if segment load command contents is suspicious).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239109 91177308-0d34-0410-b5e6-96231b3b80d8
The big/small ordering here is based on signed values so SmallValue will
be INT_MIN and BigValue 0. This shouldn't be a problem but the code
assumed that BigValue always had more bits set than SmallValue.
We used to just miss the transformation, but a recent refactoring of
mine turned this into an assertion failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239105 91177308-0d34-0410-b5e6-96231b3b80d8
NVPTXISelDAGToDAG translates "addrspacecast to param" to
NVPTX::nvvm_ptr_gen_to_param
Added an llc test in bug21465.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239100 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
1) The only caller, ARMTargetParser::parseArch, uses the results for an "endswith" test; so, including the "arm" prefix into the result is unnecessary.
2) Most ARMTargetParser::parseArch callers pass it the output from ARMTargetParser::getCanonicalArchName; so, make this behaviour the default. Then, including the "arm" prefix into the cases is unnecessary.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10249
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239099 91177308-0d34-0410-b5e6-96231b3b80d8
Basic block selection involves checking successor BBs for PHI nodes
that depend on the current BB. In case such BBs are found, the value
being selected is a constant and such constant already exists in
current BB, it's value is reused.
This might lead to wrong locations in some situations, especially if
same constant value ends up being materialized twice in two different
ways, which discards that sharing and leaves us with wrong debug
location in the successor BB.
In code this involves the following sequence of calls:
SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks ->
SelectionDAGBuilder::CopyValueToVirtualRegister ->
SelectionDAGBuilder::getNonRegisterValue
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239089 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Differential Revision: http://reviews.llvm.org/D10054
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239087 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239084 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Test Plan: lower-kernel-ptr-arg.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: wengxt, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10154
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239082 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Properly report the error in segment load commands from MachOObjectFile
constructor instead of crashing the program.
Adjust the test case accordingly.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239081 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently all load commands are parsed in MachOObjectFile constructor.
If the next load command cannot be parsed, or if command size is too
small, properly report it through the error code and fail to construct
the object, instead of crashing the program.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239080 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Instead, properly report this error from MachOObjectFile constructor.
Test Plan: regression test suite
Reviewers: rafael
Subscribers: llvm-commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239078 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Avoid parsing object file each time MachOObjectFile::getHeader() is
called. Instead, cache the header in MachOObjectFile constructor, where
it's parsed anyway. In future, we must avoid constructing the object
at all if the header can't be parsed.
Test Plan: regression test suite.
Reviewers: rafael
Subscribers: llvm-commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239075 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
-march=bpf -> host endian
-march=bpf_le -> little endian
-match=bpf_be -> big endian
Test Plan:
v1 was tested by IBM s390 guys and appears to be working there.
It bit rots too fast here.
Reviewers: chandlerc, tstellarAMD
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239071 91177308-0d34-0410-b5e6-96231b3b80d8
Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a
build_vector of a bunch of extract_vector_elt nodes and constant zero nodes
into a shuffle blend with a zero vector.
However, method 'visitBUILD_VECTOR' forgot that a floating point
build_vector may contain negative zero as well as positive zero.
Example:
define <2 x double> @example(<2 x double> %A) {
entry:
%0 = extractelement <2 x double> %A, i32 0
%1 = insertelement <2 x double> undef, double %0, i32 0
%2 = insertelement <2 x double> %1, double -0.0, i32 1
ret <2 x double> %2
}
Before this patch, llc (with -mattr=+sse4.1) wrongly generated
movq %xmm0, %xmm0 # xmm0 = xmm0[0],zero
So, the sign bit of the negative zero was effectively lost.
This patch fixes the problem by adding explicit checks for positive zero.
With this patch, llc produces the following code for the example above:
movhpd .LCPI0_0(%rip), %xmm0
where .LCPI0_0 referes to a 'double -0'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239070 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we sometimes know the address space, this can
theoretically do a better job.
This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239053 91177308-0d34-0410-b5e6-96231b3b80d8
When checking (High - Low + 1).sle(BitWidth), BitWidth would be truncated
to the size of the left-hand side. In the case of this PR, the left-hand
side was i4, so BitWidth=64 got truncated to 0 and the assert failed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239048 91177308-0d34-0410-b5e6-96231b3b80d8
Section symbols exist as an optimization: instead of having multiple relocations
point to different symbols, many of them can point to a single section symbol.
When that optimization is unused, a section symbol is also unused and adds no
extra information to the object file.
This saves a bit of space on the object files and makes the output of
llvm-objdump -t easier to read and consequently some tests get quite a bit
simpler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239045 91177308-0d34-0410-b5e6-96231b3b80d8
If the compare in a select pattern has another use then it can't be removed, so we'd just
be creating repeated code if we created a min/max node.
Spotted by Matt Arsenault!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239037 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239036 91177308-0d34-0410-b5e6-96231b3b80d8
Removed the redundant "llvm::" from class names in InstrProfiling.cpp
clang-format is ran on the changes.
Patch from Betul Buyukkurt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239034 91177308-0d34-0410-b5e6-96231b3b80d8
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:
- unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+ unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;
Original message:
Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239012 91177308-0d34-0410-b5e6-96231b3b80d8
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239006 91177308-0d34-0410-b5e6-96231b3b80d8
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Differential Revision: http://reviews.llvm.org/D10228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239003 91177308-0d34-0410-b5e6-96231b3b80d8
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239001 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Now users don't have to manually deal with getFirstLoadCommandInfo() /
getNextLoadCommandInfo(), calculate the number of load segments, etc.
No functionality change.
Test Plan: regression test suite
Reviewers: rafael, lhames, loladiro
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10144
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238983 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids yet another last minute patching of the binding.
While at it, also simplify the weakref implementation a bit by not walking
past it in the expression evaluation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238982 91177308-0d34-0410-b5e6-96231b3b80d8
With this getBinging can now return the correct answer for all cases not
involving a .symver and the elf writer doesn't need to patch it last minute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238980 91177308-0d34-0410-b5e6-96231b3b80d8
_GLOBAL_OFFSET_TABLE_ is not magical and we can now directly check for a
symbol never getting an explicit binding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238978 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238935 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, /U[qytnms]/ do not have backend tests but are accepted by clang.
No functional change intended.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D8203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238921 91177308-0d34-0410-b5e6-96231b3b80d8
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.
I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238916 91177308-0d34-0410-b5e6-96231b3b80d8
The windows buildbot originally failed because the check expressions are
evaluated as 64-bit values, even for 32-bit symbols. Fixed this by comparing
bottom 32-bits of the expressions.
The host/target endian mismatch issue is that it's invalid to read/write target
values using a host pointer without taking care of endian differences between
the target and host. Most (if not all) instances of
reinterpret_cast<uint32_t*>() in the RuntimeDyld are examples of this bug.
This has been fixed for Mips using the endian aware read/write functions.
The original commits were:
r238838:
[mips] Add RuntimeDyld tests for currently supported O32 relocations.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10126
r238844:
[mips][mcjit] Add support for R_MIPS_PC32.
Summary:
This allows us to resolve relocations for DW_EH_PE_pcrel TType encodings
in the exception handling LSDA.
Also fixed a nearby typo.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238915 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....).
Since the refactoring of the shuffle lowering code this no longer has any use.
Differential Revision: http://reviews.llvm.org/D10169
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238906 91177308-0d34-0410-b5e6-96231b3b80d8
Some temporary symbols are created by MC itself. These symbols are never used
for lookup and are never included in the object symbol table, so we can
avoid creating a name for them.
Other temporaries are created by CodeGen or by the user by explicitly asking
for a name starting with .L (or L on MachO).
These temporaries behave like regular symbols, we just try to avoid including
them in the object symbol table, but sometimes they end up there:
const char *foo() {
return "abc" + 3;
}
will have a relocation pointing to a .L symbol.
It just so happens that almost all MC created temporary has the AlwaysAddSuffix
option and CodeGen/user created ones don't.
One interesting future optimization would be to use unnamed symbols for
all temporaries, but that would require use an st_name of 0 or
having the object writer create the names if a symbol does end up in the
symbol table.
No testcase since this just avoid creating a few extra names for MC created
temporaries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238887 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Once a gc.statepoint has been rewritten to relocate live references, the
SSA values represent physical pointers instead of logical references.
Logical dereferencability does not imply physical dereferencability and
after RewriteStatepointsForGC has run any attributes that imply
dereferencability of the logical references need to be stripped.
This current approach is conservative, and can be made more precise
later if needed. For starters, we need to strip dereferencable
attributes only from pointers that live in the GC address space.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10105
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238883 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
A later change that has RewriteStatepointsForGC change function
attributes throughout the module depends on this.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10104
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238882 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
LLVM's MI level notion of invariant_load is different from LLVM's IR
level notion of invariant_load with respect to dereferenceability. The
IR notion of invariant_load only guarantees that all *non-faulting*
invariant loads result in the same value. The MI notion of invariant
load guarantees that the load can be legally moved to any location
within its containing function. The MI notion of invariant_load is
stronger than the IR notion of invariant_load -- an MI invariant_load is
an IR invariant_load + a guarantee that the location being loaded from
is dereferenceable throughout the function's lifetime.
Reviewers: hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10075
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238881 91177308-0d34-0410-b5e6-96231b3b80d8
If the first character in a metadata attachment's name is a digit, it has
to be output using an escape sequence, otherwise it's not valid text IR.
Removed an over-zealous assert from LLVMContext which didn't allow this.
The rule should only apply to text IR. Actual names can have any sequence
of non-NUL bytes.
Also added some documentation on accepted names.
Bug found with AFL fuzz.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238867 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a dedicated type for ELF symbol, these helper functions can
become member function of MCSymbolELF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238864 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
Previous attempts at committing this broke the buildbots due to bugs in IAS.
These bugs have now been fixed so trying again.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238863 91177308-0d34-0410-b5e6-96231b3b80d8
As a follow-up to r235955, actually support up to 65535 arguments in a
subprogram. r235955 missed assembly support, having only tested the new
limit via C++ unit tests. Code patch by Amjad Aboud.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238854 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This allows us to resolve relocations for DW_EH_PE_pcrel TType encodings
in the exception handling LSDA.
Also fixed a nearby typo.
Reviewers: petarj, vkalintiris
Reviewed By: vkalintiris
Subscribers: vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D10127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238844 91177308-0d34-0410-b5e6-96231b3b80d8
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238842 91177308-0d34-0410-b5e6-96231b3b80d8
using lowerVectorShuffleWithSHUFPS() and other shuffle-helpers routines.
Added matching of VALIGN instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238830 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
With this change we are able to realign the stack dynamically, whenever it
contains objects with alignment requirements that are larger than the
alignment specified from the given ABI.
We have to use the $fp register as the frame pointer when we perform
dynamic stack realignment. In complex stack frames, with variably-sized
objects, we reserve additionally the callee-saved register $s7 as the
base pointer in order to reference locals.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8633
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238829 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.
Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238821 91177308-0d34-0410-b5e6-96231b3b80d8
This create a MCSymbolELF class and moves SymbolSize since only ELF
needs a size expression.
This reduces the size of MCSymbol from 56 to 48 bytes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238801 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to extract version numbers from the environment.
getOSVersion is currently overloaded for that purpose, this allows us to
clean it up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238796 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238795 91177308-0d34-0410-b5e6-96231b3b80d8
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.
PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D8232
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238793 91177308-0d34-0410-b5e6-96231b3b80d8
LLVMContext. Production builds of clang do not set names on most
Value's, so this is wasted space on almost all subclasses of Value.
This reduces the size of all Value subclasses by 8 bytes on 64 bit
hosts.
The one tricky part of this change is averting compile time regression
by keeping Value::hasName() fast. This required stealing bits out of
NumOperands.
With this change, peak memory usage on verify-uselistorder-nodbg.lto.bc
is decreased by approximately 2.3% (~3MB absolute on my machine).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238791 91177308-0d34-0410-b5e6-96231b3b80d8
If a dead instruction we may not only have a last-use in the main live
range but also in a subregister range if subregisters are tracked. We
need to partially rebuild live ranges in both cases.
The testcase only broke when subregister liveness was enabled. I
commited it in the current form because there is currently no flag to
enable/disable subregister liveness.
This fixes PR23720.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238785 91177308-0d34-0410-b5e6-96231b3b80d8
Start using C++ types such as StringRef and MemoryBuffer in the C++ LTO
API. In doing so, clarify the ownership of the native object file: the caller
now owns it, not the LTOCodeGenerator. The C libLTO library has been modified
to use a derived class of LTOCodeGenerator that owns the object file.
Differential Revision: http://reviews.llvm.org/D10114
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238776 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 .
Based on a patch by Reed Kotler.
Test Plan:
bswap1.ll
test-suite
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7219
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238760 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implement the intrinsics memset, memcopy and memmove in MIPS FastISel.
Make some needed infrastructure fixes so that this can work.
Based on a patch by Reed Kotler.
Test Plan:
memtest1.ll
The patch passes test-suite for mips32 r1/r2 and at O0/O2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7158
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238759 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel.
Based on a patch by Reed Kotler.
Test Plan:
srem1.ll
div1.ll
test-suite at O0/O2 for mips32 r1/r2
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238757 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Implement the LLVM IR select statement for MIPS FastISelsel.
Based on a patch by Reed Kotler.
Test Plan:
"Make check" test included now.
Passes test-suite at O2/O0 mips32 r1/r2.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D6774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238756 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The contents of the HI/LO registers are unpredictable after the execution of
the MUL instruction. In addition to implicitly defining these registers in the
MUL instruction definition, we have to mark those registers as dead too.
Without this the fast register allocator is running out of registers when the
MUL instruction is followed by another one that tries to allocate the AC0
register.
Based on a patch by Reed Kotler.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D9825
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238755 91177308-0d34-0410-b5e6-96231b3b80d8
According to the TBAA description struct-path tag node can have an optional IsConstant field. Add corresponding argument to MDBuilder::createTBAAStructTagNode.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D10160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238749 91177308-0d34-0410-b5e6-96231b3b80d8
We don't want to bother with creating .sxdata sections on Win64; all the
relevant information is already in the .pdata section.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238730 91177308-0d34-0410-b5e6-96231b3b80d8
This is important because of different addressing modes
depending on the address space for GPU targets.
This only adds the argument, and does not update
any of the uses to provide the correct address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238723 91177308-0d34-0410-b5e6-96231b3b80d8
Alternatively, this type could be derived on-demand whenever
getResultElementType is called - if someone thinks that's the better
choice (simple time/space tradeoff), I'm happy to give it a go.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238716 91177308-0d34-0410-b5e6-96231b3b80d8
There is no MCSectionData, so the old name is now meaningless.
Also remove some asserts/checks that were there just because the information
they used was in MCSectionData.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238708 91177308-0d34-0410-b5e6-96231b3b80d8
Unreachable values may use themselves in strange ways due to their
dominance property. Attempting to translate through them can lead to
infinite recursion, crashing LLVM. Instead, claim that we weren't able
to translate the value.
This fixes PR23096.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238702 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug in the line info handling in the dwarf code, based on a
problem I when implementing RelocVisitor support for MachO.
Since addr+size will give the first address past the end of the function,
we need to back up one line table entry. Fix this by looking up the
end_addr-1, which is the last address in the range. Note that this also
removes a duplicate output from the llvm-rtdyld line table dump. The
relevant line is the end_sequence one in the line table and has an offset
of the first address part the end of the range and hence should not be
included.
Also factor out the common functionality into a separate function.
This comes up on MachO much more than on ELF, since MachO
doesn't store the symbol size separately, hence making
said situation always occur.
Differential Revision: http://reviews.llvm.org/D9925
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238699 91177308-0d34-0410-b5e6-96231b3b80d8
The original version didn't properly account for the base register
being modified before the final jump, so caused miscompilations in
Chromium and LLVM. I've fixed this and tested with an LLVM self-host
(I don't have the means to build & test Chromium).
The general idea remains the same: in pathological cases jump tables
can be too far away from the instructions referencing them (like other
constants) so they need to be movable.
Should fix PR23627.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238680 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds partial support for MachO relocations to RelocVisitor.
A simple test case is added to show that relocations are indeed being
applied and that using llvm-dwarfdump on MachO files no longer errors.
Correctness is not yet tested, due to an unrelated bug in DebugInfo,
which will be fixed with appropriate testcase in a followup commit.
Differential Revision: http://reviews.llvm.org/D8148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238663 91177308-0d34-0410-b5e6-96231b3b80d8
That comment misleads the current discussions in mentioned bug. Leave
the discussions to the bug. Also, adding a future change FIXME.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238653 91177308-0d34-0410-b5e6-96231b3b80d8
best approach of each.
For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.
For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.
If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.
But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.
I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238652 91177308-0d34-0410-b5e6-96231b3b80d8
The plan was to move the whole table into the already existing ArchExtNames
but some fields depend on a table-generated file, and we don't yet have this
feature in the generic lib/Support side.
Once the minimum target-specific table-generated files are available in a
generic fashion to these libraries, we'll have to keep it in the ASM parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238651 91177308-0d34-0410-b5e6-96231b3b80d8
typeIsConvertibleTo was just calling baseClassOf(this) on the argument passed to it, but there weren't different signatures for baseClassOf so passing 'this' didn't really do anything interesting. typeIsConvertibleTo could have just been a non-virtual method in RecTy. But since that would be kind of a silly method, I instead re-distributed the logic from baseClassOf into typeIsConvertibleTo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238648 91177308-0d34-0410-b5e6-96231b3b80d8
.safeseh adds an entry to the .sxdata section to register all the
appropriate functions which may handle an exception. This entry is not
a relocation to the symbol but instead the symbol table index of the
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238641 91177308-0d34-0410-b5e6-96231b3b80d8
shorter one. NFC.
In addition to being much shorter to type and requiring fewer arguments,
this change saves over 30 lines from this one file, all wasted on total
boilerplate...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238640 91177308-0d34-0410-b5e6-96231b3b80d8
around a value using its existing SDLoc.
Start using this in just one function to save omg lines of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238638 91177308-0d34-0410-b5e6-96231b3b80d8
shifting vectors of bytes as x86 doesn't have direct support for that.
This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.
In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238637 91177308-0d34-0410-b5e6-96231b3b80d8
in-register LUT technique.
Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html
The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.
On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.
The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.
I have extended Bruno's patch in the following ways:
0) I committed the new tests with baseline sequences so this shows
a diff, and regenerated the tests using the update scripts.
1) Bruno had noticed and mentioned in IRC a redundant mask that
I removed.
2) I introduced a particular optimization for the i32 vector cases where
we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
+ PSADBW to compute doubled high i32 popcounts. This takes advantage
of the fact that to line up the high i32 popcounts we have to shift
them anyways, and we can shift them by one fewer bit to effectively
divide the count by two. While the PSHUFD based horizontal add is no
faster, it doesn't require registers or load traffic the way a mask
would, and provides more ILP as it happens on different ports with
high throughput.
3) I did some code cleanups throughout to simplify the implementation
logic.
4) I refactored it to continue to use the parallel bitmath lowering when
SSSE3 is not available to preserve the performance of that version on
SSE2 targets where it is still much better than scalarizing as we'll
still do a bitmath implementation of popcount even in scalar code
there.
With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.
With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.
Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.
Differential Revision: http://reviews.llvm.org/D10084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238636 91177308-0d34-0410-b5e6-96231b3b80d8
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.
This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.
The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.
The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238635 91177308-0d34-0410-b5e6-96231b3b80d8
r238503 fixed the problem of too-small shift types by promoting them
during legalization, but the correct solution is to promote only the
operands that actually demand promotion.
This fixes a crash on an out-of-tree target caused by trying to
promote an operand that can't be promoted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238632 91177308-0d34-0410-b5e6-96231b3b80d8
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.
This is in preparation for lowering the llvm.eh.exceptioninfo().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238627 91177308-0d34-0410-b5e6-96231b3b80d8
The value in 'ebp' acts as an implicit argument to the outlined
handlers, and is recovered with frameaddress(1).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238619 91177308-0d34-0410-b5e6-96231b3b80d8
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238602 91177308-0d34-0410-b5e6-96231b3b80d8
For some history here see the commit messages of r199797 and r169060.
The original intent was to fix cases like:
%EAX<def> = COPY %ECX<kill>, %RAX<imp-def>
%RCX<def> = COPY %RAX<kill>
where simply removing the copies would have RCX undefined as in terms of
machine operands only the ECX part of it is defined. The machine
verifier would complain about this so 169060 changed such COPY
instructions into KILL instructions so some super-register imp-defs
would be preserved. In r199797 it was finally decided to always do this
regardless of super-register defs.
But this is wrong, consider:
R1 = COPY R0
...
R0 = COPY R1
getting changed to:
R1 = KILL R0
...
R0 = KILL R1
It now looks like R0 dies at the first KILL and won't be alive until the
second KILL, while in reality R0 is alive and must not change in this
part of the program.
As this only happens after register allocation there is not much code
still performing liveness queries so the issue was not noticed. In fact
I didn't manage to create a testcase for this, without unrelated changes
I am working on at the moment.
The fix is simple: As of r223896 the MachineVerifier allows reads from
partially defined registers, so the whole transforming COPY->KILL thing
is not necessary anymore. This patch also changes a similar (but more
benign case as the def and src are the same register) case in the
VirtRegRewriter.
Differential Revision: http://reviews.llvm.org/D10117
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238588 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We would wrap flow mappings and sequences when they go over a hardcoded 70
characters limit. Make the wrapping column configurable (and default to 70
co the change should be NFC for current users). Passing 0 allows to completely
suppress the wrapping which makes it easier to handle in tools like FileCheck.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10109
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238584 91177308-0d34-0410-b5e6-96231b3b80d8
Symbols are no longer required to be named, but this leads to a crash here if an
unnamed symbol checks that its first character is '$'.
Change the code to first check for a name, then check its first character.
No test case i'm afraid as this is debugging code, but any test case with temp labels
and 'llc --debug --filetype=obj' would have crashed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238579 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D9941
It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238578 91177308-0d34-0410-b5e6-96231b3b80d8
This commit translates the line and column numbers for LLVM IR
errors from the numbers in the YAML block scalar to the numbers
in the MIR file so that the MIRParser users can report LLVM IR
errors with the correct line and column numbers.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238576 91177308-0d34-0410-b5e6-96231b3b80d8
Small (really small!) C++ exception handling examples work on 32-bit x86
now.
This change disables the use of .seh_* directives in WinException when
CFI is not in use. It also uses absolute symbol references in the tables
instead of imagerel32 relocations.
Also fixes a cache invalidation bug in MMI personality classification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238575 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In continuation to an earlier commit to DependenceAnalysis.cpp by jingyue (r222100), the type for all subscripts in a coupled group need to be the same since constraints from one subscript may be propagated to another during testing. During testing, new SCEVs may be created and the operands for these need to be the same.
This patch extends unifySubscriptType() to work on lists of subscript pairs, ensuring a common extended type for all of them.
Test Plan:
Added a test case to NonCanonicalizedSubscript.ll which causes dependence analysis to crash without this fix.
All regression tests pass.
Reviewers: spop, sebpop, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9698
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238573 91177308-0d34-0410-b5e6-96231b3b80d8
And with that simplify the logic for inserting them in ExternalSymbolData or
LocalSymbolData.
No functionality change overall since the old code avoided the isLocal bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238555 91177308-0d34-0410-b5e6-96231b3b80d8
MIOperands/ConstMIOperands are classes iterating over the MachineOperand
of a MachineInstr, however MachineInstr::mop_iterator does the same
thing.
I assume these two iterators exist to have a uniform interface to
iterate over the operands of a machine instruction bundle and a single
machine instruction. However in practice I find it more confusing to have 2
different iterator classes, so this patch transforms (nearly all) the
code to use mop_iterators.
The only exception being MIOperands::anlayzePhysReg() and
MIOperands::analyzeVirtReg() still needing an equivalent, I leave that
as an exercise for the next patch.
Differential Revision: http://reviews.llvm.org/D9932
This version is slightly modified from the proposed revision in that it
introduces MachineInstr::getOperandNo to avoid the extra counting
variable in the few loops that previously used MIOperands::getOperandNo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238539 91177308-0d34-0410-b5e6-96231b3b80d8
About pristine regsiters:
Pristine registers "hold a value that is useless to the current
function, but that must be preserved - they are callee saved registers
that have not been saved." This concept saves compile time as it frees
the prologue/epilogue inserter from adding every such register to every
basic blocks live-in list.
However the current code in getPristineRegs is formulated in a
complicated way: Inside the function prologue and epilogue all callee
saves are considered pristine, while in the rest of the code only the
non-saved ones are considered pristine. This requires logic to
differentiate between prologue/epilogue and the rest and in the presence
of shrink-wrapping this even becomes complicated/expensive. It's also
unnecessary because the prologue epilogue inserters already mark
callee-save registers that are saved/restores properly in the respective
blocks in the prologue/epilogue (see updateLiveness() in
PrologueEpilogueInserter.cpp). So only declaring non-saved/restored
callee saved registers as pristine just works.
Differential Revision: http://reviews.llvm.org/D10101
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238524 91177308-0d34-0410-b5e6-96231b3b80d8
This is in preparation for reusing this for 32-bit x86 EH table
emission. Also updates the type name for consistency. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238521 91177308-0d34-0410-b5e6-96231b3b80d8
This commit introduces a serializable structure called
'llvm::yaml::MachineFunction' that stores the machine
function's name. This structure will mirror the machine
function's state in the future.
This commit prints machine functions as YAML documents
containing a YAML mapping that stores the state of a machine
function. This commit also parses the YAML documents
that contain the machine functions.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D9841
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238519 91177308-0d34-0410-b5e6-96231b3b80d8
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.
Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238514 91177308-0d34-0410-b5e6-96231b3b80d8
ELF has no restrictions on where undefined symbols go relative to other defined
symbols. In fact, gas just sorts them together. Do the same.
This was there since r111174 probably just because the MachO writer has it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238513 91177308-0d34-0410-b5e6-96231b3b80d8
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.
Differential Revision: http://reviews.llvm.org/D9800
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238507 91177308-0d34-0410-b5e6-96231b3b80d8
The shift amount may be too small to cope with promoted left hand side,
make sure to promote it as well.
This fixes PR23664.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238503 91177308-0d34-0410-b5e6-96231b3b80d8
For x86 targets, do not do sibling call optimization when materializing
the callee's address would require a GOT relocation. We can still do
tail calls to internal functions, hidden functions, and protected
functions, because they do not require this kind of relocation. It is
still possible to get GOT relocations when the user explicitly asks for
it with musttail or -tailcallopt, both of which are supposed to
guarantee TCO.
Based on a patch by Chih-hung Hsieh.
Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk
Subscribers: joerg, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D9799
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238487 91177308-0d34-0410-b5e6-96231b3b80d8
It caused a smaller number of failures than the previous attempt at committing but still caused a couple on the llvm-linux-mips builder. Reverting while I investigate the remainder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238483 91177308-0d34-0410-b5e6-96231b3b80d8
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.
This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.
Fixes PR9199.
Differential Revision: http://reviews.llvm.org/D9508
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238473 91177308-0d34-0410-b5e6-96231b3b80d8
BranchProbabilityInfo was leaking 3MB of memory when running 'opt -O2 verify-uselistorder.lto.bc'. This was due to the Weights member not being cleared once the pass is no longer needed.
This adds the releaseMemory override to clear that field. The other fields are cleared at the end of runOnFunction so can stay there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238462 91177308-0d34-0410-b5e6-96231b3b80d8