The integrated assembler fails to properly lex arm comments when
they are adjacent to an identifier in the input stream. The reason
is that the arm comment symbol '@' is also used as symbol variant in
other assembly languages so when lexing an identifier it allows the
'@' symbol as part of the identifier.
Example:
$ cat comment.s
foo:
add r0, r0@got to parse this as a comment
$ llvm-mc -triple armv7 comment.s
comment.s:4:18: error: unexpected token in argument list
add r0, r0@got to parse this as a comment
^
This should be parsed as correctly as `add r0, r0`.
This commit modifes the assembly lexer to not include the '@' symbol
in identifiers when lexing for targets that use '@' for comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196607 91177308-0d34-0410-b5e6-96231b3b80d8
This more accurately represents the actual walk - pubnames/pubtypes are
emitted into the .o, not the .dwo, and reference the skeletons not the
full units.
Use the newly established ID->index invariant to lookup the underlying
full unit to retrieve its public names and types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196601 91177308-0d34-0410-b5e6-96231b3b80d8
This simplifies reasoning about the code and enables simple navigation
from a skeleton to its full unit. (currently there are no type unit
skeletons, so the skeleton list doesn't have the same ID == index
property)
Eventually we should get rid of this ID and just store the labels we
need as the IDs are allowing this code to create difficult to
manage/understand associations (loops over non-skeletal units are
implicitly referencing their skeletal units during pub* emission, for
example). It may be necessary to have some kind of skeleton->full unit
association and a more direct pointer or similar device would be
preferable than an index.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196600 91177308-0d34-0410-b5e6-96231b3b80d8
The current peephole optimizing for compare inst assumes an instr that
uses CPSR has an MO for ARM Cond code.However, for VSEL instructions
(vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do
they support the modification of Cond Code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196588 91177308-0d34-0410-b5e6-96231b3b80d8
Since z has no setcc instruction as such, the choice of setBooleanContents
is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent,
so we produced a branch-free form when selecting between 0 and 1,
but not when selecting between 0 and -1. This patch handles the latter
case too.
At some point I'd like to measure whether it's better to use conditional
moves for constant selects on z196, but that's future work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196578 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Rewrite asan's stack frame layout.
First, most of the stack layout logic is moved into a separte file
to make it more testable and (potentially) useful for other projects.
Second, make the frames more compact by using adaptive redzones
(smaller for small objects, larger for large objects).
Third, try to minimized gaps due to large alignments (this is hypothetical since
today we don't see many stack vars aligned by more than 32).
The frames indeed become more compact, but I'll still need to run more benchmarks
before committing, but I am sking for review now to get early feedback.
This change will be accompanied by a trivial change in compiler-rt tests
to match the new frame sizes.
Reviewers: samsonov, dvyukov
Reviewed By: samsonov
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2324
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196568 91177308-0d34-0410-b5e6-96231b3b80d8
Not only does it trigger -Wparentheses, I think the assert actually
relies on incorrect operator precedence.
Also, the grammar as questionable, but I might not know enough about the
problem at hand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196567 91177308-0d34-0410-b5e6-96231b3b80d8
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.
This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.
The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.
Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196537 91177308-0d34-0410-b5e6-96231b3b80d8
There is no reason to use std::deque here over std::vector. Thus given the
performance differences inbetween the two it makes sense to change deque to
vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196524 91177308-0d34-0410-b5e6-96231b3b80d8
We use CSEBlocks to initialize a worklist:
SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());
so it must have a deterministic order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196520 91177308-0d34-0410-b5e6-96231b3b80d8
This allows a target to use MI-Sched as an in-order scheduler that
will model strict resource conflicts without defining a processor
itinerary. Instead, the target can now use the new per-operand machine
model and define in-order resources with BufferSize=0. For example,
this would allow restricting the type of operations that can be formed
into a dispatch group. (Normally NumMicroOps is sufficient to enforce
dispatch groups).
If the intent is to model latency in in-order pipeline, as opposed to
resource conflicts, then a resource with BufferSize=1 should be
defined instead.
This feature is only casually tested as there are no in-tree targets
using it yet. However, Hal will be experimenting with POWER7.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196517 91177308-0d34-0410-b5e6-96231b3b80d8
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.
Fixes PR18129.
radar://15582184
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196508 91177308-0d34-0410-b5e6-96231b3b80d8
in case the operands are constants and its difference is |1|.
It should be possible in those cases to rematerialize the result using
MIPS's slt and similar instructions.
The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed
otherwise the optimization implemented in this patch would have been triggered
(difference between the operands was 1) and that would have changed the semantic
of the tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196498 91177308-0d34-0410-b5e6-96231b3b80d8
The structure of the code was slightly modified so that the next patch is easier to read/review.
No functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196496 91177308-0d34-0410-b5e6-96231b3b80d8
not being correctly encoded/decoded.
In more detail, immediate fields of LD/ST instructions should be
divided/multiplied by the size of the data format before encoding and
after decoding, respectively.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196494 91177308-0d34-0410-b5e6-96231b3b80d8
We were trying to fold the stack adjustment into the wrong instruction in the
situation where the entire basic-block was epilogue code. Really, it can only
ever be valid to do the folding precisely where the "add sp, ..." would be
placed so there's no need for a separate iterator to track that.
Should fix PR18136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196493 91177308-0d34-0410-b5e6-96231b3b80d8
getSymbolWithGlobalValueBase use is to create a name of a new symbol based
on the name of an existing GV. Assert that and then remove the last call
to pass true to isImplicitlyPrivate.
This gives the mangler API a 1:1 mapping from GV to names, which is what we
need to drop the mangler dependency on the target (and use an extended
datalayout instead).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196472 91177308-0d34-0410-b5e6-96231b3b80d8
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8
given
declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
call void @foo()
call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
ret void
}
We used to produce
L_foo$stub:
.indirect_symbol _foo
.ascii "\364\364\364\364\364"
_memset$stub:
.indirect_symbol _memset
.ascii "\364\364\364\364\364"
We not produce a private stub for memset too.
Stubs are not needed with recent linkers, but we still produce them for darwin8.
Thanks to David Fang for confirming that gcc used to do this too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196468 91177308-0d34-0410-b5e6-96231b3b80d8
This just extends the existing hack. It should be enough to get a reproducible bootstrap
on 32 bits.
I will open a bug to track getting a real fix for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196462 91177308-0d34-0410-b5e6-96231b3b80d8
DIEs already contain references directly to their DIEAbbrev, use that
instead of looking it up based on index.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196446 91177308-0d34-0410-b5e6-96231b3b80d8
ELF_Other_Weakref and ELF_Other_ThumbFunc seems to be LLVM
internal ELF symbol flags. These should not be emitted to
object file.
This commit defines ELF_STO_Shift for the target-defined
flags for st_other, and increase the value of
ELF_Other_Shift to 16.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196440 91177308-0d34-0410-b5e6-96231b3b80d8
Where it would use a scattered relocation entry but falls back to a
normal relocation entry because the FixupOffset is more than 24-bits.
The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where
it changes reference parameter FixedValue but then returns false to indicate
it did not create a scattered relocation entry. The fix is simply to save the
original value of the parameter FixedValue at the start of the method and
restore it if we are returning false in that case.
rdar://15526046
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196432 91177308-0d34-0410-b5e6-96231b3b80d8
ARM symbol variants are written with parens instead of @ like this:
.word __GLOBAL_I_a(target1)
This commit adds support for parsing these symbol variants in
expressions. We introduce a new flag to MCAsmInfo that indicates the
parser should use parens to parse the symbol variant. The expression
parser is modified to look for symbol variants using parens instead
of @ when the corresponding MCAsmInfo flag is true.
The MCAsmInfo parens flag is enabled only for ARM on ELF.
By adding this flag to MCAsmInfo, we are able to get rid of
redundant ARM-specific symbol variants and use the generic variants
instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new
UseParensForSymbolVariant attribute in MCAsmInfo to correctly print
the symbol variants for arm.
To achive this we need to keep a handle to the MCAsmInfo in the
MCSymbolRefExpr class that we can check when printing the symbol
variant.
Updated Tests:
Changed case of symbol variant to match the generic kind.
test/CodeGen/ARM/tls-models.ll
test/CodeGen/ARM/tls1.ll
test/CodeGen/ARM/tls2.ll
test/CodeGen/Thumb2/tls1.ll
test/CodeGen/Thumb2/tls2.ll
PR18080
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196424 91177308-0d34-0410-b5e6-96231b3b80d8
While we still have a few (~4) non-trivial comments with string
concatenation, etc that should remain conditionalized, these trivial
literal comments can be simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196416 91177308-0d34-0410-b5e6-96231b3b80d8
Since we always emit only one abbrevation section (shared by all the
compilation units in this module) there's no need for a separate label
at the start of each one (and we weren't using the CU ID anyway, so
there really was only one label). Use the section label instead and drop
the wholely unused debug_abbrev_end label.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196394 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, reuse the same MCSymbol - this should make the code easier to
follow by avoiding hard to trace dependencies between different bits of
code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196392 91177308-0d34-0410-b5e6-96231b3b80d8
This currently breaks clang/test/CodeGen/code-coverage.c. The root cause
is that the newly introduced access to Funcs[j] is out of bounds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196365 91177308-0d34-0410-b5e6-96231b3b80d8
Added additional checks for the Identifier, CfgChecksum and Name for
each GCOVFunction. Also added function names in error messages.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196356 91177308-0d34-0410-b5e6-96231b3b80d8
This splits the file-scope read() function into readGCNO() and
readGCDA(). Also broke file format read into functions that first read
the file type, then check the version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196353 91177308-0d34-0410-b5e6-96231b3b80d8
this completes the basic port of ARM constant islands to Mips16.
More testing, code review, cleanup is in order but basically everything
seems to be working. A bug in gas is preventing some of the runtime
testing but I hope to resolve this soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196331 91177308-0d34-0410-b5e6-96231b3b80d8
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)
This fixes, for example, calling stringstream::str.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196312 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a logic bug pointed out by Juraj Ivancic.
No behavior change because none of the in-tree clients of
cl::ExpandResponseFiles check the return value. In this case, the
@prefixed arguments are left in the command line. Downstream command
line processing has the opportunity to emit errors about it, so this
isn't that bad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196306 91177308-0d34-0410-b5e6-96231b3b80d8
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.
Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy. The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196267 91177308-0d34-0410-b5e6-96231b3b80d8
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
not be issued before 'call' insn. There're other cases where helper
calls will be inserted not limited to epilog. These helper calls do
not follow the standard calling convention and won't clobber any YMM
registers. (So far, all call conventions will clobber any or part of
YMM registers.)
This patch enhances the previous fix to cover more cases 'vzerosupper' should
not be inserted by checking if that function call won't clobber any YMM
registers and skipping it if so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196261 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of asking the user to specify a single file to output coverage
info and defaulting to STDOUT, llvm-cov now creates files for each
source file with a naming system of: <source filename> + ".llcov".
This is what gcov does and although it can clutter the working directory
with numerous coverage files, it will be easier to hook the llvm-cov
output to tools which operate on this assumption (such as lcov).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196184 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful for debugging issues in the BlockFrequency implementation
since one can easily visualize where probability mass and other errors
occur in the propagation.
This is the MI version of r194654.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196183 91177308-0d34-0410-b5e6-96231b3b80d8
and emitted per function and CU. Begins coalescing ranges as a first
class entity through debug info. No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196178 91177308-0d34-0410-b5e6-96231b3b80d8
Each line stores all the blocks that execute on that line, instead of
only storing the line counts previously accumulated. This provides more
information for each line, and will be useful for options in enabling
block and branch information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196177 91177308-0d34-0410-b5e6-96231b3b80d8
Added GCOVEdge which are simple structs owned by the GCOVFunction that
stores the source and destination GCOVBlocks, as well as the counts.
Changed GCOVBlocks so that it stores a vector of source GCOVEdges and a
vector of destination GCOVEdges, rather than just the block number.
Storing the block number was only useful for knowing the number of edges
and for debug info. Using a struct is useful for traversing the edges,
especially back edges which may be needed later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196175 91177308-0d34-0410-b5e6-96231b3b80d8
PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer
which did only one thing: filtered out nodes in EmitInstruction for which
DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo
instructions. As far as I can tell, this is no longer true, and so we can use
ScoreboardHazardRecognizer directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196171 91177308-0d34-0410-b5e6-96231b3b80d8
Add a helper function getDebugInfoVersionFromModule to return the debug info
version number for a module.
"Verifier/module-flags-1.ll" checks for verification errors.
It will seg fault when calling getDebugInfoVersionFromModule because of the
incorrect format for module flags in the testing case. We make
getModuleFlagsMetadata more robust by checking for error conditions.
PR17982
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196158 91177308-0d34-0410-b5e6-96231b3b80d8
Header/cpp file rename to follow immediately - just splitting out the
commits for ease of review/reading to demonstrate that the renaming
changes are entirely mechanical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196139 91177308-0d34-0410-b5e6-96231b3b80d8
MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm.
Keeping parts of the old asm printer just to print inline asm to a string that
we then parse back looks like a hack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196111 91177308-0d34-0410-b5e6-96231b3b80d8
eliminateFrameIndex() has been reworked to handle both small & large frames
with either a FP or SP.
An additional Slot is required for Scavenging spills when not using FP for large frames.
Reworked the handling of Register Scavenging.
Whether we are using an FP or not, whether it is a large frame or not,
and whether we are using a large code model or not are now independent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196091 91177308-0d34-0410-b5e6-96231b3b80d8
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196090 91177308-0d34-0410-b5e6-96231b3b80d8
When using large code model:
Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large"
The folded global address of such objects are lowered into the const pool.
During inspection it was noted that LowerConstantPool() was using a default offset of zero.
A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental.
Correct the flags emitted for explicitly specified sections.
We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize.
To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196087 91177308-0d34-0410-b5e6-96231b3b80d8
Large frame offsets are loaded from the ConstantPool.
Where possible, offsets are encoded using the smaller MKMSK instruction.
Large frame offsets can only be used when there is a frame-pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196085 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.
This should fix PR18081.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196046 91177308-0d34-0410-b5e6-96231b3b80d8
- Actually abort when an error occurred.
- Check that the frontend lookup worked when parsing length/size/type operators.
Tested by a clang test. PR18096.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196044 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a scheduling model for the POWER7 (P7) core, and enables the
machine-instruction scheduler when targeting the P7. Scheduling for the P7,
like earlier ooo PPC cores, requires considering both dispatch group hazards,
and functional unit resources and latencies. These are both modeled in a
combined itinerary. Dispatch group formation is still handled by the post-RA
scheduler (which still needs to be updated for the P7, but nevertheless does a
pretty good job).
One interesting aspect of this change is that I've also enabled to use of AA
duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark
results seem to support this decision (see below), and while this is normally
useful for in-order cores, and not for ooo cores like the P7, I think that the
dispatch slot hazards are enough like in-order resources to make the AA useful.
Test suite significant performance differences (where negative is a speedup,
and positive is a regression) vs. the current situation:
MultiSource/Benchmarks/BitBench/drop3/drop3
with AA: N/A
without AA: -28.7614% +/- 19.8356%
(significantly against AA)
MultiSource/Benchmarks/FreeBench/neural/neural
with AA: -17.7406% +/- 11.2712%
without AA: N/A
(significantly in favor of AA)
MultiSource/Benchmarks/SciMark2-C/scimark2
with AA: -11.2079% +/- 1.80543%
without AA: -11.3263% +/- 2.79651%
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt
with AA: -41.8649% +/- 17.0053%
without AA: -34.5256% +/- 23.7072%
MultiSource/Benchmarks/mafft/pairlocalalign
with AA: 25.3016% +/- 17.8614%
without AA: 38.6629% +/- 14.9391%
(significantly in favor of AA)
MultiSource/Benchmarks/sim/sim
with AA: N/A
without AA: 13.4844% +/- 7.18195%
(significantly in favor of AA)
SingleSource/Benchmarks/BenchmarkGame/Large/fasta
with AA: 15.0664% +/- 6.70216%
without AA: 12.7747% +/- 8.43043%
SingleSource/Benchmarks/BenchmarkGame/puzzle
with AA: 82.2713% +/- 26.3567%
without AA: 75.7525% +/- 41.1842%
SingleSource/Benchmarks/Misc/flops-2
with AA: -37.1621% +/- 20.7964%
without AA: -35.2342% +/- 20.2999%
(significantly in favor of AA)
These are 99.5% confidence intervals from 5 runs per configuration. Regarding
the choice to turn on AA during CodeGen, of these results, four seem
significantly in favor of using AA, and one seems significantly against. I'm
not making this decision based on these numbers alone, but these results
seem consistent with results I have from other tests, and so I think that, on
balance, using AA is a win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195981 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for adding scheduling definitions for the POWER7, split some PPC
itinerary classes so that the P7's latencies and hazards can be better
described. For the most part, this means differentiating indexed from non-index
pre-increment loads and stores. Also, differentiate single from
double-precision sqrt.
No functionality change intended (except for a more-specific latency for
single-precision sqrt on the A2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195980 91177308-0d34-0410-b5e6-96231b3b80d8
This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s
when the stack frame is between 512 and 32,768 bytes in size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195973 91177308-0d34-0410-b5e6-96231b3b80d8
in constant islands for Mips16. We introdcuce JalB16 as a synomnym
for Jal16. It makes it easier to read and is also necessary because
Jal16 is a call instruction but JalB16 is being used as a branch.
Various parts of LLVM will not work properly even in this late stage of
the backend if we use what was declared as a call instruction to function
as a branch. For one, basic block labels may not get emitted in some
situations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195968 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195949 91177308-0d34-0410-b5e6-96231b3b80d8
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195948 91177308-0d34-0410-b5e6-96231b3b80d8
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.
This could alter scheduling slightly, but I don't expect a large effect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195947 91177308-0d34-0410-b5e6-96231b3b80d8
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195946 91177308-0d34-0410-b5e6-96231b3b80d8
target independent.
Most of the x86 specific stackmap/patchpoint handling was necessitated by the
use of the native address-mode format for frame index operands. PEI has now
been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing
us to use a simple, platform independent register/offset pair for frame
indexes on stackmap/patchpoints.
Notes:
- Folding is now platform independent and automatically supported.
- Emiting patchpoints with direct memory references now just involves calling
the TargetLoweringBase::emitPatchPoint utility method from the target's
XXXTargetLowering::EmitInstrWithCustomInserter method. (See
X86TargetLowering for an example).
- No more ugly platform-specific operand parsers.
This patch shouldn't change the generated output for X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195944 91177308-0d34-0410-b5e6-96231b3b80d8
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195929 91177308-0d34-0410-b5e6-96231b3b80d8
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195927 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name. This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195908 91177308-0d34-0410-b5e6-96231b3b80d8
conditional branches for very large targets. That will be the next small
patch. Everything now should in principle work as good (functionality
wise) as without constant islands so we decided at Mips/Imagination to
make constant islands the default for Mips16 now so that it will get
excercised a lot and this port is still experimentatl though hopefully soon
we will change the status. Some more cleanup and code review is in order
but things are converging fast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195902 91177308-0d34-0410-b5e6-96231b3b80d8
ARanges included even extern variables referenced by pointer non-type
template parameters even though that variable isn't part of this
compilation unit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195895 91177308-0d34-0410-b5e6-96231b3b80d8
make PIC calls a little more efficient:
1. Remove instructions setting up $gp if it is known that a function has been
called at least once.
2. Save the address of a called function in a register instead of loading
it from the GOT at every call site.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195892 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.
Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195890 91177308-0d34-0410-b5e6-96231b3b80d8
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.
v2:
- Fix encoding of Lane Mask
- Use correct register flags, so we don't overwrite the low dword
when restoring multi-dword registers.
v3:
- Register spilling seems to hang the GPU, so replace all shaders
that need spilling with a dummy shader.
v4:
- Fix *LANE definitions
- Change destination reg class for 32-bit SMRD instructions
v5:
- Remove small optimization that was crashing Serious Sam 3.
https://bugs.freedesktop.org/show_bug.cgi?id=68224https://bugs.freedesktop.org/show_bug.cgi?id=71285
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195880 91177308-0d34-0410-b5e6-96231b3b80d8
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195879 91177308-0d34-0410-b5e6-96231b3b80d8
We currently error in clang with:
"error: thread-local storage is unsupported for the current target", but we
can start to get the llvm level ready.
When compiling
template<typename T>
struct foo {
static __declspec(thread) int bar;
};
template<typename T>
__declspec(therad) int foo<T>::bar;
template struct foo<int>;
msvc produces
SECTION HEADER #3
.tls$ name
0 physical address
0 virtual address
4 size of raw data
12F file pointer to raw data (0000012F to 00000132)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C0301040 flags
Initialized Data
COMDAT; sym= "public: static int foo<int>::bar" (?bar@?$foo@H@@2HA)
4 byte align
Read Write
gcc produces a ".data$__emutls_v.<symbol>" for the testcase with
__declspec(thread) replaced with thread_local.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195849 91177308-0d34-0410-b5e6-96231b3b80d8
MO_ConstantPoolIndex is handled in printLeaMemReference.
MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195847 91177308-0d34-0410-b5e6-96231b3b80d8
It is only used for asm printing.
On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.
MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.
The MO_GlobalAddress and MO_Register cases were not tested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195824 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch we use simple names for COMDAT sections (like .text or .bss).
This matches the MSVC behavior.
When merging it is the COMDAT symbol that is used to decide if two sections
should be merged, so there is no point in building a fancy name.
This survived a bootstrap on mingw32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195798 91177308-0d34-0410-b5e6-96231b3b80d8
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195791 91177308-0d34-0410-b5e6-96231b3b80d8
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.
Fixes PR18049.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195787 91177308-0d34-0410-b5e6-96231b3b80d8
I'm adding new functionality in the sample profiler. This will
require more data to be kept around for each function, so I moved
the structure SampleProfile that we keep for each function into
a separate class.
There are no functional changes in this patch. It simply provides
a new home where to place all the new data that I need to propagate
weights through edges.
There are some other name and minor edits throughout.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195780 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
lowering where we need to check whether x is a vector type (in-reg
type) of i8, i16 or i32; otherwise, that optimization is not valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195779 91177308-0d34-0410-b5e6-96231b3b80d8
Since type units aren't in the CUMap, use the DwarfUnits list to iterate
over units for tasks such as accelerator table building.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195776 91177308-0d34-0410-b5e6-96231b3b80d8
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195773 91177308-0d34-0410-b5e6-96231b3b80d8
Code scanner ran by Sylvestre Ledru got a no_return bug
in DebugInfo.cpp. Adding the return statements that
should be there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195772 91177308-0d34-0410-b5e6-96231b3b80d8
Short description.
This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:
void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)
foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.
As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.
The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195769 91177308-0d34-0410-b5e6-96231b3b80d8
of the two analysis managers into a CRTP base class that can be shared
and re-used in building any analysis manager. This will in turn simplify
adding yet another analysis manager to the system.
The base class provides all of the interface sugar for the analysis
manager delegating the functionality back through DerivedT methods which
operate on simple pass IDs. It also provides the pass registration,
storage, and lookup system which is common across the various
formulations of analysis managers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195747 91177308-0d34-0410-b5e6-96231b3b80d8
We would wrongly transform the testcase into the equivalent of an AND with 1.
The problem was that, when testing whether the shifted-in bits of the right
shift were significant, we used the width of the final zero-extended result
rather than the width of the shifted value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195731 91177308-0d34-0410-b5e6-96231b3b80d8
CallGraph.
This makes the CallGraph a totally generic analysis object that is the
container for the graph data structure and the primary interface for
querying and manipulating it. The pass logic is separated into its own
class. For compatibility reasons, the pass provides wrapper methods for
most of the methods on CallGraph -- they all just forward.
This will allow the new pass manager infrastructure to provide its own
analysis pass that constructs the same CallGraph object and makes it
available. The idea is that in the new pass manager, the analysis pass's
'run' method returns a concrete analysis 'result'. Here, that result is
a 'CallGraph'. The 'run' method will typically do only minimal work,
deferring much of the work into the implementation of the result object
in order to be lazy about computing things, but when (like DomTree)
there is *some* up-front computation, the analysis does it prior to
handing the result back to the querying pass.
I know some of this is fairly ugly. I'm happy to change it around if
folks can suggest a cleaner interim state, but there is going to be some
amount of unavoidable ugliness during the transition period. The good
thing is that this is very limited and will naturally go away when the
old pass infrastructure goes away. It won't hang around to bother us
later.
Next up is the initial new-PM-style call graph analysis. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195722 91177308-0d34-0410-b5e6-96231b3b80d8
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)
Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195712 91177308-0d34-0410-b5e6-96231b3b80d8
implementation. Silliness, but it'll be a trivial performance
optimization. This should clear up a failure on the vg_leak bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195704 91177308-0d34-0410-b5e6-96231b3b80d8
r195698 moved the type unit checking up into getOrCreateTypeDIE so
remove the redundant check and fold the functions back together again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195700 91177308-0d34-0410-b5e6-96231b3b80d8
It might be possible to eventually use one data structure, but I haven't
looked at the exact criteria used for accelerator tables and pubtypes to
see if there's good reason for the differences between the two or not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195696 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.
The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
Crusoe, Microsoft VirtualBox - see
https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
Via c3 and c3-Nehemiah don't have nopl
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195679 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes a bug in the assembler that was causing bad code to
be emitted. When switching modes in an assembly file (e.g. arm to
thumb mode) we would always emit the opcode from the original mode.
Consider this small example:
$ cat align.s
.code 16
foo:
add r0, r0
.align 3
add r0, r0
$ llvm-mc -triple armv7-none-linux align.s -filetype=obj -o t.o
$ llvm-objdump -triple thumbv7 -d t.o
Disassembly of section .text:
foo:
0: 00 44 add r0, r0
2: 00 f0 20 e3 blx #4195904
6: 00 00 movs r0, r0
8: 00 44 add r0, r0
This shows that we have actually emitted an arm nop (e320f000)
instead of a thumb nop. Unfortunately, this encodes to a thumb
branch which causes bad things to happen when compiling assembly
code with align directives.
The fix is to notify the ARMAsmBackend when we switch mode. The
MCMachOStreamer was already doing this correctly. This patch makes
the same change for the MCElfStreamer.
There is still a bug in the way nops are emitted for alignment
because the MCAlignment fragment does not store the correct mode.
The ARMAsmBackend will emit nops for the last mode it knew about. In
the example above, we still generate an arm nop if we add a `.code
32` to the end of the file.
PR18019
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195677 91177308-0d34-0410-b5e6-96231b3b80d8
These are handled almost identically to static mode (and ELF's global address
materialisation), except that a symbol may have "$non_lazy_ptr" appended. This
can be handled by passing appropriate flags along with the instruction instead
of using entirely separate pseudo-instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195655 91177308-0d34-0410-b5e6-96231b3b80d8
These should not use COMDATs. GNU as uses .bss for .lcomm and section 0 for
.comm.
Given
static int a;
int b;
MSVC puts both in .bss. This patch then puts both .comm and .lcomm on .bss. With
this change we agree with gas on .lcomm, are much closer on .comm and clang-cl
matches msvc on the above example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195654 91177308-0d34-0410-b5e6-96231b3b80d8
There is no sane way for an LEApcrel (= single ADR) instruction to generate a
global address on any ARM target I know of. Fortunately, no-one was trying to
any more, but there were vestigial patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195644 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.
Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.
Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.
Reviewers: resistor
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2251
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195635 91177308-0d34-0410-b5e6-96231b3b80d8
to what is needed for constant islands. The prescan method for Mips16 constant
islands will eventually go away. It is only temporary and should be done
earlier when the instructions are first created or from the DAG. If we keep
it here we need to handle better the situation where constant islands
is called multiple times since don't want to prescan more than once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195569 91177308-0d34-0410-b5e6-96231b3b80d8
I had to move some code and I moved a declaration forward past it's first use
in the function but by nutty coincidence there was another variable of the same
name and type and with completely unrelated function that was declared globally
in the class so no compilation error ensued.
It required some unusual conditions for it to even matter. Caused test
case casts.c in test-suite to fail during compilation with a duplicate
symbol error. I would have noticed it during final code review for this port.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195565 91177308-0d34-0410-b5e6-96231b3b80d8
proxy. This lets a function pass query a module analysis manager.
However, the interface is const to indicate that only cached results can
be safely queried.
With this, I think the new pass manager is largely functionally complete
for modules and analyses. Still lots to test, and need to generalize to
SCCs and Loops, and need to build an adaptor layer to support the use of
existing Pass objects in the new managers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195538 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids the need for an extra list of SkeletonCUs and associated
cleanup while staging things to be cleaner for further type unit
improvements.
Also hopefully fixes a memory leak introduced in r195166.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195536 91177308-0d34-0410-b5e6-96231b3b80d8
SLP vectorization. Based on the code in BBVectorizer.
Fixes PR17741.
Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195528 91177308-0d34-0410-b5e6-96231b3b80d8
results.
This is the last piece of infrastructure needed to effectively support
querying *up* the analysis layers. The next step will be to introduce
a proxy which provides access to those layers with appropriate use of
const to direct queries to the safe interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195525 91177308-0d34-0410-b5e6-96231b3b80d8
a non-relocatable number offset.
One fixme to make the ranges as discrete data structures and
have range lists explicitly represented rather than as a list of symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195523 91177308-0d34-0410-b5e6-96231b3b80d8
one function's analyses are invalidated at a time. Also switch the
preservation of the proxy to *fully* preserve the lower (function)
analyses.
Combined, this gets both upward and downward analysis invalidation to
a point I'm happy with:
- A function pass invalidates its function analyses, and its parent's
module analyses.
- A module pass invalidates all of its functions' analyses including the
set of which functions are in the module.
- A function pass can preserve a module analysis pass.
- If all function passes preserve a module analysis pass, that
preservation persists. If any doesn't the module analysis is
invalidated.
- A module pass can opt into managing *all* function analysis
invalidation itself or *none*.
- The conservative default is none, and the proxy takes the maximally
conservative approach that works even if the set of functions has
changed.
- If a module pass opts into managing function analysis invalidation it
has to propagate the invalidation itself, the proxy just does nothing.
The only thing really missing is a way to query for a cached analysis or
nothing at all. With this, function passes can more safely request
a cached module analysis pass without fear of it accidentally running
part way through.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195519 91177308-0d34-0410-b5e6-96231b3b80d8
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195514 91177308-0d34-0410-b5e6-96231b3b80d8
gcov expects every function to contain an entry block that
unconditionally branches into the next block. clang does not implement
basic blocks in this manner, so gcov did not output correct branch info
if the entry block branched to multiple blocks.
This change splits every function's entry block into an empty block and
a block with the rest of the instructions. The instrumentation code will
take care of the rest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195513 91177308-0d34-0410-b5e6-96231b3b80d8
We can share the implementation between StripSymbols and dropping debug info
for metadata versions that do not match.
Also update the comments to match the implementation. A follow-on patch will
drop the "Debug Info Version" module flag in StripDebugInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195505 91177308-0d34-0410-b5e6-96231b3b80d8
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.
rdar://15386341
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195496 91177308-0d34-0410-b5e6-96231b3b80d8
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195493 91177308-0d34-0410-b5e6-96231b3b80d8
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Based on work by Andrea Di Biagio.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195491 91177308-0d34-0410-b5e6-96231b3b80d8
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.
original message:
Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195477 91177308-0d34-0410-b5e6-96231b3b80d8
- When simplifying the mask generation for BLEND, check whether that mask is
also consumed by other non-BLEND insns. If true, skip that simplification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195476 91177308-0d34-0410-b5e6-96231b3b80d8
I've no idea why I decided to handle TMxx differently from all the other
high/low logic operations, but it was a stupid thing to do. The high
registers aren't available as separate 32-bit registers on z10,
so subreg_h32 can't be used on a GR64 there.
I've normally been testing with z196 and with -O3 and so hadn't noticed
this until now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195473 91177308-0d34-0410-b5e6-96231b3b80d8
Also refactor the logic into a helper function. This is an important improvement
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195470 91177308-0d34-0410-b5e6-96231b3b80d8