When we're recalculating the feature set of the subtarget, we need to have the
ivars in their initial state.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175320 91177308-0d34-0410-b5e6-96231b3b80d8
assembler should also accept a two arg form, as the docuemntation specifies that
the first (destination) register is optional.
This patch uses TwoOperandAliasConstraint to add the two argument form.
It also fixes an 80-column formatting problem in:
test/MC/ARM/neon-bitwise-encoding
<rdar://problem/12909419> Clang rejects ARM NEON assembly instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175221 91177308-0d34-0410-b5e6-96231b3b80d8
The parser will now accept instructions with alignment specifiers written like
vld1.8 {d16}, [r0:64]
, while also still accepting the incorrect syntax
vld1.8 {d16}, [r0, :64]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175164 91177308-0d34-0410-b5e6-96231b3b80d8
Lower reverse shuffles to a vrev64 and a vext instruction instead of the default
legalization of storing and loading to the stack. This is important because we
generate reverse shuffles in the loop vectorizer when we reverse store to an
array.
uint8_t Arr[N];
for (i = 0; i < N; ++i)
Arr[N - i - 1] = ...
radar://13171760
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174929 91177308-0d34-0410-b5e6-96231b3b80d8
function is successfully handled by fast-isel. That's because function
arguments are *always* handled by SDISel. Introduce FastLowerArguments to
allow each target to provide hook to handle formal argument lowering.
As a proof-of-concept, add ARMFastIsel::FastLowerArguments to handle
functions with 4 or fewer scalar integer (i8, i16, or i32) arguments. It
completely eliminates the need for SDISel for trivial functions.
rdar://13163905
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174855 91177308-0d34-0410-b5e6-96231b3b80d8
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.
radar://13097204
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174713 91177308-0d34-0410-b5e6-96231b3b80d8
Use the validateTargetOperandClass() hook to match literal '#0' operands in
InstAlias definitions. Previously this required per-instruction C++ munging of the
operand list, but not is handled as a natural part of the matcher. Much better.
No additional tests are required, as the pre-existing tests for these instructions
exercise the new behaviour as being functionally equivalent to the old.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174488 91177308-0d34-0410-b5e6-96231b3b80d8
Swift has a renaming dependency if we load into D subregisters. We don't have a
way of distinguishing between insertelement operations of values from loads and
other values. Therefore, we are pessimistic for now (The performance problem
showed up in example 14 of gcc-loops).
radar://13096933
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174300 91177308-0d34-0410-b5e6-96231b3b80d8
infrastructure on MCStreamer to test for whether there is an
MCELFStreamer object available.
This is just a cleanup on the AsmPrinter side of things, moving ad-hoc
tests of random APIs to a direct type query. But the AsmParser
completely broken. There were no tests, it just blindly cast its
streamer to an MCELFStreamer and started manipulating it.
I don't have a test case -- this actually failed on LLVM's own
regression test suite. Unfortunately the failure only appears when the
stars, compilers, and runtime align to misbehave when we read a pointer
to a formatted_raw_ostream as-if it were an MCAssembler. =/
UBSan would catch this immediately.
Many thanks to Matt for doing about 80% of the debugging work here in
GDB, Jim for helping to explain how exactly to fix this, and others for
putting up with the hair pulling that ensued during debugging it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174118 91177308-0d34-0410-b5e6-96231b3b80d8
isa<> and dyn_cast<>. In several places, code is already hacking around
the absence of this, and there seem to be several interfaces that might
be lifted and/or devirtualized using this.
This change was based on a discussion with Jim Grosbach about how best
to handle testing for specific MCStreamer subclasses. He said that this
was the correct end state, and everything else was too hacky so
I decided to just make it so.
No functionality should be changed here, this is just threading the kind
through all the constructors and setting up the classof overloads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174113 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for AArch64 (ARM's 64-bit architecture) to
LLVM in the "experimental" category. Currently, it won't be built
unless requested explicitly.
This initial commit should have support for:
+ Assembly of all scalar (i.e. non-NEON, non-Crypto) instructions
(except the late addition CRC instructions).
+ CodeGen features required for C++03 and C99.
+ Compilation for the "small" memory model: code+static data <
4GB.
+ Absolute and position-independent code.
+ GNU-style (i.e. "__thread") TLS.
+ Debugging information.
The principal omission, currently, is performance tuning.
This patch excludes the NEON support also reviewed due to an outbreak of
batshit insanity in our legal department. That will be committed soon bringing
the changes to precisely what has been approved.
Further reviews would be gratefully received.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174054 91177308-0d34-0410-b5e6-96231b3b80d8
and update ELF header e_flags.
Currently gathering information such as symbol,
section and data is done by collecting it in an
MCAssembler object. From MCAssembler and MCAsmLayout
objects ELFObjectWriter::WriteObject() forms and
streams out the ELF object file.
This patch just adds a few members to the MCAssember
class to store and access the e_flag settings. It
allows for runtime additions to the e_flag by
assembler directives. The standalone assembler can
get to MCAssembler from getParser().getStreamer().getAssembler().
This patch is the generic infrastructure and will be
followed by patches for ARM and Mips for their target
specific use.
Contributer: Jack Carter
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173882 91177308-0d34-0410-b5e6-96231b3b80d8
Changing ARMBaseTargetMachine to return ARMTargetLowering intead of
the generic one (similar to x86 code).
Tests showing which instructions were added to cast when necessary
or cost zero when not. Downcast to 16 bits are not lowered in NEON,
so costs are not there yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173849 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM and Thumb variants of LDREXD and STREXD have different constraints and
take different operands. Previously the code expanding atomic operations didn't
take this into account and asserted in Thumb mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173780 91177308-0d34-0410-b5e6-96231b3b80d8
conditions are met:
1. They share the same operand and are in the same BB.
2. Both outputs are used.
3. The target has a native instruction that maps to ISD::FSINCOS node or
the target provides a sincos library call.
Implemented the generic optimization in sdisel and enabled it for
Mac OSX. Also added an additional optimization for x86_64 Mac OSX by
using an alternative entry point __sincos_stret which returns the two
results in xmm0 / xmm1.
rdar://13087969
PR13204
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173755 91177308-0d34-0410-b5e6-96231b3b80d8
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.
I converted some in-order scheduling tests to A2. Hal is working on
more test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171946 91177308-0d34-0410-b5e6-96231b3b80d8
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171765 91177308-0d34-0410-b5e6-96231b3b80d8
Absent a Contributor's License Agreement (CLA) with an LLVM legal entity and as
reviewed and agreed with Chris Lattner, add a patent license covering future
contributions from ARM until there is a CLA. This is to make explicit ARM's
grant of patent rights to recipients of LLVM containing ARM-contributed
material.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171721 91177308-0d34-0410-b5e6-96231b3b80d8
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171366 91177308-0d34-0410-b5e6-96231b3b80d8
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171362 91177308-0d34-0410-b5e6-96231b3b80d8
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171253 91177308-0d34-0410-b5e6-96231b3b80d8
This affords us to use std::string's allocation routines and use the destructor
for the memory management. Switching to that also means that we can use
operator==(const std::string&, const char *) to perform the string comparison
rather than resorting to libc functionality (i.e. strcmp).
Patch by Saleem Abdulrasool!
Differential Revision: http://llvm-reviews.chandlerc.com/D230
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171042 91177308-0d34-0410-b5e6-96231b3b80d8
This function is often used to decorate dangling instructions, so a
context reference is required to allocate memory for the operands.
Also add a corresponding MachineInstrBuilder method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170797 91177308-0d34-0410-b5e6-96231b3b80d8
are more expensive than the non-flag setting variant. Teach thumb2 size
reduction pass to avoid generating them unless we are optimizing for size.
rdar://12892707
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170728 91177308-0d34-0410-b5e6-96231b3b80d8
MC disassembler clients (LLDB) are interested in querying if an
instruction may affect control flow other than by virtue of being
an explicit branch instruction. For example, instructions which
write directly to the PC on some architectures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170610 91177308-0d34-0410-b5e6-96231b3b80d8
Use the version that also takes an MF reference instead.
It would technically be possible to extract an MF reference from the MI
as MI->getParent()->getParent(), but that would not work for MIs that
are not inserted into any basic block.
Given the reasonably small number of places this constructor was used at
all, I preferred the compile time check to a run time assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170588 91177308-0d34-0410-b5e6-96231b3b80d8
((x & 0xff00) >> 8) << 2
to
(x >> 6) & 0x3fc
This is general goodness since it folds a left shift into the mask. However,
the trailing zeros in the mask prevents the ARM backend from using the bit
extraction instructions. And worse since the mask materialization may require
an addition instruction. This comes up fairly frequently when the result of
the bit twiddling is used as memory address. e.g.
= ptr[(x & 0xFF0000) >> 16]
We want to generate:
ubfx r3, r1, #16, #8
ldr.w r3, [r0, r3, lsl #2]
vs.
mov.w r9, #1020
and.w r2, r9, r1, lsr #14
ldr r2, [r0, r2]
Add a late ARM specific isel optimization to
ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the
'base + offset' address computation; change the mask to one which doesn't have
trailing zeros and enable the use of ubfx.
Note the optimization has to be done late since it's target specific and we
don't want to change the DAG normalization. It's also fairly restrictive
as shifter operands are not always free. It's only done for lsh 1 / 2. It's
known to be free on some cpus and they are most common for address
computation.
This is a slight win for blowfish, rijndael, etc.
rdar://12870177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170581 91177308-0d34-0410-b5e6-96231b3b80d8
To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency).
Disables this check when code size is the priority, i.e., code is compiled with -Oz.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170462 91177308-0d34-0410-b5e6-96231b3b80d8
instruction.
This isn't strictly necessary at the moment because Thumb2SizeReduction
also copies all MI flags from the old instruction to the new. However, a
future patch will make that kind of direct flag tampering illegal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170395 91177308-0d34-0410-b5e6-96231b3b80d8
TargetLowering::getRegClassFor).
Some isSimple() guards were missing, or getSimpleVT() were hoisted too
far, resulting in asserts on valid LLVM assembly input.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170336 91177308-0d34-0410-b5e6-96231b3b80d8
immediate generates the narrow version. Needed when doing round-trip
assemble/disassemble testing using the alternate syntax that specifies
'pc' directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170255 91177308-0d34-0410-b5e6-96231b3b80d8
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
This is the second attempt. In the first attempt (r169837), a few
getSimpleVT() were hoisted too far, detected by bootstrap failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170104 91177308-0d34-0410-b5e6-96231b3b80d8
Add R_ARM_NONE and R_ARM_PREL31 relocation types
to MCExpr. Both of them will be used while
generating .ARM.extab and .ARM.exidx sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169965 91177308-0d34-0410-b5e6-96231b3b80d8
mention the inline memcpy / memset expansion code is a mess?
This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169959 91177308-0d34-0410-b5e6-96231b3b80d8
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169954 91177308-0d34-0410-b5e6-96231b3b80d8
Pre-regalloc frame allocation and referencing has been on by default
for ages. No need for the testing option that disables it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169931 91177308-0d34-0410-b5e6-96231b3b80d8
ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined
term for something like integer immediate materialization. It is always possible
to materialize an integer immediate. Whether to use it for memcpy expansion is
more a "cost" conceern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169929 91177308-0d34-0410-b5e6-96231b3b80d8
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169837 91177308-0d34-0410-b5e6-96231b3b80d8
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169796 91177308-0d34-0410-b5e6-96231b3b80d8
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169609 91177308-0d34-0410-b5e6-96231b3b80d8
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169536 91177308-0d34-0410-b5e6-96231b3b80d8
This is for the lldb team so most of but not all of the values are
to be printed as hex with this option. Some small values like the
scale in an X86 address were requested to printed in decimal
without the leading 0x.
There may be some tweaks need to places that may still be in
decimal that they want in hex. Specially for arm. I made my best
guess. Any tweaks from here should be simple.
I also did the best I know now with help from the C++ gurus
creating the cleanest formatImm() utility function and containing
the changes. But if someone has a better idea to make something
cleaner I'm all ears and game for changing the implementation.
rdar://8109283
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169393 91177308-0d34-0410-b5e6-96231b3b80d8