The current Intel Atom microarchitecture has a feature whereby
when a function returns early then it is slightly faster to execute
a sequence of NOP instructions to wait until the return address is ready,
as opposed to simply stalling on the ret instruction until
the return address is ready.
When compiling for X86 Atom only, this patch will run a pass,
called "X86PadShortFunction" which will add NOP instructions where less
than four cycles elapse between function entry and return.
It includes tests.
This patch has been updated to address Nadav's review comments
- Optimize only at >= O1 and don't do optimization if -Os is set
- Stores MachineBasicBlock* instead of BBNum
- Uses DenseMap instead of std::map
- Fixes placement of braces
Patch by Andy Zhang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171879 91177308-0d34-0410-b5e6-96231b3b80d8
Current targets don't have more than 256 relocations so they don't hit this
limit, but ELF64 actually allows more than 8 bits for a relocation type. These
were being truncated on AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171845 91177308-0d34-0410-b5e6-96231b3b80d8
one file where it is called as a static function. Nuke the declaration
and the definition in lib/CodeGen, along with the include of
SelectionDAG.h from this file.
There is no dependency edge from lib/CodeGen to
lib/CodeGen/SelectionDAG, so it isn't valid for a routine in lib/CodeGen
to reference the DAG. There is a dependency from
lib/CodeGen/SelectionDAG on lib/CodeGen. This breaks one violation of
this layering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171842 91177308-0d34-0410-b5e6-96231b3b80d8
make into the last commit.
Also, update the test-generation script to generate an exhaustive test for
align_to_end as well, and include the generated test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171811 91177308-0d34-0410-b5e6-96231b3b80d8
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171798 91177308-0d34-0410-b5e6-96231b3b80d8
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
o. X/C1 * C2 -> X/(C1/C2) (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)
Let MDC denote multiplication or dividion with one & only one operand being a constant
o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
(so long as the constant-folding doesn't yield any denormal or special value)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171793 91177308-0d34-0410-b5e6-96231b3b80d8
proposal. This leaves the strings in the skeleton die as strp,
but in all dwo files they're accessed now via DW_FORM_GNU_str_index.
Add support for dumping these sections and modify the fission-cu.ll
testcase to have the correct strings and form. Fix a small bug
in the fixed form sizes routine that involved out of array accesses
for the table and add a FIXME in the extractFast routine to fix
this up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171779 91177308-0d34-0410-b5e6-96231b3b80d8
code generation. Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly. The patch contains two new tests to verify the correct code
generation for these cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171778 91177308-0d34-0410-b5e6-96231b3b80d8
This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171765 91177308-0d34-0410-b5e6-96231b3b80d8
turning a code like this:
if (foo)
free(foo)
into that:
free(foo)
Move a call to free from basic block FB into FB's predecessor, P,
when the path from P to FB is taken only if the argument of free is
not equal to NULL.
Some restrictions apply on P and FB to be sure that this code motion
is profitable. Namely:
1. FB must have only one predecessor P.
2. FB must contain only the call to free plus an unconditional
branch to S.
3. P's successors are FB and S.
Because of 1., we will not increase the code size when moving the call
to free from FB to P.
Because of 2., FB will be empty after the move.
Because of 2. and 3., P's branch instruction becomes useless, so as FB
(simplifycfg will do the job).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171762 91177308-0d34-0410-b5e6-96231b3b80d8
peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171739 91177308-0d34-0410-b5e6-96231b3b80d8
already in a class, just inline the four of them. I suspect that this
class could be simplified some to not always keep distinct variables for
these things, but it wasn't clear to me how given the usage so I opted
for a trivial and mechanical translation.
This removes one of the two remaining users of a header in include/llvm
which does nothing more than define a 4 member struct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171738 91177308-0d34-0410-b5e6-96231b3b80d8
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171735 91177308-0d34-0410-b5e6-96231b3b80d8