LLVM backend for 6502
Go to file
Chandler Carruth 450b39e971 [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225061 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-01 11:54:38 +00:00
autoconf [multilib] Add support to the autoconf build to substitute 2014-12-29 11:58:17 +00:00
bindings [OCaml] [cmake] Use LLVM_LIBRARY_DIR instead of LLVM_LIBRARY_OUTPUT_INTDIR. 2014-12-30 03:24:07 +00:00
cmake [OCaml] [cmake] Use LLVM_LIBRARY_DIR instead of LLVM_LIBRARY_OUTPUT_INTDIR. 2014-12-30 03:24:07 +00:00
docs Fixed 2 minor typos in the documentation. 2014-12-29 09:47:51 +00:00
examples Once more on the cmake build. nativecodegen->native on the dependencies. 2014-12-08 18:24:06 +00:00
include Just use a using directive in SmallMapVector instead of inheriting from MapVector itself. 2015-01-01 08:05:41 +00:00
lib [SROA] Teach SROA how to much more intelligently handle split loads and 2015-01-01 11:54:38 +00:00
projects [cmake] Use the external project machinery for libcxxabi so that it can 2014-07-25 10:27:40 +00:00
test [SROA] Teach SROA how to much more intelligently handle split loads and 2015-01-01 11:54:38 +00:00
tools [cmake] Teach the llvm-config program to respect LLVM_LIBDIR_SUFFIX. 2014-12-29 11:16:25 +00:00
unittests Add 2x constructors for TinyPtrVector, one that takes in one elemenet and the other that takes in an ArrayRef<EltTy> 2014-12-31 23:33:24 +00:00
utils [X86] Fix disassembly of absolute moves to work correctly in 16 and 32-bit modes with all 4 combinations of OpSize and AdSize prefixes being present or not. 2014-12-31 07:07:31 +00:00
.arcconfig Updated phabricator server. 2014-04-07 03:57:04 +00:00
.clang-format Test commit. 2014-03-02 13:08:46 +00:00
.clang-tidy Enable display of compiler diagnostics in clang-tidy by default. 2014-10-29 17:29:38 +00:00
.gitignore Initial version of Go bindings. 2014-10-16 22:48:02 +00:00
CMakeLists.txt [py3] Teach the CMake build to reject Python versions older than 2.7. 2014-12-29 19:36:05 +00:00
CODE_OWNERS.TXT Add myself as SystemZ code owner 2014-12-18 19:27:50 +00:00
configure [multilib] Add support to the autoconf build to substitute 2014-12-29 11:58:17 +00:00
CREDITS.TXT Rise from the dead and update personal info 2014-08-25 17:51:04 +00:00
LICENSE.TXT Remove projects/sample. 2014-03-12 22:40:22 +00:00
llvm.spec.in
LLVMBuild.txt Remove the very substantial, largely unmaintained legacy PGO 2013-10-02 15:42:23 +00:00
Makefile [configure/make] Propagate names of build host tools when making BuildTools 2014-03-25 21:45:41 +00:00
Makefile.common
Makefile.config.in Add a check for misbehaving -Wcomment from gcc-4.7 and add 2014-11-05 00:35:15 +00:00
Makefile.rules Add a check for misbehaving -Wcomment from gcc-4.7 and add 2014-11-05 00:35:15 +00:00
README.txt [TEST-COMMIT] As per Developer Policy, Added a blank line. 2014-12-06 00:38:39 +00:00

Low Level Virtual Machine (LLVM)
================================

This directory and its subdirectories contain source code for the Low Level
Virtual Machine, a toolkit for the construction of highly optimized compilers,
optimizers, and runtime environments.

LLVM is open source software. You may freely distribute it under the terms of
the license agreement found in LICENSE.txt.

Please see the documentation provided in docs/ for further
assistance with LLVM, and in particular docs/GettingStarted.rst for getting
started with LLVM and docs/README.txt for an overview of LLVM's
documentation setup.

If you're writing a package for LLVM, see docs/Packaging.rst for our
suggestions.