Commit Graph

19 Commits

Author SHA1 Message Date
Chandler Carruth
5cd79bc14c Perform partial SROA on the helper hashing structure. I really wish the
optimizers could do this for us, but expecting partial SROA of classes
with template methods through cloning is probably expecting too much
heroics. With this change, the begin/end pointer pairs which indicate
the status of each loop iteration are actually passed directly into each
layer of the combine_data calls, and the inliner has a chance to see
when most of the combine_data function could be deleted by inlining.
Similarly for 'length'.

We have to be careful to limit the places where in/out reference
parameters are used as those will also defeat the inliner / optimizers
from properly propagating constants.

With this change, LLVM is able to fully inline and unroll the hash
computation of small sets of values, such as two or three pointers.
These now decompose into essentially straight-line code with no loops or
function calls.

There is still one code quality problem to be solved with the hashing --
LLVM is failing to nuke the alloca. It removes all loads from the
alloca, leaving only lifetime intrinsics and dead(!!) stores to the
alloca. =/ Very unfortunate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154264 91177308-0d34-0410-b5e6-96231b3b80d8
2012-04-07 20:01:31 +00:00
Chandler Carruth
005874056e Fix a silly restriction on the fast-path for hash_combine_range. This
caused several clients to select the slow variation. =[ This is extra
annoying because we don't have any realistic way of testing this -- by
design, these two functions *must* compute the same value.

Found while inspecting the output of some benchmarks I'm working on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152369 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-09 02:49:38 +00:00
Chandler Carruth
d4d8b2a7f6 Add support to the hashing infrastructure for automatically hashing both
integral and enumeration types. This is accomplished with a bit of
template type trait magic. Thanks to Richard Smith for the core idea
here to detect viable types by detecting the set of types which can be
default constructed in a template parameter.

This is used (in conjunction with a system for detecting nullptr_t
should it exist) to provide an is_integral_or_enum type trait that
doesn't need a whitelist or direct compiler support.

With this, the hashing is extended to the more general facility. This
will be used in a subsequent commit to hashing more things, but I wanted
to make sure the type trait magic went through the build bots separately
in case other compilers don't like this formulation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152217 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-07 09:32:32 +00:00
Chandler Carruth
344224b3a3 Remove an accidental cut/paste of a comment into the middle of
a function. Dunno how I missed this when going through code...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152196 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-07 02:33:06 +00:00
Chandler Carruth
179bc7cb59 Switch to a C-style cast here to silence a brain-dead MSVC warning. It
complains about the truncation of a 64-bit constant to a 32-bit value
when size_t is 32-bits wide, but *only with static_cast*!!! The exact
signal that should *silence* such a warning, and in fact does silence it
with both GCC and Clang.

Anyways, this was causing grief for all the MSVC builds, so pointless
change made. Thanks to Nikola on IRC for confirming that this works.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152021 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-05 09:56:12 +00:00
Chandler Carruth
9406da6e66 Teach the hashing facilities how to hash std::string objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152000 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-04 10:23:15 +00:00
Daniel Dunbar
a2a9b9e775 hash_state: Don't use initialization target during initialization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151959 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-03 00:35:48 +00:00
Benjamin Kramer
171cda96c8 Fix indentation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151932 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 19:19:34 +00:00
Benjamin Kramer
0c7374d87e Hashing: microoptimize a truncate on 64 bit away. This currently blocks dead code eliminating the conditional.
The optimizer should handle this eventually, but currently LVI isn't really designed for this kind of stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151918 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 15:34:35 +00:00
Chandler Carruth
dc62a90696 Make the hashing algorithm Endian neutral. This is a bit annoying, but
folks who know something about PPC tell me that the byte swap is crazy
fast and without this the bit mixture would actually be different. It
might not be worse, but I've not measured it and so I'd rather not trust
it. This way, the algorithm is identical on both endianness hosts. I'll
look into any performance issues etc stemming from this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151892 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 11:16:10 +00:00
Chandler Carruth
1c1448984d Simplify the pair optimization. Rather than using complex type traits,
just ensure that the number of bytes in the pair is the sum of the bytes
in each side of the pair. As long as thats true, there are no extra
bytes that might be padding.

Also add a few tests that previously would have slipped through the
checking. The more accurate checking mechanism catches these and ensures
they are handled conservatively correctly.

Thanks to Duncan for prodding me to do this right and more simply.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151891 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 10:56:40 +00:00
Chandler Carruth
4d628e200f We really want to hash pairs of directly-hashable data as directly
hashable data. This matters when we have pair<T*, U*> as a key, which is
quite common in DenseMap, etc. To that end, we need to detect when this
is safe. The requirements on a generic std::pair<T, U> are:

1) Both T and U must satisfy the existing is_hashable_data trait. Note
   that this includes the requirement that T and U have no internal
   padding bits or other bits not contributing directly to equality.
2) The alignment constraints of std::pair<T, U> do not require padding
   between consecutive objects.
3) The alignment constraints of U and the size of T do not conspire to
   require padding between the first and second elements.

Grow two somewhat magical traits to detect this by forming a pod
structure and inspecting offset artifacts on it. Hopefully this won't
cause any compilers to panic.

Added and adjusted tests now that pairs, even nested pairs, are treated
as just sequences of data.

Thanks to Jeffrey Yasskin for helping me sort through this and reviewing
the somewhat subtle traits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151883 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 09:26:36 +00:00
Chandler Carruth
c7384cfc7a Add support for hashing pairs by delegating to each sub-object. There is
an open question of whether we can do better than this by treating pairs
as boring data containers and directly hashing the two subobjects. This
at least makes the API reasonable.

In order to make this change, I reorganized the header a bit. I lifted
the declarations of the hash_value functions up to the top of the header
with their doxygen comments as these are intended for users to interact
with. They shouldn't have to wade through implementation details. I then
defined them at the very end so that they could be defined in terms of
hash_combine or any other hashing infrastructure.

Added various pair-hashing unittests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151882 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 08:32:29 +00:00
Chandler Carruth
4166989f10 Remove the misguided extension here that reserved two special values in
the hash_code. I'm not sure what I was thinking here, the use cases for
special values are in the *keys*, not in the hashes of those keys.

We can always resurrect this if needed, or clients can accomplish the
same goal themselves. This makes the general case somewhat faster (~5
cycles faster on my machine) and smaller with less branching.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151865 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-02 00:48:38 +00:00
Chandler Carruth
b4d023503b Fix two warnings in this code that I missed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151839 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-01 21:45:51 +00:00
Chandler Carruth
0b66c6fca2 Rewrite LLVM's generalized support library for hashing to follow the API
of the proposed standard hashing interfaces (N3333), and to use
a modified and tuned version of the CityHash algorithm.

Some of the highlights of this change:
 -- Significantly higher quality hashing algorithm with very well
    distributed results, and extremely few collisions. Should be close to
    a checksum for up to 64-bit keys. Very little clustering or clumping of
    hash codes, to better distribute load on probed hash tables.
 -- Built-in support for reserved values.
 -- Simplified API that composes cleanly with other C++ idioms and APIs.
 -- Better scaling performance as keys grow. This is the fastest
    algorithm I've found and measured for moderately sized keys (such as
    show up in some of the uniquing and folding use cases)
 -- Support for enabling per-execution seeds to prevent table ordering
    or other artifacts of hashing algorithms to impact the output of
    LLVM. The seeding would make each run different and highlight these
    problems during bootstrap.

This implementation was tested extensively using the SMHasher test
suite, and pased with flying colors, doing better than the original
CityHash algorithm even.

I've included a unittest, although it is somewhat minimal at the moment.
I've also added (or refactored into the proper location) type traits
necessary to implement this, and converted users of GeneralHash over.

My only immediate concerns with this implementation is the performance
of hashing small keys. I've already started working to improve this, and
will continue to do so. Currently, the only algorithms faster produce
lower quality results, but it is likely there is a better compromise
than the current one.

Many thanks to Jeffrey Yasskin who did most of the work on the N3333
paper, pair-programmed some of this code, and reviewed much of it. Many
thanks also go to Geoff Pike Pike and Jyrki Alakuijala, the original
authors of CityHash on which this is heavily based, and Austin Appleby
who created MurmurHash and the SMHasher test suite.

Also thanks to Nadav, Tobias, Howard, Jay, Nick, Ahmed, and Duncan for
all of the review comments! If there are further comments or concerns,
please let me know and I'll jump on 'em.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151822 91177308-0d34-0410-b5e6-96231b3b80d8
2012-03-01 18:55:25 +00:00
Jay Foad
b4b2688db0 Help the compiler to eliminate some dead code when hashing an array of T
where sizeof (T) is a multiple of 4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151523 91177308-0d34-0410-b5e6-96231b3b80d8
2012-02-27 11:00:17 +00:00
Jay Foad
6592eacf90 The implementation of GeneralHash::addBits broke C++ aliasing rules; fix
it with memcpy. This also fixes a problem on big-endian hosts, where
addUnaligned would return different results depending on the alignment
of the data.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151247 91177308-0d34-0410-b5e6-96231b3b80d8
2012-02-23 09:16:04 +00:00
Talin
1a4b19ef9b Hashing.h - utilities for hashing various data types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@150890 91177308-0d34-0410-b5e6-96231b3b80d8
2012-02-18 21:00:49 +00:00