llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-12-14 11:32:34 +00:00

Author	SHA1	Message	Date
Ulrich Weigand	edfd4f18bc	[PowerPC] ELFv2 function call changes This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points if the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be part of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213489 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 23:31:44 +00:00
Ulrich Weigand	76fcace66e	[MC] Pass MCSymbolData to needsRelocateWithSymbol As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213487 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 23:15:06 +00:00
Hal Finkel	160f9b9c10	[LoopVectorize] Use AA to partition potential dependency checks Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int a, float b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213486 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 23:07:52 +00:00
Ulrich Weigand	5ee5fc4c47	[PowerPC] ELFv2 MC support for .localentry directive A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by not directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a section instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213485 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 23:06:03 +00:00
Ulrich Weigand	0d9bcaacd1	[PowerPC] ELFv2 MC support for .abiversion directive ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213484 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 22:56:57 +00:00
Ulrich Weigand	e4b2165648	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213482 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 22:26:40 +00:00
Matt Arsenault	83a385a57b	R600: Add missing test for concat_vectors git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213473 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 07:13:17 +00:00
Matt Arsenault	eb957bfe32	R600/SI: Remove dead code and add missing tests. This probably was killed by some generic DAGCombiner improvements in checking the TargetBooleanContents instead of just 1. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213471 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-20 06:11:02 +00:00
Matt Arsenault	18ecf3fff3	R600/SI: implement range reduction for sin/cos These instructions can only take a limited input range, and return the constant value 1 out of range. We should do range reduction to be able to process arbitrary values. Use a FRACT instruction after normalization to achieve this. Also add a test for constant folding with the lowered code with unsafe-fp-math enabled. v2: use DAG lowering instead of intrinsic, adapt test v3: calculate constant, fold pattern into instruction definition v4: misc style fixes, add sin-fold testcase, cosmetics Patch by Grigori Goronzy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213458 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-19 18:44:39 +00:00
Hal Finkel	2350e9f6b7	[LoopVectorize] Propagate known metadata to vectorized instructions There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213452 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-19 13:33:16 +00:00
Andrea Di Biagio	5bc21c3b57	[x86] Fix wrong shuffle mask in test 'combine-vec-shuffle-3.ll'. No functional change. Function @test3c should check that the DAGCombiner is able to fold a pair of shuffles into a new shuffle with a permute mask of <6,7,2,3>. However, one of the shuffles in @test3c had a wrong permute mask; this prevented the DAGCombiner from folding the shuffles into the expected result. Now that the shuffle mask is fixed, the backend correctly folds the two shuffles in function @test3c into a single movhlps instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213451 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-19 07:52:58 +00:00
Hal Finkel	7c11695a23	Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes When we have a parameter (or call site return) with a dereferenceable attribute, it can specify the size of an array pointed to by that parameter. If we have a value for which we can accumulate a constant offset to such a parameter, then we can use that offset in a direct comparison with the size specified by the dereferenceable attribute. This enables us to handle cases like this: int foo(int a[static 3]) { return a[2]; /* this is always dereferenceable */ } git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213447 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-19 03:25:16 +00:00
Saleem Abdulrasool	b0a5225e6f	ARM: correct WoA __builtin_alloca handling on O0 When performing a dynamic stack adjustment without optimisations, we would mark SP as def and R4 as kill. This occurred as part of the expansion of a WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result would be that we would double define SP as part of an operation, which is obviously incorrect. Furthermore, the VTList for the chain had an incorrect parameter type of i32 instead of Other. Correct these to permit proper lowering of __builtin_alloca at -O0. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213442 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-19 01:29:51 +00:00
Hal Finkel	d644d17dd4	[PowerPC] 32-bit ELF PIC support This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213427 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 23:29:49 +00:00
Mark Heffernan	354f2afffd	Remove unroll pragma metadata after it is used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213412 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 21:04:33 +00:00
Eli Bendersky	bd9c6c0773	Add tests for atomic adds on floats. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213406 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 20:11:26 +00:00
Eli Bendersky	6a33571ba9	Use CHECK-LABEL where appropriate in this test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213398 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 19:32:09 +00:00
Gerolf Hoflehner	d94715e273	MergedLoadStoreMotion pass Merges equivalent loads on both sides of a hammock/diamond and hoists into into the header. Merges equivalent stores on both sides of a hammock/diamond and sinks it to the footer. Can enable if conversion and tolerate better load misses and store operand latencies. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213396 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 19:13:09 +00:00
David Peixotto	12f33da20b	MC: support different sized constants in constant pools On AArch64 the pseudo instruction ldr <reg>, =... supports both 32-bit and 64-bit constants. Add support for 64 bit constants for the pools to support the pseudo instruction fully. Changes the AArch64 ldr-pseudo tests to use 32-bit registers and adds tests with 64-bit registers. Patch by Janne Grunau! Differential Revision: http://reviews.llvm.org/D4279 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213387 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 16:05:14 +00:00
Hal Finkel	11af4b49b2	Add a dereferenceable attribute This attribute indicates that the parameter or return pointer is dereferenceable. Practically speaking, loads from such a pointer within the associated byte range are safe to speculatively execute. Such pointer parameters are common in source languages (C++ references, for example). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213385 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 15:51:28 +00:00
Tim Northover	e72ff8829e	AArch64: implement efficient f16 bitcasts Because i16 is illegal, there's no native DAG method to represent a bitcast to or from an f16 type. This meant LLVM was inserting a stack store/load pair which is really not ideal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213378 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:07:05 +00:00
Tim Northover	b41b1d4bac	NVPTX: support fpext/fptrunc to and from f16. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213377 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:01:43 +00:00
Tim Northover	7714a60ed1	R600: support fpext/fptrunc operations to and from f16. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213376 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:01:37 +00:00
Tim Northover	1a8bcdb72e	AArch64: support f16 extend/trunc operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213375 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:01:31 +00:00
Tim Northover	e683321270	X86: support fpext/fptrunc operations to and from 16-bit floats. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213374 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:01:25 +00:00
Tim Northover	4413539ee4	ARM: support legalisation of "fptrunc ... to half" operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213373 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 13:01:19 +00:00
Tim Northover	0afed03229	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213372 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 12:41:46 +00:00
Tilmann Scheller	7cd0201f02	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STR instructions. The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. This fixes PR20323. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213369 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 12:05:49 +00:00
Tim Northover	f811b2895f	R600: rename misleading fp16 test. This test is actually going in the opposite direction to what the filename and function name suggested. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213358 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 08:43:30 +00:00
Tim Northover	cc03227446	R600: support f16 -> f64 conversion intrinsic. Unfortunately, we don't seem to have a direct truncation, but the extension can be legally split into two operations so we should support that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213357 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 08:43:24 +00:00
Tim Northover	7bbf5786d7	NVPTX: support direct f16 <-> f64 conversions via intrinsics. Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213356 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 08:30:10 +00:00
Matt Arsenault	a32c319741	R600: Implement TTI:getPopcntSupport The test is just copied from X86, and I don't know of a better way to test it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213351 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 06:07:13 +00:00
Jim Grosbach	c6058e2462	X86: Constant fold converting vector setcc results to float. Since the result of a SETCC for X86 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the SSE code is generated as: LCPI0_0: .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 retq After, the code is improved to: LCPI0_0: .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 retq The cvtdq2ps has been constant folded away and the floating point 1.0f vector lanes are materialized directly via the ModRM operand of andps. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213342 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 00:40:56 +00:00
Jim Grosbach	f4e104f5eb	AArch64: Constant fold converting vector setcc results to float. Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the code is generated as: fcmeq.4s v0, v0, v1 movi.4s v1, #0x1 // Integer splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. scvtf.4s v0, v0 // Convert each lane to f32. ret After, the code is improved to: fcmeq.4s v0, v0, v1 fmov.4s v1, #1.00000000 // f32 splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. ret The svvtf.4s has been constant folded away and the floating point 1.0f vector lanes are materialized directly via fmov.4s. Rather than do the folding manually in the target code, teach getNode() in the generic SelectionDAG to handle folding constant operands of vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do additional constant folding there as well, but I don't have test cases for those operations, so leaving them for another time when it becomes appropriate. rdar://17693791 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213341 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 00:40:52 +00:00
Michael J. Spencer	f2c19f622f	Revert "[x86] Fold extract_vector_elt of a load into the Load's address computation." There's a bug where this can create cycles in the DAG. It will take a bit to fix, so I'm backing it out for now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213339 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-18 00:15:50 +00:00
Kevin Enderby	c61aa0219e	Add printing of Mach-O stabs in llvm-nm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213327 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 22:47:16 +00:00
Alexey Samsonov	30ea42931a	[ASan] Don't instrument load/stores with !nosanitize metadata. This is used to avoid instrumentation of instructions added by UBSan in Clang frontend (see r213291). This fixes PR20085. Reviewed in http://reviews.llvm.org/D4544. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213292 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 18:48:12 +00:00
Justin Holewinski	11ae250ec9	[NVPTX] Improve handling of FP fusion We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213287 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 18:10:09 +00:00
Zinovy Nis	345cb09610	[BUG] Due to a typo introduced in r199933 and r200027 two tests for FMA were never even started. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213283 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 17:14:35 +00:00
Adam Nemet	6ae2941874	[X86] AVX512: Add disassembler support for compressed displacement There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213281 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 17:04:56 +00:00
Adam Nemet	30cced119b	[TableGen] Allow shift operators to take bits<n> Convert the operand to int if possible, i.e. if the value is properly initialized. (I suppose there is further room for improvement here to also peform the shift if the uninitialized bits are shifted out.) With this little change we can now compute the scaling factor for compressed displacement with pure tablegen code in the X86 backend. This is useful because both the X86-disassembler-specific part of tablegen and the assembler need this and TD is the natural sharing place. The patch also adds the missing documentation for the shift and add operator. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213277 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 17:04:27 +00:00
Justin Holewinski	07bc0b6ae6	[NVPTX] Add missing .v4 qualifier on vector store instruction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213276 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 16:58:56 +00:00
Saleem Abdulrasool	7be793b8cd	MC: correct DWARF header for PE/COFF assembly input The header contains an offset to the DWARF abbreviations for the CU. The offset must be section relative for COFF and absolute for others. The non-assembly code path for the DWARF header generation already had the correct emission for the headers. This corrects just the assembly path. Due to the invalid relocation, processing of the debug information would halt previously on the first assembly input as the associated abbreviations would be out of range as they would have the location increased by image base and the section offset. This address PR20332. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213275 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 16:27:44 +00:00
Saleem Abdulrasool	a056166dc2	MC: fix MCAsmInfo usage for windows-itanium Windows itanium uses the GNUCOFF assmebly format, not ELF. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213274 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 16:27:40 +00:00
Justin Holewinski	b26ede07ad	[NVPTX] Flag surface/texture query instructions with IsTexSurfQuery Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213268 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 14:51:33 +00:00
Justin Holewinski	d6663f565c	[NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213256 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 11:59:04 +00:00
Tim Northover	58589cefee	ARM: support direct f16 <-> f64 conversions ARMv8 has instructions to handle it, otherwise a libcall is needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213254 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 11:27:04 +00:00
Justin Holewinski	df50f4150b	[TABLEGEN] Do not crash on intrinsics with names longer than 40 characters Differential Revision: http://reviews.llvm.org/D4537 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213253 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 11:23:29 +00:00
Tim Northover	6c701b9aca	CodeGen: generate single libcall for fptrunc -> f16 operations. Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213252 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 11:12:12 +00:00
Tim Northover	ed05086d61	X86: support double extension of f16 type. x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213251 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-17 11:04:04 +00:00

1 2 3 4 5 ...

25179 Commits