Changes

Summary

  1. [flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1. (details)
  2. [flang] Don't discard lower bounds of implicit-shape named constants (details)
  3. [BOLT][UTILS] Usability improvements for nfc-check-setup (details)
  4. [flang][runtime] Signal new I/O error on floating-point input overflow (details)
  5. [MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. (details)
  6. Remove unneeded cl::ZeroOrMore for cl::opt options (details)
  7. Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." (details)
Commit f3278e0f3cfeb7cd1519b0b9154ccdc28f3410b5 by pklausler
[flang][runtime] Ensure that 0. <= RANDOM_NUMBER() < 1.

It was possible for RANDOM_NUMBER() to return 1.0.

Differential Revision: https://reviews.llvm.org/D127020
The file was modifiedflang/runtime/random.cpp
Commit 08c6a323813d44f4cb382e8f1c53ce48d32f5698 by pklausler
[flang] Don't discard lower bounds of implicit-shape named constants

F18 preserves lower bounds of explicit-shape named constant arrays, but
failed to also do so for implicit-shape named constants.  Fix.

Differential Revision: https://reviews.llvm.org/D127021
The file was modifiedflang/lib/Evaluate/check-expression.cpp
Commit b346af6d4497fd3b2454ae522f2f7694827a4570 by aaupov
[BOLT][UTILS] Usability improvements for nfc-check-setup

# Stash local changes before checkout.
# Print a message that the source repository revision has been changed, with
  instructions to switch back.
# Make the script executable.
# Print sample instructions how to run bolt tests.
# Assume that llvm-bolt-wrapper script is in the same source directory.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D126941
The file was modifiedbolt/utils/nfc-check-setup.py
Commit 9c54d76251163ecf5c56ce984542d3bcf36a3c16 by pklausler
[flang][runtime] Signal new I/O error on floating-point input overflow

Besides raising the IEEE floating-point overflow exception, treat
a floating-point overflow on input as an I/O error catchable with
ERR=, IOSTAT=, &/or IOMSG=.

Differential Revision: https://reviews.llvm.org/D127022
The file was modifiedflang/include/flang/Runtime/iostat.h
The file was modifiedflang/runtime/iostat.cpp
The file was modifiedflang/runtime/edit-input.cpp
Commit bcfc0a9051014437b55ab932d9aca5ecdca6776b by csigg
[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration.

This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because:

- it performs less Newton iterations
- it avoids the slow path for e.g. denormals
- it allows reuse of the reciprocal for multiple divisions by the same divisor

Test program:
```
#include <stdio.h>
#include "cuda_fp16.h"

// This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below
// and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values.
__device__ half hdiv_newton(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float rcp;
  asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb));

  float result = fa * rcp;
  auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000;
  if (exponent != 0 && exponent != 0x7f800000) {
    float err = __fmaf_rn(-fb, result, fa);
    result = __fmaf_rn(rcp, err, result);
  }

  return __float2half(result);
}

// Surprisingly, this is faster than CUDA's own __hdiv.
__device__ half hdiv_promote(half a, half b) {
  return __float2half(__half2float(a) / __half2float(b));
}

// This is an approximation that is accurate up to 1 ulp.
__device__ half hdiv_approx(half a, half b) {
  float fa = __half2float(a);
  float fb = __half2float(b);

  float result;
  asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb));
  return __float2half(result);
}

__global__ void CheckCorrectness() {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  half x = reinterpret_cast<const half&>(i);
  for (int j = 0; j < 65536; ++j) {
    half y = reinterpret_cast<const half&>(j);
    half d1 = hdiv_newton(x, y);
    half d2 = hdiv_promote(x, y);
    auto s1 = reinterpret_cast<const short&>(d1);
    auto s2 = reinterpret_cast<const short&>(d2);
    if (s1 != s2) {
      printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n",
             __half2float(x), i, __half2float(y), j, __half2float(d1), s1,
             __half2float(d2), s2);
      //__trap();
    }
  }
}

__device__ half dst;

__global__ void ProfileBuiltin(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = x / x;
  }
  dst = x;
}

__global__ void ProfilePromote(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_promote(x, x);
  }
  dst = x;
}

__global__ void ProfileNewton(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_newton(x, x);
  }
  dst = x;
}

__global__ void ProfileApprox(half x) {
  #pragma unroll 1
  for (int i = 0; i < 10000000; ++i) {
    x = hdiv_approx(x, x);
  }
  dst = x;
}

int main() {
  CheckCorrectness<<<256, 256>>>();
  half one = __float2half(1.0f);
  ProfileBuiltin<<<1, 1>>>(one);  // 1.001s
  ProfilePromote<<<1, 1>>>(one);  // 0.560s
  ProfileNewton<<<1, 1>>>(one);   // 0.508s
  ProfileApprox<<<1, 1>>>(one);   // 0.304s
  auto status = cudaDeviceSynchronize();
  printf("%s\n", cudaGetErrorString(status));
}
```

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D126158
The file was modifiedmlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.h
The file was addedmlir/test/Dialect/LLVMIR/optimize-for-nvvm.mlir
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was addedmlir/include/mlir/Dialect/LLVMIR/Transforms/OptimizeForNVVM.h
The file was addedmlir/lib/Dialect/LLVMIR/Transforms/OptimizeForNVVM.cpp
The file was modifiedmlir/test/Dialect/LLVMIR/nvvm.mlir
The file was modifiedmlir/test/Target/LLVMIR/nvvmir.mlir
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
Commit 36c7d79dc4c114728b5f003bf48cd7a41bf932a4 by i
Remove unneeded cl::ZeroOrMore for cl::opt options

Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e.
This commit handles options where cl::ZeroOrMore is more than one line below
cl::opt.
The file was modifiedbolt/lib/Rewrite/RewriteInstance.cpp
The file was modifiedpolly/lib/Support/RegisterPasses.cpp
The file was modifiedllvm/lib/CodeGen/MachinePipeliner.cpp
The file was modifiedllvm/lib/Target/Hexagon/HexagonVectorLoopCarriedReuse.cpp
The file was modifiedpolly/lib/Transform/Canonicalization.cpp
The file was modifiedllvm/lib/Analysis/InlineCost.cpp
The file was modifiedpolly/lib/CodeGen/IslNodeBuilder.cpp
The file was modifiedpolly/lib/Analysis/DependenceInfo.cpp
The file was modifiedpolly/lib/Transform/ScheduleOptimizer.cpp
The file was modifiedllvm/lib/Transforms/Utils/CodeLayout.cpp
The file was modifiedllvm/tools/llvm-pdbutil/llvm-pdbutil.cpp
The file was modifiedllvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
The file was modifiedbolt/lib/RuntimeLibs/HugifyRuntimeLibrary.cpp
The file was modifiedllvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
The file was modifiedpolly/lib/CodeGen/BlockGenerators.cpp
The file was modifiedllvm/examples/OrcV2Examples/LLJITWithObjectLinkingLayerPlugin/LLJITWithObjectLinkingLayerPlugin.cpp
The file was modifiedbolt/lib/Passes/IndirectCallPromotion.cpp
The file was modifiedllvm/lib/Target/Hexagon/HexagonFrameLowering.cpp
The file was modifiedllvm/lib/ProfileData/ProfileSummaryBuilder.cpp
The file was modifiedllvm/lib/Transforms/IPO/PartialInlining.cpp
The file was modifiedbolt/lib/Rewrite/BinaryPassManager.cpp
The file was modifiedpolly/lib/Analysis/PolyhedralInfo.cpp
The file was modifiedllvm/tools/llvm-lto/llvm-lto.cpp
The file was modifiedpolly/lib/Transform/ManualOptimizer.cpp
The file was modifiedpolly/lib/Transform/MatmulOptimizer.cpp
The file was modifiedllvm/lib/Target/Hexagon/HexagonGenInsert.cpp
The file was modifiedllvm/lib/Transforms/Instrumentation/PGOMemOPSizeOpt.cpp
The file was modifiedllvm/tools/llvm-profgen/llvm-profgen.cpp
The file was modifiedbolt/lib/Passes/TailDuplication.cpp
The file was modifiedllvm/tools/llvm-cov/CodeCoverage.cpp
The file was modifiedllvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
The file was modifiedllvm/tools/llc/llc.cpp
The file was modifiedllvm/lib/Target/AArch64/AArch64StackTagging.cpp
The file was modifiedllvm/tools/llvm-lto2/llvm-lto2.cpp
The file was modifiedpolly/lib/CodeGen/ManagedMemoryRewrite.cpp
The file was modifiedbolt/lib/Utils/CommandLineOpts.cpp
The file was modifiedpolly/lib/Analysis/ScopBuilder.cpp
The file was modifiedpolly/lib/CodeGen/IslAst.cpp
The file was modifiedpolly/lib/Support/SCEVAffinator.cpp
The file was modifiedpolly/lib/Analysis/ScopInfo.cpp
The file was modifiedllvm/lib/Support/Debug.cpp
The file was modifiedllvm/tools/llvm-libtool-darwin/llvm-libtool-darwin.cpp
The file was modifiedpolly/lib/Analysis/ScopGraphPrinter.cpp
The file was modifiedpolly/lib/CodeGen/CodeGeneration.cpp
The file was modifiedllvm/lib/Transforms/Scalar/LoopFuse.cpp
The file was modifiedpolly/lib/Analysis/ScopDetection.cpp
The file was modifiedpolly/lib/CodeGen/PPCGCodeGeneration.cpp
Commit 369ce54bb302f209239b8ebc77ad824add9df089 by joker.eph
Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration."

This reverts commit bcfc0a9051014437b55ab932d9aca5ecdca6776b.

The build is broken with shared library enabled.
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
The file was modifiedmlir/test/Target/LLVMIR/nvvmir.mlir
The file was modifiedutils/bazel/llvm-project-overlay/mlir/BUILD.bazel
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td
The file was removedmlir/test/Dialect/LLVMIR/optimize-for-nvvm.mlir
The file was modifiedmlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt
The file was removedmlir/lib/Dialect/LLVMIR/Transforms/OptimizeForNVVM.cpp
The file was removedmlir/include/mlir/Dialect/LLVMIR/Transforms/OptimizeForNVVM.h
The file was modifiedmlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.h
The file was modifiedmlir/test/Dialect/LLVMIR/nvvm.mlir