  1. [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen

    Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).

  2. Introduce experimental generic intrinsics for horizontal vector reductions.

    - This change allows targets to opt-in to using them instead of the log2
      shufflevector algorithm.
    - The SLP and Loop vectorizers have the common code to do shuffle reductions
      factored out into LoopUtils, and now have a unified interface for generating
      reductions regardless of the preference of the target. LoopUtils now uses TTI
      to determine what kind of reductions the target wants to handle.
    - For CodeGen, basic legalization support is added.

