Changes

Summary

  1. [RISCV] Fix misleading formatting and remove a dead getNode call. NFC (details)
  2. [libc++] Re-add transitive includes that had been removed since LLVM 14 (details)
  3. [LiveInterval] Simplify with partition_point. NFC (details)
  4. [RISCV] Zero extend immediate for vget/vset builtins to match vector.insert/extract intrinsics. (details)
  5. [RISCV] Fix the problem of parsing long version numbers (details)
  6. [RISCV] Optimize 2x SELECT for floating-point types (details)
  7. [lldb] [test] Mark test_vCont_supports_t llgs-only (details)
  8. [LoopInterchange] New cost model for loop interchange (details)
  9. [mlir][Vector] Fix reordering of floating point adds during lower of `vector.contract`. (details)
  10. [CoverageMapping] Remove dots from paths inside the profile (details)
  11. Revert "[CoverageMapping] Remove dots from paths inside the profile" (details)
  12. [CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie (details)
  13. [CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie (details)
  14. [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie (details)
  15. Reland "[X86] Support `_Float16` on SSE2 and up" (details)
Commit ea1b86127814aff54b2ab821db060865af920437 by craig.topper
[RISCV] Fix misleading formatting and remove a dead getNode call. NFC
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit de4a57cb21a19179d7be830967e642b868a05a91 by Louis Dionne
[libc++] Re-add transitive includes that had been removed since LLVM 14

This commit re-adds transitive includes that had been removed by
4cd04d1687f1, c36870c8e79c, a83f4b9cda57, 1458458b558d, 2e2f3158c604,
and 489637e66dd3. This should cover almost all the includes that had
been removed since LLVM 14 and that would contribute to breaking user
code when releasing LLVM 15.

It is possible to disable the inclusion of these headers by defining
_LIBCPP_REMOVE_TRANSITIVE_INCLUDES. The intent is that vendors will
enable that macro and start fixing downstream issues immediately. We
can then remove the macro (and the transitive includes) by default in
a future release. That way, we will break users only once by removing
transitive includes in bulk instead of doing it bit by bit a every
release, which is more disruptive for users.

Note 1: The set of headers to re-add was found by re-generating the
        transitive include test on a checkout of release/14.x, which
        provided the list of all transitive includes we used to provide.

Note 2: Several includes of <vector>, <optional>, <array> and <unordered_map>
        have been added in this commit. These transitive inclusions were
        added when we implemented boyer_moore_searcher in <functional>.

Note 3: This is a best effort patch to try and resolve downstream breakage
        caused since branching LLVM 14. I wasn't able to perfectly mirror
        transitive includes in LLVM 14 for a few headers, so I added a
        release note explaining it. To summarize, adding boyer_moore_searcher
        created a bunch of circular dependencies, so we have to break
        backwards compatibility in a few cases.

Differential Revision: https://reviews.llvm.org/D128661
The file was modifiedlibcxx/include/optional
The file was addedlibcxx/cmake/caches/Generic-no-transitive-includes.cmake
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.numeric
The file was modifiedlibcxx/include/typeindex
The file was modifiedlibcxx/include/coroutine
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.typeindex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.list
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.complex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_algorithm
The file was modifiedlibcxx/include/set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.memory
The file was modifiedlibcxx/include/functional
The file was modifiedlibcxx/include/map
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_regex
The file was modifiedlibcxx/utils/ci/run-buildbot
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_unordered_map
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.string
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.bitset
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.future
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_vector
The file was modifiedlibcxx/include/stack
The file was modifiedlibcxx/include/ext/hash_set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ranges
The file was modifiedlibcxx/include/utility
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.coroutine
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.iomanip
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ccomplex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.locale
The file was modifiedlibcxx/include/future
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_unordered_set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.thread
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.mutex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_iterator
The file was modifiedlibcxx/include/array
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_string
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.fstream
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.atomic
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.latch
The file was modifiedlibcxx/include/experimental/unordered_map
The file was modifiedlibcxx/include/string
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ios
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.array
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_forward_list
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.charconv
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.system_error
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.strstream
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ctgmath
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.deque
The file was modifiedlibcxx/include/forward_list
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.iostream
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_deque
The file was modifiedlibcxx/include/string_view
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_utility
The file was modifiedlibcxx/include/bit
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.forward_list
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.unordered_set
The file was modifiedlibcxx/include/atomic
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ostream
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.span
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.queue
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.string_view
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_simd
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.any
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ext_hash_set
The file was modifiedlibcxx/include/charconv
The file was modifiedlibcxx/include/any
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.streambuf
The file was modifiedlibcxx/include/memory
The file was modifiedlibcxx/include/span
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.variant
The file was modifiedlibcxx/include/deque
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_functional
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.format
The file was modifiedlibcxx/include/variant
The file was modifiedlibcxx/include/vector
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.random
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.istream
The file was modifiedlibcxx/include/ext/hash_map
The file was modifiedlibcxx/include/ostream
The file was modifiedlibcxx/include/unordered_map
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.stack
The file was modifiedlibcxx/include/experimental/simd
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.optional
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.filesystem
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.vector
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.barrier
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_map
The file was modifiedlibcxx/include/random
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.codecvt
The file was modifiedlibcxx/include/queue
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.map
The file was modifiedlibcxx/include/thread
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_coroutine
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.utility
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.bit
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_memory_resource
The file was modifiedlibcxx/include/locale
The file was modifiedlibcxx/include/tuple
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.sstream
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.tuple
The file was modifiedlibcxx/include/unordered_set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.scoped_allocator
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.unordered_map
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.iterator
The file was modifiedlibcxx/include/list
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.functional
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.ext_hash_map
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.regex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.valarray
The file was modifiedlibcxx/include/algorithm
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_set
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.shared_mutex
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.experimental_list
The file was modifiedlibcxx/include/mutex
The file was modifiedlibcxx/include/regex
The file was modifiedlibcxx/docs/ReleaseNotes.rst
The file was modifiedlibcxx/utils/ci/buildkite-pipeline.yml
The file was modifiedlibcxx/include/numeric
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.condition_variable
The file was modifiedlibcxx/utils/libcxx/test/params.py
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.semaphore
The file was modifiedlibcxx/include/valarray
The file was modifiedlibcxx/include/iterator
The file was modifiedlibcxx/test/libcxx/transitive_includes/expected.algorithm
The file was modifiedlibcxx/test/libcxx/transitive_includes.sh.cpp
Commit f1e27716cf218d9923235f5f28e663a8da83ce89 by i
[LiveInterval] Simplify with partition_point. NFC
The file was modifiedllvm/lib/CodeGen/LiveInterval.cpp
Commit 17a36c7c40e99aa28d4323698f69845d92d96682 by craig.topper
[RISCV] Zero extend immediate for vget/vset builtins to match vector.insert/extract intrinsics.

The vector.insert/extract intrinsics require an i64 immediate argument.
This fixes a crash on RV32.

Differential Revision: https://reviews.llvm.org/D128624
The file was modifiedclang/test/CodeGen/RISCV/rvv-intrinsics/vget.c
The file was modifiedclang/test/CodeGen/RISCV/rvv-intrinsics/vset.c
The file was modifiedclang/include/clang/Basic/riscv_vector.td
Commit 1919adb19b4a0aeaef7f3b75a90062166b58b261 by sunshaoce
[RISCV] Fix the problem of parsing long version numbers

For example, when parsing Zbpbo0p911, an error will be reported:
"multi-character extensions must be separated by underscores"

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128644
The file was modifiedllvm/lib/Support/RISCVISAInfo.cpp
Commit 1178992c72b002c3b2c87203252c566eeb273cc1 by chunyu
[RISCV] Optimize 2x SELECT for floating-point types

Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
The file was modifiedllvm/test/CodeGen/RISCV/select-optimize-multiple.ll
The file was modifiedllvm/lib/Target/RISCV/RISCVISelLowering.cpp
Commit f1dcc6af30d98cef6d0aa9579148fa223dbb5d7c by mgorny
[lldb] [test] Mark test_vCont_supports_t llgs-only

Sponsored by: The FreeBSD Foundation
The file was modifiedlldb/test/API/tools/lldb-server/TestGdbRemote_vCont.py
Commit b941857b40edd7f3f3a9ec2ec85a26db24739774 by congzhecao
[LoopInterchange] New cost model for loop interchange

This is another attempt to land this patch.

The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.

Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as
the outermost loop, loop1 should be placed one more level inside, and
loop2 one more level inside, etc. What loop cache analysis does is not
only more comprehensive than the current cost model, it is also a "one-shot"
query which means that we only need to query it once during the entire
loop interchange pass, which is better than the current cost model where
we query it every time we check whether it is profitable to interchange
two loops. Thus complexity is reduced, especially after D120386 where we
do more interchanges to get the globally optimal loop access pattern.

Updates made to test cases are mostly minor changes and some
corrections. One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop
cache analysis receives a valid number of cache line size for correct
analysis. Test coverage for loop interchange is not reduced.

Currently we did not completely remove the legacy cost model, but
keep it as fall-back in case the new cost model did not run successfully.
This is because currently we have some limitations in delinearization, which
sometimes makes loop cache analysis bail out. The longer term goal is to
enhance delinearization and eventually remove the legacy cost model
compeletely.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D124926
The file was modifiedllvm/test/Transforms/LoopInterchange/outer-only-reductions.ll
The file was modifiedllvm/lib/Transforms/Scalar/LoopInterchange.cpp
The file was modifiedllvm/test/Transforms/LoopInterchange/not-interchanged-dependencies-1.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/not-interchanged-loop-nest-3.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/currentLimitation.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/perserve-lcssa.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr43326.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr43176-move-to-new-latch.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/profitability.ll
The file was modifiedllvm/test/Transforms/LICM/lnicm.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchange-no-deps.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr43326-ideal-access-pattern.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/not-interchanged-tightly-nested.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/lcssa-preheader.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/debuginfo.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr43473-invalid-lcssa-phis-in-inner-exit.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr45743-move-from-inner-preheader.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/lcssa.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/call-instructions.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/pr48212.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchangeable-outerloop-multiple-indvars.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchanged-loop-nest-3.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/inner-indvar-depend-on-outer-indvar.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/phi-ordering.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchange-flow-dep-outer.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchangeable.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/vector-gep-operand.ll
The file was modifiedllvm/test/Transforms/LoopInterchange/interchange-insts-between-indvar.ll
Commit fa596c6921159af50e69cc3be189d951521a9eb9 by ravishankarm
[mlir][Vector] Fix reordering of floating point adds during lower of `vector.contract`.

Adding the accumulator value after the `vector.contract` changes the
precision of the operation. This makes sure the accumulator is carried
through to `vector.reduce` (and down to LLVM).

Differential Revision: https://reviews.llvm.org/D128674
The file was modifiedmlir/include/mlir/Dialect/Vector/IR/VectorOps.td
The file was modifiedmlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
The file was modifiedmlir/lib/Dialect/Vector/IR/VectorOps.cpp
The file was modifiedmlir/test/Dialect/Vector/invalid.mlir
The file was modifiedmlir/test/Dialect/Vector/vector-contract-transforms.mlir
The file was modifiedmlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
The file was modifiedmlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
Commit d1b098fc825176242afee12b8f9dc14adf5eec51 by phosek
[CoverageMapping] Remove dots from paths inside the profile

We already remove dots from collected paths and path mappings. This
makes it difficult to match paths inside the profile which contain
dots. For example, we would never match /path/to/../file.c because
the collected path is always be normalized to /path/file.c. This
change enables dot removal for paths inside the profile to address
the issue.

Differential Revision: https://reviews.llvm.org/D122750
The file was addedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.covmapping
The file was modifiedllvm/test/tools/llvm-cov/coverage-prefix-map.test
The file was modifiedllvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
The file was addedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.c
The file was modifiedllvm/unittests/ProfileData/CoverageMappingTest.cpp
The file was addedllvm/test/tools/llvm-cov/Inputs/relative_dir/header.h
The file was addedllvm/test/tools/llvm-cov/relative-dir.test
The file was addedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.proftext
Commit 834a38bbcbcf5f507fe5d47174629bf5635fdb2d by phosek
Revert "[CoverageMapping] Remove dots from paths inside the profile"

This reverts commit d1b098fc825176242afee12b8f9dc14adf5eec51 since
it is failing on Windows builders.
The file was modifiedllvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
The file was modifiedllvm/unittests/ProfileData/CoverageMappingTest.cpp
The file was removedllvm/test/tools/llvm-cov/Inputs/relative_dir/header.h
The file was modifiedllvm/test/tools/llvm-cov/coverage-prefix-map.test
The file was removedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.c
The file was removedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.covmapping
The file was removedllvm/test/tools/llvm-cov/Inputs/relative_dir/main.proftext
The file was removedllvm/test/tools/llvm-cov/relative-dir.test
Commit eba5749262d9f1c6754984034c7a81fcd9bc3de6 by wlei
[CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie

Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused.

Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB.

To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`.

The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D125246
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.h
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/include/llvm/Transforms/IPO/SampleContextTracker.h
Commit aa58b7b1e30fbbd9c8c2bf6ba291f1742f53afed by wlei
[CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie

Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127026
The file was modifiedllvm/lib/Transforms/IPO/SampleContextTracker.cpp
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.h
The file was modifiedllvm/tools/llvm-profgen/llvm-profgen.cpp
The file was modifiedllvm/include/llvm/Transforms/IPO/SampleContextTracker.h
Commit 7e86b13c63f200a5649234647433fc563e1159f5 by wlei
[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031
The file was modifiedllvm/tools/llvm-profgen/CSPreInliner.cpp
The file was modifiedllvm/lib/Transforms/IPO/SampleProfile.cpp
The file was modifiedllvm/tools/llvm-profgen/CSPreInliner.h
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.cpp
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/include/llvm/ProfileData/SampleProf.h
The file was modifiedllvm/lib/Transforms/IPO/SampleContextTracker.cpp
The file was modifiedllvm/include/llvm/Transforms/IPO/SampleContextTracker.h
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.h
Commit 527ef8ca981e88a35758c0e4143be6853ea26dfc by phoebe.wang
Reland "[X86] Support `_Float16` on SSE2 and up"

Enable `COMPILER_RT_HAS_FLOAT16` to solve the lit fail.

This is split from D113107 to address #56204 and https://discourse.llvm.org/t/how-to-build-compiler-rt-for-new-x86-half-float-abi/63366

Reviewed By: zahiraam, rjmccall, bkramer

Differential Revision: https://reviews.llvm.org/D128571
The file was modifiedclang/docs/ReleaseNotes.rst
The file was addedclang/test/CodeGen/X86/Float16-arithmetic.c
The file was modifiedclang/test/SemaCXX/Float16.cpp
The file was removedclang/test/CodeGen/X86/avx512fp16-complex.c
The file was addedclang/test/CodeGen/X86/Float16-complex.c
The file was modifiedclang/docs/LanguageExtensions.rst
The file was modifiedclang/lib/Basic/Targets/X86.cpp
The file was modifiedcompiler-rt/test/builtins/CMakeLists.txt
The file was modifiedclang/test/Sema/Float16.c
The file was modifiedclang/test/Sema/conversion-target-dep.c