1. [NFC][SimplifyCFG] FoldBranchToCommonDest: add one more test with PHI (details)
  2. Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions (details)
Commit 40188063296f3a55f7816b20f518fa99977145bb by lebedev.ri
[NFC][SimplifyCFG] FoldBranchToCommonDest: add one more test with PHI

This is the problematic pattern i didn't think of,
that lead to revert of 2245fb8aaa1c1f85f53f7b19a1ee3ac69b1a1dfe
in f3abd54958ab90ba7c100d3fa936a3ce0dd2ad04.
The file was modifiedllvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll
Commit b33fbbaa34f0fe9fb16789afc72ae424c1825b69 by lebedev.ri
Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions

This was orginally committed in 2245fb8aaa1c1f85f53f7b19a1ee3ac69b1a1dfe.
but was immediately reverted in f3abd54958ab90ba7c100d3fa936a3ce0dd2ad04
because of a PHI handling issue.

Original commit message:

1. It doesn't make sense to enforce that the bonus instruction
   is only used once in it's basic block. What matters is
   whether those user instructions fit within our budget, sure,
   but that is another question.
2. It doesn't make sense to enforce that said bonus instructions
   are only used within their basic block. Perhaps the branch
   condition isn't using the value computed by said bonus instruction,
   and said bonus instruction is simply being calculated
   to be used in successors?

So iff we can clone bonus instructions, to lift these restrictions,
we just need to carefully update their external uses
to use the new cloned instructions.

Notably, this transform (even without this change) appears to be
poison-unsafe as per alive2, but is otherwise (including the patch) legal.

We don't introduce any new PHI nodes, but only "move" the instructions
around, i'm not really seeing much potential for extra cost modelling
for the transform, especially since now we allow at most one such
bonus instruction by default.

This causes the fold to fire +11.4% more (13216 -> 14725)
as of vanilla llvm test-suite + RawSpeed.

The motivational pattern is IEEE-754-2008 Binary16->Binary32
extension code:
^ that should be a switch, but it is not now:
That being said, even thought this seemed like this would fix it:
apparently that fold is happening somewhere else afterall,
so something else also has a similar 'artificial' restriction.
The file was modifiedllvm/test/CodeGen/Thumb2/mve-float32regloops.ll
The file was modifiedllvm/test/Transforms/LoopUnroll/peel-loop-inner.ll
The file was modifiedllvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll
The file was modifiedllvm/test/CodeGen/Thumb2/mve-postinc-lsr.ll
The file was modifiedllvm/lib/Transforms/Utils/SimplifyCFG.cpp
The file was modifiedllvm/test/CodeGen/Thumb2/mve-float16regloops.ll