SuccessChanges

Summary

  1. [flang] Do not create HostAssoc symbols in derived type scopes (details)
  2. [CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling (details)
  3. [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter (details)
  4. [CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe (details)
Commit 166e5c335cbe9f8144a7822ca655dc3352ec9e56 by pklausler
[flang] Do not create HostAssoc symbols in derived type scopes

When needed due to a specification expression in a derived type,
the host association symbols should be created in the surrounding
subprogram's scope instead.

Differential Revision: https://reviews.llvm.org/D94567
The file was modifiedflang/lib/Semantics/resolve-names.cpp
Commit b3154d11bc6dee59e581b731b7561f1ebab3aed6 by wlei
[CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling

This change implements pseudo probe decoding and disassembling for llvm-profgen/CSSPGO. Please see https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**ELF section format**
Please see the encoding patch(https://reviews.llvm.org/D91878) for more details of the format, just copy the example here:

Two section(`.pseudo_probe_desc` and  `.pseudoprobe` ) is emitted in ELF to support pseudo probe.
The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

**Disassembling**
A switch `--show-pseudo-probe` is added to use along with `--show-disassembly` to print disassembly code with pseudo probe directives.

For example:
```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

**Implementation**
- `PseudoProbeDecoder` is added in ProfiledBinary as an infra for the decoding. It decoded the two section and generate two map: `GUIDProbeFunctionMap` stores all the `PseudoProbeFunction` which is the abstraction of a general function. `AddressProbesMap` stores all the pseudo probe info indexed by its address.
- All the inline info is encoded into binary as a trie(`PseudoProbeInlineTree`) and will be constructed from the decoding. Each pseudo probe can get its inline context(`getInlineContext`) by traversing its inline tree node backwards.

Test Plan:
ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92334
The file was addedllvm/tools/llvm-profgen/PseudoProbe.h
The file was modifiedllvm/tools/llvm-profgen/CMakeLists.txt
The file was addedllvm/tools/llvm-profgen/PseudoProbe.cpp
The file was addedllvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfbin
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.h
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.cpp
The file was addedllvm/test/tools/llvm-profgen/pseudoprobe-decoding.test
Commit 414930b91bfd4196c457120932a1dbaf26db711d by wlei
[CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter

As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map.  This change implements the hashable interface(`Hashable`) and the unified base class for them to have better extensibility and reusability.

Currently perf trace sample and sample counter with context implemented this `Hashable` and  the class hierarchy is like:

```
| Hashable
           | PerfSample
                          | HybridSample
                          | LBRSample
           | ContextKey
                          | StringBasedCtxKey
                          | ProbeBasedCtxKey
                          | CallsiteBasedCtxKey
           | ...
```

- Class specifying `Hashable` should implement `getHashCode` and `isEqual`. Here we make `getHashCode` a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding
- `isEqual` is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function.
- Added `PerfSample` and `ContextKey` as base class for perf sample and counter context key, leveraging llvm-style RTTI for this.
- Added `StringBasedCtxKey` class extending  `ContextKey` to use string as context id.
- Refactor `AggregationCounter` to take all kinds of `PerfSample` as key
- Refactor `ContextSampleCounter` to take all kinds of `ContextKey` as key
- Other refactoring work:
- Create a wrapper class `SampleCounter` to wrap `RangeCounter` and `BranchCounter`
- Hoist `ContextId` and `FunctionProfile` out of `populateFunctionBodySamples` and `populateFunctionBoundarySamples` to reuse them in ProfileGenerator

Differential Revision: https://reviews.llvm.org/D92584
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/tools/llvm-profgen/PerfReader.cpp
The file was modifiedllvm/tools/llvm-profgen/PerfReader.h
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.h
Commit c681400b25a66ae56b74cc1f11ffdc15190a65b8 by wlei
[CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe

This change extends virtual unwinder to support pseudo probe in llvm-profgen. Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**Implementation**

- Added `ProbeBasedCtxKey` derived from `ContextKey` for sample counter aggregation. As we need string splitting to infer the profile for callee function, string based context introduces more string handling overhead, here we just use probe pointer based context.
- For linear unwinding, as inline context is encoded in each pseudo probe, we don't need to go through each instruction to extract range sharing same inliner. So just record the range for the context.
- For probe based context, we should ignore the top frame probe since it will be extracted from the address range. we defer the extraction in `ProfileGeneration`.
- Added `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- Some helper function to get pseduo probe info(call probe, inline context) from profiled binary.
- Added regression test for unwinder's output

The pseudo probe based profile generation will be in the upcoming patch.

Test Plan:

ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92896
The file was modifiedllvm/tools/llvm-profgen/PerfReader.cpp
The file was modifiedllvm/tools/llvm-profgen/PseudoProbe.cpp
The file was modifiedllvm/tools/llvm-profgen/PseudoProbe.h
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.cpp
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.h
The file was addedllvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
The file was modifiedllvm/tools/llvm-profgen/ProfiledBinary.cpp
The file was modifiedllvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript
The file was modifiedllvm/tools/llvm-profgen/ProfileGenerator.h
The file was modifiedllvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
The file was addedllvm/test/tools/llvm-profgen/Inputs/noinline-cs-pseudoprobe.perfscript
The file was modifiedllvm/tools/llvm-profgen/PerfReader.h
The file was addedllvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfscript
The file was addedllvm/test/tools/llvm-profgen/Inputs/noinline-cs-pseudoprobe.perfbin
The file was addedllvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test