Changes

Summary

  1. Add BOLT builder and worker configurations (details)
Commit 8952ec3c6b08322039e53f08e4d24622e2fc4ff7 by gkistanova
Add BOLT builder and worker configurations

Differential revision: https://reviews.llvm.org/D116062
The file was modifiedbuildbot/osuosl/master/config/builders.py (diff)
The file was modifiedbuildbot/osuosl/master/config/workers.py (diff)

Summary

  1. Fix crash in patchELFPHDRTable when no functions are modified. (details)
  2. Shorten instructions if possible. (details)
  3. Move debug-handling code into DWARFRewriter (NFC). (details)
  4. Add BinaryContext::getSectionForAddress() (details)
  5. Add movabs -> mov shortening optimization.  Add peephole optimization pass that does instruction shortening. (details)
  6. Loop detection for BOLT's CFG. (details)
  7. Simplification of loads from read-only data sections. (details)
  8. Factor out instruction printing and size computation. (details)
  9. Identical Code Folding (ICF) pass (details)
  10. Basic block clustering algorithm for minimizing branches. (details)
  11. CFG editing functions (details)
  12. Add printing support for indirect tail calls. (details)
  13. Fix for correct disassembling of conditional tail calls. (details)
  14. Add MCInst annotation mechanism to MCInstrAnalysis class. (details)
  15. More aggressive inlining pass (details)
  16. Refactoring. Mainly NFC. (details)
  17. More refactoring work. (details)
  18. Add additional info to BOLT graphviz CFG dumps. (details)
  19. Check if operands are immediates before trying shortening. (details)
  20. Compute ClusterEdges only when necessary. (details)
  21. Write padding for .eh_frame_hdr to a file. (details)
  22. Handling for indirect tail calls. (details)
  23. Emit remember_state CFI in the same code region as restore_state. (details)
  24. Add verbosity level and clean up stream usage. (details)
  25. Fix tail call conversion and test cases. (details)
  26. Inlining fixes/enhancements (details)
  27. BOLT: Make most command line options ZeroOrMore. (details)
  28. Make BinaryFunction::fixBranches() more flexible and support CFG updates. (details)
  29. Add dyno stats to BOLT. (details)
  30. Rewrite SCTC pass to do UCE and make it the last optimization pass. (details)
  31. BOLT: Add per pass dyno stats + factor out post pass printing. (details)
  32. Use BB.getNumNonPseudos() in more places. (details)
  33. BOLT: Remove double jumps peephole. (details)
  34. Fix switch table detection. Disassemble all instructions in non-simple functions. (details)
  35. Add cluster randomization layout algorithm. (details)
  36. BOLT: Clean up interface between BinaryFunction and BinaryBasicBlock. (details)
  37. Add experimental jump table support. (details)
  38. Add dyno stats for jump tables. (details)
  39. Fix issue with zero-size duplicate function symbols. (details)
  40. Add PLT dyno stats. (details)
  41. Do no collect dyno stats on functions with stale profile. (details)
  42. BOLT: Add feature to sort functions by dyno stats. (details)
  43. BOLT: Refactoring BinaryFunction interface. (details)
  44. BOLT: Add ud2 after indirect tailcalls. (details)
  45. Support for splitting jump tables. (details)
  46. BOLT: Remove restrictions on unreachable code elimination (details)
  47. Support for PIC-style jump tables. (details)
  48. New function discovery and support for multiple entries. (details)
  49. Fix EH for cold fragments that we fail to write. (details)
  50. Disable processing of functions with EVEX-encoded instructions (AVX-512). (details)
  51. Support DWARF expressions in CFI instructions (details)
  52. Fix DW_CFA_def_cfa CFI duping in output binary (details)
  53. Another EH fix for cold fragments of functions that we fail to write. (details)
  54. Generate .eh_frame_hdr based on contents of .eh_frame's. (details)
  55. Relocate old .eh_frame section next to the new one. (details)
  56. Detect default CFI frame instructions for the target (details)
  57. Add stats for "-optimize-bodyless-functions". (details)
  58. Fix clang warning about switch covering all enums (details)
  59. Remove pessimizing std::move (details)
  60. Avoid const_iterator on std::vector::emplace (details)
  61. Fix memory leak in DWARFRewriter (details)
  62. Fix undefined behavior in DebugInfo (details)
  63. Remove unused private var in CFIReaderWriter (NFC) (details)
  64. Add option to time passes (details)
  65. Fix typo in time passes (details)
  66. BOLT: Use profiling info to control branch simplification optimization. (details)
  67. Add a frame optimization pass (details)
  68. Relocations support for BOLT. (details)
  69. ICF improvements. (details)
  70. [ICF] Don't re-fold functions in non-relocation mode. (details)
  71. [BOLT] Fix debug info update for zero-length ranges. (details)
  72. [BOLT] Report stale functions' percentage wrt all profiled functions. (details)
  73. Cover RSP-indexed accesses in frame optimization (details)
  74. [BOLT] Support overwriting jump tables in-place. (details)
  75. Indirect call promotion optimization. (details)
  76. [BOLT] Update section names in output file. (details)
  77. [BOLT] Detect and prevent re-optimization attempts. (details)
  78. [BOLT] Reject sanitized binaries. (details)
  79. [BOLT] Skip disassembly of padding at function end. (details)
  80. [BOLT] Emit short tail calls in relocation mode. (details)
  81. [BOLT] Add support for *GOTPCRELX relocation type. (details)
  82. [BOLT] Move BOLT passes under Passes subdirectory (NFC). (details)
  83. [BOLT] Fix -jump-tables=basic in relocation mode. (details)
  84. [BOLT] Don't set code skew in relocations mode. (details)
  85. [BOLT] Strip 'repz' prefix from 'repz retq'. (details)
  86. Fix warnings when compiling with clang (NFC) (details)
  87. [BOLT] New CFI handling policy. (details)
  88. [BOLT] Detect and handle __builtin_unreachable(). (details)
  89. [BOLT] Detect unmarked data in text. (details)
  90. [BOLT] Update tests (details)
  91. [BOLT] Fix verbose output. (details)
  92. [BOLT] Fix gcc5 build. (details)
  93. Fix hfsort callgraph stats, add hfsort test. (details)
  94. [BOLT] Do not process empty functions. (details)
  95. [BOLT] Improve dynostats output. (details)
  96. [BOLT] Do not overwrite starting address in non-relocation mode. (details)
  97. [BOLT] Add option to print only specific functions. (details)
  98. [BOLT] Don't allow non-symbol targets in ICP (details)
  99. Change dynostats dynamic instruction count policy (details)
  100. [BOLT] Issue error in relocs mode if input is lacking relocations. (details)
  101. [BOLT] Organize options in categories for pretty printing (near NFC). (details)
  102. [BOLT] Fix debug info update for inlining. (details)
  103. [BOLT] Detect and reject binaries built for coverage. (details)
  104. [BOLT] Fix double jump peephole, remove useless conditional branches. (details)
  105. [BOLT] Fix branch count in removeDuplicateConditionalSuccessor(). (details)
  106. [BOLT] Relocation support for non-allocatable sections. (details)
  107. [BOLT] Enable SCTC by default. (details)
  108. [BOLT] Don't abort on processing binaries with .gdb_index section (details)
  109. [BOLT] Fix branch data for __builtin_unreachable(). (details)
  110. [BOLT] Update function address and size in relocation mode. (details)
  111. [BOLT] Update .gdb_index section. (details)
  112. [BOLT] Support adding new non-allocatable sections. (details)
  113. [BOLT] Add option to keep/generate .debug_aranges. (details)
  114. [BOLT] Add jump table support to ICP (details)
  115. [BOLT] Fix debug info for input with continuous range. (details)
  116. [BOLT] Add dataflow infrastructure (details)
  117. [BOLT] Rework debug info processing. (details)
  118. Don't add useless uncond branch to fallthroughs when running SCTC. (details)
  119. [BOLT] Optimize jump tables with hot entries (details)
  120. Add .bolt_info notes section containing BOLT revision and command line args. (details)
  121. [BOLT] Fix C++ ABI function alignment. (details)
  122. [BOLT] Fix no-assertions build. (details)
  123. Add option to generate function order file. (details)
  124. [BOLT] Emit sorted DWARF ranges and location lists. (details)
  125. [BOLT] Fix SCTC again. (details)
  126. [BOLT] Update addresses for DW_TAG_GNU_call_site and DW_TAG_label. (details)
  127. [BOLT] Fix SCTC again again. (details)
  128. HFSort/call graph refactoring (details)
  129. [BOLT] Do not assert on an empty location list. (details)
  130. [BOLT] More CG refactoring (details)
  131. [BOLT] Make hfsort+ deterministic and add test case (details)
  132. [BOLT] Fix misc issues in relocation mode. (details)
  133. [BOLT] Add shrink wrapping pass (details)
  134. Split FrameAnalysis and improve LivenessAnalysis (details)
  135. [BOLT] Fix ELF inter-section references (details)
  136. [BOLT] Fix hfsort+ crash when no perf data is present. (details)
  137. [BOLT] Only print stats when requested (details)
  138. Fix dynostats for conditional tail calls (details)
  139. [BOLT] Fix hfsort+ caching mechanism (details)
  140. [BOLT] Expand BOLT report for basic block ordering (details)
  141. [BOLT] Fix SCTC execution count assertion (details)
  142. Normalize Clusters Twice (details)
  143. [BOLT] More HFSort+ refactoring (details)
  144. BinaryFunction.h: Clarify commet for getSize(), add getNumNonPseudos() (details)
  145. [BOLT] Bail frame analysis on PUSHes escaping vars (details)
  146. [BOLT] Make function reordering more robust with stale data. (details)
  147. [BOLT] Set local symbols in relocation mode to zero (details)
  148. [BOLT] Call Distance Metric (details)
  149. [BOLT] Fix shrink-wrapping bugs (details)
  150. [BOLT] Improved Jump-Distance Metric (details)
  151. [BOLT] Add cold symbols to the symbol table (details)
  152. get analysis information of functions (details)
  153. add: get function score to find hot functions refine the dumped csv format (details)
  154. Recognize AArch64 as a valid input (details)
  155. [BOLT] Improve Jump-Distance Metric -- Consider Function Execution Count (details)
  156. Reformat the register strings in the output so Stoke can parse without preprocessing. (details)
  157. [BOLT] Fix reading LSDA address for PIC code (details)
  158. [BOLT] Better match LTO functions profile. (details)
  159. [BOLT] Disable last basic block assertion. (details)
  160. [BOLT] Fix SCTC issue with hot-cold split (details)
  161. Fix profiling for functions with multiple entry points (details)
  162. [BOLT] Fix printing of dyno-stats (details)
  163. [BOLT] PLT optimization (details)
  164. [BOLT] Support PIC-style exception tables (details)
  165. [BOLT] Fix bug in SCTC (details)
  166. [BOLT] Ignore TLS relocations types (details)
  167. [BOLT] Introduce non-LBR mode (details)
  168. [BOLT] Fix frameopt=all for gcc (details)
  169. [BOLT] Fix issue with exception handlers splitting (details)
  170. [BOLT] Fix SCTC bug (details)
  171. [BOLT] Integrate perf2bolt into llvm-bolt (details)
  172. Fix SCTC bug when two pred/succ BB are in a loop. (details)
  173. [BOLT] Ignore Clang LTO artifact file symbol (details)
  174. [PERF2BOLT] Improve user messages about profiling stats (details)
  175. [PERF2BOLT] Fix aggregator wrt new output format of perf (details)
  176. fixing sizes (details)
  177. [PERF2BOLT] Check build-ids of binaries when aggregating (details)
  178. [BOLT] Write bolt info according to ELF spec (details)
  179. [BOLT] Fix bolt_info ELF note (details)
  180. [BOLT] Use 32 as the default max bytes for function alignment (details)
  181. [BOLT] Create symbol table entries under -hot-text if they did not exist (details)
  182. [BOLT] Change function order file format for linker script (details)
  183. [BOLT] Fix function order output option (details)
  184. updating cache metrics (details)
  185. [BOLT][Refactoring] Make CTC first class operand, etc. (details)
  186. [BOLT] Account for FDE functions when calculating max function size (details)
  187. [BOLT] Add ability to specify custom printers for annotations. (details)
  188. [BOLT][Refactoring] Get rid of TailCallTerminatedBlocks, etc. (details)
  189. using offsets for CG (details)
  190. [BOLT][Refactoring] Change landing pads handling (details)
  191. [BOLT] Add value profiling to BOLT (details)
  192. [BOLT] Refactor branch analysis code. (details)
  193. [BOLT][Refactoring] Move basic block reordering to BinaryPasses (details)
  194. [BOLT] Always call fixBranches in relocation mode. (details)
  195. [BOLT] Fix BOLT build (details)
  196. improving hfsort+ algorithm (details)
  197. [BOLT-AArch64] Support rewriting bzip2 (details)
  198. [BOLT-AArch64] Support reordering bzip2 no relocs (details)
  199. [BOLT-AArch64] Support relocation mode for bzip2 (details)
  200. [BOLT] Fix implementation for TSP solution (details)
  201. [BOLT-AArch64] Support reordering spec06 gcc relocs (details)
  202. [BOLT] Custom function alignment (details)
  203. [BOLT] Fix segfault in debug print (details)
  204. [BOLT] Fix N-1'th sctc bug. (details)
  205. [BOLT] Fix ASAN bugs (details)
  206. [BOLT] Add finer control of peephole pass. (details)
  207. [BOLT] Fix handling of RememberState CFI (details)
  208. speeding up caches for hfsort+ (details)
  209. [BOLT] Improve ICP for virtual method calls and jump tables using value profiling. (details)
  210. [RFC] [BOLT] Use iterators for MC branch/call analysis code. (details)
  211. [PERF2BOLT] Fix aggregator wrt traces with REP RET (details)
  212. [BOLT] Add timers for non-optimization related phases. (details)
  213. [BOLT] Fix icp-top-callsites option, remove icp-always-on. (details)
  214. [BOLT] Fix bug in shortening peephole. (details)
  215. [BOLT] Use getNumPrimeOperands in shortenInstruction. (details)
  216. Introduce pass to reduce jump tables footprint (details)
  217. a new i-cache metric (details)
  218. [BOLT] Fix ICP nested jump table handling and general stats. (details)
  219. [BOLT] Add REX prefix rebalancing pass (details)
  220. [BOLT] Options to facilitate debugging (details)
  221. [BOLT] Consistent DFS ordering for landing pads (details)
  222. [BOLT] Automatically detect and use relocations (details)
  223. [BOLT] Major overhaul of profiling in BOLT (details)
  224. debug (details)
  225. [BOLT] Fix debugging derp (details)
  226. [BOLT] Fix -simplify-rodata-loads wrt data chunks with relocs (details)
  227. [BOLT] Do not assign a LP to tail calls (details)
  228. [BOLT] a new block reordering algorithm (details)
  229. [BOLT-AArch64] Support SPEC17 programs and organize AArch64 tests (details)
  230. [BOLT] New profile format (details)
  231. [BOLT-AArch64] Support large test binary (details)
  232. [BOLT] Refactoring - add BinarySection class (details)
  233. [BOLT] Refactor relocation analysis code. (details)
  234. [BOLT] faster cache+ implementation (details)
  235. [BOLT] Do not assert on bad data (details)
  236. [BOLT] Handle multiple sections with the same name (details)
  237. [BOLT] Fix profile for multi-entry functions (details)
  238. Handle types CU list in updateGdbIndexSection (details)
  239. [BOLT] Fix lookup of non-allocatable sections in RewriteInstance (details)
  240. [BOLT] Fix branch info stats after SCTC (details)
  241. [BOLT] Reduce the usage of "Offset" annotation (details)
  242. [BOLT] Fix memory regression (details)
  243. [BOLT rebase] Rebase fixes on top of LLVM Feb2018 (details)
  244. [BOLT] Limited "support" for AVX-512 (details)
  245. [BOLT] Improved function profile matching (details)
  246. [BOLT] Fixes for new profile (details)
  247. Cache+ speed, reduce mallocs (details)
  248. [BOLT] Fix jump table placement for non-simple functions (details)
  249. [BOLT] Refactoring of section handling code (details)
  250. [BOLT/LSDA] Fix alignment (details)
  251. [BOLT] Fix ShrinkWrapping bugs and enable testing (details)
  252. [BOLT] Refactor global symbol handling code. (details)
  253. [BOLT] Disassemble all functions before building CFGs (details)
  254. [BOLTDIFF] Add a tool to audit performance differences (details)
  255. [BOLT] Fix remove-unused-stores in rebased bolt (details)
  256. [BOLT] Fix ORC to properly update symbols (details)
  257. [BOLT] Introduce MCPlus layer (details)
  258. [BOLT] Fix assertion when setting size of jump table symbol (details)
  259. [BOLT] Fix assertion when building test binary (details)
  260. removing compact-mode (details)
  261. [BOLT] improvements for CFG construction (details)
  262. [BOLT][Refactoring] Isolate changes to MC layer (details)
  263. [BOLT] Fix iterator issue (details)
  264. [BOLT] Fix relocation verification (details)
  265. [BOLT] Fix CFG in BinaryFunction::eraseInvalidBBs() (details)
  266. [BOLT] Use MCPlus::getNumPrimeOperands() (details)
  267. [BOLT] Improve annotations format and processing (details)
  268. [BOLT-AArch64] Fix AArch64 port - make it work with hhvm (details)
  269. [merge-fdata] Rewrite merge-fdata to use YAML format (details)
  270. [BOLT][Cleanup] Remove branch history (details)
  271. [BOLT] Support for non-LBR profile in YAML (details)
  272. [BOLT] Report when operating in relocation mode (details)
  273. [BOLT] Fix tests (details)
  274. [BOLT] Restore macro-fusion optimization (details)
  275. [BOLT-AArch64] Fix BOLT build on AArch64 (details)
  276. [PERF2BOLT] Add support for non-LBR aggregation (details)
  277. [BOLT] improving cache metrics (details)
  278. [BOLT-AArch64] Fix -icf, -use-old-text and -update-debug-sections (details)
  279. [BOLT] Fix crash while writing new profile (details)
  280. [BOLT] Getting open-source ready (details)
  281. [BOLT] Align basic blocks based on execution count (details)
  282. [BOLT] Static data reordering pass. (details)
  283. adjusting cache stats for non-simple functions (details)
  284. [BOLT] Fix dyno-stats for PLT calls (details)
  285. [BOLT] Add option to ignore function hash in profile (details)
  286. [BOLT] Properly handle non-standard function refs (details)
  287. [BOLT] Add option to print functions with bad layout (details)
  288. [PERF2BOLT] Improve file matching (details)
  289. [BOLT][NFC] Move ICF pass into a separate file (details)
  290. [BOLT-AArch64] Detect linker stubs and address them (details)
  291. [BOLT] Initial support for memcpy() inlininig (details)
  292. [BOLT] merging cold basic blocks to reduce #jumps (details)
  293. [BOLT] Hash anonymous symbol names (details)
  294. [Bolt] Reduce verbosity while reporting hash collisions (details)
  295. [Bolt][NFC] Change capitalization s/BOLT/Bolt/g (details)
  296. Revert "[Bolt][NFC] Change capitalization s/BOLT/Bolt/g" (details)
  297. [BOLT] Update llvm.patch (details)
  298. [BOLT] Add a user friendly error reporting message (details)
  299. [BOLT] Fix support for PIC jump tables (details)
  300. [merge-fdata] Support legacy/non-YAML profile format (details)
  301. [BOLT] Add initial bolt-only test infra (details)
  302. [BOLT] Fix call to evaluateX86MemOperands (details)
  303. Disable -split-eh in non-relocation mode (details)
  304. [BOLT][PR] In some cases DB could be nullptr (details)
  305. [X86] Support a subset of internal calls (details)
  306. [BOLT] Allow jump tables with 2 entries (details)
  307. [LLVM] Accept `S` in augmentation strings in CIE (details)
  308. [BOLT] Reject processing of PIE binaries (details)
  309. [BOLT] Fix no-assertions build (details)
  310. [DebugInfo] Change default value of FDEPointerEncoding (details)
  311. [BOLT] Fix diagnostics printing in data aggregator (details)
  312. [LongJumpPass] X86 enablement. First attempt. (details)
  313. Revert "[LongJumpPass] X86 enablement. First attempt." (details)
  314. -- Adding Veneer elimination pass and Veneer count to dyno stats. (details)
  315. Avoid removing BBs referenced by JTs (details)
  316. Fix assembly after adding entry points (details)
  317. [perf2bolt] Accept `-` as a valid misprediction symbol (details)
  318. [BOLT] Fix llvm-dwarfdump issues (details)
  319. [BOLT-AArch64] Create cold symbols on demand (details)
  320. [perf2bolt] Fix perf build-id matching (details)
  321. [perf2bolt] Enforce file matching in perf2bolt (details)
  322. Add initial function injection support (details)
  323. [BOLT] Add parser for pre-aggregated perf data (details)
  324. [BOLT] further speeding up cache+ (details)
  325. [BOLT] Add R_X86_64_PC64 relocation support (details)
  326. [BOLT][NFC] Minor code refactoring (details)
  327. [BOLT] Fix TBSS-related issue (details)
  328. [BOLT] Fix range checks (details)
  329. [BOLT] Add support for IFUNC (details)
  330. Retpoline Insertion Pass (details)
  331. [BOLT] Detect and handle fixed indirect branches (details)
  332. retpoline insertion : further updates. (details)
  333. [BOLT] Fix pseudo calculation in BinaryBasicBlock (details)
  334. [perf2bolt] Use mmap events for PID collection (details)
  335. [perf2bolt] Support profiling of PIEs and .so's (details)
  336. [BOLT] Update allocatable relocation sections (details)
  337. [BOLT] Fix shrink-wrapping CFI update (details)
  338. [BOLT] Add update-build-id option, on by default (details)
  339. [BOLT] Add mattr options to AArch64 target (details)
  340. [BOLT] Reduce AArch64 target feature flags (details)
  341. [BOLT][DWARF] Fix line info for empty CU DIEs (details)
  342. [BOLT] Fix profile after ICP (details)
  343. [BOLT] Change ForceRelocation behavior (details)
  344. [perf2bolt] Fix processing of binaries with names over 15 chars long (details)
  345. [BOLT] Merge jump table profile data (details)
  346. [BOLT] turning on the compact aligner by default (details)
  347. [BOLT] Fix another issue with profile after ICP (details)
  348. [BOLT] Ignore symbols from non-allocatable sections (details)
  349. [BOLT] Keep .text section in file when using old text (details)
  350. [BOLT] Change stub-insertion pass for AArch64 (details)
  351. [BOLT] Support relocations without symbols (details)
  352. [BOLT] fix build with gcc-4.8.5 (details)
  353. [BOLT] Capitalize i (details)
  354. [BOLT][PR] Fix compiler warnings in BinaryContext and RegAnalysis (details)
  355. Fix bug in analyzeRelocation for GOT entries (details)
  356. [perf2bolt] Pre-aggregate LBR samples (details)
  357. [BOLT] Update local symbol count in symbol table (details)
  358. [BOLT] Workaround for Clang de-virtualization bug (details)
  359. [BOLT] Add branch priority policy for blocks with 2 successors (details)
  360. [BOLT] Add method for better function size estimation (details)
  361. [BOLT] Add thresholds for function splitting (details)
  362. [perf2bolt] Better tracking of process forking (details)
  363. [perf2bolt] Optimize memory usage in perf2bolt (details)
  364. [perf2bolt] Add support for generating autofdo input (details)
  365. [BOLT] For non-simple functions always update jump tables in-place (details)
  366. [BOLT] New inliner implementation (details)
  367. [BOLT-HEATMAP] Initial heat map implementation (details)
  368. Do not assert on addresses read from processIndirectBranch (details)
  369. [NFC][BOLT] Move ExecutableFileMemoryManager into its own file (details)
  370. [BOLT] Refactor allocatable sections rewrite part (details)
  371. [BOLT] Fix -hot-functions-at-end option (details)
  372. [BOLT][NFC] Fix compilation warnings (details)
  373. [BOLT] Fix debug line info emission (details)
  374. [BOLT] Place hot text mover functions into a separate section (details)
  375. [BOLT] Use local binding for cold fragment symbols (details)
  376. [BOLT] Fix section lookup while deleting symbols (details)
  377. [BOLT] Allocate enough space past __hot_end for huge pages (details)
  378. [BOLT] Do not write jump table section headers (details)
  379. [BOLT][DWARF] Dedup .debug_abbrev section patches (details)
  380. [BOLT] Move BinaryFunctions into a BinaryContext and more (details)
  381. [BOLT] Detect internal references into a middle of instruction (details)
  382. [DWARF][BOLT] Convert DW_AT_(low|high)_pc to DW_AT_ranges only if necessary (details)
  383. [PERF2BOLT] Print a better message if perf.data lacks LBR (details)
  384. [BOLT][NFC] Indentation fix (details)
  385. [BOLT] Add interface to extract values from static addresses (details)
  386. [BOLT] Sort basic block successors for printing (details)
  387. [BOLT] Include <numeric> for std::iota (details)
  388. [BOLT] Handle R_X86_64_converted_reloc_bit (details)
  389. [BOLT] Reduce warnings for non-simple functions (details)
  390. [BOLT] Abort processing if the profile has no valid data (details)
  391. [BOLT] Add another section to the list of hot text movers (details)
  392. [BOLT] Fix adjustFunctionBoundaries w.r.t. entry points (details)
  393. [BOLT] Fix an issue with std:errc (details)
  394. [BOLT] Basic support for split functions (details)
  395. [BOLT] Process CFIs for functions with FDE size mismatch (details)
  396. [BOLT] Fix non-determinism in shrink wrapping (details)
  397. [cmake] Only build enabled targets (details)
  398. Fix casting issues on macOS (details)
  399. [BOLT] Update symbols for secondary entry points (details)
  400. [BOLT] Minimize BOLT's diff with LLVM by removing trivial changes (NFC) (details)
  401. [BOLT] Automatically enable -hot-text (details)
  402. [perf2bolt] Fix print report for pre-aggregated profile (details)
  403. [BOLT] Fix profile reading in non-reloc mode (details)
  404. [BOLT] Fix symboltable update bug (details)
  405. [BOLT] Strip debug sections by default (details)
  406. [BOLT][NFC] Move DynoStats out of BinaryFunction (details)
  407. [BOLT] Limit jump table size by containing object (details)
  408. [perf2bot] Pass `-f` flag to perf (details)
  409. [BOLT] Move JumpTable management to BinaryContext (details)
  410. [BOLT] Improve ICP activation policy and hot jt processing (details)
  411. Parse statically defined tracepoint markers from .note.stapsdt section (details)
  412. Preserve nops that are SDT markers in binaries and disable SDT conflicting optimizations (details)
  413. [BOLT] Add an option to specialize memcpy() for 1 byte copy (details)
  414. [BOLT] Refactor handling of interproc refs (details)
  415. [BOLT] Better verification of jump tables (details)
  416. [BOLT][NFC] Fix white space (details)
  417. Minor-fix: remove duplicate definition of SPT optimization timer (details)
  418. [BOLT] Use regex matching for function names passed on command line (details)
  419. Update SDT locations after bolt reordering (details)
  420. Support data collection in bolted binaries (details)
  421. Compile Bolt using std 14. (details)
  422. [BOLT] Better handling of address references (details)
  423. [perf2bolt] Option to use event PC with LBR stack (details)
  424. Use singleton instances for SPT (stack pointer tracking) in FrameAnalysis. (details)
  425. [BOLT] Delay populating jump tables (details)
  426. [BOLT] Check instruction boundaries while populating jump tables (details)
  427. Parallelize ICF Pass (details)
  428. [BOLT] Add option to print profile bias stats (details)
  429. [BOLT] Ignore empty funcs in relocation mode (details)
  430. [BOLT] Initial experimental instrumentation pass (details)
  431. [BOLT] Force non-relocation mode for heatmap generation (details)
  432. [BOLT] Ignore false function references (details)
  433. [BOLT] Introduce strict relocation mode (details)
  434. [BOLT] Fix out-of-bounds entry points (details)
  435. [BOLT] Prioritize Jump Table ICP target by frequency and indice count (details)
  436. run SPT in parallel, and split annotation allocator (details)
  437. Clean SPTMap in frame anaylsis in parallel (details)
  438. Run cleanAnnotations within frame analysis in parallel (details)
  439. Create a general interface to implement parallel tasks easily and apply it to run EliminateUnreachableBlocks in parallel. (details)
  440. [BOLT] Restrict creation of jump tables (details)
  441. [BOLT] Support duplicating jump tables (details)
  442. Run reorder blocks in parallel (details)
  443. run aligner pass in parallel (details)
  444. run finalize functions in parallel (details)
  445. Run buildCFG in disassembly in parallel (details)
  446. Run shrink wrapping in parallel (details)
  447. [BOLT] Fix issue printing CTCs without annotations (details)
  448. [BOLT][PR] Target compilation based on LLVM CMake configuration (details)
  449. Lock-based parallelization for updateDebugInfo (details)
  450. Run findSubprograms in preprocessDebugInfo in parallel (details)
  451. [BOLT][NFC] Fix white space (details)
  452. [BOLT] Fix processing PLT without relocs (details)
  453. [BOLT] Add code padding verification (details)
  454. Run hfsort+ in parallel (details)
  455. Fix race condition in buildCFG (details)
  456. [perf2bolt] Enforce strict mode for perf2bolt (details)
  457. [BOLT] Add option to verify instruction encoder/decoder (details)
  458. Rewrite ICF using parallel utilities (details)
  459. Rewrite frame analysis using parallel utilities (details)
  460. Add test for parallel mode (details)
  461. Rename option (details)
  462. [BOLT] Support instrumentation via runtime library (details)
  463. [BOLT] Encode instrumentation tables in file (details)
  464. [BOLT] Fix misleading output (details)
  465. [BOLT] Tighter control of jump table detection (details)
  466. [BOLT] Fix aggregator w.r.t. split functions (details)
  467. [BOLT] Support .plt.got section (details)
  468. [BOLT] Fix perf2bolt race in BAT mode (details)
  469. [BOLT] Efficient edge profiling in instrumented mode (details)
  470. [BOLT] Ignore LBR from kernel interrupts (details)
  471. [BOLT] Filter perf samples by PID (details)
  472. [BOLT][non-reloc] Change function splitting in non-relocation mode (details)
  473. [BOLT] Better check for compiler de-virtualization bug (details)
  474. [BOLT] Reword message for macro-op fusion optimization (details)
  475. [BOLT] Fix build for Mac (details)
  476. [llvm-bolt] Bugfix jemalloc sized deallocation segfault (details)
  477. [BOLT] Do not emit BAT for non-simple in nonreloc (details)
  478. [BOLT] Improve object discovery runtime (details)
  479. [BOLT] Add missing CMake test dependencies (details)
  480. [BOLT] Fix merge-fdata and heatmap in BAT (details)
  481. [BOLT] Fix non-determinism while reading debug info (details)
  482. [BOLT] Fix stale functions when using BAT (details)
  483. [BOLT] Ignore __builtin_unreachable destination (details)
  484. [BOLT][Docs] Instructions for linking with jemalloc/tcmalloc (details)
  485. [AArch64] Recognize one extra br idiom (details)
  486. [BOLT][llvm] Reduce memory used by MCInst (details)
  487. [BOLT] Fix section offsets after debug stripping (details)
  488. [BOLT] Use NameResolver class for local symbols (details)
  489. [BOLT] Create OffsetTranslationTable for basic blocks (details)
  490. [BOLT] Update SDTs based on translation tables (details)
  491. [BOLT] Free memory for CFG after emission (details)
  492. [BOLT] Free more memory in BinaryFunction::releaseCFG() (details)
  493. [BOLT] Fix jump table analysis for non-simple functions (details)
  494. speeding up ext-tsp (details)
  495. [BOLT][NFC] Refactor data section emission code (details)
  496. [BOLT] Add BinarySection::flushPendingRelocations() (details)
  497. [BOLT] Refactor data PC relocations in BinaryContext (details)
  498. [BOLT] Refactor markAmbiguousRelocations() (details)
  499. [BOLT][NFC] Refactor BinaryFunction::addEntryPoint() (details)
  500. [BOLT] Proper support for -trap-avx512 option (details)
  501. [BOLT] Fix shrink wrapping empty BB issue (details)
  502. [PERF2BOLT/BOLT] Improve support for .so (details)
  503. [BOLT] Fix invalid abbrev error when reading debug_info section with readelf (details)
  504. [BOLT] Separate DebugRangesSectionsWriter into Ranges and ARanges (details)
  505. [BOLT] Remove test for impossible debug ranges condition (details)
  506. [perf2bolt] Ignore mmap events unrelated to execution (details)
  507. [BOLT] Make .debug_loc update deterministic (details)
  508. [BOLT] Fix non-determinism in ICP with threads (details)
  509. [BOLT] Support full instrumentation (details)
  510. [perf2bolt] Better mmap event matching (details)
  511. [BOLT] Make .debug_loc update deterministic (details)
  512. [BOLT] Fix symbol table entries for secondary entries (details)
  513. [BOLT] Fix build of the runtime on OSX (details)
  514. [BOLT] Improve handling of secondary function entry points (details)
  515. [BOLT] Get rid of Names in BinaryData (details)
  516. [BOLT] Do no report error on mismatched instruction encoding (details)
  517. [BOLT] Move postProcessEntryPoints after disassembly (details)
  518. [BOLT] Move createBinaryContext to BinaryContext (details)
  519. [BOLT] Replace list of Names with Symbols for BinaryFunction (details)
  520. [BOLT] Fix symbol table issue with ICF (details)
  521. [BOLT] Fix issue with strict and builtin_unreachable (details)
  522. [BOLT] Fix section names under `-generate-link-sections` (details)
  523. [BOLT] Remove BinaryContext::getFunctionData (details)
  524. [BOLT] Make the methods isText/isData more robust (details)
  525. [BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum (details)
  526. [BOLT] Move peepholes pass after sctc (details)
  527. [BOLT] Add initial bits for parsing MachO files (details)
  528. [BOLT] Add missing std::move (details)
  529. [BOLT] Factor out NameResolver from RewriteInstance (details)
  530. [BOLT] Get rid of BinarySection::IsLocal (details)
  531. [BOLT] Emit long nops by default (details)
  532. [BOLT] Disassemble functions from a MachO binary (details)
  533. [BOLT] Add first bits to build CFG (details)
  534. [BOLT] Enable reversing the order of basic blocks (details)
  535. [BOLT][llvm] Update llvm.patch (details)
  536. [BOLT] Add missing override (details)
  537. [BOLT] Delete ExecutableFileMemoryManager::registerNoteSection() (details)
  538. [BOLT][NFC] Remove unused BinarySection member functions (details)
  539. [BOLT][NFC] Minor refactoring of RewriteInstance (details)
  540. [BOLT] Fix shrink wrapping to check pops (details)
  541. [BOLT][NFC] Factor out relocation processing (details)
  542. [BOLT] Fix begin decrementing (details)
  543. [BOLT][NFC] Get rid of BestFit parameter (details)
  544. [BOLT] Remove allow-section-relocations option (details)
  545. [BOLT] Mark functions containing data as non-simple (details)
  546. [BOLT] Uniquify names of local symbols (details)
  547. [BOLT] Refactor emission of original .eh_frame (details)
  548. [BOLT] Refactor ELF parts of instrumentation code (details)
  549. [BOLT] Refactor code and data emission code (details)
  550. [BOLT] Refactor section prefixes (details)
  551. [BOLT] Refactor ELF symbol table rewriting code (details)
  552. [BOLT][DWARF] Add support for base address in DWARF location lists (details)
  553. [BOLT] Verify exceptions action table equivalence in ICF (details)
  554. [BOLT] Fix ICF non-determinism in non-relocation mode (details)
  555. [BOLT] Speedup ICF by better function hashing (details)
  556. [BOLT] Further speedup ICF (details)
  557. [BOLT-X86] Fix instrumentation issue with indirect calls (details)
  558. [BOLT] Speedup RTDyld external symbol resolution (details)
  559. [BOLT] Fix .eh_frame update with ICF in non-relocation mode (details)
  560. [BOLT] Emit ICF symbols for large functions (details)
  561. [BOLT] Option to control .text alignment (details)
  562. [BOLT] Do not emit old .eh_frame in relocation mode (details)
  563. [BOLT] Option to fail if invalid profile detected (details)
  564. [BOLT] Speedup PLT processing (details)
  565. [BOLT][NFC] Change wording while reporting functions stats (details)
  566. [BOLT] Change symbol handling for secondary function entries (details)
  567. [BOLT][BFC] Refactor code for adding secondary function entries (details)
  568. [BOLT] Cover PIC jump table reference in non-strict mode (details)
  569. [BOLT] Fix dyno stats after ICF in non-reloc mode (details)
  570. [BOLT] Introduce isIgnored() function attribute (details)
  571. [BOLT] Introduce lite processing mode without relocations (details)
  572. Check runtime lib format within archiver (details)
  573. [BOLT] Ignore kernel interrupts by default (details)
  574. [BOLT] Change .debug_line emission for non-simple functions (details)
  575. [BOLT] Add option to tag version (details)
  576. [BOLT] Remove StringRef from IndirectCallProfile (details)
  577. [BOLT] Refactor profile-handling code (details)
  578. Remove const call to take_front (details)
  579. Use shuffle instead of random_shuffle (details)
  580. Emit functions on MachO (details)
  581. Refactor runtime library (details)
  582. Adding automatic huge page support (details)
  583. [BOLT] Update section index for symbols from unemitted functions (details)
  584. Generate heatmap for linux kernel (details)
  585. Provide a redundant declaration of KernelBaseAddr (details)
  586. Link functions on MachO (details)
  587. Be more flexible when locating runtime libs (details)
  588. [BOLT] Support for lite mode with relocations (details)
  589. [BOLT] Disable trapping on AVX-512 by default (details)
  590. [BOLT] Support -hot-text in lite mode (details)
  591. [BOLT] Fix memory error (details)
  592. [BOLT] Properly register symbols at secondary entry points (details)
  593. [BOLT] Fixes for scanExternalRefs() (details)
  594. [BOLT] Create entry points for internal refs from external code (details)
  595. [BOLT] Ignore functions that failed validation (details)
  596. [BOLT] Allow to overwrite -use-old-text option (details)
  597. [BOLT] Fix getNewValueForSymbol() (details)
  598. [BOLT] Add '-force-patch' to forcefully patch old entries (details)
  599. [BOLT] Ignore duplicate relocations (details)
  600. [perf2bolt] Relax rules for aggregation in strict mode (details)
  601. [BOLT] Add static binary support (details)
  602. [BOLT] Do not emit duplicate org symbols (details)
  603. Update X86/pre-aggregated-perf.test (details)
  604. [TESTS] Re-add issue20/issue26 tests (details)
  605. [BOLT] Skip R_X86_64_PLT32 relocation verification (details)
  606. [Bolt] Improve coding style for runtime lib related code (details)
  607. Support for CDF distribution of heatmap buckets (details)
  608. [BOLT] Ignore addresses from non-allocatable sections (details)
  609. Report stale sample count and percentage (details)
  610. [BOLT] Add the FeatureMiner pass to extract Calder's features. (details)
  611. [BOLT] Fix fix-branches in presence of JRCXZ and friends (details)
  612. Revert "[BOLT] Add the FeatureMiner pass to extract Calder's features." (details)
  613. [BOLT] Allow to specify -reorder-functions option multiple times (details)
  614. Extracted sequence insertion function into helper function (details)
  615. Handle intra-function call in instrumentOneTarget (details)
  616. [BOLT] Fix hot_end symbol update with user function order (details)
  617. [BOLT] Fix stack alignment for runtime lib (details)
  618. Added execution count threshold option (details)
  619. [perf2bolt] Fix for SKL bug workaround (details)
  620. Linux kernel marker to update special sections (details)
  621. Print when we are operating in lite mode (details)
  622. Add first bits to support emitting more than 255 sections on MachO (details)
  623. [perf2bolt] Issue error when writing YAML for BOLTed input (details)
  624. Fix BAT cold-to-hot mappings (details)
  625. Bugfix for splitting critical edges in shrink wrapping (details)
  626. [BOLT] Do no map sections with zero address (details)
  627. [BOLT] Eliminate "shallow" function lookup (details)
  628. [BOLT][Linux] Initial support for special Linux Kernel sections (details)
  629. Set InputFileOffset for MachO sections (details)
  630. postProcessEntryPoints: return after setIgnored and setSimple are set (details)
  631. Read the entry point address on MachO (details)
  632. [BOLT] Fix sign issue when validating X86 relocations (details)
  633. Add -check-overlapping-elements option (details)
  634. Precompute symbol section indices on MachO (details)
  635. [BOLT] Refactor relocations class impl per arch, NFC (details)
  636. Add ToolPath field to MachORewriteInstance (details)
  637. [BOLT] Refactor PatchEntries pass (details)
  638. [BOLT] Disable PatchEntries in non-relocation mode on ELF (details)
  639. Add support for emitting code into a new segment on MachO (details)
  640. [BOLT] Change label name for cold fragments (details)
  641. Fix handling of _end symbol on MachO (details)
  642. [BOLT] Emit symbol size for functions (details)
  643. Add first bits to support emitting instrumented code on MachO (details)
  644. [BOLT] Fix debug line info in lite relocation mode (details)
  645. [BOLT] Refactor reading of debug line info (details)
  646. [BOLT] In shrinkwrap, do not split prefix/instr (details)
  647. Add first bits to cross-compile the runtime for OSX (details)
  648. [BOLT][DWARF] Streamline processing of DWARF unit DIEs (details)
  649. Inject a hook into the entry point on MachO (details)
  650. [BOLT] Ignore __hot_start, __hot_end from input (details)
  651. [BOLT] Enable lite mode by default with relocations (details)
  652. [BOLT] Fix PatchEntries pass (details)
  653. Add pass number to dot dump filename (details)
  654. [BOLT] Always keep dynamic symbols defined (details)
  655. [BOLT] Fix no-asserts build (details)
  656. [DOCS] Add instrumentation instructions to README (details)
  657. [BOLT] Please sanitizers (details)
  658. [BOLT] Remove threaded EliminateUnreachableBlock version (details)
  659. [BOLT] Fix C++ exceptions for shared objects (details)
  660. [BOLT][PR] Handle TLS relocations on AArch64 (details)
  661. Extract BinaryContext::registerFragment (details)
  662. processInterproceduralReferences: record references to cold fragments as entry points (details)
  663. Conservatively handle jump tables in split functions (details)
  664. Lost in rebase: call registerFragment with a reference to TargetBF (details)
  665. Improve cold fragment name matching (details)
  666. [BOLT] Disable DynoStats printing after SCTC (details)
  667. Minimize X86/shrinkwrapping-critedge test case (details)
  668. [BOLT] Debug logging in analyzeJumpTable (details)
  669. [BOLT] Add invalid offset for a JT entry pointing to a fragment (details)
  670. [BOLT] Support jump tables in split fragments with entries pointing back to parent functions (details)
  671. a new version of hfsort+ (details)
  672. [BOLT] Fix data race while running split functions pass (details)
  673. Link the instrumentation runtime on OSX (details)
  674. [BOLT] Handle insertion of updated CFI at the first basic block (details)
  675. Refactor syscall wrappers for OSX (details)
  676. Inject instrumentation's global dtor on MachO (details)
  677. [BOLT] Fix shrinkwrapping bug when changing frame alignment (details)
  678. [TEST] Remove dependency on debug output (details)
  679. [BOLT] Add threshold options for lite mode (details)
  680. [PERF2BOLT] Relax segment matching requirements (details)
  681. [BOLT] Fix missing newlines in debug prints (details)
  682. [BOLT] Fix operator new signature (details)
  683. [BOLT] Enable intToStr for MacOS (details)
  684. an updated version of ExtTSP (details)
  685. [BOLT] Add support for __literal16 section on MachO (details)
  686. [BOLT] Add support for dumping counters on MacOS (details)
  687. [BOLT] Add support for dumping profile on MacOS (details)
  688. [BOLT] Add support for reading profile on Mach-O (details)
  689. Rebase: Merge BOLT codebase in monorepo (details)
  690. [BOLT] Update license headers (details)
  691. Update DW_AT_stmt_list for .debug_types (details)
  692. Fix license for a few remaining files (details)
  693. Fix up test for Update DW_AT_stmt_list for .debug_types (details)
  694. [BOLT][PR] readDynamicRelocations: Skip NONE relocations (details)
  695. [BOLT] Ignore TBSS section at layout time (details)
  696. [BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options (details)
  697. [BOLT] Fix false references to zero-sized objects (details)
  698. [BOLT] Fix instrumentation bug in duplicated JTs (details)
  699. [BOLT] Do not assert on jump table heuristic failure (details)
  700. Rebase: [cherry-pick] [BOLT] Add option to skip writing an output file (details)
  701. [BOLT] Refactor SectionPatchers map to a Patcher in BinarySection (details)
  702. [BOLT] Remove cantFail in getAddressRanges calls (details)
  703. [BOLT] Fix value invalidation bug in runtimelib (details)
  704. Rebase: [BOLT][NFC] Expand auto types (details)
  705. [BOLT][NFC] Use const reference for MCInstrDesc (details)
  706. [BOLT][NFC] Remove RewriteInstance::EHFrame (details)
  707. [BOLT] Remove -dump-eh-frame option (details)
  708. [BOLT][NFC] Remove CFIReaderWriter::fdes() (details)
  709. [perf2bolt] Further relax segment matching (details)
  710. Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use (details)
  711. Rebase: [BOLT][NFC] Avoid binutils in tests (details)
  712. [BOLT][NFC] Avoid unnecessary copies with push_back (details)
  713. [PR] Fix bb reordering optimization (details)
  714. [PR] Fix tests build with -no-pie option (details)
  715. [PR] Add missing includes (details)
  716. [BOLT][NFC] Follow LLVM variable initialization style (details)
  717. [BOLT][NFC] Address warning about ProgramPoint implicit copy constructor (details)
  718. [BOLT][NFC] Change interface for searching relocations (details)
  719. [BOLT] Preserve original jump table relocations (details)
  720. [BOLT][NFC][TEST] Added llvm-dwarfdump and llvm-mc to BOLT_TEST_DEPS (details)
  721. Rebase: [BOLT] DebugFission Support (details)
  722. [PR] Introduce loop inversion pass (details)
  723. [PR] Instrumentation: Emit paddings to preserve data alignment (details)
  724. [BOLT][NFC] Disable ProcessAllSections in RuntimeDyld (details)
  725. [BOLT] Resolve JumpTable namespace issue in pseudo probe decoder migration (details)
  726. [BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching (details)
  727. [BOLT][NFC] Fix debug info printouts for inlined functions (details)
  728. [BOLT] Hugify: check for THP support via sysfs (details)
  729. [BOLT] Change how DF DWO logging is handled (details)
  730. [BOLT][CSSPGO] Pseudo probe decoding (details)
  731. [BOLT][NFC] Suppress addList override warning (details)
  732. [BOLT] Fix rodata load simplification pass (details)
  733. [PR] Instrumentation: Disable signals on mutex lock (details)
  734. [PR] Patch allocatable relocations for AArch64 (details)
  735. Rebase: [BOLT][DebugFission] Fix reading support for DWP (details)
  736. [PR][BOLT] Print revision in perf2bolt and bolt-diff modes" (details)
  737. [BOLT] Fix undefined symbol warnings/errors (details)
  738. Throw an error in instrument for dynamic libs (details)
  739. [BOLT][TESTS] Fix ICF test case (details)
  740. [BOLT] Handle R_X86_64_64 in flushPendingRelocations (details)
  741. [BOLT][NFC] Use MCPlusBuilder::isPseudo (details)
  742. [BOLT][CSSPGO] Relate decoded pseudo probe basic blocks (details)
  743. [BOLT][NFC] Readability improvements in X86,Aarch64 MCPlusBuilder (details)
  744. [BOLT][NFC] Refactor handlePCRelOperand (details)
  745. [BOLT][NFC] Always process runtime relocations (details)
  746. [BOLT][NFC] Delete MoveRelocations entirely (details)
  747. [BOLT][NFC] Un-inline adding external references out of disassemble loop (details)
  748. [BOLT][NFC] Un-inline indirect branch handling out of disassemble loop (details)
  749. [BOLT][NFC] Un-inline checking AArch64 linker veneers out of disassemble loop (details)
  750. [BOLT][TESTS] Remove dynamic relocations from YAML tests (details)
  751. [BOLT][DWARF] Fix writing out dwo with DWP as input (details)
  752. [BOLT] Read all dynamic relocations and refactor code (details)
  753. [BOLT][NFC] Resolved all clang-12 warnings for bolt (details)
  754. [BOLT] Add support for .plt.sec and refactor PLT-reading code (details)
  755. [BOLT] Dump dynamic execution per instruction opcode (details)
  756. [BOLT] Tail duplication analysis pass (details)
  757. [BOLT][CSSPGO] Encode pseudo probe section to binary (details)
  758. [BOLT][CSSPGO] Handle indirect call promotion in Pseudo Probe Integration (details)
  759. RewriteInstance: account .stab and .stabstr as debug sections (details)
  760. [BOLT] Tail Duplication active pass (details)
  761. [BOLT] Update build instructions in README (details)
  762. [BOLT] Support PLT sections with variable entry sizes (details)
  763. [BOLT][NFC] Unify isTailCall interface across X86 and AArch64 (details)
  764. [PR] Instrumentation: Generate and use _start and _fini trampolines (details)
  765. [PR] Instrumentation: Add readlink and getdents support (details)
  766. [PR] Instrumentation: Add support for opening libs based on links /proc/self/map_files (details)
  767. [PR] Instrumentation: Initial support for static executables (details)
  768. [PR] Instrumentation: Fix runtime handlers for PIE files (details)
  769. [PR] README: remove note about experimental status of instrumentation (details)
  770. [PR] Instrumentation: Introduce instrumentation-binpath argument (details)
  771. [PR] Instrumentation: Fix start and fini trampoline pointers (details)
  772. [PR] Instrumentation: Avoid generating GOT table in instrumentation library (details)
  773. [PR] Tests: add instrumentation tests for PIE exec & shared libs (details)
  774. Rebase: [BOLT] DWP output support (details)
  775. Fix NFC tests (details)
  776. [PR] Fix AARCH64 ADR* relocations (details)
  777. [BOLT][NFC][PR] Removed unused singletonSet (details)
  778. [PR] Fdata: Escape whitespaces in symbol names (details)
  779. [BOLT] Added Constant and Copy Propagation to tail duplicated blocks (details)
  780. [PR] Print relocations warning if failed to process (details)
  781. [PR] AArch64: Fix ADR instruction handling (details)
  782. [BOLT] Optimize the three way branch (details)
  783. [BOLT] Refactor to use new APIs for getting offset of attribute (details)
  784. [PR] ReorderAlgorithm.cpp: Fix iterator types (details)
  785. [PR] LIT: add checking if maxIndividualTestTime is availabe on the platform (details)
  786. [PR] Instrumentation: use TryLock for SimpleHashTable getter (details)
  787. [BOLT] Fix binary corruption in non-reloc mode (details)
  788. [BOLT] [NFC] Cleanup old code in mapCodeSections (details)
  789. [NFC] Fix warnings when building with clang (details)
  790. [BOLT] Fix warnings from LLVM DWARF reading library (details)
  791. [PR] Fix aarch64 TLS relocations handling (details)
  792. [PR] AArch64: Skip some of the relocations processing (details)
  793. [BOLT][DWARF][NFC] Refactor code (details)
  794. [BOLT][TEST] Remove dependence on host_cc and host_cxx (details)
  795. [PR] Add AARCH64_MOVW_UABS_G* relocations support (details)
  796. [BOLT][DWARF] Write new .debug_abbrev sections (details)
  797. [BOLT][DWARF][NFC] Use only skeleton/main CUs to update .debug_aranges (details)
  798. [BOLT][DWARF][NFC] Get rid of updateRangeBase() helper function (details)
  799. [BOLT][TEST] Split runtime tests into test/runtime folder (details)
  800. [BOLT][TEST] Import internal_call_instrument.s (details)
  801. Rebase: [PR] Fix build instructions (details)
  802. [BOLT][TEST] Imported small tests (details)
  803. [BOLT][DWARF] Fix abbrev offsets for type units (details)
  804. [BOLT][NFC] Remove redundant code (details)
  805. [BOLT][DWARF] Move line info emission into BOLT (details)
  806. [BOLT][DWARF] Deprecate usage of DWARFAbbreviationDeclaration::findAttribute() (details)
  807. [BOLT][TEST] Imported small tests, removed duplicate input (details)
  808. [BOLT][DWARF] Change line info emission for unmodified functions (details)
  809. [BOLT][DWARF] Properly emit of end-of-sequence entries for line tables (details)
  810. [BOLT] Do not process DWARF relocs (details)
  811. [BOLT][NFC] Use const pointers in PrintProgramStats (details)
  812. [PR] Update skipRelocationProcess (details)
  813. [PR] AArch64: Add TSTBR14 and CONDB19 relocations support (details)
  814. [BOLT][TEST] Imported small tests (details)
  815. [BOLT] link_fdata: accept symbols with slash in the name (details)
  816. [PR] Handle relocations in constant islands (details)
  817. [BOLT][TEST] Imported small tests (details)
  818. [BOLT][TEST] Imported small tests (details)
  819. [BOLT][TEST] Imported small tests (details)
  820. [BOLT][TEST] Imported small tests (details)
  821. [BOLT][TEST] Imported small tests (details)
  822. [BOLT][TEST] Imported small tests (details)
  823. [BOLT][TEST] Imported small tests (details)
  824. [BOLT] Allocate memory for constant islands on-demand (details)
  825. [BOLT] Fix build after auto rebase (details)
  826. [BOLT][DWARF] Use MCAsmLayout to update stmt_list values (details)
  827. [PR] Fix LongJmp pass (details)
  828. [PR][BOLT][TEST] Fix tests (details)
  829. [BOLT][DWARF] Refactor of Loc and LocLists writers (details)
  830. [PR] Fix constant islands handling (details)
  831. [PR] Instrumentation: Sync file on dump (details)
  832. [PR] Skip NONE static relocations (details)
  833. [PR] Disable instrumentation and hugify build for aarch64 (details)
  834. [BOLT][DWARF] Keep original line info for unmodified units (details)
  835. [PR] Fix warning (details)
  836. [PR] Introduce remove-symtab option (details)
  837. [BOLT] Add Dockerfile (details)
  838. [PR] bolt_rt: getBinaryPath() increase max file path (details)
  839. Rebase: [NFC] Refactor sources to be buildable in shared mode (details)
  840. [BOLT] Improve cmake configs for opensource (details)
  841. [BOLT][NFC] Do not pass BinaryContext alongside BinaryFunction (details)
  842. Rebase: [BOLT] AsmDump: dump function assembly and profile info (details)
  843. [PR][BOLT][Instrumentation] Optimize eflags load/store (details)
  844. [PR] Aarch64: Add ABS32/16 relocations support (details)
  845. [BOLT][DWARF] Fix rare problem while rewriting debug_abbrev after LTO (details)
  846. [BOLT][NFC] Remove references to internal tasks (details)
  847. [BOLT] TailDuplication: skip non-simple functions (details)
  848. [BOLT][TEST] Import small tests (details)
  849. [BOLT][TEST] Add instrumentation test using merge-fdata (details)
  850. [BOLT][TEST] Import small tests (details)
  851. [BOLT][TEST] Rename tests to follow standard naming scheme (details)
  852. [BOLT][TEST] Import jump-table-icp.test, update link_fdata script (details)
  853. [BOLT][NFC] AsmDump: disable printing of empty profile data (details)
  854. [BOLT][NFC] Remove unused function (details)
  855. [BOLT][TEST] Import small tests (details)
  856. [BOLT] Fix Windows build (details)
  857. [PR] instr: change assert to allow FD 0 return by __open() (details)
  858. [BOLT][TEST] Reduce vararg.test (details)
  859. [BOLT][TEST] Import small tests (details)
  860. [BOLT][TEST] Add llvm-boltdiff to build/test requirements (details)
  861. [BOLT] Fix tailcall-traps and basic-instr tests on ubuntu (details)
  862. Fix shared build (details)
  863. [BOLT][NFC] Change guard macros in headers (details)
  864. [BOLT][DWARF] Fix for Unsupported Debug section: debug_line.dwo warning (details)
  865. [PR] Fix ShrinkWrapping pop order (details)
  866. [BOLT][TEST] Fix runtime/X86/retpoline-synthetic.test (details)
  867. [BOLT][NFC] Use function names passed in -funcs-no-regex as-is (details)
  868. [BOLT] Import bughunter script (details)
  869. [BOLT] Fix crash when trying to resolve external symbols for runtime libs (details)
  870. [PR] Disable stack protection in runtime libraries (details)
  871. [BOLT] Tail Duplication: skip unreachable blocks (details)
  872. [BOLT] Tail Duplication: fix jump table check (details)
  873. [BOLT][NFC] Better diagnostics for unsupported relocation types (details)
  874. [BOLT][NFC] Remove misleading debug message (details)
  875. [BOLT] Tail duplication: disable const/copy propagation by default as a workaround (details)
  876. [BOLT][NFC] Remove unused MCPlusBuilder::createIndirectCall method (details)
  877. [BOLT][NFC] Remove unused MCPlusBuilder::isEnter (details)
  878. [BOLT][TESTS] Move debugTypesBug.s test into binary tests (details)
  879. [BOLT] Add pass to normalize CFG (details)
  880. [BOLT][DWARF] Force allocation of debug_line in RuntimeDyld (details)
  881. Add code owners file (details)
  882. [BOLT][DWARF] Fix for abbrev check in DWP case (details)
  883. [BOLT][NFC] Clear HFSort copyright/license (details)
  884. [BOLT] Use more ADT data structures for BinaryFunction (details)
  885. Disable Windows build (details)
  886. [BOLT] Refactor BinaryBasicBlock to use ADT (details)
  887. [BOLT] Split functions: support fragments with multiple parents (details)
  888. Add bolt target to cmake (details)
  889. [BOLT][NFC] Reformat with clang-format (details)
  890. Fix install-bolt_rt dependencies (details)
  891. [PR] Fix update-debug-sections for AArch64 (details)
  892. Fix frameopt crash when processing POPF (details)
  893. [BOLT] Move disassemble optimizations to optimization passes (details)
  894. [BOLT] Fix profile and tests for nop-removal pass (details)
  895. [BOLT][NFC] Remove unused function (details)
  896. [BOLT][NFC] Remove another unused function (details)
  897. [BOLT] Don't use ld.lld in tests (details)
  898. [BOLT][DOCS] Updated clang build instructions in OptimizingClang.md (details)
  899. [BOLT][NFC] Clear HFSort copyright/license (details)
  900. [BOLT][NFC] Fix file-description comments (details)
  901. [bughunter.sh][NFC] Fix license and file description (details)
  902. [BOLTCore] [NFC] Fix braces usages according to LLVM (details)
  903. Re-enable Windows build and fix issues (details)
  904. [BOLT][RFC] Use new LLVM license for ADRRelaxationPass (details)
  905. [PR][BOLT] Check for end iterator in LongJmp stub lookup (details)
  906. [BOLTRewrite][NFC] Fix braces usages (details)
  907. [BOLT][DOCS] Build doxygen documentation (details)
  908. [BOLT][NFC] Fix braces usage in Passes (details)
  909. [BOLT] Fix debug logging in IndirectCallPromotion (details)
  910. [BOLT][NFC] Fix braces usage in Target (details)
  911. [BOLT][NFC] Fix braces usage in Profile (details)
  912. [BOLT][NFC] Fix braces usage in the rest of the codebase (details)
  913. [PR][BOLT] Add aarch64 backend code owner (details)
  914. [BOLT][DOCS] Link to README instead of the github page in Doxygen (details)
  915. [BOLT] Rewrite of .debug_info section (details)
  916. [BOLT] removeAllSuccessors: handle multiple edges between basic blocks (details)
  917. [BOLT][DWARF] Change convertToRanges to not use indirect (details)
  918. [BOLT][DWARF] Handling more data formats for DW_AT_high_pc (details)
  919. [BOLT][NFC] Refactor if statements in RewriteInstance (details)
  920. [BOLT][NFC] Use uniform DEBUG_TYPE for MCPlus builders (details)
  921. [BOLT][DWARF] Fix race conditions for debug fission in non-deterministic mode (details)
  922. [BOLT][NFC] Refactor command line options in BinaryPassManager (details)
  923. [BOLT][DWARF] Fix size mismatch error with jemalloc (details)
  924. [BOLT] Remove ineligible macro-fusion patterns (details)
  925. [BOLT][NFC] Reuse X86BaseInfo interfaces for macrofusion checks (details)
  926. [BOLT][NFC] Refactor X86MCPlusBuilder (details)
  927. [BOLT][NFC] Refactor AArch64MCPlusBuilder (details)
  928. [BOLT][NFC] Format braced initializer lists (details)
  929. [InstSimplify] Add additional GEP tests with undef bases. (details)
  930. [SPIR-V] Remove unused variable (details)
  931. [SCEV] Add test for umin_seq with duplicate operands (details)
  932. [SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand (details)
  933. [mlir][linalg] Use cast instead of dyn_cast that's always dereferenced (details)
  934. [clang][lex] Keep references to `DirectoryLookup` objects up-to-date (details)
  935. [GlobalOpt] Regenerate test checks (NFC) (details)
  936. [gn build] (manually) port 8503c688d555 (details)
  937. [NFC][SCEV] Add more tests for umin_seq with redundant operands (details)
  938. [GlobalStatus] Look through non-constexpr casts (details)
  939. [ASan] Driver changes to always link-in asan_static library. (details)
  940. [libc++] Use TEST_HAS_NO_UNICODE instead of _LIBCPP_HAS_NO_UNICODE in the test suite (details)
  941. [compiler-rt] Silence warnings when building with MSVC (details)
  942. [NFC][SCEV] More tests with operand-wise redundant operands of umin of umin_seq (details)
  943. [SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands (details)
  944. [X86] Tag existing shuffle test case as PR53124 (details)
  945. [mips] Use `push_back` to insert element at the end of a container. NFC (details)
  946. [mips][lld] Add test case to check symbol index reading on mips64el. NFC (details)
  947. [InstSimplify] Fold inbounds GEP to poison if base is undef. (details)
  948. [Nomination] Adding Intel representatives to security group (details)
  949. [DSE] Style improvements after 3cef3cf - remove redundant dyn_casts [NFC] (details)
  950. [X86] Apply clang-format to X86TargetLowering::isVectorShiftByScalarCheap (details)
  951. [GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls (details)
  952. Mark arith.minf, arith.maxf as commutative. (details)
  953. [libc][NFC] Move sys/mman entrypoints to the default build configs. (details)
  954. [clang] Move `ApplyHeaderSearchOptions` from Frontend to Lex (details)
  955. [SPIR-V] Drop double quote from test pattern (details)
  956. [RISCV] Add DAG combine to fold (fp_to_int (ffloor X)) -> (fcvt X, rdn) (details)
  957. X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr (details)
  958. [gn build] Port f77d115cc136 (details)
  959. [instsimplify] Add a comment and test for a highly confusing case (details)
  960. [clang-format] Fix SeparateDefinitionBlocks issues (details)
  961. [IRBuilder] Introduce folder using inst-simplify, use for Or fold. (details)
  962. Fix bazel build after 8503c688d555014b88849e933bf096035a351586. (details)
  963. [mlir][linalg] Improve pooling op iterator order consistency (details)
  964. [ELF] Add RelocationScanner. NFC (details)
  965. [MLIR][SCF] Simplify scf.if by swapping regions if condition is a not (details)
  966. [mlir][tosa] Relax tosa.apply_scale operations (details)
  967. [ShrinkWrap] check for PPC's non-callee-saved LR (details)
  968. [NFC][LazyCallGraph] Remove check in removeDeadFunction() if graph is empty (details)
  969. [libc++][libc++abi][libunwind] Dedup install path var definitions (details)
  970. [mlir][tosa] Allow optional TOSA decompositions to be populated separately (details)
  971. [SelectionDAG] treat X constrained labels as i for asm (details)
  972. [mlir] Fix a missing override warning (details)
  973. Fix clang-tidy bugprone-argument-comment that was mixed up (details)
  974. Apply clang-tidy fixes for readability-redundant-control-flow in OpenMPDialect.cpp (NFC) (details)
  975. [CodeGen] Treat ObjC `__unsafe_unretained` and class types as trivial (details)
  976. [llvm][test] rewrite callbr to use i rather than X constraint NFC (details)
  977. Fix bazel build after f77d115cc136585f39d30a78c741eb296f9e804d. (details)
  978. [NFC][SimplifyCFG] Add some more tests for sinking into 'unreachable' block (details)
  979. [clang][CGStmt] emit i constraint rather than X for asm goto indirect dests (details)
  980. [gn build] (manually) port f77d115cc136 more (details)
  981. [HIP] Fix device malloc/free (details)
  982. [MLIR][LLVM] Add MemRead/MemWrite behavior to llvm store/load/addressof ops (details)
  983. [DSE] Generalize store null to calloc allocated memory [NFC-ish] (details)
  984. Accept string literal decay in conditional operator (details)
  985. [clang] number labels in asm goto strings after tied inputs (details)
  986. [DSE] Minor style improvements to calloc formation code [NFC] (details)
  987. [NFC][MLGO] Remove the word "inliner" in a generic error message. (details)
  988. [AIX] support xcoff for llvm-nm (details)
  989. [DSE] Seperate malloc+memset -> calloc transform from noop store dedection [NFC] (details)
  990. [InstCombine] Pull out a helper function to simplify upcoming patch [NFC] (details)
  991. AMDGPU/GlobalISel: Regenerate baseline checks to include -NEXT (details)
  992. GlobalISel: Use cloneVirtualRegister in localizer (details)
  993. Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`." (details)
  994. [AIX] add the xcoff symbol size for the llvm-nm. (details)
  995. [mlir][Linalg] Pattern to fuse pad operation with elementwise operations. (details)
  996. [MCA] Switching from conservatively guessing which instructions are (details)
  997. [clang][#47272] Avoid suggesting deprecated version of a declaration over another in typo correction (details)
  998. [libc++] Introduce __debug_db_insert_c() (details)
  999. [libc++] Add Status page for P2321R2 (Zip) (details)
  1000. [libc++] Introduce __fits_in_sso() (details)
  1001. Add 'eager-checks' as a module parameter to MSAN. (details)
  1002. [NFC][llvm-libtool-darwin] Encapsulate the process of adding a new member in a class (details)
  1003. [llvm-libtool-darwin] Print a warning if object file names are repeated (details)
  1004. [sanitizer_common] Only use NT_GNU_BUILD_ID in sanitizer_linux_libcdep.cpp if supported (details)
  1005. [TSan][Darwin] Mark test UNSUPPORTED for iOS simulator (details)
  1006. [libc++] Temporarily disable the in_out_result test on Fuchsia. (details)
  1007. [TSan][Darwin] Enable Trace/TraceAlloc unit tests (details)
  1008. [MLIR][SCF] Canonicalize while statement whose cmp condition is recomputed in the after region (details)
  1009. [NFC] Fixup for comment (details)
  1010. [LLDB][NativePDB] Add support for inlined functions (details)
  1011. [NFC][MLGO] Use ASSERT_TRUE in TFUtilsTest, where appropriate. (details)
  1012. [clang][CodeGen][UBSan] VLA size checking for unsigned integer parameter (details)
  1013. [MLGO] Add support for multiple training traces per module (details)
  1014. [lld-macho] Rename LazySymbol to LazyArchive. NFC (details)
  1015. [libc++][test] Move iter_swap into iterator.cust.swap. NFC. (details)
  1016. ASTMatchers: Avoid using SmallVector::set_size() (details)
  1017. [MLIR][SCF] Remove unused arguments to whileop (details)
  1018. ADT: Avoid using SmallVector::set_size() in SmallString (details)
  1019. [lld-macho] Initialize separate time trace profiler for mapfile worker (details)
  1020. AST: Avoid using SmallVector::set_size() in UnresolvedSet (details)
  1021. Support: Avoid SmallVector::set_size() in Windows code (details)
  1022. Support: Avoid SmallVector::set_size() in Unix code (details)
  1023. Support: Extract sys::fs::readNativeFileToEOF() from MemoryBuffer (details)
  1024. [Coroutines] Enhance symmetric transfer for constant CmpInst (details)
Commit 674dbcc0de71b082ae51c52481186e7aa7562e82 by maks
Fix crash in patchELFPHDRTable when no functions are modified.

Summary:
patchELFPHDRTable was asserting that it could not find an entry
for .eh_frame_hdr in SectionMapInfo when no functions were modified
by BOLT.

This just changes code to skip modifying GNU_EH_FRAME program headers
hen SectionMapInfo is empty.  The existing header is copied and written
instead.

(cherry picked from FBD3557481)
The file was modifiedbolt/RewriteInstance.cpp
Commit bf46263eed318fbba77a5ec386aab11792cbd3ee by maks
Shorten instructions if possible.

Summary:
Generate short versions of branch instructions by default and rely on
relaxation to produce longer versions when needed.

Also produce short versions of arithmetic instructions if immediate
fits into one byte. This was only triggered once on HHVM binary.

(cherry picked from FBD3591466)
The file was modifiedbolt/BinaryFunction.cpp
Commit f2d82919d07661958514723f081e4ef15e725449 by maks
Move debug-handling code into DWARFRewriter (NFC).

Summary: RewriteInstance.cpp is getting too big. Split the code.

(cherry picked from FBD3596103)
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/DWARFRewriter.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit c6d0c568d43b2657bdf691e09fdd60b9f1cdde1e by maks
Add BinaryContext::getSectionForAddress()

Summary: Interface for accessing section from BinaryContext.

(cherry picked from FBD3600854)
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.h
Commit ea53cffb2d363f75eceb4a8d36c12791a57519ed by maks
Add movabs -> mov shortening optimization.  Add peephole optimization pass that does instruction shortening.

Summary:
Shorten when a mov instruction has a 64-bit immediate that can be repesented as
a sign extended 32-bit number, use the smaller mov instruction (MOV64ri -> MOV64ri32).

Add peephole optimization pass that does instruction shortening.

(cherry picked from FBD3603099)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/RewriteInstance.cpp
Commit 17b846586cd0e46cc454a97f019f0468f8952a7f by maks
Loop detection for BOLT's CFG.

Summary:
Loop detection for the CFG data structure. Added a GraphTraits
specialization for BOLT's CFG that allows us to use LLVM's loop
detection interface.

(cherry picked from FBD3604837)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.h
The file was addedbolt/BinaryLoop.h
Commit 156a55209c9bd421a6c75652c8139f042516a1e6 by maks
Simplification of loads from read-only data sections.

Summary:
Instructions that load data from the a read-only data section and their
target address can be computed statically (e.g. RIP-relative addressing)
are modified to corresponding instructions that use immediate operands.
We apply the transformation only when the resulting instruction will have
smaller or equal size.

(cherry picked from FBD3397112)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.cpp
Commit 82401630a22b4051532a2203beb7d2cc6a021d01 by maks
Factor out instruction printing and size computation.

Summary:
I've factored out the instruction printing and size computation routines to
methods on BinaryContext.  I've also added some more debug print functions.

This was split off the ICP diff to simplify it a bit.

(cherry picked from FBD3610690)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryBasicBlock.h
Commit a9bb3320ad6f8e87e22b0ff60cd004f0de3f8157 by maks
Identical Code Folding (ICF) pass

Summary:
Added an ICF pass to BOLT, that can recognize identical functions
and replace references to these functions with references to just one
representative.

(cherry picked from FBD3460297)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPasses.cpp
Commit ab599fe71a7b02fa4f8ceaaadaca988c2c7dc1a1 by maks
Basic block clustering algorithm for minimizing branches.

Summary:
This algorithm is similar to our main clustering algorithm but uses
a different heuristic for selecting edges to become fall-throughs.
The weight of an edge is calculated as the win in branches if we choose
to layout this edge as a fall-through. For example, the edges A -> B with
execution count 100 and A -> C with execution count 500 (where B and C
are the only successors of A) have weights -400 and +400 respectively.

(cherry picked from FBD3606591)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/ReorderAlgorithm.h
The file was modifiedbolt/ReorderAlgorithm.cpp
Commit 50e011f4e54c317b2fe3e7d9406e20347ba4a5ee by maks
CFG editing functions

Summary:
This diff adds a number of methods to BinaryFunction that can be used to edit the CFG after it is created.

The basic public functions are:
  - createBasicBlock - create a new block that is not inserted into the CFG.
  - insertBasicBlocks - insert a range of blocks (made with createBasicBlock) into the CFG.
  - updateLayout - update the CFG layout (either by inserting new blocks at a certain point or recomputing the entire layout).
  - fixFallthroughBranch - add a direct jump to the fallthrough successor for a given block.

There are a number of private helper functions used to implement the above.

This was split off the ICP diff to simplify it a bit.

(cherry picked from FBD3611313)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 486ab273c7d003f30fe9cab8cd0ae81c7ede313f by maks
Add printing support for indirect tail calls.

Summary:
LLVM was missing assembler print string for indirect tail
calls which are synthetic instructions created by us.

(cherry picked from FBD3640197)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.cpp
Commit 713e361f3616dd08ccaae53bfb1cab7a5bb66358 by maks
Fix for correct disassembling of conditional tail calls.

Summary:
BOLT attempts to convert jumps that serve as tail calls to dedicated tail call
instructions, but this is impossible when the jump is conditional because there is
no corresponding tail call instruction. This was causing the creation of a duplicate
fall-through edge for basic blocks terminated with a conditional jump serving as
a tail call when there is profile data available for the non-taken branch. In this
case, the first fall-through edge had a count taken from the profile data, while
the second has a count computed (incorrectly) by
BinaryFunction::inferFallThroughCounts.

(cherry picked from FBD3560504)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 82d76ae18b084458e0a0c8992fe47320467eb1e8 by maks
Add MCInst annotation mechanism to MCInstrAnalysis class.

Summary:
Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize.

Annotation is used for associating random data with MCInsts.  Clients can
construct their own annotation types (subclassed from MCAnnotation) and
associate them with instructions.  Annotations are looked up by string keys.

Annotations can be added, removed and queried using an instance of the
MCInstrAnalysis class.

The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception
handling information for call instructions.

GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute.

(cherry picked from FBD3597877)
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 32739247ebd175a875f307a39504c38187c90cf9 by maks
More aggressive inlining pass

Summary:
This adds functionality for a more aggressive inlining pass, that can
inline tail calls and functions with more than one basic block.

(cherry picked from FBD3677856)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.cpp
Commit 36df6057b01417df0842628a61bbbf09ff9bb2e6 by maks
Refactoring. Mainly NFC.

Summary:
Eliminated BinaryFunction::getName(). The function was confusing since
the name is ambigous. Instead we have BinaryFunction::getPrintName()
used for printing and whenever unique string identifier is needed
one can use getSymbol()->getName(). In the next diff I'll have
a map from MCSymbol to BinaryFunction in BinaryContext to facilitate
function lookup from instruction operand expressions.

There's one bug fixed where the function was called only under assert()
in ICF::foldFunction().

For output we update all symbols associated with the function. At the
moment it has no effect on the generated binary but in the future we
would like to have all symbols in the symbol table updated.

(cherry picked from FBD3704790)
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 003d106c0b2b84b32397686456cdb58b3353845b by maks
More refactoring work.

Summary:
Avoid referring to BinaryFunction's by name.

Functions could be found by MCSymbol using
BinaryContext::getFunctionForSymbol().

(cherry picked from FBD3707685)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/DebugData.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/RewriteInstance.h
Commit 406aa6208399c9e8eec4c1e92a4d64df58b8667a by maks
Add additional info to BOLT graphviz CFG dumps.

Summary:
Add the following info the graphviz CFG dump:
- Edges are labeled with the jmp instruction that leads to that edge.
- Edges include the count and misprediction count.
- Nodes have (offset, BB index, BB layout index)
- Nodes optionally have tooltips which contain the code of the basic block.
  (enabled with -dot-tooltip-code)
- Added dashed edges to landing pads.

(cherry picked from FBD3646568)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit c1d1c2e7cda6fd6c0b47ebd8a2683b0b196e6813 by maks
Check if operands are immediates before trying shortening.

Summary:
Operands in the initial instruction stream should all have immediate operands
for instructions that can be shortened.  But if a BOLT optimization pass adds
one of these instructions with a symbolic operand, the shortening operation
will assert.  This diff adds checks to make sure that the operands are
immediate.

I've also disabled shortening pass by default since it won't really be needed
until ICP is submitted.  It will still run at CFG creation time.

(cherry picked from FBD3610646)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit a10fb73ab3b939920356f879e3e5172fdf09bd01 by maks
Compute ClusterEdges only when necessary.

Summary:
We only need ClusterEdges in reordering algorithm optimized for
branches and the computation is quite resource-hungry, thus it
makes sense to only do it when needed.

Some refactoring too.

(cherry picked from FBD3721107)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/ReorderAlgorithm.h
Commit 42c5894fe246dd3f79ea7d799a8d4da9513b66aa by maks
Write padding for .eh_frame_hdr to a file.

Summary:
We were applying padding to the calculated address but were never
writing it to a file triggering an assertion for cases when
.gcc_except_table size wasn't multiple of 4.

(cherry picked from FBD3744638)
The file was modifiedbolt/RewriteInstance.cpp
Commit 97f598fd17bc7aafd056211b288a0d7e0466478c by maks
Handling for indirect tail calls.

Summary:
Analyze indirect branches and convert them into indirect
tail calls when possible. We analyze the memory contents
when the address could be calculated statically and also
detect epilogue code.

(cherry picked from FBD3754395)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryContext.cpp
Commit 43acb6a28ab052cb3d4e8ca647dbdb3f4d883299 by maks
Emit remember_state CFI in the same code region as restore_state.

Summary:
While creating remember_state/restore_state CFI sequences, we
were always placing remember_state instruction into the first
basic block. However, when we have hot-cold splitting, the cold
part has and independent FDE entry in .eh_frame, and thus the
restore_state instruction was missing its counter part.

The fix is to adjust the basic block that is used for placing
remember_state instruction whenever we see the hot-cold split
boundary.

(cherry picked from FBD3767102)
The file was modifiedbolt/BinaryFunction.cpp
Commit c27a6a5c63dfabe0a53e2f4b2c1f372f96330035 by maks
Add verbosity level and clean up stream usage.

Summary:
I've added a verbosity level to help keep the BOLT spewage to a minimum.
The default level is pretty terse now, level 1 is closer to the original,
I've saved level 2 for the noisiest of messages.  Error messages should
never be suppressed by the verbosity level only warnings and info messages.

The rational behind stream usage is as follows:
outs() for info and debugging controlled by command line flags.
errs() for errors and warnings.
dbgs() for output within DEBUG().

With the exception of a few of the level 2 messages I don't have any strong feelings about the others.

(cherry picked from FBD3814259)
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit 1cf200107edd2a2d7f4775d236f03c8e1704932f by maks
Fix tail call conversion and test cases.

Summary:
A previous diff accidentally disabled tail call conversion.

Additionally some test cases relied on output of "-v=2". Fix those.

(cherry picked from FBD3823760)
The file was modifiedbolt/BinaryFunction.cpp
Commit dcaffe64d339a8436aae626a6f21d235f3f61101 by maks
Inlining fixes/enhancements

Summary:
A number of fixes/enhancements to inline-small-functions
- Fixed size estimateHotSize to use computeCodeSize instead of the original layout offsets.
- Added -print-inline option to dump CFGs for functions that have been modified by inlining.
- Added flag to force consideration of functions without any profiling info (mostly for testing)
- Updated debug line info for inlined functions.
- Ignore the number of pseudo instructions when checking for candidates of suitable size.

Misc changes
- Moved most print flags to BinaryPasses.cpp

(cherry picked from FBD3812658)
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
Commit 48b55300e0ac4b084466669c0134b17139a83bb6 by maks
BOLT: Make most command line options ZeroOrMore.

Summary:
This will make it easier to run experiments with the same baseline
BOLT binary but different command line options.

(cherry picked from FBD3831978)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit 17e691915bfccbce750fa92eb208dfbdd11c1c4a by maks
Make BinaryFunction::fixBranches() more flexible and support CFG updates.

Summary:
The CFG represents "the ultimate source of truth". Transformations
on functions and blocks have to update the CFG and fixBranches() would
make sure the correct branch instructions are inserted at the end of
basic blocks (or removed when necessary).

We do require a conditional branch at the end of the basic block if
the block has 2 successors as CFG currently lacks the conditional
code support (it will probably stay that way). We only use this
branch instruction for its conditional code, the destination is
determined by CFG - first successor representing true/taken branch,
while the second successor - false/fall-through branch.

When we reverse the branch condition, the CFG is updated accordingly.

The previous version used to insert jumps after some terminating
instructions sometimes resulting in a larger code than needed. As a
result with the new version 1 extra function becomes overwritten for
HHVM binary.

With this diff we also convert conditional branches with one successor
(result of code from __builtin_unreachable()) into unconditional
jumps.

(cherry picked from FBD3802062)
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 6bef336cc2ebfefb3b724dedf9dc7fd23c147b7e by maks
Add dyno stats to BOLT.

Summary:
Add "-dyno-stats" option that prints instruction stats based on
the execution profile similar to below:

BOLT-INFO: program-wide dynostats after optimizations:
  executed forward branches : 109706407 (+8.1%)
  taken forward branches : 13769074 (-55.5%)
  executed backward branches : 24517582 (-25.0%)
  taken backward branches : 15330256 (-27.2%)
  executed unconditional branches : 6009826 (-35.5%)
  function calls : 17192114 (+0.0%)
  executed instructions : 837733057 (-0.4%)
  total branches : 140233815 (-2.3%)
  taken branches : 35109156 (-42.8%)

Also fixed pseudo instruction discrepancies and added assertions
for BinaryBasicBlock::getNumPseudos() to make sure the number is
synchronized with real number of pseudo instructions.

(cherry picked from FBD3826995)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.h
Commit c4c518ee9d9bb45daaadd0b6b0094548373f6fde by maks
Rewrite SCTC pass to do UCE and make it the last optimization pass.

Summary:
For now we make SCTC a special pass that runs at the end of all
optimizations and transformations right after fixupBranches().

Since it's the last pass, it has to do its own UCE.

(cherry picked from FBD3838051)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 71be5679694c9b12928d4c6f63adbea770cdaba5 by maks
BOLT: Add per pass dyno stats + factor out post pass printing.

Summary:
I've added dyno stats printing per pass so we can see the results
of each optimization pass on the stats.  I've also factored out the
post pass function printing code since it was pretty much the same
after each pass.

(cherry picked from FBD3843587)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
Commit 617c6a13b793917b249f72e37ac67ffad21ba08b by maks
Use BB.getNumNonPseudos() in more places.

Summary:
Use BB.getNumNonPseudos() in more places.

Fix analyze_potential script to pass the new parameter.

(cherry picked from FBD3844416)
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit 861d5a1586a4406f48b96f272487aead2fdb8dc0 by maks
BOLT: Remove double jumps peephole.

Summary:
Replace jumps to other unconditional jumps with the final
destination, e.g.

  B0: ...
      jmp B1  (or jcc B1)

  B1: jmp B2

  ->

  B0: ...
      jmp B2  (or jcc B1)

This peephole removes 8928 double jumps from a test binary.

Note: after filtering out double jumps found in EH code and infinite
loops, the number of double jumps patched is 49 (24 for a clang
compiled test).  The 24 in the clang build are all from external
libraries which have probably been compiled with gcc.  This peephole
is still useful for cleaning up after ICP though.

(cherry picked from FBD3815420)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryBasicBlock.cpp
Commit 52bfc3f92f42b3b1f91139565315a230396cd2fd by maks
Fix switch table detection. Disassemble all instructions in non-simple functions.

Summary:
Switch table can contain __builtin_unreachable(). As a result,
a compiler may place an entry into a jump table that contains
an address immediately past the last instruction in the function.
Sometimes it may coincide with a start of the next function in
the binary. Thus when we check for switch tables in such cases
we have to check more than a single entry until we see either
an address inside containing function or some address outside
different from the address past the last instruction.

Additonally, don't stop disassembly after discovering that the
function was not simple. We need to detect all outside
references whenever possible.

(cherry picked from FBD3850825)
The file was modifiedbolt/BinaryFunction.cpp
Commit b0f4031db33409d3c9c60bc7591e5e78976ad6ca by maks
Add cluster randomization layout algorithm.

Summary:
Add "-reorder-blocks=cluster-shuffle" for performance experiments.
Use "-bolt-seed=<N>" to set a randomization seed.

(cherry picked from FBD3851035)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/ReorderAlgorithm.h
Commit 7483cd0fa694aec103b16029b07353a30548a919 by maks
BOLT: Clean up interface between BinaryFunction and BinaryBasicBlock.

Summary:
This is just a bit of refactoring to make sure that BinaryFunction goes
through methods to get at the state in BinaryBasicBlock.  I did this so
that changing the way Index/LayoutIndex/Valid works will be easier.

(cherry picked from FBD3860899)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit 2f3a85977298a5dcd610bfa074940762e148ec1e by maks
Add experimental jump table support.

Summary:
Option "-jump-tables=1" enables experimental support for jump tables.

The option hasn't been tested with optimizations other than block
re-ordering.

Only non-PIC jump tables are supported at the moment.

(cherry picked from FBD3867849)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.cpp
Commit 8dbf0e2b3d7be8a5b97173c9c834c94b20e7fec0 by maks
Add dyno stats for jump tables.

Summary: Add dyno stats for jump tables.

(cherry picked from FBD3871035)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit c4e36c1dd6bdd6172183d458aef0d65fbe20e2f5 by maks
Fix issue with zero-size duplicate function symbols.

Summary:
While working on PLT dyno stats I've noticed that we were missing
BinaryFunctions for some symbols that were not PLT. Upon closer inspection
turned out that those symbols were marked as zero-sized functions in
symbol table, but they had duplicates with non-zero size. Since the
zero-size symbols were preceding other duplicates, we were not creating
BinaryFunction for them and they were not added as duplicates.

The 2 most prominent functions that were missing for a test were free() and
malloc().  There's not much to optimize in these functions, but they were
contributing quite significantly to dyno stats.

As a result dyno stats for this test needed an adjustment.

Also several assembly functions (e.g. _init()) had zero size, and now we
set the size to the max size and start processing those. It's good for
coverage but will not affect the performance.

(cherry picked from FBD3874622)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit 2c9bf9afd65701ce40ddb9802efc099ed16892ea by maks
Add PLT dyno stats.

Summary: Get PLT call stats.

(cherry picked from FBD3874799)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 62bff426c3b90b5c471d3d2b7e0889ea02dc9a1c by maks
Do no collect dyno stats on functions with stale profile.

Summary:
Dyno stats collected on functions with invalid profile may appear
completely bogus. Skip them.

(cherry picked from FBD3879371)
The file was modifiedbolt/BinaryFunction.cpp
Commit 510f227cbd3d4e699c535cc743036b6420ae3ee1 by maks
BOLT: Add feature to sort functions by dyno stats.

Summary:
Add -print-sorted-by and -print-sorted-by-order command line options.
The first option takes a list of dyno stats keys used to sort functions
that are printed at the end of all optimization passes.  Only the top
100 functions are printed.  The -print-sorted-by-order option can be
either ascending or descending (descending is the default).

(cherry picked from FBD3898818)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 2f1341b51da00ba6a4a15cde00350744fc70f2cd by maks
BOLT: Refactoring BinaryFunction interface.

Summary:
Get rid of all uses of getIndex/getLayoutIndex/getOffset outside of BinaryFunction.
Also made some other offset related methods private.

(cherry picked from FBD3861968)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/DebugData.cpp
Commit ecc4b9e713508b5c6cb41d12467b122ee51497ee by maks
BOLT: Add ud2 after indirect tailcalls.

Summary:
Insert ud2 instructions after indirect tailcalls to prevent the CPU from
decoding instructions following the callsite.

A simple counter in the peephole pass shows 3260 tail call traps inserted.

(cherry picked from FBD3859737)
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
Commit 4464861a02e2a2815530d3ffd6edd68684d9d2de by maks
Support for splitting jump tables.

Summary:
Add level for "-jump-tables=<n>" option:
  1 - all jump tables are output in the same section (default).
  2 - basic splitting, if the table is used it is output to hot section
      otherwise to cold one.
  3 - aggressively split compound jump tables and collect profile for
      all entries.

Option "-print-jump-tables" outputs all jump tables for debugging
and/or analyzing purposes. Use with "-jump-tables=3" to get profile
values for every entry in a jump table.

(cherry picked from FBD3912119)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
Commit 4a0c494bc10eec3637fb1179aeea85137b1b1271 by maks
BOLT: Remove restrictions on unreachable code elimination

Summary:
Allow UCE when blocks have EH info.  Since UCE may remove blocks
that are referenced from debugging info data structures, we don't
actually delete them.  We just mark them with an "invalid" index
and store them in a different vector to be cleaned up later once
the BinaryFunction is destroyed.  The debugging code just skips
any BBs that have an invalid index.

Eliminating blocks may also expose useless jmp instructions, i.e.
a jmp around a dead block could just be a fallthrough.  I've added
a new routine to cleanup these jmps.  Although, @maks is working on
changing fixBranches() so that it can be used instead.

(cherry picked from FBD3793259)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.cpp
Commit 9cf5d74ffb983b250be5cccc836041d5681ddea3 by maks
Support for PIC-style jump tables.

Summary:
Added support for jump tables in code compiled with "-fpic".
Code pattern generated for position-independent jump tables
is quite different, as is the format of the tables.
More details in comments.

Coverage increased slightly for a test, mostly due to the code
coming from external lib that was compiled with "-fpic".

(cherry picked from FBD3940771)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit e241e9c156b0a9036ee84ce2bebeb6627806acea by maks
New function discovery and support for multiple entries.

Summary:
Modified function discovery process to tolerate more functions and
symbols coming from assembly. The processing order now matches
the memory order of the functions (input symbol table is unsorted).

Added basic support for functions with multiple entries. When
a function references its internal address other than with
a branch instruction, that address could potentially escape.
We mark such addresses as entry points and make sure they
are treated as roots by unreachable code elimination.

Without relocations we have to mark multiple-entry functions
as non-simple.

(cherry picked from FBD3950243)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.h
Commit 0eb2559feeff1060920d433ac41affa976c2e37e by maks
Fix EH for cold fragments that we fail to write.

Summary:
When we fail to write functions that are too big, we have to
effectively cancel their effect on exception handling by ignoring
their FDE entries in .eh_frame while writing .eh_frame_hdr.

This can happen to functions that we split too. In such cases
the cold part has its own FDE and we have to ignore that one too.
This doesn't happen very often - I've only seen one case on
hhvm binary, however it is a potential issue. The fix is to
add the cold part address to the list of failed-to-write
addresses.

(cherry picked from FBD3987984)
The file was modifiedbolt/RewriteInstance.cpp
Commit 99dce7d05e4ee7f62e1a864de6e0d7a3643624fd by maks
Disable processing of functions with EVEX-encoded instructions (AVX-512).

Summary:
AVX-512 disassembler support in LLVM is not quite ready yet.
Before we feel more comfortable about it we disable processing
of all functions that use any EVEX-encoded instructions.

(cherry picked from FBD4028706)
The file was modifiedbolt/BinaryFunction.cpp
Commit bc8cb088c0ec5e502f850a4420cbb99613f3e530 by maks
Support DWARF expressions in CFI instructions

Summary:
Modify the MC layer (MCDwarf.h|cpp) to understand CFI
instructions dealing with DWARF expressions. Add code to emit DWARF
expressions in MCDwarf. Change llvm-bolt to pass these CFI instructions
to streamer instead of bailing on them. Change -dump-eh-frame option in
llvm-bolt to dump the EH frame of the rewritten binary in addition to
the one in the original binary, allowing us to proper test this patch.

(cherry picked from FBD4194452)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.cpp
Commit 355dbd769e5ddab5e72ebc00e38567858309d730 by maks
Fix DW_CFA_def_cfa CFI duping in output binary

Summary:
CFI instructions may live in CIEs or FDEs. CIEs hold common
instructions used across many FDEs. When replaying CFIs to the output
binary, llvm-bolt needs to replay both instructions from CIE and the
corresponding FDE for the function. However, some instructions need not
to be replayed because MCStreamer/MCDwarf and friends will write them
by default in the output CIE. This patch fix the code that tried to
recognize one of these default instructions but was failing, resulting
in an extra CFI instruction in each FDE we outputted. With this patch,
the output binary should be a bit smaller.

(cherry picked from FBD4194753)
The file was modifiedbolt/RewriteInstance.cpp
Commit 055dfe48e712baae4805d36f77f81c8fa40e8e5f by maks
Another EH fix for cold fragments of functions that we fail to write.

Summary:
In a prev diff I disabled inclusion of FDEs for cold fragments that
we fail to write. The side effect of it was that we failed to
write FDE for the next function with a cold fragment since it
had the same assigned address that we had put in FailedAddresses.

The correct fix is to assign zero address to failed cold fragments
and ignore them when we write .eh_frame_hdr.

(cherry picked from FBD4156740)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.cpp
Commit 809c28f585bb9e087ea2c449d2da0c9d829d25a2 by maks
Generate .eh_frame_hdr based on contents of .eh_frame's.

Summary:
We used to patch an existing .eh_frame_hdr and append contents
for split functions at the end. However, this approach does not
work in relocation mode since function addresses change and split
functions will not necessarily be at the end.

Instead of patching and appending we generate the new .eh_frame_hdr
based on contents of old and new .eh_frame sections.

(cherry picked from FBD4180756)
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.h
The file was modifiedbolt/RewriteInstance.h
Commit a7fb610eba625df8797f98883ca8691b88c1090a by maks
Relocate old .eh_frame section next to the new one.

Summary:
In order to improve gdb experience with BOLT we have to make
sure the output file has a single .eh_frame section. Otherwise
gdb will use either old or new section for unwinding purposes.

This diff relocates the original .eh_frame section next to
the new one generated by LLVM. Later we merge two sections
into one and make sure only the newly created section has
.eh_frame name.

(cherry picked from FBD4203943)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryFunction.h
Commit 8609ad51e504028fa23270caf59ccb966020a44b by maks
Detect default CFI frame instructions for the target

Summary:
Make BOLT resilient to changes in the LLVM's X86 target library
by not hardwiring the list of default CIE instructions, but detecting it
at run time.

(cherry picked from FBD4200982)
The file was modifiedbolt/RewriteInstance.cpp
Commit ac2621fbf456c40a3e2b9d9b52234307cef37634 by maks
Add stats for "-optimize-bodyless-functions".

Summary: Print the number of calls eliminated.

(cherry picked from FBD4010698)
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
Commit 7115706d02175fc62384d9ae173c548e0e71bd20 by maks
Fix clang warning about switch covering all enums

Summary:
This is part of a series of clean-up patches to make bolt
cleanly compile with clang 4.0. This patch fixes the following warning:
default label in switch which covers all enumeration values

(cherry picked from FBD4242168)
The file was modifiedbolt/BinaryFunction.h
Commit b21bc02ac4e5dfcc30beadadacc3802b6ac53b81 by maks
Remove pessimizing std::move

Summary:
This is part of a series of clean-up patches to make bolt
cleanly compile with clang 4.0. This patch fixes the following warning:
moving a temporary object prevents copy elision

(cherry picked from FBD4242236)
The file was modifiedbolt/BinaryPasses.cpp
Commit 5cc9c5841064b1ffa13efbd63d39c1edf9301911 by maks
Avoid const_iterator on std::vector::emplace

Summary:
This is part of a series of clean-up patches to make bolt
cleanly compile with clang 4.0. This patch fixes an error where clang
will fail to compile because it does not support passing a
const_iterator to std::vector<T>::emplace(Iter, ...).

(cherry picked from FBD4242546)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit a331fa396bc4a26067320f800f81a20abd0d4612 by maks
Fix memory leak in DWARFRewriter

Summary:
Clang's Address Sanitizer caught this leak where MCAsmBackend
and MCObjectWriter instances were being created but not freed. Fix this.

(cherry picked from FBD4249941)
The file was modifiedbolt/DWARFRewriter.cpp
Commit 5c0e4b6a574198c9d02beed282adc68f47f24850 by maks
Fix undefined behavior in DebugInfo

Summary:
The CFI instructions parser in libDebugInfo was relying on
undefined behavior to parse operands by assuming the order function
parameters are evaluated in a function call site is defined (it is
not). This patch fix this and makes our clang and gcc tests agree.
It also fixes wrong LIT tests in our codebase with respect to the
order of DW_CFA_def_cfa operands.

(cherry picked from FBD4255227)
The file was modifiedbolt/Exceptions.cpp
Commit 3888c5604f73d2daafb28205c2e8b460372184ac by maks
Remove unused private var in CFIReaderWriter (NFC)

Summary: This member variable is dead.

(cherry picked from FBD4255342)
The file was modifiedbolt/Exceptions.h
Commit c570038d319a0ff30376b56c31ae8614db2a1978 by maks
Add option to time passes

Summary:
As we begin to work on optimization passes for bolt, it is important to
keep track of the time spent in each of these to measure their
contribution to the time bolt takes to finish rewriting a program.

(cherry picked from FBD4301136)
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/BinaryPassManager.cpp
Commit 06caefdb1d1b2cc520ce6b1bbdc369bbd24cfd55 by maks
Fix typo in time passes

Summary:
Previously NamedRegionTimer's constructor was being called
with no local variable associated with it owing to a typo. We need a
local variable to keep track of the time spent in the scope. At the
end of the scope, the destructor will be called an then the timer will
stop.

(cherry picked from FBD4301844)
The file was modifiedbolt/BinaryPassManager.cpp
Commit 3a3dfc3dc25c36700f2367821d0f5ba360cad5d9 by maks
BOLT: Use profiling info to control branch simplification optimization.

Summary:
An optimization to simplify conditional tail calls by removing unnecessary branches.  It adds the following two command line options:

  -simplify-conditional-tail-calls  - simplify conditional tail calls by removing unnecessary jumps
  -sctc-mode                        - mode for simplify conditional tail calls
    =always                         -   always perform sctc
    =preserve                       -   only perform sctc when branch direction is preserved
    =heuristic                      -   use branch prediction data to control sctc

This optimization considers both of the following cases:

  foo: ...
       jcc L1   original
       ...
  L1:  jmp bar  # TAILJMP

->

  foo: ...
       jcc bar  iff jcc L1 is expected
       ...

  L1 is unreachable

OR

  foo: ...
       jcc  L2
  L1:  jmp  dest  # TAILJMP
  L2:  ...

->

  foo: jncc dest  # TAILJMP
  L2:  ...

  L1 is unreachable

For this particular case, the first basic block ends with a conditional branch and has two successors, one fall-through and one for when the condition is true.  The target of the conditional is a basic block with a single unconditional branch (i.e. tail call) to another function.  We don't care about the contents of the fall-through block.

(cherry picked from FBD3719617)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
Commit a75bbfc6401aa209b2ad07aa67716aab32e16b76 by maks
Add a frame optimization pass

Summary:
This is a first attempt to perform data flow analyses on bolt
and try to rebuild the stack frame for functions. The goal of the frame
optimization pass is to detect instructions that are accessing stack and,
if loading values, evaluate whether this load is redundant and we can
substitute the memory operation for a register load or immediate load.
To find opportunities, this pass also builds a map of clobbered registers
by function, so we use this in our analysis at call sites. If a call site
is found out to not clobber a caller-saved register but the caller is
spilling it anyway to the stack (to comply with the ABI), we should
detect these cases and remove this unnecessary move.

(cherry picked from FBD4337238)
The file was addedbolt/FrameOptimizerPass.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/BinaryPasses.cpp
The file was addedbolt/FrameOptimizerPass.h
Commit 55fc5417f854dbc704254fe5de714b675553ab06 by maks
Relocations support for BOLT.

Summary: Read relocation from linker and relocate all functions.

(cherry picked from FBD4223901)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.h
Commit bc8a456309e62efae9afedc8340f97a510fa4ad9 by maks
ICF improvements.

Summary:
Re-worked the way ICF operates. The pass now checks for more than just
call instructions, but also for all references including function
pointers. Jump tables are handled too.

(cherry picked from FBD4372491)
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/ReorderAlgorithm.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryBasicBlock.h
Commit 08949053730dc619e71885034b9295e216889900 by maks
[ICF] Don't re-fold functions in non-relocation mode.

Summary:
In-non relocation mode, when we run ICF the second time,
we fold the same functions again since they were not
removed from the function set. This diff marks them as
folded and ignores them during ICF optimization. Note
that we still want to optimize such functions since they
are potentially called from the code not covered by BOLT
in non-relocation mode.

Folded functions are also excluded from dyno stats with
this diff

Also print the number of times folded functions were called.
When 2 functions -  f1() and f2() are folded, that number
would be min(call_frequency(f1), call_frequency(f2)).

(cherry picked from FBD4399993)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPasses.cpp
Commit 19859377f8c435a6f8e80be14385ea7c2a01cffd by maks
[BOLT] Fix debug info update for zero-length ranges.

Summary:
Due to a clowntown on my part we were generating wrong ranges
when an empty range was seen on input. We were basically expanding
the range to include all basic blocks following such range and setting
wrong sizes at the same time.

Add "-dump-cu" option to llvm-dwarfdump that allows to look at debug
info of a single compile unit only. Saves time if we are only interested
in a subset of information.

(cherry picked from FBD4430989)
The file was modifiedbolt/DebugData.h
The file was modifiedbolt/DebugData.cpp
Commit 503c741d430a3b7e71f87903d2845af8a15981c0 by maks
[BOLT] Report stale functions' percentage wrt all profiled functions.

Summary:
Report stale functions percentage with respect to all profiled
functions instead of all simple functions in the binary.
The new reporting format should make it more apparent if the
profile is out-of-date. Compare:

  BOLT-INFO: 341 (16.7% of all profiled) functions have invalid (possibly
stale) profile.

vs old:

  BOLT-INFO: 341 (0.3%)  functions have invalid (possibly stale) profile.

(cherry picked from FBD4451746)
The file was modifiedbolt/RewriteInstance.cpp
Commit 6dfd16cb4c8b56df0ca2683a920086d89c16292d by maks
Cover RSP-indexed accesses in frame optimization

Summary:
Add a new dataflow analysis to recover the value of RSP at a
given point of the program. This value is expressed as an offset from
the CFA. Use this information to detect redundant load in memory
accesses performed via RSP as well, not only RBP as done previously.
Bail when RSP value (as an offset of the CFA) can't be reliably
determined with a simple dataflow analysis.

(cherry picked from FBD4372261)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/FrameOptimizerPass.h
The file was modifiedbolt/FrameOptimizerPass.cpp
Commit 6ff1795d969a918bd1b57641be13867ba3aae0b9 by maks
[BOLT] Support overwriting jump tables in-place.

Summary:
Add an option to overwrite jump tables without moving and make it a
default:

  -jump-tables   - jump tables support (default=basic)
    =none        -   do not optimize functions with jump tables
    =basic       -   optimize functions with jump tables
    =move        -   move jump tables to a separate section
    =split       - split jump tables section into hot and cold based on
                   function execution frequency
    =aggressive  - aggressively split jump tables section based on usage of
                   the tables

(cherry picked from FBD4448499)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.h
Commit d74997c3ccf333161c4d6712bd1efde942ff61b6 by maks
Indirect call promotion optimization.

Summary:
Perform indirect call promotion optimization in BOLT.

The code scans the instructions during CFG creation for all
indirect calls.  Right now indirect tail calls are not handled
since the functions are marked not simple.  The offsets of the
indirect calls are stored for later use by the ICP pass.

The indirect call promotion pass visits each indirect call and
examines the BranchData for each.  If the most frequent targets
from that callsite exceed the specified threshold (default 90%),
the call is promoted.  Otherwise, it is ignored.  By default,
only one target is considered at each callsite.

When an candiate callsite is processed, we modify the callsite
to test for the most common call targets before calling through
the original generic call mechanism.

The CFG and layout are modified by ICP.

A few new command line options have been added:
-indirect-call-promotion
-indirect-call-promotion-threshold=<percentage>
-indirect-call-promotion-topn=<int>

The threshold is the minimum frequency of a call target needed
before ICP is triggered.

The topn option controls the number of targets to consider for
each callsite, e.g. ICP is triggered if topn=2 and the total
requency of the top two call targets exceeds the threshold.

Example of ICP:

C++ code:

  int B_count = 0;
  int C_count = 0;

  struct A { virtual void foo() = 0; }
  struct B : public A { virtual void foo() { ++B_count; }; };
  struct C : public A { virtual void foo() { ++C_count; }; };

  A* a = ...
  a->foo();
  ...

original:
  400863: 49 8b 07             mov    (%r15),%rax
  400866: 4c 89 ff             mov    %r15,%rdi
  400869: ff 10                callq  *(%rax)
  40086b: 41 83 e6 01          and    $0x1,%r14d
  40086f: 4d 89 e6             mov    %r12,%r14
  400872: 4c 0f 44 f5          cmove  %rbp,%r14
  400876: 4c 89 f7             mov    %r14,%rdi
  ...

after ICP:
  40085e: 49 8b 07             mov    (%r15),%rax
  400861: 4c 89 ff             mov    %r15,%rdi
  400864: 49 ba e0 0b 40 00 00 movabs $0x400be0,%r10
  40086b: 00 00 00
  40086e: 4c 3b 10             cmp    (%rax),%r10
  400871: 75 29                jne    40089c <main+0x9c>
  400873: 41 ff d2             callq  *%r10
  400876: 41 83 e6 01          and    $0x1,%r14d
  40087a: 4d 89 e6             mov    %r12,%r14
  40087d: 4c 0f 44 f5          cmove  %rbp,%r14
  400881: 4c 89 f7             mov    %r14,%rdi
  ...

  40089c: ff 10                callq  *(%rax)
  40089e: eb d6                jmp    400876 <main+0x76>

(cherry picked from FBD3612218)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPasses.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit e212805ea6f5454f4eb7928deb935a001f19389b by maks
[BOLT] Update section names in output file.

Summary:
Re-write section header string table to reflect new names
given to sections. Old sections get ".bolt.org" prefix.

E.g. when we write ".eh_frame" section, we keep the old copy
but rename it to ".bolt.org.eh_frame".

Note: the new code section is named ".bolt.text" - it contains split
function bodies, while original ".text" name is left unchanged.

(cherry picked from FBD4524935)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
Commit c89821cee3426a7fe20efca3ef39570a8bdf597c by maks
[BOLT] Detect and prevent re-optimization attempts.

Summary:
Whenever we try to re-optimize a binary with BOLT we should
issue an error and exit.

(cherry picked from FBD4525228)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
Commit 6b0b5bbae7034b27823c2a09d9844c0f6fc31f47 by maks
[BOLT] Reject sanitized binaries.

Summary:
Whenever input binary is suspected to have been sanitized we print an error
message and exit. I've checked that "__asan_init*" symbol
presence is the most conservative way to detect "sanitization".

(cherry picked from FBD4525478)
The file was modifiedbolt/RewriteInstance.cpp
Commit 734a7a5437d92463c86c3cc5371ae27571d7adfe by maks
[BOLT] Skip disassembly of padding at function end.

Summary:
Some functions coming from assembly may not have been marked
with size. We assume the size to include all bytes up to
the next function/object in the file. As a result,
function body will include any padding inserted by the linker.
If linker inserts 0-value bytes this could be misinterpreted
as invalid instruction and BOLT will bail out on such functions
in non-relocation mode, and give up on a binary in relocation
mode.

This diff detects zero-padding, ignores it, and continues processing
as normal.

(cherry picked from FBD4528893)
The file was modifiedbolt/BinaryFunction.cpp
Commit 82965b963f366d106595d32e445a089e1a890092 by maks
[BOLT] Emit short tail calls in relocation mode.

Summary:
To minimize size of the output code we should emit tail calls
that are as short as possible. For this we have to convert a synthetic
TAILJMPd into JMP_1 instruction. This should be one of the last passes
as most of analysis passes could break since tail calls will no longer
be marked as such.

The total size of the code is smaller, but not by much - hot text was
reduced by 192 bytes.

(cherry picked from FBD4557804)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryPasses.h
The file was modifiedbolt/BinaryPasses.cpp
Commit f06a1455eac57aa3ddcbc156c8b9496224dc9539 by maks
[BOLT] Add support for *GOTPCRELX relocation type.

Summary:
gcc5 can generate new types of relocations that give linker a freedom
to substitute instructions. These relocations are PC-relative, and
since we manually process such relocations they don't present
much of a problem.

Additionally, detect non-pc-relative access from code into a middle of
a function. Occasionally I've seen such code, but don't know exactly
how to trigger its generation. Just issue a warning for now.

(cherry picked from FBD4566473)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
Commit 88244a10bb205f650a44590dcc8e184cd240a26d by maks
[BOLT] Move BOLT passes under Passes subdirectory (NFC).

Summary:
Move passes under Passes subdirectory.

Move inlining passes under Passes/Inliner.*

(cherry picked from FBD4575832)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/Passes/FrameOptimizer.h
The file was modifiedbolt/BinaryPassManager.h
The file was addedbolt/Passes/BinaryPasses.cpp
The file was removedbolt/FrameOptimizerPass.h
The file was removedbolt/FrameOptimizerPass.cpp
The file was addedbolt/Passes/Inliner.cpp
The file was addedbolt/Passes/Inliner.h
The file was addedbolt/Passes/BinaryPasses.h
The file was addedbolt/Passes/FrameOptimizer.cpp
The file was addedbolt/Passes/ReorderAlgorithm.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was addedbolt/Passes/CMakeLists.txt
The file was addedbolt/Passes/ReorderAlgorithm.cpp
Commit d3e33b6edc12c12fffd2f53b6af64e94ec874f5c by maks
[BOLT] Fix -jump-tables=basic in relocation mode.

Summary:
In a prev diff I added an option to update jump tables in-place (on by default)
and accidentally broke the default handling of jump tables in relocation
mode. The update should be happening semi-automatically, but because
we ignore relocations for jump tables it wasn't happening (derp).

Since we mostly use '-jump-tables=move' this hasn't been noticed for
some time.

This diff gets rid of IgnoredRelocations and removes relocations
from a relocation set when they are no longer needed. If relocations
are created later for jump tables they are no longer ignored.

(cherry picked from FBD4595159)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.cpp
Commit 88a461014b6c064dc23151912e692d0796cf66e4 by maks
[BOLT] Don't set code skew in relocations mode.

Summary:
We use code skew in non-relocation mode since functions have fixed
addresses, and internal alignment has to be adjusted wrt the skew.
However in relocation mode it interferes with effective code
alignment, and has to be disabled. I missed it when was re-basing
the relocation diff.

(cherry picked from FBD4599670)
The file was modifiedbolt/RewriteInstance.cpp
Commit 2029458f347c7a9658e0464b22298f83b0c70093 by maks
[BOLT] Strip 'repz' prefix from 'repz retq'.

Summary:
Add pass to strip 'repz' prefix from 'repz retq' sequence. The prefix
is not used in Intel CPUs afaik. The pass is on by default.

(cherry picked from FBD4610329)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 965a373dc491064cf7fd05113247f275e984c00d by maks
Fix warnings when compiling with clang (NFC)

Summary:
Fix inconsistent override keyword usages and initializes a
missing field of a Relocation object when using braced initializers.

(cherry picked from FBD4622856)
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
Commit 6dc2351505b3a1d570bf4d9ae9a5dc6e55dd6e1b by maks
[BOLT] New CFI handling policy.

Summary:
The new interface for handling Call Frame Information:

  * CFI state at any point in a function (in CFG state) is defined by
    CFI state at basic block entry and CFI instructions inside the
    block. The state is independent of basic blocks layout order
    (this is implied by CFG state but wasn't always true in the past).
  * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst *Inst) to
    get CFI state at any given instruction in the program.
  * No need to call fixCFIState() after any given pass. fixCFIState()
    is called only once during function finalization, and any function
    transformations after that point are prohibited.
  * When introducing new basic blocks, make sure CFI state at entry
    is set correctly and matches CFI instructions in the basic block
    (if any).
  * When splitting basic blocks, use getCFIStateAtInstr() to get
    a state at the split point, and set the new basic block's CFI
    state to this value.

Introduce CFG_Finalized state to indicate that no further optimizations
are allowed on the function. This state is reached after we have synced
CFI instructions and updated EH info.

Rename "-print-after-fixup" option to "-print-finalized".

This diffs fixes CFI for cases when we split conditional tail calls,
and for indirect call promotion optimization.

(cherry picked from FBD4629307)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
Commit f241e252fc987de8d382432dc6a2b16abe66c0d6 by maks
[BOLT] Detect and handle __builtin_unreachable().

Summary:
Calls to __builtin_unreachable() can result in a inconsistent CFG.
It was possible for basic block to end with a conditional branche
and have a single successor. Or there could exist non-terminated
basic block without successors.

We also often treated conditional jumps with destination past the end
of a function as conditional tail calls. This can be prevented
reliably at least when the byte past the end of the function does
not belong to the next function.

This diff includes several changes:
  * At disassembly stage jumps past the end of a function are converted
    into 'nops'. This is done only for cases when we can guarantee that
    the jump is not a tail call. Conversion to nop is required since the
    instruction could be referenced either by exception handling
    tables and/or debug info. Nops are later removed.
  * In CFG insert 'ret' into non-terminated basic blocks without
    successors (this almost never happens).
  * Conditional jumps at the end of the function are removed from
    CFG. The block will still have a single successor.
  * Cases where a destination of a jump instruction is the start
    of the next function, are still conservatively handled as
    (conditional) tail calls.

(cherry picked from FBD4655046)
The file was addedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was addedbolt/Passes/HFSort.cpp
The file was addedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/ReorderAlgorithm.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/Passes/BinaryPasses.h
Commit 0acba2bcf0cfbcc60755d9bae45b328bf905d435 by maks
[BOLT] Detect unmarked data in text.

Summary:
Sometimes a code written in assembly will have unmarked data (such as
constants) embedded into text.

Typically such data falls into a "padding" address space of a function.

This diffs detects such references, and adjusts the padding space to
prevent overwriting of code in data.

Note that in relocation mode we prefer to overwrite the original code
(-use-old-text) and thus cannot simply ignore data in text.

(cherry picked from FBD4662780)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit fed0980139e4ee4d0de1316f2908942619f21fad by maks
[BOLT] Update tests

Summary:
Fix validateCFG to handle BBs that were generated from code that used
_builtin_unreachable().
Add -verify-cfg option to run CFG validation after every optimization
pass.

(cherry picked from FBD4641174)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 98737b34bb987b9799a1bc6e821153b3f5dc3538 by maks
[BOLT] Fix verbose output.

Summary:
Inadvertently, output of BOLT became way too verbose. Discovered while
building HHVM on master.

(cherry picked from FBD4669881)
The file was modifiedbolt/RewriteInstance.cpp
Commit f4825ea4171ae557b8ef2db1c401d4eb9086b674 by maks
[BOLT] Fix gcc5 build.

Summary: A <numeric> include is required for gcc5 build.

(cherry picked from FBD4671953)
The file was modifiedbolt/BinaryPassManager.cpp
Commit 2e5c2e689f750587d3a4f3f73a39fd93628c7692 by maks
Fix hfsort callgraph stats, add hfsort test.

Summary:
The stats for call sites that are not included in the call graph were broken.
The intention is to count the total number of call sites vs. the number of call sites that are ignored because they have targets that are not BinaryFunctions.

Also add a new test for hfsort.

(cherry picked from FBD4668631)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 351af0c895bcebfa82fb15b3b6451901c6c6c0f0 by maks
[BOLT] Do not process empty functions.

Summary:
While running on a recent test binary BOLT failed with an error. We were
trying to process '__hot_end' (which is not really a function), and asserted
that it had no basic blocks.

This diff marks functions with empty basic blocks list as non-simple since
there's no need to process them.

(cherry picked from FBD4696517)
The file was modifiedbolt/BinaryFunction.cpp
Commit 559a57a18186174322b361d0918eadb7e7c10022 by maks
[BOLT] Improve dynostats output.

Summary:
Reduce verbosity of dynostats to make them more readable.

  * Don't print "before" dynostats twice.
  * Detect if dynostats have changed after optimization and print
    before/after only if at least one metric have changed. Otherwise
    just print dynostats once and indicate "no change".
  * If any given metric hasn't changed, then print the difference as
    "(=)" as opposed to (+0.0%).

(cherry picked from FBD4705920)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 6cfd7ac2d586f3a86f0d4d0f45d291a12f20b20e by maks
[BOLT] Do not overwrite starting address in non-relocation mode.

Summary:
In non-relocation mode we shouldn't attemtp to change ELF
entry point.

What made matters worse - it broke '-max-funcs=' and '-funcs=' options
since an entry function more often than not was excluded from the list
of processed functions, and we were setting entry point to 0.

(cherry picked from FBD4720044)
The file was modifiedbolt/RewriteInstance.cpp
Commit e6f96de4d0f97afb3769e9f6575ce66643878d2a by maks
[BOLT] Add option to print only specific functions.

Summary:
Add option '-print-only=func1,func2,...' to print only functions
of interest. The rest of the functions are still processed and
optimized (e.g. inlined), but only the ones on the list are printed.

(cherry picked from FBD4734610)
The file was modifiedbolt/BinaryFunction.cpp
Commit b1ef186ca94bad96ccbb9fe39059d3f9c7085361 by maks
[BOLT] Don't allow non-symbol targets in ICP

Summary:
ICP was letting through call targets that weren't symbols.  This diff
filters out the non-symbol targets before running ICP.

(cherry picked from FBD4735358)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit ad81bd677999174d6b841f0926e484d959bac871 by maks
Change dynostats dynamic instruction count policy

Summary:
Also add LOAD/STORE counters.

(cherry picked from FBD4732284)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit d5a0264a9e678e2b62768b4c36010059b05b3706 by maks
[BOLT] Issue error in relocs mode if input is lacking relocations.

Summary:
If we specify "-relocs" flag and an input has no relocations we
proceed with assumptions that relocations were there and break the
binary.

Detect the condition above, and reject the input.

(cherry picked from FBD4761239)
The file was modifiedbolt/RewriteInstance.cpp
Commit 0bde796e50bae4d15c0e705e803e1c35bd5fa49a by maks
[BOLT] Organize options in categories for pretty printing (near NFC).

Summary:
Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory.

Use alphabetical order for options in source code (does not affect
output).

The result is a cleaner output of "llvm-bolt -help" which does not
include any unrelated llvm options and is close to the following:

.....

BOLT generic options:

  -data=<string>                                       - <data file>
  -dyno-stats                                          - print execution info based on profile
  -hot-text                                            - hot text symbols support (relocation mode)
  -o=<string>                                          - <output file>
  -relocs                                              - relocation mode - use relocations to move functions in the binary
  -update-debug-sections                               - update DWARF debug sections of the executable
  -use-gnu-stack                                       - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy)
  -use-old-text                                        - re-use space in old .text if possible (relocation mode)
  -v=<uint>                                            - set verbosity level for diagnostic output

BOLT optimization options:

  -align-blocks                                        - try to align BBs inserting nops
  -align-functions=<uint>                              - align functions at a given value (relocation mode)
  -align-functions-max-bytes=<uint>                    - maximum number of bytes to use to align functions
  -boost-macroops                                      - try to boost macro-op fusions by avoiding the cache-line boundary
  -eliminate-unreachable                               - eliminate unreachable code
  -frame-opt                                           - optimize stack frame accesses
  ......

(cherry picked from FBD4793684)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/llvm-bolt.cpp
The file was modifiedbolt/Passes/ReorderAlgorithm.cpp
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
Commit c166a8c1a721942fa3b48ad84c047050313f35a6 by maks
[BOLT] Fix debug info update for inlining.

Summary:
When inlining, if a callee has debug info and a caller does not
(i.e. a containing compilation unit was compiled without "-g"), we try
to update a nonexistent compilation unit. Instead we should skip
updating debug info in such cases.

Minor refactoring of line number emitting code.

(cherry picked from FBD4823982)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit f7d32f7e7ddb6b8a58c89cad1c04ce217121a3dd by maks
[BOLT] Detect and reject binaries built for coverage.

Summary: Don't attempt to optimize binaries built with coverage support.

(cherry picked from FBD4810330)
The file was modifiedbolt/RewriteInstance.cpp
Commit 6c5c65e3a313f913f0fe837c5f53b126e6ac36dd by maks
[BOLT] Fix double jump peephole, remove useless conditional branches.

Summary:
I split some of this out from the jumptable diff since it fixes the
double jump peephole.

I've changed the pass manager so that UCE and peepholes are not called
after SCTC.  I've incorporated a call to the double jump fixer to SCTC
since it is needed to fix things up afterwards.

While working on fixing the double jump peephole I discovered a few
useless conditional branches that could be removed as well.  I highly
doubt that removing them will improve perf at all but it does seem
odd to leave in useless conditional branches.

There are also some minor logging improvements.

(cherry picked from FBD4751875)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryPassManager.cpp
Commit a99005397f893ba271efa7364806921abaf8a244 by maks
[BOLT] Fix branch count in removeDuplicateConditionalSuccessor().

Summary:
When we merge the original branch counts we have to make sure
both of them have a profile. Otherwise set the count to COUNT_NO_PROFILE.

The misprediction count should be 0.

(cherry picked from FBD4837774)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 34c8a7c21beecc18200fd342fe9cde8802c8ce87 by maks
[BOLT] Relocation support for non-allocatable sections.

Summary:
Relocations can be created for non-allocatable (aka Note) sections.

To start using this for debug info, the emission has to be moved
earlier in the pipeline for relocation processing to kick in.

(cherry picked from FBD4835204)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/RewriteInstance.h
Commit c7cccacc4f39d9c74a0393b7916f153a6289083c by maks
[BOLT] Enable SCTC by default.

(cherry picked from FBD4837849)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPassManager.cpp
Commit 075f076503ae495107bcaa7610f86ea49c06e08f by maks
[BOLT] Don't abort on processing binaries with .gdb_index section

Summary:
While writing non-allocatable sections we had an assumption that the
size of such section is congruent to the alignment, as typically
such sections are a collections of fixed-sized elements. .gdb_index
breaks this assumption.

This diff removes the assertion that was triggered by a presence of
.gdb_index section, and makes sure that we insert a padding if we are
appending to a section with a size not congruent to section alignment.

(cherry picked from FBD4844553)
The file was modifiedbolt/RewriteInstance.cpp
Commit 13c89e6ef19b3b543ebaa6f1af8e635268eeb913 by maks
[BOLT] Fix branch data for __builtin_unreachable().

Summary:
When we have a conditional branch past the end of function (a result
of a call to__builtin_unreachable()), we replace the branch with nop,
but keep branch information for validation purposes. If that branch
has a recorded profile we mistakenly create an additional successor
to a containing basic block (a 3rd successor).

Instead of adding the branch to FTBranches list we should be adding
to IgnoredBranches.

(cherry picked from FBD4912840)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 3f42fdf7daf268880e0fb3fdc87ee4882012a6c9 by maks
[BOLT] Update function address and size in relocation mode.

Summary:
Set function addresses after code emission but before we update
debug info and symbol table entries.

(cherry picked from FBD5029609)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 3adb52d80e909eee36aaca50d8a2fc0d50a6c76b by maks
[BOLT] Update .gdb_index section.

Summary: Update address table in .gdb_index section.

(cherry picked from FBD5068255)
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/DebugData.h
Commit 69b586326c2b3ff836c84c747837d840a67b7216 by maks
[BOLT] Support adding new non-allocatable sections.

Summary:
We had the ability to add allocatable sections before. This diff
expands this capability to non-allocatable sections.

(cherry picked from FBD5082018)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/RewriteInstance.h
Commit c789d5137b40d68f30801f37caa516849e837579 by maks
[BOLT] Add option to keep/generate .debug_aranges.

Summary:
GOLD linker removes .debug_aranges while generating .gdb_index.
Some tools however rely on the presence of this section.
Add an option to generate .debug_aranges if it was removed,
or keep it in the file if it was present.

Generally speaking .debug_aranges duplicates information present
in .gdb_index addresses table.

(cherry picked from FBD5084808)
The file was modifiedbolt/DWARFRewriter.cpp
Commit 4806b1383591eab7139b65a25d80a70791ff6fd9 by maks
[BOLT] Add jump table support to ICP

Summary:
Add jump table support to ICP.  The optimization is basically the same
as ICP for tail calls.  The big difference is that the profiling data
comes from the jump table and the targets are local symbols rather than
global.

I've removed an instruction from ICP for tail calls.  The code used to
have a conditional jump to a block with a direct jump to the target, i.e.

  B1: cmp foo,(%rax)
      jne B3
  B2: jmp foo
  B3: ...

this code is now:

  B1: cmp foo,(%rax)
      je  foo
  B2: ...

The other changes in this diff:
- Move ICP + new jump table support to separate file in Passes.
- Improve the CFG validation to handle jump tables.
- Fix the double jump peephole so that the successor of the modified
  block is updated properly.  Also make sure that any existing branches
  in the block are modified to properly reflect the new CFG.
- Add an invocation of the double jump peephole to SCTC.  This allows
  us to remove a call to peepholes/UCE occurring after fixBranches() in
  the pass manager.
- Miscellaneous cleanups to BOLT output.

(cherry picked from FBD4727757)
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was addedbolt/Passes/IndirectCallPromotion.h
The file was addedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 457b7f14b9e07850b791d7df5fff0171cabd5315 by maks
[BOLT] Fix debug info for input with continuous range.

Summary:
When we see a compilation unit with continuous range on input,
it has two attributes: DW_AT_low_pc and DW_AT_high_pc. We convert the
range to a non-continuous one and change the attributes to
DW_AT_ranges and DW_AT_producer. However, gdb seems to expect
every compilation unit to have a base address specified via
DW_AT_low_pc, even when its value is always 0. Otherwise gdb will
not show proper debug info for such modules.

With this diff we produce DW_AT_ranges followed by DW_AT_low_pc.
The problem is that the first attribute takes DW_FORM_sec_offset
which is exactly 4 bytes, and in many cases we are left with
12 bytes to fill in. We used to fill this space with DW_AT_producer,
which took an arbitrary-length field. For DW_AT_low_pc we can
use a trick of using DW_FORM_udata (unsigned ULEB128 encoded
integer) which can take up to 12 bytes, even when the value is 0.

(cherry picked from FBD5109798)
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/DebugData.h
Commit 511a1c78b2cd5aa35b1e8d13a53855034dce1bbf by maks
[BOLT] Add dataflow infrastructure

Summary:
This diff introduces a common infrastructure for performing
dataflow analyses in BinaryFunctions as well as a few analyses that are
useful in a variety of scenarios. The largest user of this
infrastructure so far is shrink wrapping, which will be added in a
separate diff.

(cherry picked from FBD4983671)
The file was addedbolt/Passes/DataflowInfoManager.cpp
The file was addedbolt/Passes/FrameAnalysis.cpp
The file was addedbolt/Passes/StackPointerTracking.h
The file was addedbolt/Passes/DataflowInfoManager.h
The file was addedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was addedbolt/Passes/DominatorAnalysis.h
The file was modifiedbolt/Passes/CMakeLists.txt
The file was addedbolt/Passes/DataflowAnalysis.cpp
The file was addedbolt/Passes/ReachingInsns.h
The file was addedbolt/Passes/LivenessAnalysis.h
The file was addedbolt/Passes/FrameAnalysis.h
The file was addedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/BinaryFunction.h
The file was addedbolt/Passes/LivenessAnalysis.cpp
The file was addedbolt/Passes/StackPointerTracking.cpp
Commit 96adec51eb703c4ab79999ac907548ec67dc7493 by maks
[BOLT] Rework debug info processing.

Summary:
Multiple improvements to debug info handling:
  * Add support for relocation mode.
  * Speed-up processing.
  * Reduce memory consumption.
  * Bug fixes.

The high-level idea behind the new debug handling is that we don't save
intermediate state for ranges and location lists. Instead we depend
on function and basic block address transformations to update the info
as a final post-processing step.

For HHVM in non-relocation mode the peak memory went down from 55GB to 35GB. Processing time went from over 6 minutes to under 5 minutes.

(cherry picked from FBD5113431)
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/DebugData.h
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.h
Commit 3a3bcd767eddb90bc06a67c0382323434883cf93 by maks
Don't add useless uncond branch to fallthroughs when running SCTC.

Summary:
SCTC was sometimes adding unconditional branches to fallthrough blocks.
This diff checks to see if the unconditional branch is really necessary, e.g.
it's not to a fallthrough block.

(cherry picked from FBD5098493)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 2ee4bbd3c1c7c091632cc1cb5085009f9e9c8cdb by maks
[BOLT] Optimize jump tables with hot entries

Summary:
This diff is similar to Bill's diff for optimizing jump tables
(and is built on top of it), but it differs in the strategy used to
optimize the jump table. The previous approach loads the target address
from the jump table and compare it to check if it is a hot target. This
accomplishes branch misprediction reduction by promote the indirect jmp
to a (more predictable) direct jmp.

  load  %r10, JMPTABLE
  cmp   %r10, HOTTARGET
  je    HOTTARGET
  ijmp  [JMPTABLE + %index * scale]

The idea in this diff is instead to make dcache better by avoiding the
load of the jump table, leaving branch mispredictions as a secondary
target. To do this we compare the index used in the indirect jmp and if
it matches a known hot entry, it performs a direct jump to the target.

  cmp  %index, HOTINDEX
  je   CORRESPONDING_TARGET
  ijmp [JMPTABLE + %index * scale]

The downside of this approach is that we may have multiple indices
associated with a single target, but we only have profiling to show
which targets are hot and we have no clue about which indices are hot.

  INDEX    TARGET
  0        4004f8
  8        4004f8
  10       4003d0
  18       4004f8

  Profiling data:
  TARGET   COUNT
  4004f8   10020
  4003d0   17

In this example, we know 4004f8 is hot, but to make a direct call to it
we need to check for indices 0, 8 and 18 -- 3 comparisons instead of 1.

Therefore, once we know a target is hot, we must generate code to
compare against all possible indices associated with this target because
we don't know which index is the hot one (IF there's a hotter index).

  cmp %index, 0
  je  4004f8
  cmp %index, 8
  je  4004f8
  cmp %index, 18
  je  4004f8
  (... up to N comparisons as in --indirect-call-promotion-topn=N )
  ijmp [JMPTABLE + %index * scale]

(cherry picked from FBD5005620)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/FrameAnalysis.h
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 5cd58961a93599c051945c286921326034515e42 by maks
Add .bolt_info notes section containing BOLT revision and command line args.

Summary:
Optinally add a .bolt_info notes section containing BOLT revision and command line args.
The new section is controlled by the -add-bolt-info flag which is on by default.

(cherry picked from FBD5125890)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/llvm-bolt.cpp
Commit 174e3a825bba60a96d772c72f5e3a346e8a1707f by maks
[BOLT] Fix C++ ABI function alignment.

Summary: C++ functions have to be aligned at 2-bytes minimum on x86-64.

(cherry picked from FBD5128185)
The file was modifiedbolt/RewriteInstance.cpp
Commit 2428567f7ddbc48a07781d2479faff80635175a2 by maks
[BOLT] Fix no-assertions build.

(cherry picked from FBD5130285)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/Inliner.cpp
Commit 96943d2f4b97eabd3be3f70d2803341a51af35e9 by maks
Add option to generate function order file.

Summary: Add -generate-function-order=<filename> option to write the computed function order to a file.  We can read this order in later rather than recomputing each time we process a binary with BOLT.

(cherry picked from FBD5127915)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit 2e744e6867836d0780aca5bbc47227a60708c05c by maks
[BOLT] Emit sorted DWARF ranges and location lists.

Summary:
When producing address ranges and location lists for debug info
add a post-processing step that sorts them and merges adjacent
entries.

Fix a memory allocation/free issue for .debug_ranges section.

(cherry picked from FBD5130583)
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 35d2530a40f79acc9e97952e3c9401cd08ef15f2 by maks
[BOLT] Fix SCTC again.

Summary: Respect hot/cold boundaries when using BinaryFunction::getBasicBlockAfter().

(cherry picked from FBD5153379)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 6c32079d57a897b091596bf436e0b99c0c2908fb by maks
[BOLT] Update addresses for DW_TAG_GNU_call_site and DW_TAG_label.

Summary:
Some DWARF tags (such as GNU_call_site and label) reference instruction
addresses in the input binary. When we update debug info we need to
update these tags too with new addresses.

Also fix base address used for calculation of output addresses in
relocation mode.

(cherry picked from FBD5155814)
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 9b190cc74bbc5dc1bd9f15fb128824f845b707b4 by maks
[BOLT] Fix SCTC again again.

Summary: I put the const_cast<BinaryFunction *>(this) on the wrong version of getBasicBlockAfter().  It's on the right one now.

(cherry picked from FBD5159127)
The file was modifiedbolt/BinaryFunction.h
Commit 733e8c464fc2c9caf500ebacbd193f713e57e3e8 by maks
HFSort/call graph refactoring

Summary:
I've factored out the call graph code from dataflow and function reordering code and done a few small renames/cleanups.  I've also moved the function reordering pass into a separate file because it was starting to get big.

I've got more refactoring planned for hfsort/call graph but this is a start.

(cherry picked from FBD5140771)
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/FrameOptimizer.h
The file was modifiedbolt/Passes/FrameAnalysis.h
The file was addedbolt/Passes/ReorderFunctions.cpp
The file was addedbolt/Passes/ReorderFunctions.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was addedbolt/Passes/CallGraph.cpp
The file was modifiedbolt/Passes/HFSort.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was addedbolt/Passes/PettisAndHansen.cpp
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was addedbolt/Passes/CallGraph.h
Commit 95ab659fe4a8987924d9cdef07806da921616226 by maks
[BOLT] Do not assert on an empty location list.

Summary:
Clang generates an empty debug location list, which doesn't make sense,
but we probably shouldn't assert on it and instead issue a warning
in verbosity mode. There is only a single empty location list in the
whole llvm binary.

(cherry picked from FBD5166666)
The file was modifiedbolt/DWARFRewriter.cpp
Commit 5feee9f1d896b37ffca49ffbd23b84ac1d364ba6 by maks
[BOLT] More CG refactoring

Summary:
Do some additional refactoring of the CallGraph class.  Add a BinaryFunctionCallGraph class that has the BOLT specific bits.  This is in preparation to moving the generic CallGraph class into a library that both BOLT and HHVM can use.

Make data members of CallGraph private and add the appropriate accessor methods.

(cherry picked from FBD5143468)
The file was modifiedbolt/Passes/FrameAnalysis.h
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/CallGraph.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/CallGraph.cpp
The file was addedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/HFSort.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/FrameOptimizer.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/Passes/PettisAndHansen.cpp
The file was addedbolt/Passes/BinaryFunctionCallGraph.h
The file was modifiedbolt/Passes/ReorderFunctions.h
The file was modifiedbolt/Passes/CMakeLists.txt
Commit 382c660ee5f30f867faacb6832e61600039a7622 by maks
[BOLT] Make hfsort+ deterministic and add test case

Summary:
Make hfsort+ algorithm deterministic.
We only had a test for hfsort.  Since hfsort+ is going to be the default, I've added a test for that too.

(cherry picked from FBD5143143)
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/HFSort.cpp
Commit 4b485f4167a89076259f7d79eaed196f3a3809c2 by maks
[BOLT] Fix misc issues in relocation mode.

Summary:
Fix issues discovered while testing LTO mode with bfd linker:

  * Correctly update absolute function references from code
    with addend.
  * Support .got.plt section generated by bfd linker.
  * Support quirks of .tbss section.
  * Don't ignore functions if the size in FDE doesn't match the
    size in the symbol table. Instead keep processing using the
    maximum indicated size.

(cherry picked from FBD5178831)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
Commit d850ca36222f4ee58bee17bac22385e8c9ffa2d1 by maks
[BOLT] Add shrink wrapping pass

Summary:
Add an implementation for shrink wrapping, a frame optimization
that moves callee-saved register spills from hot prologues to cold
successors.

(cherry picked from FBD4983706)
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was addedbolt/Passes/StackAllocationAnalysis.cpp
The file was addedbolt/Passes/StackAllocationAnalysis.h
The file was addedbolt/Passes/StackReachingUses.cpp
The file was addedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryContext.h
The file was addedbolt/Passes/StackReachingUses.h
The file was addedbolt/Passes/StackAvailableExpressions.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/DominatorAnalysis.h
The file was addedbolt/Passes/ShrinkWrapping.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was addedbolt/Passes/StackAvailableExpressions.h
The file was addedbolt/Passes/AllocCombiner.cpp
The file was addedbolt/Passes/AllocCombiner.h
The file was modifiedbolt/Passes/ReachingInsns.h
The file was modifiedbolt/Passes/DataflowInfoManager.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/DataflowInfoManager.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/FrameOptimizer.h
Commit 2c2309429905bcfa25fa0ebd50852c7fd2a7a637 by maks
Split FrameAnalysis and improve LivenessAnalysis

Summary:
Split FrameAnalysis into FrameAnalysis and RegAnalysis, since
some optimizations only require register information about functions,
not frame information. Refactor callgraph walking code into the
CallGraphWalker class, allowing any analysis that depend on the call
graph to easily traverse it via a visitor pattern. Also fix
LivenessAnalysis, which was broken because it was not considering
registers read into callees and incorporating this into caller.

(cherry picked from FBD5177901)
The file was modifiedbolt/Passes/FrameAnalysis.h
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/StackReachingUses.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/Passes/CMakeLists.txt
The file was addedbolt/Passes/CallGraphWalker.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/StackAvailableExpressions.cpp
The file was modifiedbolt/Passes/FrameOptimizer.h
The file was addedbolt/Passes/CallGraphWalker.h
The file was modifiedbolt/Passes/DataflowInfoManager.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/StackAvailableExpressions.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was addedbolt/Passes/RegAnalysis.h
The file was addedbolt/Passes/RegAnalysis.cpp
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/Passes/DataflowInfoManager.cpp
Commit f9436bc9033d3c81b7fdaad30ae6ae2630508321 by maks
[BOLT] Fix ELF inter-section references

Summary:
Since we are stripping non-allocatable relocation sections from
the binary and adding new sections it changes section indices
in the binary. Some sections refer to other sections by their index
which is stored in sh_link or sh_info field. Hence we need to update
these field.

In the past update of indices was done ad-hoc and as we started
adding more complex updates to section header table the update
mechanism became broken in some cases. As a result, we were putting
wrong indices into sh_link/sh_info.

The broken case was discovered while investigating a problem with
a stripped BOLTed binary. In BOLTed binary .rela.plt was incorrectly
pointing to one of the debug sections and strip command removed
the debug section together with .rela section that was referencing it.

The new update mechanism computes complete old to new section index
mapping and updates sh_link/sh_info fields based on the mapping
before writing section header entries into an output file.

(cherry picked from FBD5207378)
The file was modifiedbolt/RewriteInstance.cpp
Commit 8eaa2fdd9f7674ab5ac924b753c60e6667880dcd by maks
[BOLT] Fix hfsort+ crash when no perf data is present.

Summary: hfsort+ was trying to access the back() of an empty vector when no perf data is present.  Just add a guard around that code.

(cherry picked from FBD5206962)
The file was modifiedbolt/Passes/HFSortPlus.cpp
Commit 2baa4c7a2c8b520e4dfa4f7f84ce77ea6c5c9fa0 by maks
[BOLT] Only print stats when requested

Summary:
Make LLVM timers only output numbers when the -time-opts option
is used.

(cherry picked from FBD5212221)
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/DominatorAnalysis.h
The file was modifiedbolt/Passes/CallGraphWalker.cpp
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/StackReachingUses.h
The file was modifiedbolt/Passes/StackAvailableExpressions.h
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/ReachingInsns.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/Passes/RegAnalysis.cpp
The file was modifiedbolt/Passes/StackAllocationAnalysis.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/Passes/CallGraphWalker.h
Commit 583790ee22f8334895e1fd05c79841e7fe3b4add by maks
Fix dynostats for conditional tail calls

Summary:
Don't treat conditional tail calls as branches for dynostats. Count
taken conditional tails calls as calls. Change SCTC to report dynamic
numbers after it is done.

(cherry picked from FBD5203708)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit ea5306628750edf058ef2402b65cfbb2b8279436 by maks
[BOLT] Fix hfsort+ caching mechanism

Summary:
There's good news and bad news.

The good news is that this fixes the caching mechanism used by hfsort+ so that we always get the correct end results, i.e. the order is the same whether the cache is enabled or not.
The bad news is that it takes about the same amount of time as the original to run. (~6min)
The good news is that I can make some improvements on this implementation which I'll put up in another diff.

The problem with the old caching mechanism is that it was caching values that were dependent on adjacent sets of clusters.  It only invalidated the clusters being merged and none of other clusters that might have been affected.  This version computes the adjacency information up front and updates it after every merge, rather than recomputing it for each iteration.  It uses the adjacency data to properly invalidate any cached values.

(cherry picked from FBD5203023)
The file was modifiedbolt/Passes/CallGraph.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/PettisAndHansen.cpp
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/CallGraph.cpp
The file was modifiedbolt/Passes/HFSort.cpp
Commit eb63a0b295eac25fc11cde2f96533976254661b8 by maks
[BOLT] Expand BOLT report for basic block ordering

Summary:
Add a new positional option onto bolt: "-print-function-statistics=<uint64>"
which prints information about block ordering for requested number of functions.

(cherry picked from FBD5105323)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit eeea415dd2ebb884ba952fcbd30c3105e7f32b7b by maks
[BOLT] Fix SCTC execution count assertion

Summary:
SCTC is currently asserting (my fault :-) when running in
combination with hot jump table entries optimization. This optimization
sets the frequency for edges connecting basic blocks it creates and jump
table targets based on the execution count of the original BB containing
the indirect jump.

This is OK as an estimation, but it breaks our assumption that the sum of
the frequency of preds edges equals to our BB frequency. This happens
because the frequency of the BB is rarely equal to its outgoing edges
frequency.

SCTC, in turn, was updating the execution count for BBs with tail calls
by subtracting the frequency count of predecessor edges. Because hot
jump table entries optimization broke the BB exec count = sum(preds freq)
invariant, SCTC was asserting.

To trigger this, the input program must have a jump table where each
entry contains a tail call. This happens in the HHVM binary for func
_ZN4HPHP11collections5issetEPNS_10ObjectDataEPKNS_10TypedValueE.

(cherry picked from FBD5222504)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit f819f53d27413fad24723606a02410722a48ae21 by maks
Normalize Clusters Twice

Summary:
This one will normalize cluster twice, leaving edges connecting two
basic block untouched

(cherry picked from FBD5207416)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/ReorderAlgorithm.cpp
The file was modifiedbolt/BinaryBasicBlock.h
Commit dc4dd648002b15538acd2e393b5bdf72b27f2cfc by maks
[BOLT] More HFSort+ refactoring

Summary: Move most of hfsort+ into a class so the state can more easily be shared.

(cherry picked from FBD5216206)
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/CallGraph.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/PettisAndHansen.cpp
Commit 37d0f81df500afb69c4fec71d8e49e5794624ee5 by maks
BinaryFunction.h: Clarify commet for getSize(), add getNumNonPseudos()

Summary: Minor fix and add new function

(cherry picked from FBD5270376)
The file was modifiedbolt/BinaryFunction.h
Commit 8233c7d204fcaf249f14a895b08426fad6ea8514 by maks
[BOLT] Bail frame analysis on PUSHes escaping vars

Summary:
Some PUSH instructions may contain memory addresses pushed to
the stack. If this memory address is from an object in the stack, cancel
further frame analysis for this function since it may be escaping a
variable.

This fixes a bug with deleting used stores (in frameopt) in hhvm trunk.

(cherry picked from FBD5270590)
The file was modifiedbolt/Passes/RegAnalysis.cpp
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
Commit 59e90f0f43cb38b4ac200b61111129a9e9a5c3f1 by maks
[BOLT] Make function reordering more robust with stale data.

Summary:
Rewrote the guts of buildCallGraph.  There are two new options to control how the CG is created.  UsePerfData controls whether we use the perf data directly to construct the CG for functions with a stale profile.  IgnoreRecursiveCalls omits recursive calls from the CG since they might be skewing results unfairly for heavily recursive functions.

I've changed the way BinaryFunction::estimateHotSize() works.  If the function is marked as split, I count the size of all the non-cold blocks.  This gives a different but more accurate answer than the old method.

I've improved and updated the CG build stats with extra information.

(cherry picked from FBD5224183)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.h
Commit 3469396269318b409bcd4ffcae154fe5f6de8764 by maks
[BOLT] Set local symbols in relocation mode to zero

Summary:
Strobelight is getting confused by local symbols that we do not
update in relocation mode. These symbols were preserved by the linker in
relocation mode in order support emitting relocations against local
labels, but they are unused.

Issue a quick fix to this by detecting such symbols and setting their
value to zero.

This patch also fixes an issue with the symbol table that was assigning
the wrong section index to symbols associated with the .text section.

(cherry picked from FBD5271277)
The file was modifiedbolt/RewriteInstance.cpp
Commit ec304396c3e5b57a84aa26a99b6de70eab3637e6 by maks
[BOLT] Call Distance Metric

Summary:
Designed a new metric, which shows 93.46% correltation with Cache Miss
and 86% correlation with CPU Time.

Definition:

One can get all the traversal path for each function. And for each traversal,
we will define a distance. The distance represents how far two connected
basic blocks are. Therefore, for each traversal, I will go through the
basic blocks one by one, until the end of the traversal and sum up the
distance for the neighboring basic blocks.
Distance between two connected basic blocks is the distance of the
centers of two blocks in the binary file.

(cherry picked from FBD5242526)
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/CalcCacheMetrics.h
The file was addedbolt/CalcCacheMetrics.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit 4ecd3856e9808e472026b8830c2f31f02a736861 by maks
[BOLT] Fix shrink-wrapping bugs

Summary:
Make shrink-wrapping more stable. Changes:

* Correctly detect landing pads at the dominance frontier, bailing
  on such cases because we are not prepared to split LPs that are target
  of a critical edge.
* Disable FOP's store removal by default - this is experimental and
  shouldn t go to prod because removing a store that we failed to detect
  it's actually necessary is disastrous. This pass currently doesn't
  have a great impact on the number of stores reduced, so it is not a
  problem. Most stores reduced are due shrink wrapping anyway.
* Fix stack access identification - correctly estimate memory length of
  weird instructions, bail if we don't know.
* Make rules for shrink-wrapping more strict: cancel shrink wrapping on
  a number of cases when we are not 100% sure that we are dealing with a
  regular callee-saved register.
* Add basic block folding to SW. Sometimes when splitting critical edges
  we create a lot of redundant BBs with the same instructions, same
  successor but different predecessor. Fold all identical BBs created by
  splitting critical edges.
* Change defaults: now the threshold used to determine when to perform
  SW is more conservative, to be sure we are moving a spill to a colder
  area. This effort, along with BB folding, helps us to avoid hurting
  icache performance by indiscriminately increasing code size.

(cherry picked from FBD5315086)
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/StackReachingUses.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/StackReachingUses.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
Commit 4d34471eeb0ce8ebf8bc00fcca5ddfec18923205 by maks
[BOLT] Improved Jump-Distance Metric

Summary:
Current existing Jump-Distance Metric (Previously named Call-Distance) will ignore some traversals.
This modified version adds those missing traversals back.

The correlation remains the same: around 97% correlation with CPU and
Cache Miss (which implies that even though some traversals are ignored,
it doesn't affect correlation that much.)

(cherry picked from FBD5369653)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/CalcCacheMetrics.cpp
Commit 4e29afeb1801973a8961271d0c37765ba5c0411d by maks
[BOLT] Add cold symbols to the symbol table

Summary:
Create new .symtab and .strtab sections, so we can change their
sizes and not only patch them. Remove local symbols and add symbols to
identify the cold part of split functions.

(cherry picked from FBD5345460)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
Commit 6d845719ce08332e88a2f9183625781e0f26fb17 by maks
get analysis information of functions

Summary:
complete the StokeInfo pass,
ignore previous arc diff

(cherry picked from FBD5306863)
The file was addedbolt/Passes/StokeInfo.cpp
The file was addedbolt/Passes/StokeInfo.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/RegAnalysis.h
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryPassManager.cpp
Commit 70bad8d34db548ce459da5cc88da8ba4cc26b7ad by maks
add: get function score to find hot functions refine the dumped csv format

Summary: minor modification of the bolt stoke pass

(cherry picked from FBD5471011)
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/Passes/StokeInfo.h
Commit 787db1cf3e895ec3868657b125a44936bf7c18fd by maks
Recognize AArch64 as a valid input

Summary:
BOLT needs to be configured with the LLVM
AArch64 backend. If the backend is linked into the LLVM
library, start processing AArch64 binaries.

(cherry picked from FBD5489369)
The file was modifiedbolt/RewriteInstance.cpp
Commit 87481cb4946601fe48b7117dc6284e14ce2c1be3 by maks
[BOLT] Improve Jump-Distance Metric -- Consider Function Execution Count

Summary:
Function execution count is very important. When calculating metric, we
should care more about functions which are known to be executed.

The correlations between this metric and both CPU time is slightly improved
to be close to  96% and the correlation between this metric and Cache Miss
remains the same 96%.

Thanks the suggestion from Sergey!

(cherry picked from FBD5494720)
The file was modifiedbolt/CalcCacheMetrics.cpp
Commit eb64d03b73662a4e63fa491df6f276287a5a8c50 by maks
Reformat the register strings in the output so Stoke can parse without preprocessing.

Summary:
Minor change. Reformat the def-in, live-out register strings so that Stoke can parse
without doing preprocessing.

(cherry picked from FBD5537421)
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/Passes/StokeInfo.h
Commit d27b31ee07cd27743005c4815fc24a6c8f83124b by maks
[BOLT] Fix reading LSDA address for PIC code

Summary:
Fix a bug while reading LSDA address in PIC format. The base address was
wrong for PC-relative value. There's more work involved in making PIC
code with C++ exceptions work.

(cherry picked from FBD5538755)
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit ae409f0b277bb2954e60b63ae13dfc3dd1f51d7c by maks
[BOLT] Better match LTO functions profile.

Summary:
* Improve profile matching for LTO binaries that don't match 100%.
* Fix profile matching for '.LTHUNK*' functions.
* Add external outgoing branches (calls) for profile validation.

There's an improvement for 100% match profile and for stale LTO
profile. However, we are still not fully closing the gap with
stale profile when LTO is enabled.

(NOTE: I haven't updated all test cases yet)

(cherry picked from FBD5529293)
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.cpp
Commit e4290d083fbe2693eb22b03c2d72a099e54f89cb by maks
[BOLT] Disable last basic block assertion.

Summary:
While converting code from __builtin_unreachable() we were asserting
that a basic block with a conditional jump and a single CFG successor
was the last one before converting the jump to an unconditional one.

However, if that code was executed after a conditional tail call
conversion in the same function, the original last basic block
will no longer be the last one in the post-conversion layout.

I'm disabling the assertion since it doesn't seem worth it to add
extra checks for the basic block that used to be the last one.

(cherry picked from FBD5570298)
The file was modifiedbolt/BinaryFunction.cpp
Commit b81ff8a8fcafeccc5c0466cba88344db2fbd8ba7 by maks
[BOLT] Fix SCTC issue with hot-cold split

Summary:
SCTC was deleting an unconditional branch to a block in the
cold area because it was the next block in the layout vector. Fix the
condition to only delete such branches when source and target are in
the same allocation area (either both hot or both cold).

(cherry picked from FBD5570300)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 21c48f7d78249482cd0f8174ea20f76d9e628bdb by maks
Fix profiling for functions with multiple entry points

Summary:
Fix issue in memcpy where one of its entry points was getting
no profiling data and was wrongly considered cold, being put in the cold
region.

(cherry picked from FBD5569156)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
Commit 0c07445110407d408f4f227070ae699c9486d5b4 by maks
[BOLT] Fix printing of dyno-stats

Summary:
We used to print dyno-stats after instruction lowering
which was skewing our metrics as tail calls were no longer
recognized as calls for one thing. The fix is to control
the point at which dyno-stats printing pass is run and run
it immediately before instruction lowering. In the future we
may decide to run the pass before some other intervening pass.

(cherry picked from FBD5605639)
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPassManager.cpp
Commit 49d1f5698d5cae27863927c5b06512f18b2740ae by maks
[BOLT] PLT optimization

Summary:
Add an option to optimize PLT calls:

  -plt  - optimize PLT calls (requires linking with -znow)
    =none - do not optimize PLT calls
    =hot  - optimize executed (hot) PLT calls
    =all  - optimize all PLT calls

When optimized, the calls are converted to use GOT reference
indirectly. GOT entries are guaranteed to contain a valid
function pointer if lazy binding is disabled - hence the
requirement for linker's -znow option.

Note: we can add an entry to .dynamic and drop a requirement
for -znow if we were moving .dynamic to a new segment.

(cherry picked from FBD5579789)
The file was addedbolt/Passes/PLTCall.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was addedbolt/Passes/PLTCall.h
The file was modifiedbolt/Passes/CMakeLists.txt
Commit bd8e4b9e879ad6a084ea66fea697bce2ba6ca778 by maks
[BOLT] Support PIC-style exception tables

Summary:
Exceptions tables for PIC may contain indirect type references
that are also encoded using relative addresses.

This diff adds support for such encodings. We read PIC-style
type info table, and write it using new encoding.

(cherry picked from FBD5716060)
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit ec5b3b0a655fbe01e4f4a08b54d70a2dd2959088 by maks
[BOLT] Fix bug in SCTC

Summary:
After SCTC optimization fixDoubleJumps() was relying on CFG information
on the number of successors of a basic block. It ignored the fact that
conditional tail call had a successor outside of the function and
deleted a containing basic block.

Discovered while testing old HHVM with disabled jump tables.

(cherry picked from FBD5752903)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryPassManager.cpp
Commit 29d4f4cfac2891514548e93af94454668357d8e5 by maks
[BOLT] Ignore TLS relocations types

Summary:
No special handling is required for TLS relocations types,
and if we see them in the binary we can safely ignore those
types.

(cherry picked from FBD5853889)
The file was modifiedbolt/RewriteInstance.cpp
Commit 9df155ce116bdd5fe6d870ea9c307fd537fe6e98 by maks
[BOLT] Introduce non-LBR mode

Summary:
Add support to read profiles collected without LBR. This
involves adapting our data aggregator perf2bolt and adding support
in llvm-bolt itself to read this data.

This patch also introduces different options to convert basic block
execution count to edge count, so BOLT can operate with its regular
algorithms to perform basic block layout. The most successful approach
is the default one.

(cherry picked from FBD5664735)
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/BinaryFunction.h
Commit ef0ec9edf9c93e61119fb9af5b5c09a132c73484 by maks
[BOLT] Fix frameopt=all for gcc

Summary:
Fix two bugs. First, stack pointer tracking, the dataflow
analysis, was converging to the "superposition" state (meaning that at
this point there are multiple and conflicting states) too early in case
the entry state in the BB was "empty" AND there was an SP computation in
the block. In these cases, we need to propagate an "empty" value as well
and wait for an iteration where the input is not empty (only entry BBs
start with a non-empty well-defined value). Previously, it was
propagating "superposition", meaning there is a conflict of states in
this block, which is not true, since the input is empty and, therefore,
there is no preceding state to justify a collision of states.

Second, if SPT failed and has no idea about the stack values in a block
(if it is in the superposition state at a given point in a BB), shrink
wrapping should not attempt to insert computation into those blocks
that we do not understand what is happening. Fix it to bail on those
cases.

(cherry picked from FBD5858402)
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/StackPointerTracking.h
Commit b006d2a8604daf4f3bdb29513bec42642bcea160 by maks
[BOLT] Fix issue with exception handlers splitting

Summary:
A cold part of a function can start with a landing pad. As a
result, this landing pad will have offset 0 from the start
of the corresponding FDE, and it wouldn't get registered by
exception-handling runtime.

The solution is to use a different landing pad base address
(LPStart), such as (FDE_start - 1).

(cherry picked from FBD5876561)
The file was modifiedbolt/Exceptions.cpp
Commit 156fc73157284b69817127fba9f41b6f52c20cfc by maks
[BOLT] Fix SCTC bug

Summary:
If conditional branch has been converted to conditional tail call,
it may be considered for SCTC optimization later since it will
appear as a tail call. We have to make sure that the tail call
we are considering is not a conditional branch.

(cherry picked from FBD5884777)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 42f957bb75d9cd755dd7a1040d82550991e38763 by maks
[BOLT] Integrate perf2bolt into llvm-bolt

Summary:
Move the data aggregator logic from our python script to
our C++ LLVM/BOLT libs. This has a dramatic reduction in processing
time for profiling data (from 45 minutes for HHVM to 5 minutes) because
we directly use BOLT as a disassembler in order to validate traces found
in the LBR and to add the fallthrough counts. Previously, the python
approach relied on parsing the output objdump to check traces.

(cherry picked from FBD5761313)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/llvm-bolt.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/DataAggregator.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.h
The file was addedbolt/DataAggregator.cpp
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/DataReader.h
Commit aa05dc91c51d5b9acac3cf1c0f43ef386fd44e9f by maks
Fix SCTC bug when two pred/succ BB are in a loop.

Summary: It's possible that two basic blocks being conidered for SCTC are in a loop in the CFG.  In this case a block that is both a predecessor and a successor may have been processed and marked invalid by a previous iteration of the SCTC loop. We should skip rewriting in this case.

(cherry picked from FBD5886721)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit f32784f4cb410514ae1f57a6b43dc3fbcb9cba7f by maks
[BOLT] Ignore Clang LTO artifact file symbol

Summary:
The presence of ld-temp.o symbol is somewhat indeterministic.
I couldn't find out exactly when it's generated, it could be
related to LTO vs ThinLTO, but not always.

If the symbol is there, it could affect names of most
of functions in LTO binary. The status of the symbol
may change between the binary the profile was collected on,
and the binary BOLT is called on. As a result, we may mismatch
many function names.

It is safe to ignore this symbol.

(cherry picked from FBD5908955)
The file was modifiedbolt/RewriteInstance.cpp
Commit f02c8c29ee578a52b9710da2315e352f4036c3d3 by maks
[PERF2BOLT] Improve user messages about profiling stats

Summary:
Improve messages and color-code bad traces percentage, warning
user about a potential input binary mismatch.

(cherry picked from FBD5915934)
The file was modifiedbolt/DataAggregator.cpp
Commit 9df6dce2348d2e4976207af8e4cb3f0a29c1a7a2 by maks
[PERF2BOLT] Fix aggregator wrt new output format of perf

Summary:
Perf is now outputting one less space, which broke our previous
(flaky) assumptions about field separators when processing the output
file. Make it more resilient by accepting any number of spaces before
reading LBR entries.

(cherry picked from FBD6014941)
The file was modifiedbolt/DataAggregator.cpp
Commit f77a6acd7192947310ade878c932ec7fe05c8382 by maks
fixing sizes

Summary: In some (weird) cases, a Function is marked 'split' but doesn't contain any 'cold' basic block. In that case, the size of the last basic block of the function is computed incorrectly. Hence, this fix.

(cherry picked from FBD6012963)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryBasicBlock.h
Commit 0ed144a188d042fe729f135555be8a5c14962324 by maks
[PERF2BOLT] Check build-ids of binaries when aggregating

Summary:
Check the build-id of the input binary against the build-id of
the binary used during profiling data collection with perf, as reported
in perf.data. If they differ, issue a warning, since the user should use
exactly the same binary. If we cannot determine the build-id of either
the input binary or the one registered in the input perf.data, cancel the
build-id check but print a log message.

(cherry picked from FBD6001917)
The file was modifiedbolt/DataAggregator.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/DataAggregator.cpp
Commit 0cc2a62f6a2f3b28922e0acebded492e92058244 by maks
[BOLT] Write bolt info according to ELF spec

Summary:
Follow ELF spec for NOTE sections when writing bolt info.
Since tools such as "readelf -n" will not recognize a custom code
identifying our new note section, we use GNU "gold linker version"
note, tricking readelf into printing bolt info.

(cherry picked from FBD6010153)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
Commit 7689cf2417c5b0d42da7bd2fcc3271d7f9897a6e by maks
[BOLT] Fix bolt_info ELF note

Summary:
Small fix - align the end of the descriptor string as well,
since readelf will detect when it is not aligned and print an error
instead of printing BOLT version and command line.

(cherry picked from FBD6023643)
The file was modifiedbolt/RewriteInstance.cpp
Commit 3d3fefff46721d9a24d9f93c06e9cd4a38b98c62 by maks
[BOLT] Use 32 as the default max bytes for function alignment

Summary:
Several benchmarks (hhvm, compilers) show that 32 provides a good
balance between I-Cache performance and iTLB misses.

(cherry picked from FBD6026476)
The file was modifiedbolt/RewriteInstance.cpp
Commit 1605f07f5c55a2c8a46a57e67287c94468a4b593 by maks
[BOLT] Create symbol table entries under -hot-text if they did not exist

Summary:
If "-hot-text" options is specified and the input binary did not
have __hot_start/__hot_end symbols, then add them to the symbol table.

(cherry picked from FBD6027737)
The file was modifiedbolt/RewriteInstance.cpp
Commit bee9132a54b48deb5932530ee660b5314664a8c7 by maks
[BOLT] Change function order file format for linker script

Summary:
Change output of "-generate-function-order=<file>" to match expected
format used for a linker script:

  * Prefix function names with ".text".
  * Strip internal suffix from local function names. E.g. for function
    with names "foo/1" and "foo/foo.c/1" we will only output "foo".
  * Output (with indentation) duplicate names for folded functions.

(cherry picked from FBD6071020)
The file was modifiedbolt/Passes/ReorderFunctions.cpp
Commit 4c8f48be3d3b7968be47dffe7103c0c694730aa6 by maks
[BOLT] Fix function order output option

Summary:
Add support to output both function order and section order files
as the former is useful for offloading functions sorting and
the latter is useful for linker script generation:

  -generate-function-order=<file>
  -generate-link-sections=<file>

(cherry picked from FBD6078446)
The file was modifiedbolt/Passes/ReorderFunctions.cpp
Commit b77172ce2f42769631f24164a80c6b4453ebe42d by maks
updating cache metrics

Summary:
This is a replacement of a previous diff. The implemented metric
('graph distance') is not very useful at the moment but I plan to add
more relevant metrics in the subsequent diff. This diff fixes some
obvious problems and moves the call of CalcMetrics::printAll to the
right place.

(cherry picked from FBD6072312)
The file was modifiedbolt/RewriteInstance.cpp
The file was removedbolt/CalcCacheMetrics.cpp
The file was removedbolt/CalcCacheMetrics.h
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/CacheMetrics.cpp
The file was addedbolt/CacheMetrics.h
Commit 1e1833c8a2c51041cbf2511a8fecc340ecbdcf2e by maks
[BOLT][Refactoring] Make CTC first class operand, etc.

Summary:
This diff is a preparation for decoupling function disassembly,
profile association, and CFG construction phases.

We used to have multiple ways to mark conditional tail calls with
annotations or TailCallOffsets map. Since CTC information is affecting
the correctness, it is justifiable to have it as a operand class for
instruction with a destination (0 is a valid one).

"Offset" annotation now replaces "EdgeCountData" and
"IndirectBranchData" annotations to extract profile data for any
given instruction.

Inlining for small functions was broken in a presence of
profiled (annotated) instructions and hence I had to remove
"-inline-small-functions" from the test case.

Also fix an issue with UNDEF section for created __hot_start/__hot_end
symbols. Now the symbols use ABS section.

(cherry picked from FBD6087284)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/Inliner.cpp
Commit 2ab74723291a7573cde1c867eb306c0d8419ee7b by maks
[BOLT] Account for FDE functions when calculating max function size

Summary:
When we calculate maximum function size we only used to rely on the
symbol table information, and ignore function info coming from FDEs.
Invalid maximum function size can lead to code emission over
the code of neighbouring function.

Fix this by considering FDE functions when determining the maximum
function size.

(cherry picked from FBD6025613)
The file was modifiedbolt/RewriteInstance.cpp
Commit c58996fd559f282988432f95adc177a6b55781e1 by maks
[BOLT] Add ability to specify custom printers for annotations.

Summary:
This will give us the ability to print annotations in a more meaningful way.  Especially annotations that could be interpreted in multiple ways.  I've added one register name printer for liveness analysis.  We can update the other dataflow annotations as needed.

I also noticed that BitVector annotations were leaking since they contain heap allocated memory.  I made removeAnnotation call the annotation destructor explicitly to mitigate this but it won't fix the problem when annotations are just dropped en masse.

(cherry picked from FBD6105999)
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/Passes/DataflowAnalysis.cpp
Commit 61e5fbf8c34e0806f7cb0165ab385bbbb5c6d81e by maks
[BOLT][Refactoring] Get rid of TailCallTerminatedBlocks, etc.

Summary:
More changes to allow separation of CFG construction and
profile assignment. Misc cleanups.

(cherry picked from FBD6158653)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 244a476a2e53cbffcccb337daec23ad414f40bf4 by maks
using offsets for CG

Summary: Arc->AvgOffset can be used for function/block ordering to distinguish between calls from the beggining of a function and calls from the end of the function. This makes a difference for large functions.

(cherry picked from FBD6094221)
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.h
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/Passes/CallGraph.cpp
The file was modifiedbolt/Passes/CallGraph.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
Commit 1288c81c9b5eaa5a3e87c67908f2904e56fea919 by maks
[BOLT][Refactoring] Change landing pads handling

Summary: Change the way we store and handle landing pads and throwers.

(cherry picked from FBD6169992)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit 9e42885d045eb09d316363bc5d74e9ad9cd9382f by maks
[BOLT] Add value profiling to BOLT

Summary:
Add support for reading value profiling info from perf data.  This diff adds support in DataReader/DataAggregator for value profiling data.  Each event is recorded as two Locations (a PC and an address/value) and a count.

For now, I'm assuming that the value profiling data is in the same file as the usual BOLT profiling data.  Collecting both at the same time seems to work.

(cherry picked from FBD6076877)
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/DataAggregator.h
Commit 46866f5fa0248520a2da699e897ea2d44cb2b523 by maks
[BOLT] Refactor branch analysis code.

Summary:
Move the indirect branch analysis code from BinaryFunction to MCInstrAnalysis/X86MCTargetDesc.cpp.

In the process of doing this, I've added an MCRegInfo to MCInstrAnalysis which allowed me to remove a bunch of extra method parameters.  I've also had to refactor how BinaryFunction held on to instructions/offsets so that it would be easy to pass a sequence of instructions to the analysis code (rather than a map keyed by offset).

Note: I think there are a bunch of MCInstrAnalysis methods that have a BitVector output parameter that could be changed to a return value since the size of the vector is based on the number of registers, i.e. from MCRegisterInfo.  I haven't done this in order to keep the diff a more manageable size.

(cherry picked from FBD6213556)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/RegAnalysis.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/LivenessAnalysis.h
Commit e838b354ce21bce707c315d06bc89f3e07aac7ce by maks
[BOLT][Refactoring] Move basic block reordering to BinaryPasses

Summary:
Refactor basic block reordering code out of the BinaryFunction.

BinaryFunction::isSplit() is now checking if the first and the last
blocks in the layout belong to the same fragment. As a result,
it no longer returns true for functions that have their cold part
optimized away.

Change type for returned "size" from unsigned to size_t.

Fix lines over 80 characters long.

(cherry picked from FBD6250825)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 0b967eb01262e88d30dfbcb9a7074f1083b4398e by maks
[BOLT] Always call fixBranches in relocation mode.

Summary:
If you attempted to use a function filter on a binary when in relocation mode, the resulting binary would probably crash.  This is because we weren't calling fixBranches on all functions.  This was breaking bughunter.sh

I also strengthened the validation of basic blocks.  The cond branch should always be non-null when there are two successors.

(cherry picked from FBD6261930)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 848cb78080ad5c07515745c8dc90a884451cc2c7 by maks
[BOLT] Fix BOLT build

Summary: The latest change to MCInstrAnalysis broke then clang build.  This fixes it.

(cherry picked from FBD6262308)
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
Commit 19fea927927634b255176d070b76aebf31cb909e by maks
improving hfsort+ algorithm

Summary:
A few improvements for hfsort+ algorithm. The goal of the diff is (i) to make the resulting function order more i-cache "friendly" and (ii) fix a bug with incorrect input edge weights. A specific list of changes is as follows:
- The "samples" field of CallGraph.Node should be at least the sum of incoming edge weights. Fixed with a new method CallGraph::adjustArcWeights()
- A new optimization pass for hfsort+ in which pairs of functions that call each other with very high probability (>=0.99) are always merged. This improves the resulting i-cache but may worsen i-TLB. See a new method HFSortPlus::runPassOne()
- Adjusted optimization goal to make the resulting ordering more i-cache "friendly", see HFSortPlus::expectedCalls and HFSortPlus::mergeGain
- Functions w/o samples are now reordered too (they're placed at the end of the list of hot functions). These functions do appear in the call graph, as some of their basic blocks have samples in the LBR dataset. See HfSortPlus::initializeClusters

(cherry picked from FBD6248850)
The file was modifiedbolt/Passes/CallGraph.h
The file was modifiedbolt/Passes/CallGraph.cpp
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/Passes/HFSort.h
Commit fe6e9b4ab5b439b41c8e0c215529b4a8218b4778 by maks
[BOLT-AArch64] Support rewriting bzip2

Summary:
Add basic AArch64 read/write capability to be able to
disassemble bzip2 for AArch64 compiled with gcc 5.4.0 and write
it back after going through the basic BOLT pipeline with no block
reordering (NOPs/unreachable blocks get removed).

This is not for relocation mode.

(cherry picked from FBD5701994)
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 76d7740cc9a0d7828cfed55f3fc95937821b5cbc by maks
[BOLT-AArch64] Support reordering bzip2 no relocs

Summary:
Add functionality to support reordering bzip2 compiled to
AArch64, with function splitting but without relocations:

* Expand the AArch64 backend to support inverting branches and
analyzing branches so BOLT reordering machinery is able to shuffle
blocks and fix branches correctly;
* Add a new pass named LongJmp to add stubs whenever code needs to
jump to the cold area, when using function splitting, because of the
limited target encoding capability in AArch64 (as a RISC architecture).

(cherry picked from FBD5748184)
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was addedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was addedbolt/Passes/LongJmp.h
The file was modifiedbolt/RewriteInstance.cpp
Commit 624b2d984a9e2c1bcd9f35775da62c8be476b628 by maks
[BOLT-AArch64] Support relocation mode for bzip2

Summary:
As we deal with incomplete addresses in address-computing
sequences of code in AArch64, we found it is easier to handle them in
relocation mode in the presence of relocations.

Incomplete addresses may mislead BOLT into thinking there are
instructions referring to a basic block when, in fact, this may be the
base address of a data reference. If the relocation is present, we can
easily spot such cases.

This diff contains extensions in relocation mode to understand and
deal with AArch64 relocations. It also adds code to process data inside
functions as marked by AArch64 ABI (symbol table entries named "$d").
In our code, this is called constant islands handling. Last, it extends
bughunter with a "cross" mode, in which the host generates the binaries
and the user test them (uploading to the target), useful when debugging
in AArch64.

(cherry picked from FBD6024570)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/LongJmp.h
Commit 69ddcfa5cb277da2c818f0a888af0c573d820831 by maks
[BOLT] Fix implementation for TSP solution

Summary:
Fix a bug in reconstruction of an optimal path. When calculating the
best path we need to take into account a path from new "last" node
to the current last node.

Add "-tsp-threshold" (defaults to 10) to control when the TSP
algorithm should be used.

(cherry picked from FBD6253461)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/ReorderAlgorithm.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/BinaryBasicBlock.h
Commit dd6ecdd782408de3e6e2b8b2735ddd8d92e358dd by maks
[BOLT-AArch64] Support reordering spec06 gcc relocs

Summary:
Enhance the basic infrastructure for relocation mode for
AArch64 to make a reasonably large program work after reordering (gcc).

Detect jump table patterns and skip optimizing functions with jump
tables in AArch64, as those will require extra future effort to fully
decode. To make these work in relocation mode, we skip changing
the function body and introduce a mode to preserve even the original
nops. By not changing any local offsets in the function, the input
original jump tables should just work.

Functions with no jump tables are optimized with BB reordering. No other
optimizations have been tested.

(cherry picked from FBD6130117)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/BinaryContext.h
Commit a0c041f72a56bd79ccbc99f914a8bf86895b836e by maks
[BOLT] Custom function alignment

Summary:
A new 'compact' function aligner that takes function sizes in consideration. The approach is based on the following assumptions:
-- It is not desirable to introduce a large offset when aligning short functions, as it leads to a lot of "wasted" address space.
-- For longer functions, the offset can be larger than the default 32 bytes; However, using 64 bytes for the offset still worsen performance, as again a lot of address space is wasted.
-- Cold parts of functions can still use the default max-32 offset.

The algorithm is switched on/off by flag 'use-compact-aligner' and is controlled by parameters align-functions-max-bytes and align-cold-functions-max-bytes described above. In my tests the best performance is produced with '-use-compact-aligner=true -align-functions-max-bytes=48 -align-cold-functions-max-bytes=32'.

(cherry picked from FBD6194092)
The file was modifiedbolt/BinaryPassManager.cpp
The file was addedbolt/Passes/Aligner.h
The file was modifiedbolt/BinaryFunction.h
The file was addedbolt/Passes/Aligner.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
Commit f8e6f66c1e95e3dbe00df1d25ab53112b4f86635 by maks
[BOLT] Fix segfault in debug print

Summary:
With "-debug" flag we are using a dump in intermediate state when
basic block's list is initialized, but layout is not. In new isSplit()
funciton we were checking the size() which uses basic block list,
and then we were accessing the (uninitiazed) layout.
Instead of checking size() we should be checking layout_size().

(cherry picked from FBD6277770)
The file was modifiedbolt/BinaryFunction.h
Commit e9aa6e1a33179c809414cef9b549dd7bdacf0176 by maks
[BOLT] Fix N-1'th sctc bug.

Summary:
The logic to append an unconditional branch at the end of a block that had
the condition flipped on its conditional tail was broken.  It should have
been looking at the successor to PredBB instead of BB.  It also wasn't skipping
invalid blocks when finding the fallthrough block.

This fixes the SCTC bug uncovered by @spupyrev's work on block reordering.

(cherry picked from FBD6269493)
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit a3b719e0f9fbcb47b6e6d3827c18a69cf364eebf by maks
[BOLT] Fix ASAN bugs

Summary:
Fix a leak in DEBUGRewriter.cpp and an address out of bounds
issue in edit distance calculation.

(cherry picked from FBD6290026)
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/DebugData.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 7eaaaaba96fc7dcfbcf448e0c2dfe74d959518c9 by maks
[BOLT] Add finer control of peephole pass.

Summary: Add selective control over peephole options.  This makes it easier to test which ones might have a positive effect.

(cherry picked from FBD6289659)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 0836fa7d086837106f1a6e3055cd56d32fae410e by maks
[BOLT] Fix handling of RememberState CFI

Summary:
When RememberState CFI happens to be the last CFI in a basic block, we
used to set the state of the next basic block to a CFI prior to
executing RememberState instruction. This contradicts comments in
annotateCFIState() function and also differs form behaviour of
getCFIStateAtInstr(). As a result we were getting code like the
following:

  .LBB0121166 (21 instructions, align : 1)
    CFI State : 0
    ....
      0000001a:   !CFI    $1      ; OpOffset Reg6 -16
      0000001a:   !CFI    $2      ; OpRememberState
    ....
    Successors: .Ltmp4167600, .Ltmp4167601
    CFI State: 3

  .Ltmp4167601 (13 instructions, align : 1)
    CFI State : 2
    ....

Notice that the state at the entry of the 2nd basic block is less than
the state at the exit of the previous basic block.

In practice we have never seen basic blocks where RememberState was the
last CFI instruction in the basic block, and hence we've never run into
this issue before.

The fix is a synchronization of handling of last RememberState
instruction by annotateCFIState() and getCFIStateAtInstr().
In the example above, the CFI state at the entry to the second BB will
be 3 after this diff.

(cherry picked from FBD6314916)
The file was modifiedbolt/BinaryFunction.cpp
Commit 1475c4da7116bd2cec7eeb9b12136c439d47fadc by maks
speeding up caches for hfsort+

Summary:
When running hfsort+, we invalidate too many cache entries, which leads to inefficiencies. It seems we only need to invalidate cache for pairs of clusters (Into, X) and (X, Into) when modifying cluster Into (for all clusters X).
With the modification, we do not really need ShortCache, since it is computed only once per pair of clusters.

(cherry picked from FBD6341039)
The file was modifiedbolt/Passes/HFSort.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
Commit c4d7460ed6d61febbe21a51da9b340d18e44aaaf by maks
[BOLT] Improve ICP for virtual method calls and jump tables using value profiling.

Summary:
Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites.

The basic process is the following:
1. Work backwards from the callsite to find the most recent def of the call register.
2. Work back from the call register def to find the instruction where the vtable is loaded.
3. Find out of there is any value profiling data associated with the vtable load.  If so, record all these addresses as potential vtables + method offsets.
4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address.  At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register.  The result of this execution should be the method offset.
5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset.  Make sure that this address maps to an actual function symbol.
6. Try to associate a vtable pointer with each target address in SymTargets.  If every target has a vtable, then this is almost certainly a virtual method callsite.
7. Use the vtable address when generating the promoted call code.  It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer.  Additionally, the instructions to load up the method are dumped into the cold call block.

For jump tables, the basic idea is the same.  I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table.

Note: I'm assuming the whole call is in a single BB.  According to @rafaelauler, this isn't always the case on ARM.    This also isn't always the case on X86 either.  If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call.  I'm going to leave fixing this until later since it makes things a bit more complicated.

I've also fixed a bug where ICP was introducing a conditional tail call.  I made sure that SCTC fixes these up afterwards.  I have no idea why I made it introduce a CTC in the first place.

(cherry picked from FBD6120768)
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit b2f132c7c2d0867a9d2232cd4a5bedeb8bbbed54 by maks
[RFC] [BOLT] Use iterators for MC branch/call analysis code.

Summary:
Here's an implementation of an abstract instruction iterator for the branch/call
analysis code in MCInstrAnalysis.  I'm posting it up to see what you guys think.
It's a bit sloppy with constness and probably needs more tidying up.

(cherry picked from FBD6244012)
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
Commit dc23def477069b5a0e950fb22884c4033206ce6d by maks
[PERF2BOLT] Fix aggregator wrt traces with REP RET

Summary:
Previously the perf2bolt aggregator was rejecting traces
finishing with REP RET (return instruction with REP prefix) as a
result of the migration from objdump output to LLVM disassembler,
which decodes REP as a separate instruction. Add code to detect
REP RET and treat it as a single return instruction.

(cherry picked from FBD6417496)
The file was modifiedbolt/BinaryFunction.cpp
Commit 591e0ef3ba489dfa730dfa8443e4d2d7e743075f by maks
[BOLT] Add timers for non-optimization related phases.

Summary: Add timers for non-optimization related phases.  There are two new options, -time-build for disassembling functions and building CFGs, and -time-rewrite for phases in executeRewritePass().

(cherry picked from FBD6422006)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
Commit 0bab742949a413940f8587686084f72ee1e5d431 by maks
[BOLT] Fix icp-top-callsites option, remove icp-always-on.

Summary: The icp-top-callsites option was using basic block counts to pick the top callsites while the ICP main loop was using branch info from the targets of each call.  These numbers do not exactly match up so there was a dispcrepancy in computing the top calls.  I've switch top callsites over to use the same stats as the main loop.  The icp-always-on option was redundant with -icp-top-callsites=100, so I removed it.

(cherry picked from FBD6370977)
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
Commit a71b5700c0ab2510bd2f17d5807dc1aa2aa5b3b2 by maks
[BOLT] Fix bug in shortening peephole.

Summary: The arithmetic shortening code on x86 was broken.  It would sometimes shorten instructions with immediate operands that wouldn't fit into 8 bits.

(cherry picked from FBD6444699)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
Commit 39a8c36697784b34ee8c05e37b415d23ef0b5b42 by maks
[BOLT] Use getNumPrimeOperands in shortenInstruction.

Summary: Apply maks' review comments

(cherry picked from FBD6451164)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 21eb2139ee3eb52dd71e124fff137310a649604e by maks
Introduce pass to reduce jump tables footprint

Summary:
Add a pass to identify indirect jumps to jump tables and reduce
their entries size from 8 to 4 bytes. For PIC jump tables, it will
convert the PIC code to non-PIC (since BOLT only processes static code,
it makes no sense to use expensive PIC-style jumps in static code). Add
corresponding improvements to register scavenging pass and add a MCInst
matcher machinery.

(cherry picked from FBD6421582)
The file was addedbolt/Passes/JTFootprintReduction.h
The file was addedbolt/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.cpp
The file was modifiedbolt/BinaryFunction.h
Commit 48a53a7b551ffd245d735771941c1b70199cfc5b by maks
a new i-cache metric

Summary:
The diff introduces two measures for i-cache performance: a TSP measure (currently used for optimization) and an "extended" TSP measure that takes into account jumps between non-consecutive basic blocks. The two measures are computed for estimated addresses/sizes of basic blocks and for the actually omitted addresses/sizes.

Intuitively, the Extended-TSP metric quantifies the expected number of i-cache misses for a given ordering of basic blocks. It has 5 parameters:
- FallthroughWeight is the impact of fallthrough jumps on the score
- ForwardWeight is the impact of forward (but not fallthrough) jumps
- BackwardWeight is the impact of backward jumps
- ForwardDistance is the max distance of a forward jump affecting the score
- BackwardDistance is the max distance of a backward jump affecting the score
We're still learning the "best" values for the options but default values look reasonable so far.

(cherry picked from FBD6331418)
The file was modifiedbolt/CacheMetrics.h
The file was modifiedbolt/CacheMetrics.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit cd0a075a08f90b29cd6bd37ba183c5a6a197e12b by maks
[BOLT] Fix ICP nested jump table handling and general stats.

Summary: Load elimination for ICP wasn't handling nested jump tables correctly.  It wasn't offseting the indices by the range of the nested table.  I also wasn't computing some of the stats ICP correctly in all cases which was leading to weird results in the stats.

(cherry picked from FBD6453693)
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
Commit 70d44ab20ab806f2956d8c2b465733a93db6c351 by maks
[BOLT] Add REX prefix rebalancing pass

Summary:
Add a pass to rebalance the usage of REX prefixes, moving them
from the hot code path to the cold path whenever possible. To do this, we
rank the usage frequency of each register and exchange an X86 classic reg
with an extended one (which requires a REX prefix) whenever the classic
register is being used less times than the extended one. There are two
versions of this pass: regular one will only consider RBX as classic and
R12-R15 as extended registers because those are callee-saved, which means
their scope is local to the function and therefore they can be easily
interchanged within the function without further consequences. The
aggressive version relies on liveness analysis to detect if the value of
a register is being used as a caller-saved value (written to without
being read first), which also is eligible for reallocation. However, it
showed limited results and is not the default option because it is
expensive.

Currently, this pass does not update debug info. This means that if a
substitution is made, the AT_LOCATION of a variable inside a function may
be outdated and GDB will display the wrong value if you ask it to print
the value of the affected variable. Updating DWARF involves a painful
task of writing a new DWARF expression parser/writer similar to the one
we already have for CFI expressions. I'll defer the task of writing this
until we determine this optimization is enabled in production. So far,
it is experimental to be combined with other optimizations to help us
find a new set of optimizations that is beneficial.

(cherry picked from FBD6476659)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was addedbolt/Passes/RegReAssign.h
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was addedbolt/Passes/RegReAssign.cpp
The file was modifiedbolt/Passes/JTFootprintReduction.cpp
Commit 10274633eed30ea17070bba614c860a4dd9a663b by maks
[BOLT] Options to facilitate debugging

Summary:
Some helpful options:

  -print-dyno-stats-only
    while printing functions output dyno-stats and skip instructions

  -report-stale
    print a list of functions with a stale profile

(cherry picked from FBD6505141)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit 2b9bafed836e46c5ea474ea63ef774b5a4127200 by maks
[BOLT] Consistent DFS ordering for landing pads

Summary:
The list of landing pads in BinaryBasicBlock was sorted by their address
in memory. As a result, the DFS order was not always deterministic.
The change is to store landing pads in the order they appear in invoke
instructions while keeping them unique.

Also, add Throwers verification to validateCFG().

(cherry picked from FBD6529032)
The file was modifiedbolt/BinaryFunction.cpp
Commit b6f7c68a6c9a5e97dcc84cacc117f13ca01d42f1 by maks
[BOLT] Automatically detect and use relocations

Summary:
If relocations are available in the binary, use them by default.
If "-relocs" is specified, then require relocations for further
processing. Use "-relocs=0" to forcefully ignore relocations.

Instead of `opts::Relocs` use `BinaryContext::HasRelocations` to check
for the presence of the relocations.

(cherry picked from FBD6530023)
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/Aligner.cpp
The file was modifiedbolt/Passes/RegReAssign.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit d15b93badec2ffdfc6e5a01b9386cdefb4ea59ac by maks
[BOLT] Major overhaul of profiling in BOLT

Summary:
Profile reading was tightly coupled with building CFG. Since I plan
to move to a new profile format that will be associated with CFG
it is critical to decouple the two phases.

We now have read profile right after the cfg was constructed, but
before it is "canonicalized", i.e. CTCs will till be there.

After reading the profile, we do a post-processing pass that fixes
CFG and does some post-processing for debug info, such as
inference of fall-throughs, which is still required with the current
format.

Another good reason for decoupling is that we can use profile with
CFG to more accurately record fall-through branches during
aggregation.

At the moment we use "Offset" annotations to facilitate location
of instructions corresponding to the profile. This might not be
super efficient. However, once we switch to the new profile format
the offsets would be no longer needed. We might keep them for
the aggregator, but if we have to trust LBR data that might
not be strictly necessary.

I've tried to make changes while keeping backwards compatibly. This makes
it easier to verify correctness of the changes, but that also means
that we lose accuracy of the profile.

Some refactoring is included.

Flag "-prof-compat-mode" (on by default) is used for bug-level
backwards compatibility. Disable it for more accurate tracing.

(cherry picked from FBD6506156)
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/ReorderFunctions.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/DataAggregator.h
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was addedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/llvm-bolt.cpp
Commit 67cef1f5362048d39462ceffff41f3abca529408 by maks
debug

(cherry picked from FBD28110897)
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
Commit 85f5f4fb631959ed55d818c3994e2d922a0ad948 by maks
[BOLT] Fix debugging derp

(cherry picked from FBD28110992)
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
Commit 660daac2d0489045f2bb142a9f7eb5b7737f3594 by maks
[BOLT] Fix -simplify-rodata-loads wrt data chunks with relocs

Summary:
The pass was previously copying data that would change after layout
because it had a relocation at the copied address.

(cherry picked from FBD6541334)
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 1fa80594cf8b9e709cf8f7917e176993becfb139 by maks
[BOLT] Do not assign a LP to tail calls

Summary:
Do not assign a LP to tail calls. They are not calls in the
view of an unwinder, they are just regular branches. We were hitting an
assertion in BinaryFunction::removeConditionalTailCalls() complaining
about landing pads in a CTC, however it was in fact a
builtin_unreachable being conservatively treated as a CTC.

(cherry picked from FBD6564957)
The file was modifiedbolt/Exceptions.cpp
Commit a599fe1bbc90d8151a468e23e453321abda57a24 by maks
[BOLT] a new block reordering algorithm

Summary:
A new block reordering algorithm, cache+, that is designed to optimize
i-cache performance.

On a high level, this algorithm is a greedy heuristic that merges
clusters (ordered sequences) of basic blocks, similarly to how it is
done in OptimizeCacheReorderAlgorithm. There are two important
differences: (a) the metric that is optimized in the procedure, and
(b) how two clusters are merged together.
Initially all clusters are isolated basic blocks. On every iteration,
we pick a pair of clusters whose merging yields the biggest increase
in the ExtTSP metric (see CacheMetrics.cpp for exact implementation),
which models how i-cache "friendly" a pecific cluster is. A pair of
clusters giving the maximum gain is merged to a new clusters. The
procedure stops when there is only one cluster left, or when merging
does not increase ExtTSP. In the latter case, the remaining clusters
are sorted by density.
An important aspect is the way two clusters are merged. Unlike earlier
algorithms (e.g., OptimizeCacheReorderAlgorithm or Pettis-Hansen), two
clusters, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three clusters (e.g., X1YX2,
X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the
largest score. This improves the quality of the final result (the
search space is larger) while keeping the implementation sufficiently
fast.

(cherry picked from FBD6466264)
The file was addedbolt/Passes/ReorderUtils.h
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/Passes/ReorderAlgorithm.h
The file was modifiedbolt/CacheMetrics.cpp
The file was modifiedbolt/CacheMetrics.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was addedbolt/Passes/CachePlusReorderAlgorithm.cpp
Commit f8f52d01d012404ed76ea3dca703bd0776fd3b05 by maks
[BOLT-AArch64] Support SPEC17 programs and organize AArch64 tests

Summary:
Add a few new relocation types to support a wider variety of
binaries, add support for constant island duplication (so we can split
functions in large binaries) and make LongJmp pass really precise with
respect to layout, so we don't miss stubs insertions at the correct
places for really large binaries. In LongJmp, introduce "freeze"
annotations so fixBranches won't mess the jumps we carefully determined
that needed a stub.

(cherry picked from FBD6294390)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/LongJmp.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
Commit b6cb112febefef75d479f528fc09a52b806fb333 by maks
[BOLT] New profile format

Summary:
A new profile that is more resilient to minor binary modifications.

BranchData is eliminated. For calls, the data is converted into instruction
annotations if the profile matches a function. If a profile cannot be matched,
AllCallSites data should have call sites profiles.

The new profile format is YAML, which is quite verbose. It still takes
less space than the older format because we avoid function name repetition.

The plan is to get rid of the old profile format eventually.

merge-fdata does not work with the new format yet.

(cherry picked from FBD6753747)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/PLTCall.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was addedbolt/ProfileReader.h
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/DataAggregator.cpp
The file was addedbolt/ProfileWriter.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was addedbolt/ProfileYAMLMapping.h
The file was addedbolt/ProfileWriter.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/ProfileReader.cpp
Commit 907ca25841288975bfbed5d44a3992098ce47268 by maks
[BOLT-AArch64] Support large test binary

Summary:
Rewrite how data/code markers are interpreted, so the code
can have constant islands essentially anywhere. This is necessary to
accomodate custom AArch64 assembly code coming from mozjpeg. Allow
any function to refer to the constant island owned by any other
function. When this happens, we pull the constant island from the
referred function and emit it as our own, so it will live nearby
the code that refers to it, allowing us to freely reorder functions
and code pieces. Make bolt more strict about not changing anything
in non-simple ARM functions, as we need to preserve offsets for
those functions we don't interpret their jump tables (currently
any function with jump tables in ARM is non-simple and is left
untouched).

(cherry picked from FBD6402324)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.h
Commit 2640b4071f4790832d2c3d0ea4ed944991204579 by maks
[BOLT] Refactoring - add BinarySection class

Summary: Add BinarySection class that is a wrapper around SectionRef.  This is refactoring work for static data reordering.

(cherry picked from FBD6792785)
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was addedbolt/BinarySection.cpp
The file was addedbolt/BinarySection.h
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryContext.h
Commit 89feb847ea9446afa870c6c43098fcd6061008b5 by maks
[BOLT] Refactor relocation analysis code.

Summary:
Refactor the relocation anaylsis code.  It should be a little better at validating
that the relocation value matches up with the symbol address + addend stored in the
relocation (except on aarch64).  It is also a little better at finding the symbol
address used to do the lookup in BinaryContext, rather than just using symbol
address + addend.

(cherry picked from FBD6814702)
The file was modifiedbolt/BinarySection.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinarySection.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryContext.cpp
Commit 626e977c4a346e05b6be959b6d4145fe83498e81 by maks
[BOLT] faster cache+ implementation

Summary:
Speeding up cache+ algorithm.

The idea is to find and merge "fallthrough" successors before main
optimization. For a pair of blocks, A and B, block B is the fallthrough
successor of A, if (i) all jumps (based on profile) from A goes to B
and (ii) all jumps to B are from A.
Such blocks should be adjacent in an optimal ordering, and should
not be considered for splitting. (This gives the speed up).

The gap between cache and cache+ reduced from ~2m to ~1m.

(cherry picked from FBD6799900)
The file was modifiedbolt/Passes/CachePlusReorderAlgorithm.cpp
Commit 48370744d98e1855557c91285ae3c911f6b186c0 by maks
[BOLT] Do not assert on bad data

Summary:
A test is asserting on impossible addresses coming from
perf.data, instead of just reporting it as bad data. Fix this behavior.

(cherry picked from FBD6835590)
The file was modifiedbolt/BinaryFunctionProfile.cpp
Commit 304c8ba80a097d03d174dd0aba312deb74e9666d by maks
[BOLT] Handle multiple sections with the same name

Summary: Multiple sections can have the same name, so we need to make the NameToSectionMap into a multimap.

(cherry picked from FBD6847622)
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/RewriteInstance.cpp
Commit d114ef1fa5ed182741542902982b4d128c98bf68 by maks
[BOLT] Fix profile for multi-entry functions

Summary:
When we read profile for functions, we initialize counts for entry
blocks first, and then populate counts for all blocks based
on incoming edges.

During the second phase we ignore the entry blocks because we expect
them to be already initialized. For the primary entry at offset 0 it's
the correct thing to do, since we treat all incoming branches as calls
or tail calls. However, for secondary entries we only consider external
edges to be from calls and don't increase entry count if an edge
originates from inside the function. Thus we need to update the
secondary entry basic block counts with internal edges too.

(cherry picked from FBD6836817)
The file was modifiedbolt/BinaryFunctionProfile.cpp
Commit 2b8194fa501578597bec5cab06fea0560d801001 by maks
Handle types CU list in updateGdbIndexSection

Summary:
Handle types CU list in `updateGdbIndexSection`.

It looks like the types part of `.gdb_index` isn't empty when `-fdebug-types-section` is used. So instead of aborting, we copy the part to new `.gdb_index` section.

(cherry picked from FBD6770460)
The file was modifiedbolt/DWARFRewriter.cpp
Commit 1207e1d229d7ef68741f69ac435ab11a39465fd8 by maks
[BOLT] Fix lookup of non-allocatable sections in RewriteInstance

Summary: Register all sections with BinaryContext.  Store all sections in a set ordered by (address, size, name).  Add two separate maps to lookup sections by address or by name.  Non-allocatable sections are not stored in the address->section map since they all "start" at 0.

(cherry picked from FBD6862973)
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinarySection.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.cpp
Commit 501601259b4efac4e37b0e6ff07a5273a3f77278 by maks
[BOLT] Fix branch info stats after SCTC

Summary:
SCTC was incorrectly swapping BranchInfo when reversing the branch condition.  This was wrong because when we remove the successor BB later, it removes the BranchInfo for that BB.  In this case the successor would be the BB with the stats we had just swapped.

Instead leave BranchInfo as it is and read the branch count from the false or true branch depending on whether we reverse or replace the branch, respectively.  The call to removeSuccessor later will remove the unused BranchInfo we no longer care about.

(cherry picked from FBD6876799)
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit f85264ae18f8afc09baf2613dcaa7c058b52ace6 by maks
[BOLT] Reduce the usage of "Offset" annotation

Summary:
Limiting "Offset" annotation only to instructions that actually
need it, improves the memory consumption on HHVM binary by 1GB.

(cherry picked from FBD6878943)
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryFunction.cpp
Commit 600cf0ecf609d975d8aa85ebae251b2963087365 by maks
[BOLT] Fix memory regression

Summary:
This fixes the increased memory consumption introduced in an earlier
diff while I was working on new profiling infra.

The increase came from a delayed release of memory allocated to
intermediate structures used to build CFG. In this diff we release
them ASAP, and don't keep them for all functions at the same time.

(cherry picked from FBD6890067)
The file was modifiedbolt/BinaryFunction.cpp
Commit 8a5a30156e1e9c48dedc67c2b2413f69502715df by maks
[BOLT rebase] Rebase fixes on top of LLVM Feb2018

Summary:
This commit includes all code necessary to make BOLT working again
after the rebase. This includes a redesign of the EHFrame work,
cherry-pick of the 3dnow disassembly work, compilation error fixes,
and port of the debug_info work. The macroop fusion feature is not
ported yet.

The rebased version has minor changes to the "executed instructions"
dynostats counter because REP prefixes are considered a part of the
instruction it applies to. Also, some X86 instructions had the "mayLoad"
tablegen property removed, which BOLT  uses to identify and account
for loads, thus reducing the total number of loads reported by
dynostats. This was observed in X86::MOVDQUmr. TRAP instructions are
not terminators anymore, changing our CFG. This commit adds compensation
to preserve this old behavior and minimize tests changes. debug_info
sections are now slightly larger. The discriminator field in the line
table is slightly different due to a change upstream. New profiles
generated with the other bolt are incompatible with this version
because of different hash values calculated for functions, so they will
be considered 100% stale. This commit changes the corresponding test
to XFAIL so it can be updated. The hash function changes because it
relies on raw opcode values, which change according to the opcodes
described in the X86 tablegen files. When processing HHVM, bolt was
observed to be using about 800MB more memory in the rebased version
and being about 5% slower.

(cherry picked from FBD7078072)
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/BinarySection.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/StackReachingUses.h
The file was modifiedbolt/Passes/ReachingInsns.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/Passes/CMakeLists.txt
The file was modifiedbolt/llvm-bolt.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/BinarySection.cpp
The file was modifiedbolt/DebugData.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/PLTCall.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/BinaryPassManager.h
The file was modifiedbolt/DebugData.h
The file was modifiedbolt/Passes/CallGraphWalker.cpp
The file was modifiedbolt/BinaryLoop.h
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/Passes/DominatorAnalysis.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Exceptions.h
The file was modifiedbolt/Passes/StackAvailableExpressions.h
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/Passes/StackAllocationAnalysis.h
Commit 1298d99a41532db677b3842bb2a87933794a2846 by maks
[BOLT] Limited "support" for AVX-512

Summary:
In relocation mode trap on entry to any function that has AVX-512
instructions. This is controlled by "-trap-avx512" option which is on
by default. If the option is disabled and AVX-512 instruction is seen
in relocation mode, then we abort while re-writing the binary.

(cherry picked from FBD6893165)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit a24c5543eac713c28e1047b1c3fc15b861572067 by maks
[BOLT] Improved function profile matching

Summary:
Prioritize functions with 100% name match when doing LTO "fuzzy"
name matching. Avoid re-assigning profile to a function.

(cherry picked from FBD6992179)
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/ProfileReader.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 5599c019117a0b294e6f2b68f979b5014db10e96 by maks
[BOLT] Fixes for new profile

Summary:
Do a better job of recording fall-through branches in new profile mode
(-prof-compat-mode=0). For this we need to record offsets for all
instructions that are last in the containing basic block.

Change the way we convert conditional tail calls. Now we never reverse
the condition. This is required for better profile matching.
The original approach of preserving the direction was controversial
to start with.

Add "-infer-fall-throughs" option (on by default) to allow disabling
inference of fall-through edge counts.

(cherry picked from FBD6994293)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/BinaryFunction.cpp
Commit e15623058e7942cf16cf1c2ae2d3bf83ae393dfa by maks
Cache+ speed, reduce mallocs

Summary:
Speed of cache+ by skipping mallocs on vectors.

Although this change speeds up the algorithm by 2x, this is still not
enough for some binaries where some functions have ~2500 hot basic
blocks. Hence, introduce a threshold for expensive optimizations in
CachePlusReorderAlgorithm. If the number of hot basic blocks exceeds
the threshold (2048 by default), we use a cheaper version, which is
quite fast.

(cherry picked from FBD6928075)
The file was modifiedbolt/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/CacheMetrics.cpp
Commit 6744f0dbeb81832486a9e9619d0a3f9fca542f92 by maks
[BOLT] Fix jump table placement for non-simple functions

Summary:
When we move a jump table to either hot or cold new section
(-jump-tables=move), we rely on a number of taken branches from the table
to decide if it's hot or cold. However, if the function is non-simple, we
always get 0 count, and always move the table to the cold section.
Instead, we should make a conservative decision based on the execution
count of the function.

(cherry picked from FBD7058127)
The file was modifiedbolt/BinaryFunction.cpp
Commit ddefc770b0ea21fef706a96faac57202b8a055fd by maks
[BOLT] Refactoring of section handling code

Summary:
This is a big refactoring of the section handling code.  I've removed the SectionInfoMap and NoteSectionInfo and stored all the associated info about sections in BinaryContext and BinarySection classes.  BinarySections should now hold all the info we care about for each section.  They can be initialized from SectionRefs but don't necessarily require one to be created.  There are only one or two spots that needed access to the original SectionRef to work properly.

The trickiest part was making sure RewriteInstance.cpp iterated over the proper sets of sections for each of it's different types of processing.  The different sets are broken down roughly as allocatable and non-alloctable and "registered" (I couldn't think up a better name).  "Registered" means that the section has been updated to include output information, i.e. contents, file offset/address, new size, etc.  It may help to have special iterators on BinaryContext to iterate over the different classes to make things easier.  I can do that if you guys think it is worthwhile.

I found pointee_iterator in the llvm ADT code.  Use that for iterating over BBs in BinaryFunction rather than the custom iterator class.

(cherry picked from FBD6879086)
The file was modifiedbolt/BinarySection.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/BinarySection.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/BinaryContext.cpp
Commit 6d0401ccfb2eec19f5193efc2a5292496ec33313 by maks
[BOLT/LSDA] Fix alignment

Summary:
Fix a bug introduced by rebasing with respect to aligned ULEBs.
This wasn't breaking anything but it is good to keep LDSA aligned.

(cherry picked from FBD7094742)
The file was modifiedbolt/Exceptions.cpp
Commit 32b332ad2dc637208403274b5dc0592fbd6aaeec by maks
[BOLT] Fix ShrinkWrapping bugs and enable testing

Summary:
Fix a few ShrinkWrapping bugs:

- Using push-pop mode in a function that required aligned stack
- Correctly update the edges in jump tables after splitting critical
   edges
- Fix stack pointer restores based on RBP + offset, when we change the
   stack layout in push-pop mode.

(cherry picked from FBD6755232)
The file was modifiedbolt/Passes/FrameOptimizer.h
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/AllocCombiner.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Passes/AllocCombiner.h
The file was modifiedbolt/Passes/ShrinkWrapping.h
Commit 0e4d86bf19fba5b62d6435603b79a71ee9c33ff2 by maks
[BOLT] Refactor global symbol handling code.

Summary:
This is preparation work for static data reordering.

I've created a new class called BinaryData which represents a symbol
contained in a section.  It records almost all the information relevant
for dealing with data, e.g. names, address, size, alignment, profiling
data, etc.  BinaryContext still stores and manages BinaryData objects
similar to how it managed symbols and global addresses before.  The
interfaces are not changed too drastically from before either.  There is
a bit of overlap between BinaryData and BinaryFunction.  I would have
liked to do some more refactoring to make a BinaryFunctionFragment that
subclassed from BinaryData and then have BinaryFunction be composed or
associated with BinaryFunctionFragments.

I've also attempted to use (symbol + offset) for when addresses are
pointing into the middle of symbols with known sizes.  This changes the
simplify rodata loads optimization slightly since the expression on an
instruction can now also be a (symbol + offset) rather than just a symbol.

One of the overall goals for this refactoring is to make sure every
relocation is associated with a BinaryData object.  This requires adding
"hole" BinaryData's wherever there are gaps in a section's address space.
Most of the holes seem to be data that has no associated symbol info. In
this case we can't do any better than lumping all the adjacent hole
symbols into one big symbol (there may be more than one actual data
object that contributes to a hole). At least the combined holes should
be moveable.

Jump tables have similar issues. They appear to mostly be sub-objects
for top level local symbols. The main problem is that we can't recognize
jump tables at the time we scan the symbol table, we have to wait til
disassembly. When a jump table is discovered we add it as a sub-object
to the existing local symbol. If there are one or more existing
BinaryData's that appear in the address range of a newly created jump
table, those are added as sub-objects as well.

(cherry picked from FBD6362544)
The file was modifiedbolt/Passes/JTFootprintReduction.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/DWARFRewriter.cpp
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/BinaryData.h
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was addedbolt/Relocation.h
The file was addedbolt/JumpTable.h
The file was modifiedbolt/BinarySection.h
The file was modifiedbolt/Passes/ReorderFunctions.cpp
The file was modifiedbolt/BinaryContext.h
The file was addedbolt/JumpTable.cpp
The file was addedbolt/Relocation.cpp
The file was modifiedbolt/BinarySection.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/ProfileWriter.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was addedbolt/BinaryData.cpp
The file was modifiedbolt/RewriteInstance.h
Commit d660f8b1fea7ed07c172cf0ad46238ba1e942993 by maks
[BOLT] Disassemble all functions before building CFGs

Summary:
This makes it possible to do adjustments to all functions based on
information gained during disassembly. E.g. if we detect an entry point
after the CFG for a function is constructed, we have to take a
conservative approach, and mark such function as non-simple. Now we have
this information before building the CFG. This could also be used to do
other processing/post-processing on disassembled functions that might
affect CFG construction of other functions (e.g. early detection of
functions that never return).

The drawback of this approach is that we lose cache locality and some
processing performance as a result. I've measured 5 second difference
on HHVM binary.

(cherry picked from FBD7258466)
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/RewriteInstance.cpp
Commit 6644548c743dba42a1de454b7bdab95a5104d625 by maks
[BOLTDIFF] Add a tool to audit performance differences

Summary:
This is a simple bolt-based tool that instantiates two
RewriteInstances objects and compares them. Add a method to
RewriteInstance to enable us to compare two objects. Include a mechanism
to match functions from binary 1 to binary 2 and finally print the
largest differences in profiling data from one binary to another.

(cherry picked from FBD6517076)
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/DataReader.cpp
The file was addedbolt/BoltDiff.cpp
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/llvm-bolt.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPassManager.cpp
Commit 2fe37b443519f93ce5fa26811196228d43b06a13 by maks
[BOLT] Fix remove-unused-stores in rebased bolt

Summary:
Rebased version revealed a mistake when computing the dataflow
for the "remove-unused-stores" optimization. This is disabled in prod but
it doesn't hurt to fix it, so the tests for the rebased bolt go green
again.

(cherry picked from FBD7253418)
The file was modifiedbolt/Passes/StackReachingUses.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
Commit 8c16594f2e1389e0e8f0d4b2d08fd47a49f20473 by maks
[BOLT] Fix ORC to properly update symbols

Summary:
In new ORC, the sequence of how sections are allocated and loaded is
changed. Now everything is delayed until emitAndFinalize() is called,
and all actions are supposed to happen via notification functors.
There are two functors that we pass to new ObjectLinkingLayer object.
One is used to notify when objects are loaded, and the other - once they
are finalized. We use the first one to remap sections to proper
addresses, and that's the earliest place where we can do it. However,
ORC decides to update symbols right before that, and as a result they
are updated with non-mapped values.

There are two possible fixes for that. This diff postpones the update to
the symbol table until the notifier is called. I don't know what other
tools depend on the existing sequence, and the proper fix may involve
creating a third notifier to be called before the symbol table update.

(cherry picked from FBD7280973)
The file was modifiedbolt/RewriteInstance.cpp
Commit 48ae32a33bea49a4bedf51332536e3f0aa0b03af by maks
[BOLT] Introduce MCPlus layer

Summary:
Refactor architecture-specific code out of llvm into llvm-bolt.

Introduce MCPlusBuilder, a class that is taking over MCInstrAnalysis
responsibilities, i.e. creating, analyzing, and modifying instructions.
To access the builder use BC->MIB, i.e. substitute MIA with MIB.
MIB is an acronym for MCInstBuilder, that's what MCPlusBuilder used
to be. The name stuck, and I find it better than MPB.

Instructions are still MCInst, and a bunch of BOLT-specific code still
lives in LLVM, but the staff under Target/* is significantly reduced.

(cherry picked from FBD7300101)
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/Passes/PLTCall.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryContext.h
The file was addedbolt/Target/CMakeLists.txt
The file was modifiedbolt/Passes/ReachingInsns.h
The file was modifiedbolt/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was addedbolt/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/Passes/ReachingDefOrUse.h
The file was modifiedbolt/Passes/StackAllocationAnalysis.cpp
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was addedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was addedbolt/MCPlusBuilder.cpp
The file was modifiedbolt/Passes/AllocCombiner.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/Passes/IndirectCallPromotion.h
The file was modifiedbolt/Passes/DominatorAnalysis.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/CacheMetrics.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.h
The file was addedbolt/Target/X86/CMakeLists.txt
The file was modifiedbolt/CMakeLists.txt
The file was modifiedbolt/Passes/RegAnalysis.cpp
The file was modifiedbolt/Passes/RegReAssign.cpp
The file was addedbolt/Target/AArch64/CMakeLists.txt
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/LivenessAnalysis.h
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/FrameOptimizer.cpp
The file was modifiedbolt/ProfileWriter.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was addedbolt/MCPlusBuilder.h
Commit 598a346abf305260f0bec9310bbb74054eabb628 by maks
[BOLT] Fix assertion when setting size of jump table symbol

Summary: This assertion was making sure that when we patched up symbol sizes that we wouldn't modify the size of a symbol that has already had its size set.  The issue here is that private symbols are sometimes composed of multiple objects internally (e.g. jump tables).  In this particular case a jump table started at the same address as the private data blob it was contained in.  Currently, there isn't any good way of differentiating symbols that start at the same address (except possibly using multimaps for certain data structures).  I'm hacking around it by modifying the assertion to ignore jump tables and skip setting the size when it has already been set.  This shouldn't affect any existing optimizations since the only thing that depended on sizes is data reordering and that currently ignores jump tables and private data blobs.

(cherry picked from FBD7269207)
The file was modifiedbolt/BinaryContext.cpp
Commit faacdf60801b2960c457c83ca6cfdc8deeb9b4ce by maks
[BOLT] Fix assertion when building test binary

Summary:
The binary had some unexpected ovelapping symbols:

.str.34.llvm.2944770977690351622/1 address = 0x48e9ec7, next address =
   0x48e9ed2, size = 21
PG.LC135/1 address = 0x48e9ed2, next address = 0x48e9eef, size = 29

BOLT wasn't expecting this type of overlap when generating HOLE symbols,
so it was asserting.  I've changed the code to deal with this case.

I'll need to change the reordering pass to mark these types of symbols
as unmoveable for now.

(cherry picked from FBD7304195)
The file was modifiedbolt/BinaryContext.cpp
Commit 3458e92285f965a459b61a061577fc5a9422bb19 by maks
removing compact-mode

Summary: this is not needed but makes code harder to read; hence, removing

(cherry picked from FBD7257937)
The file was modifiedbolt/BinaryFunctionProfile.cpp
Commit 0dea33737a7970083144e85b858448e957334f33 by maks
[BOLT] improvements for CFG construction

Summary:
Some improvements for CFG construction:
- getting rid of fallthrough-inferrence, as this is already
done DataAggregator;
- adjusting block counts for blocks with non-zero outgoing edges
to make sure they're not outlined;
- making sure that all functions (including non-simple ones) are
reordered and placed in the hot section.

The main goal of the diff is to make sure that constructed CFG graphs
exactly correspond to the input profile data.

(cherry picked from FBD7323205)
The file was modifiedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/Passes/HFSortPlus.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/Passes/ReorderFunctions.cpp
Commit a62f4fda4650f0758ab021ea5cbda0cd0662e385 by maks
[BOLT][Refactoring] Isolate changes to MC layer

Summary:
Changes that we made to MCInst, MCOperand, MCExpr, etc. are now all
moved into tools/llvm-bolt. That required a change to the way we handle
annotations and any extra operands for MCInst.

Any MCPlus information is now attached via an extra operand of type
MCInst with an opcode ANNOTATION_LABEL. Since this operand is MCInst, we
attach extra info as operands to this instruction. For first-level
annotations use functions to access the information, such as
getConditionalTailCall() or getEHInfo(), etc. For the rest, optional or
second-class annotations, use a general named-annotation interface such
as getAnnotationAs<uint64_t>(Inst, "Count").

I did a test on HHVM binary, and a memory consumption went down a little
bit while the runtime remained the same.

(cherry picked from FBD7405412)
The file was modifiedbolt/MCPlusBuilder.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/Passes/ShrinkWrapping.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/JTFootprintReduction.cpp
The file was addedbolt/MCPlus.h
The file was modifiedbolt/Exceptions.cpp
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/Passes/FrameAnalysis.cpp
The file was modifiedbolt/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/Passes/RegReAssign.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.cpp
The file was modifiedbolt/Passes/StokeInfo.cpp
The file was modifiedbolt/BinaryContext.cpp
The file was modifiedbolt/Passes/StackPointerTracking.h
The file was modifiedbolt/Passes/StackAllocationAnalysis.cpp
Commit 77f35bd0e9baecbe1041087fa445c8795d9cff61 by maks
[BOLT] Fix iterator issue

Summary:
Getting a forward iterator from reverse iterator was implemented
incorrectly. For some reason erase worked on it, but it's clearly wrong
and printing the instruction (before the deletion) results in an error.

(cherry picked from FBD7457457)
The file was modifiedbolt/BinaryFunction.cpp
Commit 0d729f218b34eb743cc6395c43d4f0dcab8d225b by maks
[BOLT] Fix relocation verification

Summary:
We verify that relocation information matches a value stored in a
binary, i.e. "ExtractedValue == SymbolValue + Addend". However, because
of the size of the relocation, and the fact that an addend is always
of type int64_t, we have to sign-extend the extracted value, and then we
might get mismatch in higher bits in certain scenarios. Hence, we should
only compare values that are truncated to a relocation size.

Discovered while processing hhvm binary with modified compiler flags.

(cherry picked from FBD7462559)
The file was modifiedbolt/RewriteInstance.cpp
Commit 7956da0fe8f125dcc59a05165419a6ea2655315d by maks
[BOLT] Fix CFG in BinaryFunction::eraseInvalidBBs()

Summary:
When we erase invalid/unreachable basic blocks, we have to remove them
from a list of predecessors of regular blocks, otherwise the CFG will be
left in a broken state containing references to removed basic blocks.

(cherry picked from FBD7464292)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/Passes/Inliner.cpp
The file was modifiedbolt/Passes/IndirectCallPromotion.cpp
Commit d8cf08b243bac49285075f7b700272e94b40cfdf by maks
[BOLT] Use MCPlus::getNumPrimeOperands()

Summary:
Use MCPlus::getNumPrimeOperands() to get the real number of operands
on MCInst. Alternatively, use MCInstrDesc::getNumOperands().

(cherry picked from FBD7507666)
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/Passes/AllocCombiner.cpp
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit 489e51453089e8b5e8d5e2ef16244b84598dec5d by maks
[BOLT] Improve annotations format and processing

Summary:
Change the way annotations are stored and processed.

Embed annotation type/index into immediate value stored as an operand.
This limits the effective range of values that could be stored as
annotations to 56 bits, which is still plenty for most integer types
that we use and for pointers on real systems. High 8 bits are reserved
for storing annotation type/index.

Expand the interface for general annotations to include reference to
annotations by index. The main purpose of this interface is to improve
performance of annotations that are used by heavy (>O(N)) algorithms,
such as data flow analysis.

For -frame-opt pass, new memory usage and processing times are slightly
better compared to those before refactoring.

(cherry picked from FBD7492017)
The file was modifiedbolt/MCPlusBuilder.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/MCPlus.h
The file was modifiedbolt/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/Passes/DataflowAnalysis.h
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.h
The file was modifiedbolt/Passes/BinaryPasses.cpp
Commit 7df6a6d5c6f8c4bbcc34bd36709a03f6e8bb963a by maks
[BOLT-AArch64] Fix AArch64 port - make it work with hhvm

Summary:
This diff has 3 fixes. First fixes the way relocations are read
and interpreted for AArch64, so the references are preserved correctly.
Second, it fixes constant islands to be able to live in the very first
address of a function (which means there is no code, but this function
contains just a constant island).
Third, it fixes function splitting to do not outline entry points for
AArch64. This was done because some functions may load pointers to its
internal basic blocks, issueing a short-range ADR instruction to do so
without its pair ADRP (since the size of the function is supposed to
be small). But when we move this block to a cold region, that is not
the case anymore. Since blocks with a reference are marked as entry
points, we conservatively disable outlining for them in AArch64.

(cherry picked from FBD7505067)
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/Passes/LongJmp.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Passes/BinaryPasses.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit 190693059a08913ac43a19df371f52452f32f59a by maks
[merge-fdata] Rewrite merge-fdata to use YAML format

Summary:
merge-fdata now operates on .fdata files in YAML format. The previous
format is not supported, which means that non-LBR data could not be
merged and memory data has to be merged with "cat" command.

(cherry picked from FBD7544031)
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/ProfileYAMLMapping.h
Commit 487877007254709cb5f9587c9e60a9cc273ea037 by maks
[BOLT][Cleanup] Remove branch history

Summary:
We are not using branch histories and don't have plans to.
Clean up the code.

(cherry picked from FBD7588644)
The file was modifiedbolt/DataReader.h
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/DataReader.cpp
Commit 8b049d3c7f9382e86f77576b0a6392ef6a8287a4 by maks
[BOLT] Support for non-LBR profile in YAML

Summary:
Expanded YAML profile format to support different kinds of profile
including LBR and non-LBR (and memevents in the future).

The profile now starts with a header that includes the profile
description. "profile-flags" field includes either "lbr" or "sample",
but not both at the same time. It could also include "memevent" in
addition to other flags.

For now, the only way to generate non-LBR YAML profile is through
conversion. Once task is done, it should be possible to use
perf2bolt for it.

(cherry picked from FBD7595693)
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/ProfileWriter.cpp
The file was modifiedbolt/ProfileReader.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/ProfileWriter.h
The file was modifiedbolt/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/ProfileYAMLMapping.h
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/merge-fdata/CMakeLists.txt
Commit dc12911feaf21107e76a11ee8ee4668816130c22 by maks
[BOLT] Report when operating in relocation mode

Summary:
Since BOLT can use relocations in the binary automatically, it's not
always clear if we are operating in relocation mode or not. This diff
adds "BOLT-INFO" message indicating if the relocation mode in ON.

(cherry picked from FBD7557492)
The file was modifiedbolt/RewriteInstance.cpp
Commit c13cd9084dc2c58a8ea055118a08757762a85c94 by maks
[BOLT] Fix tests

Summary:
During a rebase function hashes changed and new profile
stopped matching functions.

(cherry picked from FBD7618919)
The file was modifiedbolt/ProfileReader.cpp
Commit 120d26727a2aa8d47a33d30cb32577c25a9430dd by maks
[BOLT] Restore macro-fusion optimization

Summary:
Restore the optimization with some modifications:
  * Only enabled in relocation mode.
  * Covers instructions other than TEST/CMP.
  * Prints missed macro-fusion opportunities for input.
  * By default enabled for all hot code.
  * Without profile enabled for all code.

The new command-line option:
  -align-macro-fusion - fix instruction alignment for macro-fusion (x86 relocation mode)
      =none   - do not insert alignment no-ops for macro-fusion
      =hot    - only insert alignment no-ops on hot execution paths (default)
      =all    - always align instructions to allow macro-fusion

(cherry picked from FBD7644042)
The file was modifiedbolt/BinaryBasicBlock.cpp
The file was modifiedbolt/BinaryBasicBlock.h
The file was modifiedbolt/BinaryContext.h
The file was modifiedbolt/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/RewriteInstance.h
The file was modifiedbolt/BinaryFunction.h
Commit a30fff6e36f3505fddfddb2d8e196b46181efafe by maks
[BOLT-AArch64] Fix BOLT build on AArch64

Summary:
Whenever building BOLT in an AArch64 box, we need to make sure
we do not run tests that are excluse for x86. This diff also adds a tag
for expensive tests, so the user can disable them, which is useful when
using a memory-constrained machine to run BOLT tests. It also removes
ifdefs that caused BOLT to behave diferently when running in a non-x86
host. Finally, it changes a case where we depended on updated libstdc++
implementation for insert to make the codebase more friendly with boxes
that do not have the newer version of the lib.

(cherry picked from FBD7625715)
The file was modifiedbolt/BinaryPassManager.cpp
The file was modifiedbolt/Passes/BinaryPasses.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/Target/X86/X86MCPlusBuilder.cpp
Commit db949fc1f5f4a4c8d2647998b7bf1e8cacb4e034 by maks
[PERF2BOLT] Add support for non-LBR aggregation

Summary:
Previously, we depended on the python script perf2bolt.py whenever
operating with non-LBR data.

(cherry picked from FBD7620125)
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/DataReader.cpp
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/DataAggregator.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/ProfileReader.h
The file was modifiedbolt/DataReader.h
Commit aa91281ac3250fbae424b4166012aecdafeea880 by maks
[BOLT] improving cache metrics

Summary: Modifying parameters of block reordering algorithm that result in better performance. Additionally extending some cache-related metrics

(cherry picked from FBD7578336)
The file was modifiedbolt/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/CacheMetrics.cpp
The file was modifiedbolt/Passes/HFSortPlus.cpp
Commit d6003e94eb0efa721b6774cc514adb4b3b472ec2 by maks
[BOLT-AArch64] Fix -icf, -use-old-text and -update-debug-sections

Summary:
Refactor MCInst comparison code to support target-dependent
functionality. This was necessary because AArch64 uses MCTargetExprs
that only the AArch64 backend knows how to unpack it and compare. Also
fix a bug where a relocation against a constant island would make BOLT
create a fixed reference against a code location in a similar way to
read-only data, so when we asked to -use-old-text, the code would break
for this particular HHVM function
(_ZN5folly2io4zlib18defaultZlibOptionsEv) because the reference now
contains invalid data, since the original .text was overwritten. Finally,
fix a bug with -update-debug-sections on AArch64 where the update
loop wasn't expecting a function with zero basic blocks, which can
happen on AArch64 because some functions contain just a constant
island.

(cherry picked from FBD7679244)
The file was modifiedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/MCPlus.h
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/RewriteInstance.cpp
The file was modifiedbolt/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/BinaryFunction.cpp
The file was modifiedbolt/MCPlusBuilder.h
The file was modifiedbolt/MCPlusBuilder.cpp
Commit caad4bcf3a500cba82f1083c81d74bdd5c8c40b2 by maks
[BOLT] Fix crash while writing new profile

Summary:
New profile writer was crashing as functions were lacking a profile
flags. Fix it by requiring flags when marking function as profiled.

Generate new profile for clang. The new profile has more coverage and
results in better overall improvement from BOLT. It was generated by
merging multiple runs of:

% perf record -e cycles:u -j any,u -F32000 -- \
    ./clang bf.cpp -O2 -std=c++11 -c -o /tmp/bf.o

(cherry picked from FBD7798580)
The file was modifiedbolt/BinaryFunction.h
The file was modifiedbolt/DataAggregator.cpp
The file was modifiedbolt/BinaryFunctionProfile.cpp
The file was modifiedbolt/ProfileReader.cpp
The file was modifiedbolt/ProfileWriter.cpp
Commit 9c6f9656166a0336829242b377b2b592addc3e38 by maks
[BOLT] Getting open-source ready

Summary:
BOLT sources are being moved under tools/llvm-bolt/src
and tools/llvm-bolt will contain more files such as LICENSE.txt,
README.txt, etc.

Remove trailing white spaces from our sources.

Create llvm.patch by running

  > git diff f137ed238db11440f03083b1c88b7ffc0f4af65e include lib > \
    tools/llvm-bolt/llvm.patch

README.txt has instructions on checking out sources and applying the
patch.

(cherry picked from FBD7878380)
The file was addedbolt/LICENSE.TXT
The file was addedbolt/src/Passes/CallGraph.cpp
The file was addedbolt/README.txt
The file was removedbolt/Passes/DominatorAnalysis.h
The file was addedbolt/src/Passes/RegAnalysis.cpp
The file was addedbolt/src/DataAggregator.cpp
The file was removedbolt/Passes/IndirectCallPromotion.h
The file was removedbolt/BinaryPassManager.cpp
The file was removedbolt/llvm-bolt.cpp
The file was addedbolt/src/BinaryBasicBlock.cpp
The file was removedbolt/BinarySection.h
The file was removedbolt/DataReader.cpp
The file was removedbolt/CacheMetrics.h
The file was removedbolt/RewriteInstance.cpp
The file was addedbolt/src/Passes/JTFootprintReduction.cpp
The file was removedbolt/Passes/AllocCombiner.h
The file was removedbolt/Passes/Inliner.cpp
The file was addedbolt/src/Passes/StackPointerTracking.cpp
The file was addedbolt/src/DataAggregator.h
The file was addedbolt/src/Passes/DataflowInfoManager.h
The file was addedbolt/src/Relocation.h
The file was addedbolt/src/BinaryContext.cpp
The file was addedbolt/src/CacheMetrics.cpp
The file was addedbolt/src/Passes/BinaryPasses.cpp
The file was addedbolt/src/Passes/PettisAndHansen.cpp
The file was removedbolt/DataAggregator.cpp
The file was removedbolt/Passes/RegAnalysis.h
The file was addedbolt/src/Passes/Aligner.h
The file was removedbolt/Target/X86/X86MCPlusBuilder.cpp
The file was removedbolt/BinaryFunction.h
The file was removedbolt/ProfileWriter.h
The file was addedbolt/src/Passes/ReorderAlgorithm.cpp
The file was addedbolt/src/ProfileWriter.h
The file was removedbolt/Passes/PLTCall.h
The file was addedbolt/src/Passes/StackAllocationAnalysis.h
The file was addedbolt/src/Target/CMakeLists.txt
The file was addedbolt/src/Target/AArch64/CMakeLists.txt
The file was addedbolt/src/Passes/BinaryFunctionCallGraph.h
The file was removedbolt/BinaryData.cpp
The file was removedbolt/Passes/JTFootprintReduction.cpp
The file was removedbolt/ReorderAlgorithm.cpp
The file was removedbolt/BinaryBasicBlock.cpp
The file was removedbolt/Passes/PLTCall.cpp
The file was addedbolt/src/Passes/PLTCall.h
The file was removedbolt/Passes/StackReachingUses.cpp
The file was addedbolt/src/CacheMetrics.h
The file was removedbolt/Passes/BinaryPasses.h
The file was removedbolt/BinaryPasses.h
The file was addedbolt/src/Passes/CallGraph.h
The file was addedbolt/src/Passes/ReorderFunctions.h
The file was removedbolt/Passes/RegReAssign.h
The file was removedbolt/BinaryContext.cpp
The file was addedbolt/src/DWARFRewriter.cpp
The file was addedbolt/src/ProfileYAMLMapping.h
The file was addedbolt/src/Passes/ReorderUtils.h
The file was removedbolt/Passes/CallGraphWalker.cpp
The file was removedbolt/Passes/LivenessAnalysis.h
The file was addedbolt/src/ProfileReader.h
The file was removedbolt/DWARFRewriter.cpp
The file was removedbolt/Exceptions.h
The file was removedbolt/Passes/HFSort.cpp
The file was addedbolt/src/BinaryFunction.h
The file was removedbolt/merge-fdata/merge-fdata.cpp
The file was addedbolt/src/Passes/ReachingDefOrUse.h
The file was removedbolt/JumpTable.cpp
The file was removedbolt/Passes/FrameOptimizer.h
The file was removedbolt/DataReader.h
The file was removedbolt/Passes/FrameOptimizer.cpp
The file was addedbolt/src/Passes/HFSortPlus.cpp
The file was addedbolt/src/Passes/LivenessAnalysis.h
The file was removedbolt/Passes/ReachingInsns.h
The file was addedbolt/src/Passes/ReorderFunctions.cpp
The file was addedbolt/src/Passes/BinaryPasses.h
The file was removedbolt/Passes/StackPointerTracking.cpp
The file was addedbolt/src/Passes/FrameAnalysis.cpp
The file was addedbolt/src/RewriteInstance.h
The file was removedbolt/ProfileReader.h
The file was addedbolt/src/Passes/CallGraphWalker.cpp
The file was removedbolt/ProfileWriter.cpp
The file was addedbolt/src/BinaryContext.h
The file was removedbolt/Passes/CallGraph.cpp
The file was removedbolt/Passes/LivenessAnalysis.cpp
The file was addedbolt/src/Passes/ReachingInsns.h
The file was removedbolt/MCPlus.h
The file was removedbolt/Exceptions.cpp
The file was addedbolt/src/DebugData.cpp
The file was addedbolt/src/Exceptions.cpp
The file was removedbolt/DebugData.cpp
The file was removedbolt/Passes/BinaryPasses.cpp
The file was addedbolt/src/Passes/FrameAnalysis.h
The file was removedbolt/Passes/CMakeLists.txt
The file was addedbolt/src/Passes/HFSort.cpp
The file was removedbolt/BinarySection.cpp
The file was removedbolt/Passes/RegAnalysis.cpp
The file was removedbolt/BinaryFunction.cpp
The file was removedbolt/JumpTable.h
The file was removedbolt/Passes/RegReAssign.cpp
The file was removedbolt/BinaryBasicBlock.h
The file was removedbolt/Passes/ReorderFunctions.h
The file was removedbolt/Passes/StackAvailableExpressions.h
The file was addedbolt/src/BinarySection.cpp
The file was addedbolt/src/Passes/AllocCombiner.h
The file was addedbolt/src/Passes/DataflowAnalysis.cpp
The file was addedbolt/src/BinaryData.cpp
The file was addedbolt/src/CMakeLists.txt
The file was addedbolt/src/DebugData.h
The file was removedbolt/Passes/ShrinkWrapping.cpp
The file was addedbolt/llvm.patch
The file was removedbolt/BoltDiff.cpp
The file was addedbolt/src/Passes/IndirectCallPromotion.cpp
The file was removedbolt/Passes/StackReachingUses.h
The file was addedbolt/src/Target/X86/CMakeLists.txt
The file was addedbolt/src/merge-fdata/merge-fdata.cpp
The file was removedbolt/BinaryPassManager.h
The file was addedbolt/src/MCPlus.h
The file was removedbolt/Passes/ReorderAlgorithm.cpp
The file was removedbolt/Passes/IndirectCallPromotion.cpp
The file was addedbolt/src/Passes/RegReAssign.h
The file was removedbolt/Passes/ReorderAlgorithm.h
The file was addedbolt/src/BinaryFunctionProfile.cpp
The file was removedbolt/Passes/CachePlusReorderAlgorithm.cpp
The file was addedbolt/src/Passes/ShrinkWrapping.cpp
The file was addedbolt/src/Relocation.cpp
The file was addedbolt/src/Passes/StackAvailableExpressions.cpp
The file was removedbolt/Passes/ReorderUtils.h
The file was removedbolt/Target/AArch64/CMakeLists.txt
The file was removedbolt/Passes/StackAvailableExpressions.cpp
The file was removedbolt/Passes/DataflowInfoManager.cpp
The file was removedbolt/merge-fdata/CMakeLists.txt
The file was removedbolt/Relocation.cpp
The file was addedbolt/src/Passes/Inliner.h
The file was removedbolt/MCPlusBuilder.h
The file was removedbolt/Passes/ReachingDefOrUse.h
The file was addedbolt/src/Passes/StackReachingUses.cpp
The file was removedbolt/BinaryData.h
The file was removedbolt/Passes/LongJmp.cpp
The file was addedbolt/src/ProfileWriter.cpp
The file was addedbolt/src/Passes/IndirectCallPromotion.h
The file was addedbolt/src/DataReader.cpp
The file was removedbolt/ReorderAlgorithm.h
The file was removedbolt/Passes/JTFootprintReduction.h
The file was addedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/CMakeLists.txt
The file was addedbolt/src/DataReader.h
The file was addedbolt/src/Passes/StackAllocationAnalysis.cpp
The file was removedbolt/BinaryContext.h
The file was removedbolt/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was addedbolt/src/RewriteInstance.cpp
The file was removedbolt/Passes/BinaryFunctionCallGraph.cpp
The file was removedbolt/Passes/FrameAnalysis.cpp
The file was addedbolt/src/BinaryLoop.h
The file was removedbolt/Passes/StokeInfo.h
The file was addedbolt/src/Passes/DataflowAnalysis.h
The file was removedbolt/BinaryPasses.cpp
The file was addedbolt/src/JumpTable.h
The file was addedbolt/src/Passes/CallGraphWalker.h
The file was removedbolt/Passes/DataflowAnalysis.cpp
The file was removedbolt/Passes/StackAllocationAnalysis.h
The file was removedbolt/merge-fdata/LLVMBuild.txt
The file was addedbolt/src/Passes/AllocCombiner.cpp
The file was addedbolt/src/JumpTable.cpp
The file was removedbolt/Relocation.h
The file was addedbolt/src/Passes/PLTCall.cpp
The file was removedbolt/BinaryLoop.h
The file was removedbolt/DebugData.h
The file was removedbolt/Passes/DataflowAnalysis.h
The file was addedbolt/src/Passes/StackAvailableExpressions.h
The file was removedbolt/DataAggregator.h
The file was addedbolt/src/Passes/FrameOptimizer.h
The file was addedbolt/src/BinaryPassManager.h
The file was addedbolt/src/Passes/BinaryFunctionCallGraph.cpp
The file was removedbolt/merge-fdata/Makefile
The file was addedbolt/src/BinaryPassManager.cpp
The file was removedbolt/Passes/CallGraph.h
The file was removedbolt/MCPlusBuilder.cpp
The file was removedbolt/Passes/Inliner.h
The file was addedbolt/src/Exceptions.h
The file was addedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was removedbolt/Passes/LongJmp.h
The file was removedbolt/Passes/ShrinkWrapping.h
The file was addedbolt/src/MCPlusBuilder.cpp
The file was removedbolt/CacheMetrics.cpp
The file was addedbolt/src/BinaryBasicBlock.h
The file was removedbolt/RewriteInstance.h
The file was removedbolt/Passes/StokeInfo.cpp
The file was addedbolt/src/BinaryData.h
The file was addedbolt/src/ProfileReader.cpp
The file was addedbolt/src/Passes/RegAnalysis.h
The file was removedbolt/Passes/StackPointerTracking.h
The file was addedbolt/src/llvm-bolt.cpp
The file was addedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was removedbolt/BinaryFunctionProfile.cpp
The file was addedbolt/src/Passes/LivenessAnalysis.cpp
The file was addedbolt/src/Passes/StokeInfo.cpp
The file was removedbolt/Passes/Aligner.cpp
The file was addedbolt/src/Passes/DataflowInfoManager.cpp
The file was addedbolt/src/Passes/Aligner.cpp
The file was removedbolt/Passes/CallGraphWalker.h
The file was addedbolt/src/Passes/ShrinkWrapping.h
The file was removedbolt/Passes/BinaryFunctionCallGraph.h
The file was removedbolt/Passes/StackAllocationAnalysis.cpp
The file was removedbolt/Passes/HFSort.h
The file was removedbolt/ProfileReader.cpp
The file was addedbolt/src/Passes/CMakeLists.txt
The file was addedbolt/src/BinaryFunction.cpp
The file was removedbolt/Target/X86/CMakeLists.txt
The file was addedbolt/src/Passes/DominatorAnalysis.h
The file was removedbolt/Passes/HFSortPlus.cpp
The file was addedbolt/src/Passes/StokeInfo.h
The file was removedbolt/Passes/PettisAndHansen.cpp
The file was removedbolt/ProfileYAMLMapping.h
The file was addedbolt/src/Passes/FrameOptimizer.cpp
The file was removedbolt/Passes/FrameAnalysis.h
The file was removedbolt/Target/CMakeLists.txt
The file was addedbolt/src/Passes/Inliner.cpp
The file was removedbolt/Passes/DataflowInfoManager.h
The file was addedbolt/src/Passes/StackReachingUses.h
The file was addedbolt/src/Passes/StackPointerTracking.h
The file was addedbolt/src/merge-fdata/CMakeLists.txt
The file was addedbolt/src/Passes/JTFootprintReduction.h
The file was addedbolt/src/Passes/LongJmp.cpp
The file was addedbolt/src/BoltDiff.cpp
The file was removedbolt/Passes/Aligner.h
The file was addedbolt/src/Passes/CachePlusReorderAlgorithm.cpp
The file was addedbolt/src/Passes/RegReAssign.cpp
The file was removedbolt/Passes/ReorderFunctions.cpp
The file was addedbolt/src/Passes/ReorderAlgorithm.h
The file was addedbolt/src/BinarySection.h
The file was removedbolt/Passes/AllocCombiner.cpp
The file was addedbolt/src/Passes/HFSort.h
The file was addedbolt/src/Passes/LongJmp.h
Commit bdf21f7617e4155efbf3df7ce277901ca3825ac9 by maks
[BOLT] Align basic blocks based on execution count

Summary:
The default is not changing, i.e. we are not aligning code within a
function by default.

New meaning of options for aligning basic blocks:

  -align-blocks
      triggers basic block alignment based on profile

  -preserve-blocks-alignment
      tries to preserve basic block alignment seen on input

Tuning options for "-align-blocks":
  -align-blocks-min-size=<uint>
      blocks smaller than the specified size wouldn't be aligned

  -align-blocks-threshold=<uint>
      align only blocks with frequency larger than containing function
      execution frequency specified in percent. E.g. 1000 means aligning
      blocks that are 10 times more frequently executed than the containing
      function.

(cherry picked from FBD7921980)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/Aligner.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/Passes/Aligner.h
Commit 729da2da22740428a84ec27839db6c957893cff2 by maks
[BOLT] Static data reordering pass.

Summary:
Enable BOLT to reorder data sections in a binary based on memory
profiling data.

This diff adds a new pass to BOLT that can reorder data sections for
better locality based on memory profiling data.  For now, the algorithm
to order data is primitive and just relies on the frequency of loads to
order the contents of a section.  We could probably do a lot better by
looking at what functions use the hot data and grouping together hot
data that is used by a single function (or cluster of functions).
Block ordering might give some hints on how to order the data better as
well.

The new pass has two basic modes: inplace and split (when inplace is
false).  The default is split since inplace hasn't really been tested
much.  When splitting is on, the cold data is copied to a "cold" version
of the section while the hot data is kept in the original section, e.g.
for .rodata, .rodata will contain the hot data and .bolt.org.rodata will
contain the cold bits.  In inplace mode, the section contents are
reordered inplace.  In either mode, all relocations to data within that
section are updated to reflect new data locations.

Things to improve:
- The current algorithm is really dumb and doesn't seem to lead to any
  wins.  It certainly could use some improvement.
- Private symbols can have data that leaks over to an adjacent symbol,
  e.g. a string that has a common suffix can start in one symbol and
  leak over (with the common suffix) into the next.  For now, we punt on
  adjacent private symbols.
- Handle ambiguous relocations better.  Section relocations that point
  to the boundary of two symbols will prevent the adjacent symbols from
  being moved because we can't tell which symbol the relocation is for.
- Handle jump tables.  Right now jump table support must be basic if
  data reordering is enabled.
- Being able to handle TLS.  A good amount of data access in some
  binaries are happening in TLS. It would be worthwhile to be able to
  reorder any TLS sections too.
- Handle sections with writeable data.  This hasn't been tested so
  probably won't work.  We could try to prevent false sharing in
  writeable sections as well.
- A pie in the sky goal would be to use DWARF info to reorder types.

(cherry picked from FBD6792876)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was addedbolt/src/Passes/ReorderData.h
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinarySection.h
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/JumpTable.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.h
The file was addedbolt/src/Passes/ReorderData.cpp
The file was modifiedbolt/src/BinarySection.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryData.h
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/Passes/CMakeLists.txt
The file was modifiedbolt/src/BinaryData.cpp
Commit e4f39bda51d26078b871ad5a546ee06501e94fd5 by maks
adjusting cache stats for non-simple functions

Summary:
While working with a binary in non-relocations mode, I realized
some cache metrics are not computed correctly. Hence, this fix.
In addition, logging the number of functions with modified ordering of
basic blocks, which is helpful for analysis.

(cherry picked from FBD7975392)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/CacheMetrics.cpp
Commit 56b38a14c50ab3a8e7c1549823c5718174925151 by maks
[BOLT] Fix dyno-stats for PLT calls

Summary:
To accurately account for PLT optimization, each PLT call should be
counted as an extra indirect call instruction, which in turn is
a load, a call, an indirect call, and instruction entry in dyno stats.

(cherry picked from FBD7978980)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 1750fee2ac331261a5fd027eea250f254caf9bbf by maks
[BOLT] Add option to ignore function hash in profile

Summary:
When we make changes to MCInst opcodes (or get changes from upstream),
a hash value for BinaryFunction changes. As a result, we are unable
to match profile generated by a previous version of BOLT.

Add option `-profile-ignore-hash` to match profile while ignoring
function hash value. With this option we match functions with common
names using the number of basic blocks.

(cherry picked from FBD7983269)
The file was modifiedbolt/src/ProfileReader.cpp
Commit 3af3537383fca997ab6a9a8807060ca777542be1 by maks
[BOLT] Properly handle non-standard function refs

Summary:
Application code can reference functions in a non-standard way, e.g.
using arithmetic and bitmask operations on them. One example is if a
program checks if a function is below a certain address or within
a certain address range to perform a low-level optimization or generate
a proper code (JIT).

Instead of relying on a relocation value (symbol+addend), we use only
the symbol value, and then check if the value is inside the function.
If it is, we treat it as a code reference against location within the
function, otherwise we handle it as a non-standard function reference
and issue a warning.

(cherry picked from FBD7996274)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit 13968f7fa9c7225c018078c32e48cd6c0f71eb98 by maks
[BOLT] Add option to print functions with bad layout

Summary:
Option `-report-bad-layout=N` prints top N functions with layouts
that have cold blocks placed in the middle of hot blocks. The sorting is
based on execution_count / number_of_basic_blocks formula.

(cherry picked from FBD8051950)
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryFunction.h
Commit 6302e18f9466ce50cb9f6dbb10323821267fe776 by maks
[PERF2BOLT] Improve file matching

Summary:
If the input binary for perf2bolt has a build-id and perf data has
recorded build-ids, then try to match them. Adjust the file name if
build-ids match to cover cases where the binary was renamed after data
collection. If there's no matching build-id report an error and exit.

While scanning task events, truncate the name to 15 characters prior to
matching, since that's how names are reported by perf.

(cherry picked from FBD8034436)
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 929b0908f711457feae6c6b68d9fa5ac82604987 by maks
[BOLT][NFC] Move ICF pass into a separate file

Summary:
Consolidate code used by identical code folding under
Passes/IdenticalCodeFolding.cpp.

(cherry picked from FBD8109916)
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/BoltDiff.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was addedbolt/src/Passes/IdenticalCodeFolding.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was addedbolt/src/Passes/IdenticalCodeFolding.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/Passes/CMakeLists.txt
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 42e651224146a12d8531923769187b67496cf0b5 by maks
[BOLT-AArch64] Detect linker stubs and address them

Summary:
In AArch64, when the binary gets large, the linker inserts
stubs with 3 instructions: ADRP to load the PC-relative address of
a page; ADD to add the offset of the page; and a branch instruction
to do an indirect jump to the contents of X16 (the linker-reserved
reg). The problem is that the linker does not issue a relocation for
this (since this is not code coming from the assembler), so BOLT has
no idea what is the real target, unless it recognizes these instructions
and extract the target by combining the operands of the instructions
from the stub. This diff does exactly that.

(cherry picked from FBD7882653)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit b4dbd35d6c183909396611b50bb0c231aaf9548d by maks
[BOLT] Initial support for memcpy() inlininig

Summary:
Add "-inline-memcpy" option to inline calls to memcpy() using
"rep movsb" instruction. The pass is X86-specific.

Calls to _memcpy8 are optimized too using a special return value
(dest+size).

The implementation is very primitive in that it does not track liveness
of %rax after return, and no %rcx substitution. This is going to get
improved if we find the optimization to be useful.

(cherry picked from FBD8211890)
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/BinaryFunction.h
Commit 779541283a3ddeac3975f879bbd4a6a2f0655459 by maks
[BOLT] merging cold basic blocks to reduce #jumps

Summary:
This diff introduces a modification of cache+ block ordering algorithm,
which reordered and merges cold blocks in a function with the goal of reducing
the number of (non-fallthrough) jumps, and thus, the code size.

(cherry picked from FBD8044978)
The file was modifiedbolt/src/Passes/HFSortPlus.cpp
The file was modifiedbolt/src/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/src/Passes/ReorderAlgorithm.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
Commit 706abb6c9541cb3b18e4093cbfe0e8518c51602d by maks
[BOLT] Hash anonymous symbol names

Summary:
This diff replaces the addresses in all the {SYMBOLat,HOLEat,DATAat} symbols with hash values based on the data contained in the symbol.  It should make the profiling data for anonymous symbols robust to address changes.

The only small problem with this approach is that the hashed name for padding symbols of the same size collide frequently.  This shouldn't be a big deal since it would be weird if those symbols were hot.

On a test run with hhvm there were 26 collisions (out of ~338k symbols).  Most of the collisions were from small (2,4,8 byte) objects.

(cherry picked from FBD7134261)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Relocation.h
The file was modifiedbolt/src/BinarySection.cpp
The file was modifiedbolt/src/BinarySection.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinaryData.h
The file was modifiedbolt/src/merge-fdata/merge-fdata.cpp
The file was modifiedbolt/src/BinaryData.cpp
Commit 232046f9b25e54842dcce4471354d450d80b864f by maks
[Bolt] Reduce verbosity while reporting hash collisions

Summary:
Don't report all data objects with hash collisions by default. Only
report the summary, and use -v=1 for providing the full list.

(cherry picked from FBD8372241)
The file was modifiedbolt/src/BinaryContext.cpp
Commit 789162276d00c48a0274a826b539a024822819f6 by maks
[Bolt][NFC] Change capitalization s/BOLT/Bolt/g

(cherry picked from FBD8373789)
The file was modifiedbolt/src/Passes/RegReAssign.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/ProfileReader.cpp
The file was modifiedbolt/src/ProfileReader.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/llvm-bolt.cpp
The file was modifiedbolt/src/Passes/LongJmp.h
The file was modifiedbolt/src/DataReader.cpp
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/Passes/JTFootprintReduction.h
The file was modifiedbolt/src/Passes/Inliner.h
The file was modifiedbolt/README.txt
The file was modifiedbolt/src/Passes/Inliner.cpp
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/ProfileYAMLMapping.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit a7d025139f377c40b54ce471621ac3d7c2684296 by maks
Revert "[Bolt][NFC] Change capitalization s/BOLT/Bolt/g"

Summary:

(cherry picked from FBD8431879)
The file was modifiedbolt/src/llvm-bolt.cpp
The file was modifiedbolt/src/Passes/RegReAssign.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/README.txt
The file was modifiedbolt/src/Passes/Inliner.h
The file was modifiedbolt/src/Passes/JTFootprintReduction.h
The file was modifiedbolt/src/ProfileYAMLMapping.h
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataReader.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/Passes/Inliner.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/ProfileReader.h
The file was modifiedbolt/src/Passes/LongJmp.h
The file was modifiedbolt/src/ProfileReader.cpp
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit 221107c5fb0684850e76201a3319adf7c122c243 by maks
[BOLT] Update llvm.patch

Summary:

(cherry picked from FBD8475998)
The file was modifiedbolt/llvm.patch
Commit 35c09dc4ddbcd9dff21769a3ce74f4c6065dd366 by maks
[BOLT] Add a user friendly error reporting message

Summary:
In case we fail to disassemble or to build the CFG for a
function, print instructions on bug reporting.

(cherry picked from FBD8549737)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.cpp
Commit 3ab2929b36cc2d3b6b7d859da2e9ab44101ea09e by maks
[BOLT] Fix support for PIC jump tables

Summary:
BOLT heuristics failed to work if false PIC jump table entries were
accepted when they were pointing inside a function, but not at
an instruction boundary.

This fix checks if the destination falls at instruction boundary, and
if it does not, it truncates the jump table. This, of course, still does not
guarantee that the entry corresponds to a real destination, and we can
have "false positive" entry(ies). However, it shouldn't affect
correctness of the function, but the CFG may have edges that are never
taken. We may update an incorrect jump table entry, corresponding to an
unrelated data, and for that reason we force moving of jump tables if a
PIC jump table was detected.

(cherry picked from FBD8559588)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 1baa2529ea57eea3cd63cbbf9e1d237402d7fa4f by maks
[merge-fdata] Support legacy/non-YAML profile format

Summary: Concatenate profile contents if they are not in YAML format.

(cherry picked from FBD8579955)
The file was modifiedbolt/src/merge-fdata/merge-fdata.cpp
Commit 8f717dd25ecc0a76581cf0fe014c5a27d2166044 by maks
[BOLT] Add initial bolt-only test infra

Summary:
Create folders and setup to make LIT run BOLT-only tests. Add
a test example. This will add a new make/ninja rule "check-bolt" that
the user can invoke to run LIT on this folder.

(cherry picked from FBD8595786)
The file was addedbolt/test/CMakeLists.txt
The file was addedbolt/test/lit.site.cfg.py.in
The file was addedbolt/test/lit.cfg.py
The file was addedbolt/test/X86/Inputs/srol-bug-input.yaml
The file was addedbolt/test/X86/srol-bug.test
The file was modifiedbolt/CMakeLists.txt
Commit 5b2eab653809d62c9b7120cf60bc21308f7effe6 by maks
[BOLT] Fix call to evaluateX86MemOperands

Summary:
There was a call site not providing a displament immediate
value. This assertion is firing in opensource.

(cherry picked from FBD8576033)
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
Commit 72ecd12f2faedceef0b402be8b1a1924412feba8 by maks
Disable -split-eh in non-relocation mode

Summary:
This option only works in relocation mode. In non-relocation
mode, it generates invalid references that cause MCStreamer to fail.
Disable this flag if the user requested and print a warning.

(cherry picked from FBD8625990)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
Commit 07353e9590ace8cac1375e89c8c40bd7e2430cf7 by maks
[BOLT][PR] In some cases DB could be nullptr

Summary:
When processing binary with -debug mode in some cases, BD could be nullptr. It will be better to fail later on assert than here with segfault.
Closes https://github.com/facebookincubator/BOLT/pull/18
GitHub Author: Alexander Gryanko <xpahos@gmail.com>

(cherry picked from FBD8650719)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 8835f90d1e241d51c6293a44a40ecf2386941b63 by maks
[X86] Support a subset of internal calls

Summary:
Add support for functions with internal calls, necessary for
handling Intel MKL library and some code observed in google core dumper
library.

This is not optimizing these functions, but only identifying them,
running analyses to assure we will not break those functions if we move
them, and then "freezing" these functions (marking as not simple so Bolt
will not try to reorder it or touch it in any way).

(cherry picked from FBD8364381)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was addedbolt/src/Passes/ValidateInternalCalls.h
The file was modifiedbolt/src/Passes/FrameOptimizer.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/Passes/JTFootprintReduction.cpp
The file was addedbolt/src/Passes/ValidateInternalCalls.cpp
The file was modifiedbolt/src/Passes/RegAnalysis.cpp
The file was modifiedbolt/src/Passes/RegAnalysis.h
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/Passes/RegReAssign.cpp
The file was modifiedbolt/src/Passes/DataflowAnalysis.h
The file was modifiedbolt/src/Passes/CMakeLists.txt
The file was modifiedbolt/src/Passes/StokeInfo.cpp
The file was modifiedbolt/src/Passes/ReachingDefOrUse.h
Commit 6802948028b52b24bdadbbb99c1972b3dd4722c4 by maks
[BOLT] Allow jump tables with 2 entries

Summary:
GCC 8 can generate jump tables with just 2 entries. Modify our heuristic
to accept it. We still assert that there's more than one entry.

(cherry picked from FBD8709416)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit edc0cb1121a1d535834059f777f3f1f3a3aaed16 by maks
[LLVM] Accept `S` in augmentation strings in CIE

Summary:
Ignore 'S' in augmentation string on input. It just marks a signal
frame. All we have to do is propagate it.

Fixes facebookincubator/BOLT#21

This was already in LLVM trunk rL331738. Update llvm.patch.

(cherry picked from FBD8707222)
The file was modifiedbolt/llvm.patch
Commit a6a37995d916964f4b216e3dda967ab491eb0d9a by maks
[BOLT] Reject processing of PIE binaries

Summary:
Check if the input binary ELF type. Reject any binary not of
ET_EXEC type, including position-independent executables (PIEs).

Also print the first function containing PIC jump table.

(cherry picked from FBD8707274)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 365613b404075a7f4ecb823bb65427eefba7d2c5 by maks
[BOLT] Fix no-assertions build

Summary:
In release build without assertions MCInst::dump() is undefined and
causes link time failure.

Fixes facebookincubator/BOLT#27.

(cherry picked from FBD8732905)
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit d7b2474f835968cc649bcced27ef86bc3f4c61c4 by maks
[DebugInfo] Change default value of FDEPointerEncoding

Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.

(cherry picked from FBD8740274)
The file was modifiedbolt/llvm.patch
Commit b447979b8c35f8e24dc8edc709cffa36b595ea6c by maks
[BOLT] Fix diagnostics printing in data aggregator

Summary: Print correct part of the string while reporting an error.

(cherry picked from FBD8745329)
The file was modifiedbolt/src/DataAggregator.cpp
Commit 64c429da895edf3801e40c85a0a1550e619e95e7 by maks
[LongJumpPass] X86 enablement. First attempt.

(cherry picked from FBD8753328)
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/Passes/LongJmp.cpp
Commit 207ac19c638b4bd7c2a9e7936d538835192115a6 by maks
Revert "[LongJumpPass] X86 enablement. First attempt."

This reverts commit 010b0f7603fc9fa209c6dc95ce4b9c08e7b70d75.

(cherry picked from FBD28111178)
The file was modifiedbolt/src/Passes/LongJmp.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
Commit b6c4d8e924d340e778babbbac0abc22643cd76e8 by maks
-- Adding Veneer elimination pass and Veneer count to dyno stats.

Summary: Create a pass that performs veneers elimination .

(cherry picked from FBD8359299)
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was addedbolt/src/Passes/VeneerElimination.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was addedbolt/src/Passes/VeneerElimination.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/CMakeLists.txt
Commit 544d1577c1d9f6064860b57668232e62a98e8953 by maks
Avoid removing BBs referenced by JTs

Summary:
While removing unreachable blocks, we may decide to remove a
block that is listed as a target in a jump table entry. If we do that,
this label will be then undefined and LLVM assembler will crash.
Mitigate this for now by not removing such blocks, as we don't support
removing unnecessary jump tables yet.

Fixes facebookincubator/BOLT#20

(cherry picked from FBD8730269)
The file was addedbolt/test/X86/Inputs/issue20.yaml
The file was modifiedbolt/src/BinaryFunction.cpp
The file was addedbolt/test/X86/issue20.test
Commit 12380b8b067251fdebab883ca707133681d6b6a4 by maks
Fix assembly after adding entry points

Summary:
When a given function B, located after function A, references
one of A's basic blocks, it registers a new global symbol at the
reference address and update A's Labels vector via
BinaryFunction::addEntryPoint(). However, we don't update A's branch
targets at this point. So we end up with an inconsistent CFG, where the
basic block names are global symbols, but the internal branch operands
are still referencing the old local name of the corresponding blocks
that got promoted to an entry point. This patch fix this by detecting
this situation in addEntryPoint and iterating over all instructions,
looking for references to the old symbol and replacing them to use the
new global symbol (since this is now an entry point).

Fixes facebookincubator/BOLT#26

(cherry picked from FBD8728407)
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
The file was addedbolt/test/X86/Inputs/issue26.yaml
The file was modifiedbolt/src/MCPlusBuilder.cpp
The file was addedbolt/test/X86/issue26.test
Commit 66e0313d15260c092cafba5f5882b7ea4c706a3e by maks
[perf2bolt] Accept `-` as a valid misprediction symbol

Summary:
As reported in GH-28 `perf` can produce `-` symbol for misprediction bit
if the bit is not supported by the kernel/HW. In this case we can ignore
the bit.

(cherry picked from FBD8786827)
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataReader.cpp
Commit 44a36937f878745f59a1bf4c9ee638048a6d72e9 by maks
[BOLT] Fix llvm-dwarfdump issues

Summary:
llvm-dwarfdump is relying on getRelocatedSection() to return
section_end() for ELF files of types other than relocatable objects.
We've changed the function to return relocatable section for other
types of ELF files. As a result, llvm-dwarfdump started re-processing
relocations for sections that already had relocations applied, e.g. in
executable files, and this resulted in wrong values reported.

As a workaround/solution, we make this function return relocated section
for executable (and any non-relocatable objects) files only if the
section is allocatable.

(cherry picked from FBD8760175)
The file was modifiedbolt/llvm.patch
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 7aee0adbf9a5ddd62afa425bf14fcd5d1d3f4620 by maks
[BOLT-AArch64] Create cold symbols on demand

Summary:
Rework the logic we use for managing references to constant
islands. Defer the creation of the cold versions to when we split the
function and will need them.

(cherry picked from FBD8228803)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryFunction.h
Commit f2f164f47434397d1c2b8aed30e43434178a79d4 by maks
[perf2bolt] Fix perf build-id matching

Summary:
Recent compiler tool chains can produce build-ids that are less than 40
characters long. Linux perf, however, always outputs 40 characters,
expanding the string with 0's as needed. Fix the matching by only
checking the string prefix.

(cherry picked from FBD8839452)
The file was modifiedbolt/src/DataAggregator.cpp
Commit 6e45f5aeec0b26c60f38fcb9a81d4139e09b226f by maks
[perf2bolt] Enforce file matching in perf2bolt

Summary:
If the input binary does not have a build-id and the name does not match
any file names in perf.data, then reject the binary, and issue an error
message suggesting to rename it to one of the listed names from
perf.data.

(cherry picked from FBD8846181)
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataAggregator.h
Commit 27f30324478a031c920f964d69db0d6e8e9e6ac3 by maks
Add initial function injection support

Summary:
This diff have the API needed to inject functions using bolt.
In relocation mode injected functions are emitted between the cold and the hot functions,
In non-reloc mode injected functions are emitted a next text section.

(cherry picked from FBD8715965)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit ddfcf4f266d218f242b6ea0b714550949adb75ff by maks
[BOLT] Add parser for pre-aggregated perf data

Summary:
The regular perf2bolt aggregation job is to read perf output directly.
However, if the data is coming from a database instead of perf, one
could write a query to produce a pre-aggregated file. This function
deals with this case.

The pre-aggregated file contains aggregated LBR data, but without binary
knowledge. BOLT will parse it and, using information from the
disassembled binary, augment it with fall-through edge frequency
information. After this step is finished, this data can be either
written to disk to be consumed by BOLT later, or can be used by BOLT
immediately if kept in memory.

File format syntax:
{B|F|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count>
[<mispred_count>]

B - indicates an aggregated branch
F - an aggregated fall-through (trace)
f - an aggregated fall-through with external origin - used to disambiguate
between a return hitting a basic block head and a regular internal
jump to the block

<start_id> - build id of the object containing the start address. We can
skip it for the main binary and use "X" for an unknown object. This will
save some space and facilitate human parsing.

<start_offset> - hex offset from the object base load address (0 for the
main executable unless it's PIE) to the start address.

<end_id>, <end_offset> - same for the end address.

<count> - total aggregated count of the branch or a fall-through.

<mispred_count> - the number of times the branch was mispredicted.
Omitted for fall-throughs.

Example
F 41be50 41be50 3
F 41be90 41be90 4
f 41be90 41be90 7
B 4b1942 39b57f0 3 0
B 4b196f 4b19e0 2 0

(cherry picked from FBD8887182)
The file was modifiedbolt/src/DataReader.cpp
The file was modifiedbolt/src/BinaryFunctionProfile.cpp
The file was addedbolt/test/X86/Inputs/pre-aggregated.txt
The file was addedbolt/test/X86/pre-aggregated-perf.test
The file was modifiedbolt/src/DataReader.h
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/DataAggregator.h
The file was addedbolt/test/X86/Inputs/blarge.yaml
The file was modifiedbolt/src/DataAggregator.cpp
Commit 631da736b0f2926835d39ee55fc4cdfb079061fb by maks
[BOLT] further speeding up cache+

Summary:
For large binaries, cache+ algorithm adds a noticeable overhead in
comparison with cache. This modification restricts search space of the
optimization, which makes cache+ as fast as cache for all tested binaries.

There is a tiny (in the order of 0.01%) regression in cache-related metrics,
but this is not noticeable in practice.

(cherry picked from FBD8369968)
The file was modifiedbolt/src/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/src/CacheMetrics.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 49920a8fad09f3cc687b7c77e669cada2cb792e1 by maks
[BOLT] Add R_X86_64_PC64 relocation support

(cherry picked from FBD8980691)
The file was modifiedbolt/src/Relocation.cpp
Commit 771d9765435fa41f6834a19542d3ba0b047445bd by maks
[BOLT][NFC] Minor code refactoring

(cherry picked from FBD8882632)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/ValidateInternalCalls.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
Commit fe9f8219faba269dcc773bfa029b258bc249476d by maks
[BOLT] Fix TBSS-related issue

Summary:
TLS segment provide a template for initializing thread-local storage
for every new thread. It consists of initialized  and uninitialized
parts. The uninitialized part of TLS, .tbss, is completely meaningless
from a binary analysis perspective. It doesn't take any space in the
file, or in memory. Note that this is different from a regular .bss
section that takes space in memory.

We should not place .tbss into a list of allocatable sections, otherwise
it may cause conflicts with objects contained in the next section.

(cherry picked from FBD9074056)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryData.cpp
The file was modifiedbolt/src/BinarySection.h
Commit df947861199e20085da3dde4ea287b46fd9441e3 by maks
[BOLT] Fix range checks

Summary:
containsRange() functions were incorrectly checking for an empty range
at the end of containing object. I.e. [a,b) was reporting true for
containing [b,b).

(cherry picked from FBD9074643)
The file was modifiedbolt/src/BinaryData.h
The file was modifiedbolt/src/BinarySection.h
Commit 39f6fcc947bd80f80ca9a2d29ce724e053400774 by maks
[BOLT] Add support for IFUNC

Summary:
Relocation value verification was failing for IFUNC as the real value
used for relocation wasn't the symbol value, but a corresponding PLT
entry.

Relax the verification and skip any symbols of ST_Other type.

(cherry picked from FBD9123741)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 06e1554158a9917e8eb126d92e38f94da296ce30 by maks
Retpoline Insertion Pass

Summary:
retpoline insertion implemented for reloc mode,

(cherry picked from FBD8832838)
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was addedbolt/src/Passes/RetpolineInsertion.h
The file was modifiedbolt/src/Passes/CMakeLists.txt
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was addedbolt/src/Passes/RetpolineInsertion.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/MCPlusBuilder.h
Commit c35dc2a38635414ee7e8d0142df0d9489f235128 by maks
[BOLT] Detect and handle fixed indirect branches

Summary:
Sometimes GCC can generate code where one of jump table entries
is being used by an indirect branch with a fixed memory reference,
such as "jmp *(JT+8)". If we don't convert such branches to direct ones
and move jump tables, then the indirect branch will reference the old
table value and will end up at the non-updated destination, possibly
causing a runtime crash.

This fix converts such indirect branches into direct ones.

For now we mark functions containing indirect branches with fixed
destination as non-simple to prevent unreachable code elimination
problem triggered by related dead/unreachable jump table.

(cherry picked from FBD9192363)
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
Commit b2382dc5527dde3b5de3201d83d8f97eafee0ab6 by maks
retpoline insertion : further updates.

Summary:
Couple of updates:

1) Handle address pattern with segment register.
2) Assume R11 available for PLT calls always.
3) Add CFI state to each BB.
4) early exit getMacroOpFusionPair if Instruction.size() <2.

(cherry picked from FBD9172426)
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/BinaryBasicBlock.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/Passes/PLTCall.cpp
The file was modifiedbolt/src/Passes/RetpolineInsertion.cpp
Commit b10d4724c38648b010e1ec98aa051fe4344b3de4 by maks
[BOLT] Fix pseudo calculation in BinaryBasicBlock

Summary:
A recent commit broke our tests because it was depending on
getNumNonPseudos() at a very late stage of our optimization pipeline.
The problem was in a instruction deletion member function in
BinaryBasicBlock that was not updating the number of pseudos after
deletion. Fix this.

(cherry picked from FBD9305972)
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit 560c23411a006ba9253de21bf2b382299c6cf860 by maks
[perf2bolt] Use mmap events for PID collection

Summary:
Switch from using `perf script --show-task-events` to
`perf script --show-mmap-events` for associating a binary with PIDs in
perf.data. The output of the former command does not provide enough
information for PIE/.so processing.

(cherry picked from FBD9346586)
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/DataAggregator.cpp
Commit 87788ca92655a4a6c55f345dbe9769f87b28a2f6 by maks
[perf2bolt] Support profiling of PIEs and .so's

Summary:
Processing profile data for binaries with flexible load address (such as
position-independent executables and shared objects) requires adjusting
binary addresses depending on the base load address.

For every PID the mapping will be more or less unique when executing
with ASLR enabled, thus we have to keep the mapping record for all PIDs
associated with the binary. Then we adjust the addresses based on those
mappings.

(cherry picked from FBD9368566)
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 88bb14516499c5ffe2326fc209b02fc0c0a5e191 by maks
[BOLT] Update allocatable relocation sections

Summary:
Position-independent binaries may have runtime relocations of type
R_X86_64_RELATIVE that need an update if they were pointing to one of
the functions that we have relocated.

(cherry picked from FBD9374164)
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/RewriteInstance.h
Commit 510a8c4bbe568f457a4689ee224f4f3fd2c902ea by maks
[BOLT] Fix shrink-wrapping CFI update

Summary:
When updating CFI for a function that was optimized by
shrink-wrapping, if the function had no frame pointers, the CFI update
algorithm was incorrect.

(cherry picked from FBD9328658)
The file was modifiedbolt/src/Passes/ShrinkWrapping.cpp
Commit 9c4fcafa37ba7f07b6941ee4e0f1986918193f81 by maks
[BOLT] Add update-build-id option, on by default

Summary:
The build-id is used by tools to uniquely identify binaries. Update
the output binary build-id with a different number to make it
distinguishable from the input binary. This implementation just flips
the last build-id bit.

(cherry picked from FBD9235336)
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/ProfileWriter.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
Commit af1177d99f36f6b1e6ce5d5368319bfb9fbc4bec by maks
[BOLT] Add mattr options to AArch64 target

Summary:
Make the AArch64 subtarget enable all features, so the disassembler
won't choke on extension instructions.

(cherry picked from FBD9477066)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit a7e0704be671286ba167cca24411835d9b8b2a57 by maks
[BOLT] Reduce AArch64 target feature flags

Summary:
Eliminate some flags that are not recognized and
are currently printing warnings when BOLT runs on AArch64.

(cherry picked from FBD9499971)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 2511b09985e774f329a75f12b68cc4f0cdbe5c0a by maks
[BOLT][DWARF] Fix line info for empty CU DIEs

Summary:
In some rare cases a compiler may generate DWARF that contains an empty
CU DIE that references a debug line fragment. That fragment will contain
no file name information, and we fail to register it. Then, as a result,
DW_AT_stmt_list is not updated for the CU. This may cause some
DWARF-processing tools to segfault.

As a solution/workaround, we register "<unknown>" file name for such
debug line tables.

(cherry picked from FBD9526705)
The file was modifiedbolt/src/BinaryContext.cpp
Commit 708a55008477590e1022ca9f5ff27a5d19725472 by maks
[BOLT] Fix profile after ICP

Summary:
After optimizing a target of a jump table, ICP was not updating edge
counts corresponding to that target. As a result the edge could be left
hot and negatively influence the code layout.

(cherry picked from FBD9524396)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit d0a80b08703e631db509533a26e96dd6133fdd92 by maks
[BOLT] Change ForceRelocation behavior

Summary:
Only record address as addend if the target of the relocation
is the pseudo-symbol Zero.

(cherry picked from FBD9551543)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 69e6004a4292a98e6a351a5127e1defea3580afd by maks
[perf2bolt] Fix processing of binaries with names over 15 chars long

Summary:
Do not truncate the binary name for comparison purposes as the binary
name we are getting from "perf script" is no longer truncated.

(cherry picked from FBD9596409)
The file was modifiedbolt/src/DataAggregator.cpp
Commit cd19f718b4cbed65e7fb9a342301a09f37acfcd3 by maks
[BOLT] Merge jump table profile data

Summary:
While running ICF pass we have skipped merging profile data for jump
tables. We were only updating profile in the CFG. Fix that.

(cherry picked from FBD9595523)
The file was modifiedbolt/src/BinaryFunctionProfile.cpp
Commit 41ed5431a01f55a5ddb64229cc61948ffcbf42b9 by maks
[BOLT] turning on the compact aligner by default

Summary: Making UseCompactAligner true by default

(cherry picked from FBD9325158)
The file was modifiedbolt/src/Passes/Aligner.cpp
Commit 8026760ac04173e77a86a8059ea0aab47a849d7f by maks
[BOLT] Fix another issue with profile after ICP

Summary:
For jump tables ICP was using profile from the jump table itself which
doesn't work correct if the jump table is re-used at different code
locations.

(cherry picked from FBD9618774)
The file was modifiedbolt/src/BinaryBasicBlock.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.h
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit 53b72d0f2eebc885522524d737e576717b90b87f by maks
[BOLT] Ignore symbols from non-allocatable sections

Summary:
While creating BinaryData objects we used to process all symbol table
entries. However, some symbols could belong to non-allocatable sections,
and thus we have to ignore them for the purpose of analyzing in-memory
data.

(cherry picked from FBD9666511)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.cpp
Commit 1387a9d76165382d26b3d8ae6ca3c840e0defb76 by maks
[BOLT] Keep .text section in file when using old text

Summary:
If we reuse text section under `-use-old-text` option, then there's no
need to rename it. Tools, such as perf, seem to not like binaries
without `.text`.

Additionally, check if the code fits into `.text` using the page
alignment, otherwise we were skipping the alignment relying on the user
detecting the warning message. This could have resulted in unexpected
performance drops.

Also add `-no-huge-pages` option to use regular page size for code
alignment purposes (i.e. 4KiB instead of 2MiB).

(cherry picked from FBD10024670)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/Passes/LongJmp.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/BinaryContext.cpp
Commit bd0b99c45d1397065f750925f00536bf465bf4fc by maks
[BOLT] Change stub-insertion pass for AArch64

Summary:
Previously, we were expanding eligible branches with stubs. After
expansion, we were computing which stubs were unnecessary and removing them,
assuming ranges were shortening as code is removed. The problem with this
approach is that for branches that refer to code that is not managed by
BOLT, the distance to that location can increase and we can end up with an
out-of-range branch.

This rewrites the pass to be simpler, only increasing size and expanding code
with stubs as needed after each iteration, stopping when code stops increasing.
Besides this rewrite, the stub-insertion pass now supports stubs grouping
similar to what the linker does, allowing different functions to share the
same veneer that jumps to a common callee. It also fixes a bug in the previous
implementation that, in very large functions that use TBZ/TBNZ (+-32KB range),
it would mistakenly try to reuse a local stub BB that is out of range.

This includes a change to allow hot functions to be put at the end of the
.text section, closer to the heap, requiring no veneers to jump to JITted
code. And finally it enables eliminate veneers pass by default.

(cherry picked from FBD10023158)
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/Passes/VeneerElimination.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Passes/LongJmp.cpp
The file was modifiedbolt/src/Passes/LongJmp.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Target/AArch64/AArch64MCPlusBuilder.cpp
Commit ce508b58c6e92dc618a7d40cc1534a0618ffa5a0 by maks
[BOLT] Support relocations without symbols

Summary:
lld may generate relocations without associated symbols. Instead of
rejecting binaries with such relocations, we can re-create the symbol
the relocation is against based on the extracted value.

(cherry picked from FBD10054576)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Exceptions.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit cc2276d3f1f8fc70ec134c20a6d8d9af4ffdc573 by maks
[BOLT] fix build with gcc-4.8.5

Summary: These are two minor changes to make it copatible with gcc-4.8.5

(cherry picked from FBD9884971)
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit c3c80822a356b5b126161a35b0a93e174fb9e259 by maks
[BOLT] Capitalize i

Summary: as titled

(cherry picked from FBD10136655)
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit b166ccbea800df610668b2562a5867211cac577e by maks
[BOLT][PR] Fix compiler warnings in BinaryContext and RegAnalysis

Summary:
This pull request fixes two compiler warnings:

- missing `break;` in a switch-case statement in RegAnalysis.cpp (-Wimplicit-fallthrough warning)
- misleading indentation in BinaryContext.cpp (-Wmisleading-indentation warning)
Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/39
GitHub Author: Andreas Ziegler <andreas.ziegler@fau.de>

(cherry picked from FBD10202092)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Passes/RegAnalysis.cpp
Commit 74a71c681216b32e7532e881e330b5a39e049257 by maks
Fix bug in analyzeRelocation for GOT entries

Summary:
Special case GOT relocs to ignore addend subtracting
logic in analyzeRelocation, since the addend does not refer to the
target of the instruction being analyzed. Also make the code honor
the comments in the special case about zeroed out ExtractValue but
non-zero addend.
Fix facebookincubator/BOLT#40

(cherry picked from FBD10355019)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit a76b13d48e4103e43fc0f5b2ed47559a37162d4a by maks
[perf2bolt] Pre-aggregate LBR samples

Summary: Pre-aggregating LBR data cuts pef2bolt processing times in half.

(cherry picked from FBD10420286)
The file was modifiedbolt/src/ProfileYAMLMapping.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/ProfileWriter.cpp
The file was modifiedbolt/src/DataAggregator.cpp
Commit 30fd960951b8146d93f34fe4467227c4e455a097 by maks
[BOLT] Update local symbol count in symbol table

Summary:
Fix sh_info entry for symbol table section to reflect updated number of
local symbols.

(cherry picked from FBD10503216)
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 40d9fcfdcadbe082bda825d3b0a19ac101aefef5 by maks
[BOLT] Workaround for Clang de-virtualization bug

Summary:
When Clang is boot-strapped with (Thin)LTO, it may produce a code
fragment similar to below:

  .LFT663334 (6 instructions, align : 1)
    Predecessors: .LFT663333
      00000538:   movb    $0x1, %al
      0000053a:   movl    %eax, -0x2c(%rbp)
      0000053d:   movl    $"_ZN5clang6Parser12ConsumeParenEv/1", %ecx
      00000542:   testb   $0x1, %cl
      00000545:   movq    -0x40(%rbp), %r14
      00000549:   je      .Ltmp1071462
    Successors: .Ltmp1071462, .LFT663335

  .LFT663335 (2 instructions, align : 1)
    Predecessors: .LFT663334
      0000054b:   movq    (%r12), %rax
      0000054f:   movq    .Ltmp0(%rax), %rcx
    Successors: .Ltmp1071462

  .Ltmp1071462 (7 instructions, align : 1)
    Predecessors: .LFT663334, .LFT663335
      00000556:   movq    %r12, %rdi
      00000559:   callq   *%rcx
      .......

The code above is making a call by dereferencing a pointer to a member
function. A pointer to a member function could either be a regular
function, or a virtual function. To differentiate between the two, AMD64
ABI (originated from Itanium ABI) uses the last bit of the pointer. The
call instruction sequence varies depending if the function is virtual or
not, and the pointer's last bit is checked. If it's "1" then the value
of the pointer (minus 1) is used as an offset in the object vtable to
get the address of the function, otherwise the pointer is used directly
as a function address.

In this specific case, a de-virtualization is taking place, but it's not
complete. Compiler knows that the member function pointer is actually a
non-virtual function _ZN5clang6Parser12ConsumeParenEv (aka
"clang::Parser::ConsumeParen()"). However, it keeps the (dead) code that
checks the last bit of _ZN5clang6Parser12ConsumeParenEv, and furthermore
keeps the code (unreachable/dead) to make a virtual call while using
(_ZN5clang6Parser12ConsumeParenEv - 1) as an offset into the vtable.
This is obviously wrong, but since the code is unreachable, it will
never affect the runtime correctness.

The value "_ZN5clang6Parser12ConsumeParenEv - 1" falls into a last byte
of a function preceding _ZN5clang6Parser12ConsumeParenEv, and BOLT
creates a label ".Ltmp0" pointing to this last byte that is referenced
in by the instruction sequence above. It just happens that the last byte
is also in the middle of the last instruction, and as a result, BOLT
never emits the label, hence resulting in the error message "Undefined
temporary symbol".

The workaround is to detect non-pc-relative relocations from code
pointing to some (fptr - 1). Note that this is not completely
error-prone, but non-pc-relative references from code into a middle of
a function are quite rare, and chances that in a normal situation they
will point to a byte preceding some function address are virtually zero.

(cherry picked from FBD13030310)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit e1b8fade7fa6687750114e5cf09dd8d2126a6170 by maks
[BOLT] Add branch priority policy for blocks with 2 successors

Summary:
On x86 the difference between long and short jump instructions could be
either 4 or 3 bytes, depending if it's a conditional jump or not.
For a basic block with 2 jump instructions, if we know that one of
the successors is in a different code region, then we can make it
a target of an unconditional jump instruction. This will save 1 byte
in case the conditional jump happens to be a short one.

(cherry picked from FBD13078139)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit b0f7fddd3505d5c389d4a637a6e33b7594db6842 by maks
[BOLT] Add method for better function size estimation

Summary:
Add BinaryContext::calculateEmittedSize() that ephemerally emits code
to allow precise estimation of the function size. Relaxation and
macro-op alignment adjustments are taken into account.

(cherry picked from FBD13092139)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 067a3850006eb9598a31874df54442d1fd74deac by maks
[BOLT] Add thresholds for function splitting

Summary:
Use newly added function size estimation to measure the effectiveness
and guide function splitting. Two new tuning options are added:

  -split-threshold=<uint>
    split function only if its main size is reduced by more than given
    amount of bytes. Default value: 0, i.e. split iff the size is reduced.
    Note that on some architectures the size can increase after splitting.
  -split-align-threshold=<uint>
    when deciding to split a function, apply this alignment while doing
    the size comparison (see -split-threshold). Default value: 2.

(cherry picked from FBD13136352)
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
Commit 2fe0c38d6b3d056dd1a835913129454a4b08ae84 by maks
[perf2bolt] Better tracking of process forking

Summary:
Improve tracking of forked processes.

If a process corresponding to the input binary has forked/started
before 'perf record' was initiated, then the full name of the binary
will be recorded in a corresponding MMAP2 event. We've being handling
such cases well so far.

However, if the process was forked after 'perf record' has started, and
execve(2) wasn't called afterwards, then there will be no MMAP2 event
recorded corresponding to the mapping of the main binary (unrelated
MMAP2 events could still be recorded).

To track such cases, we need to parse 'perf script --show-task-events'
command output, and to scan for PERF_RECORD_FORK events, and then add
forked process PIDs to the list associated with the input binary. If
the fork event was followed by an exec event (PERF_RECORD_COMM exec)
of a different binary, then the forked PID should be ignored. If the
exec event was associated with our input binary, then the correct MMAP2
event was recorded and parsed.

To track if the event occurred before or after 'perf record', we parse
event's time. This helps us to differentiate some events. E.g. the exec
event is only registered correctly if it happened after perf recording
has started (otherwise the "exec" part is missing), and thus we only
record forks with non-zero time stamps.

(cherry picked from FBD13250904)
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/DataAggregator.cpp
Commit c6ce2abb7d71d6ea8703552349d906c9897692b6 by maks
[perf2bolt] Optimize memory usage in perf2bolt

Summary:
While converting perf profile, we only need CFG for functions that were
profiled and can skip building CFG for the rest. This saves us some
processing time and memory.

Breakdown processing of perf.data into two steps. The first
step parses the data, saves it in intermediate format, and marks
functions with the profile. The second step attributes the profile to
functions with CFG. When we disassemble and build CFG for functions in
aggregate-only mode, we skip functions without the profile.

(cherry picked from FBD13706697)
The file was modifiedbolt/src/Passes/IndirectCallPromotion.h
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/Passes/JTFootprintReduction.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit af81c7ff803a9f5c47e112cd13eb73feee0e51e6 by maks
[perf2bolt] Add support for generating autofdo input

Summary:
Autofdo tools support.

(cherry picked from FBD13779026)
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/DataReader.h
The file was modifiedbolt/src/DataReader.cpp
The file was modifiedbolt/src/DataAggregator.cpp
Commit 365bd1f1c8f4be1c4798709e507c99ae74e07edd by maks
[BOLT] For non-simple functions always update jump tables in-place

Summary:
For non-simple function we can miss a reference to a jump table or
to an indirect goto table. If we move the jump table, the missed
reference will not get updated, and the corresponding indirect jump
will end up in the old (wrong) location. Updating the original jump
table in-place should take care of the issue.

(cherry picked from FBD13849776)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit ff6e21290f01bfe7a688102fc8fa92ce07f39947 by maks
[BOLT] New inliner implementation

Summary:
Addresses correctness issues related to inlining.
Inlining heuristics are not part of this diff.

(cherry picked from FBD13796888)
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryBasicBlock.cpp
The file was modifiedbolt/src/Passes/AllocCombiner.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/src/Passes/Inliner.h
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/Passes/Inliner.cpp
The file was modifiedbolt/src/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/src/Passes/JTFootprintReduction.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/Passes/FrameOptimizer.cpp
The file was modifiedbolt/src/Passes/LongJmp.cpp
The file was modifiedbolt/src/MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryPassManager.cpp
Commit 0c704eb75a13cdde738c051e6bef5ad1fb724817 by maks
[BOLT-HEATMAP] Initial heat map implementation

Summary:
Add heatmap subcommand to produce heatmaps based on perf.data with LBR.
The output is produced in colored ASCII format.

  llvm-bolt heatmap -p perf.data <executable>

    -block-size=<uint> - size of a heat map block in bytes (default 64)
    -line-size=<uint>  - number of entries per line (default 256)
    -max-address=<uint> - maximum address considered valid for heatmap
                          (default 4GB)
    -o=<string>        - heatmap output file (default stdout)

(cherry picked from FBD13969992)
The file was modifiedbolt/src/llvm-bolt.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/Exceptions.h
The file was modifiedbolt/src/CMakeLists.txt
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/Exceptions.cpp
The file was addedbolt/src/Heatmap.h
The file was addedbolt/src/Heatmap.cpp
Commit c593563d1f963066def0c50fc45b8406ea03a684 by maks
Do not assert on addresses read from processIndirectBranch

Summary: As part of our heuristics to decode an indirect branch, if we
suspect the branch is an indirect tail call, we add its probable target
to the BC::InterproceduralReferences vector to detect functions with
more than one entry point. However, if this probable target is not in an
allocatable section, we were asserting. Remove this assertion and
change the code to conditionally store to InterproceduralReferences
instead. The probable target could be garbage at this point because
of analyzeIndirectBranch failing to identify the load instruction that
has the memory address of the target, so we should tolerate this.

(cherry picked from FBD14432821)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit a9e64947c5d7813077075bd460439f3b160926fe by maks
[NFC][BOLT] Move ExecutableFileMemoryManager into its own file

(cherry picked from FBD14474800)
The file was modifiedbolt/src/CMakeLists.txt
The file was addedbolt/src/ExecutableFileMemoryManager.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was addedbolt/src/ExecutableFileMemoryManager.h
Commit 163adbec9fa49542a9b55dc68f1e5923a868383b by maks
[BOLT] Refactor allocatable sections rewrite part

Summary:
This refactoring makes it easier to create new code sections and control
code placement. As an example, cold code is being placed into
".text.cold" which is emitted independently from ".text", and the final
address assignment becomes more flexible.

Previously, in non-relocation mode we used to emit temporary section
name into .shstrtab. This resulted in unnecessary bloat of this section.

There was unnecessary padding emitted at the end of text section. After
fixing this, the output binary becomes smaller.

I had to change the way exception handling tables are re-written
as the current infra does not support cross-section label difference.
This means we have to emit absolute landing pad addresses, which might
not work for PIE binaries. I'm going to address this once I investigate
the current exception handling issues in PIEs.

This diff temporarily disables "-hot-functions-at-end" option.

(cherry picked from FBD14475693)
The file was modifiedbolt/src/BinarySection.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Exceptions.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryData.cpp
The file was modifiedbolt/src/BinarySection.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.h
Commit 0a55001a0e2ca9d6656005ed87878e39b3303be2 by maks
[BOLT] Fix -hot-functions-at-end option

Summary: Make "-hot-functions-at-end" option work again.

(cherry picked from FBD14476242)
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 61ea19edf83461b489063053cc4d37b1e37c5ca8 by maks
[BOLT][NFC] Fix compilation warnings

Summary: Get rid of warnings while building with Clang.

(cherry picked from FBD14495620)
The file was modifiedbolt/src/Passes/RetpolineInsertion.h
The file was modifiedbolt/src/Passes/ReorderFunctions.cpp
The file was modifiedbolt/src/Passes/RetpolineInsertion.cpp
Commit 17cd2034f3db3c08f8247940d874b0b6ee75cf96 by maks
[BOLT] Fix debug line info emission

Summary:
GDB does not like if the first entry in the line info table after
end_sequence entry is not marked with is_stmt. If this happens, it will
not print the correct line number information for such address. Note
that everything works fine starting with the first address marked
with is_stmt.

This could happen if the first instruction in the cold section wasn't
marked with is_stmt.

The fix is to always emit debug line info for the first instruction
in any function fragment with is_stmt flag.

(cherry picked from FBD14516629)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 6bcb3389dd7d61f8655b6556d8d9ed47e8c79198 by maks
[BOLT] Place hot text mover functions into a separate section

Summary:
Create a separate pass for assigning functions to sections. Detect
functions originating from special sections (by default .stub and
.mover) and place them into ".text.mover" if "-hot-text" options is
specified.

Cold functions are isolated from hot functions even when no function
re-ordering is specified.

(cherry picked from FBD14512628)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/RewriteInstance.h
Commit b8d3dc40ea069c070e7d534341e159eb7ae13681 by maks
[BOLT] Use local binding for cold fragment symbols

Summary:
We used to use existing symbol binding while duplicating and renaming
cold fragment symbols. As a result, some of those were emitted with
global binding. This confuses gdb, and it starts treating those symbols
as additional entry points.

The fix is to always emit such symbols with a local binding. This also
means that we have to sort static symbol table before emission to make
sure local symbols precede all others.

(cherry picked from FBD14529265)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 69faf6137243af4f60bdfd6057844bf1fff5e415 by maks
[BOLT] Fix section lookup while deleting symbols

Summary:
While removing redundant local symbols, we used new section index to
lookup the corresponding section in the old section table. As a result,
we used to either not remove the correct symbols, or remove the wrong
ones.

(cherry picked from FBD14552047)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/RewriteInstance.h
Commit d1b76f2ac2fe793b7a6566dfe0077044c0f0445f by maks
[BOLT] Allocate enough space past __hot_end for huge pages

Summary:
While using "-hot-text" option, we might not get enough cold text to
fill up the last huge page, and we can get data allocated on this page
producing undesirable effects. To prevent this from happening, always
make sure to allocate enough space past __hot_end.

(cherry picked from FBD14575100)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 297d1a4e1a7516d7fd7d0ae4fa0cd284ccf8193a by maks
[BOLT] Do not write jump table section headers

Summary:
In non-relocation mode we were accidentally emitting section headers for
every single jump table. This happened with default
`-jump-tables=basic`.

(cherry picked from FBD14653282)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinarySection.h
Commit 8894853f42b8d869fa3746e0f96460f9470697e9 by maks
[BOLT][DWARF] Dedup .debug_abbrev section patches

Summary:
When we patch .debug_abbrev we issue many duplicate patches. Instead of
storing these patches as a vector, use a hash map. This saves some
processing time and memory.

(cherry picked from FBD14691292)
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/DebugData.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/DebugData.h
Commit 7fd487066f6f0fa28c012182af513fcf415f6bb5 by maks
[BOLT] Move BinaryFunctions into a BinaryContext and more

Summary:
A long due refactoring that makes interfaces cleaner and less awkward.
Mainly makes the future work way easier.

(cherry picked from FBD14766284)
The file was modifiedbolt/src/Passes/FrameAnalysis.cpp
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/Passes/RegReAssign.h
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/Passes/ReorderData.h
The file was modifiedbolt/src/ProfileWriter.cpp
The file was modifiedbolt/src/Passes/JTFootprintReduction.h
The file was modifiedbolt/src/Passes/ReorderFunctions.h
The file was modifiedbolt/src/Passes/VeneerElimination.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/Passes/LongJmp.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Passes/ReorderFunctions.cpp
The file was modifiedbolt/src/Passes/BinaryFunctionCallGraph.h
The file was modifiedbolt/src/Passes/RegReAssign.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was addedbolt/src/DWARFRewriter.h
The file was modifiedbolt/src/Passes/RegAnalysis.h
The file was modifiedbolt/src/Passes/Inliner.cpp
The file was modifiedbolt/src/Passes/IdenticalCodeFolding.h
The file was modifiedbolt/src/Passes/IdenticalCodeFolding.cpp
The file was modifiedbolt/src/Passes/RetpolineInsertion.h
The file was modifiedbolt/src/Passes/FrameOptimizer.cpp
The file was modifiedbolt/src/Passes/StokeInfo.h
The file was modifiedbolt/src/Passes/FrameAnalysis.h
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/Passes/AllocCombiner.cpp
The file was modifiedbolt/src/Passes/LongJmp.cpp
The file was modifiedbolt/src/Passes/AllocCombiner.h
The file was modifiedbolt/src/Passes/StokeInfo.cpp
The file was modifiedbolt/src/Passes/Inliner.h
The file was modifiedbolt/src/Passes/PLTCall.cpp
The file was modifiedbolt/src/Passes/RetpolineInsertion.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/DWARFRewriter.cpp
The file was modifiedbolt/src/Passes/ValidateInternalCalls.cpp
The file was modifiedbolt/src/DebugData.h
The file was modifiedbolt/src/Passes/JTFootprintReduction.cpp
The file was modifiedbolt/src/BinaryPassManager.h
The file was modifiedbolt/src/DataAggregator.h
The file was modifiedbolt/src/Passes/VeneerElimination.h
The file was modifiedbolt/src/Passes/IndirectCallPromotion.h
The file was modifiedbolt/src/BoltDiff.cpp
The file was modifiedbolt/src/Passes/PLTCall.h
The file was modifiedbolt/src/Passes/ReorderData.cpp
The file was modifiedbolt/src/Passes/FrameOptimizer.h
The file was modifiedbolt/src/Passes/Aligner.cpp
The file was modifiedbolt/src/Passes/Aligner.h
The file was modifiedbolt/src/Passes/BinaryFunctionCallGraph.cpp
The file was modifiedbolt/src/Passes/ValidateInternalCalls.h
Commit c8a927696cf4e39a813be4c42fe5b48bfbddcc76 by maks
[BOLT] Detect internal references into a middle of instruction

Summary:
Some instructions in assembly-written functions could reference 8-byte
constants from another instructions using 4-byte offsets, presumably to
save a couple of bytes.

Detect such cases, and skip processing such functions until we teach
BOLT how to handle references into a middle of instruction.

(cherry picked from FBD14768212)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryFunction.h
Commit 624a0e810d843818c9c45a74fcdf98c8ffa69ea9 by maks
[DWARF][BOLT] Convert DW_AT_(low|high)_pc to DW_AT_ranges only if necessary

Summary:
While updating DWARF, we used to convert address ranges for functions
into DW_AT_ranges format, even if the ranges were not split and still
had a simple [low, high) form. We had to do this because functions with
contiguous ranges could be sharing an abbrev with non-contiguous range
function, and we had to convert the abbrev.

It turns out, that the excessive usage of DW_AT_ranges may lead to
internal core dumps in gdb in the presence of .gdb_index.
I still don't know the root cause of it, but reducing the number
DW_AT_ranges used by DW_TAG_subprogram DIEs does alleviate the
issue.

We can keep a simple range for DIEs that are guaranteed not to
share an abbrev with any non-contiguous function. Hence we have to
postpone the update of function ranges until we've seen all DIEs.
Note that DIEs from different compilation units could share the same
abbrev, and hence we have to process DIEs from all compilation units.

(cherry picked from FBD14814043)
The file was modifiedbolt/src/DWARFRewriter.h
The file was modifiedbolt/src/DebugData.cpp
The file was modifiedbolt/src/DebugData.h
The file was modifiedbolt/src/DWARFRewriter.cpp
Commit 90996eb54b907d2b431d9e83bcb2401069cd3a49 by maks
[PERF2BOLT] Print a better message if perf.data lacks LBR

Summary:
If processing the perf.data in LBR mode but the data was
collected without -j, currently we confusingly report all samples
to mismatch the input binary, even though the samples match but
lack LBR info. Change perf2bolt to detect this scenario and print
a helpful message instructing the user to collect data with LBR.

(cherry picked from FBD14817732)
The file was modifiedbolt/src/DataAggregator.cpp
The file was modifiedbolt/src/DataAggregator.h
Commit 7d89b113d86ac0517734f6e438ab3037a6c07d18 by maks
[BOLT][NFC] Indentation fix

(cherry picked from FBD14856700)
The file was modifiedbolt/src/BinaryContext.cpp
Commit a8e05d067d4ecc7e1ed434f8f05b4a3c49e4440a by maks
[BOLT] Add interface to extract values from static addresses

(cherry picked from FBD14858028)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Exceptions.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
Commit 88375d311e0c7fb4393e06213ac969511881dabd by maks
[BOLT] Sort basic block successors for printing

Summary:
For easier analysis of the hottest targets of jump tables it helps to
have basic block successors sorted based on the taken frequency.

(cherry picked from FBD14856640)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
Commit 315ae74de39a5b9ead5b13aa61860945f6ca1e72 by maks
[BOLT] Include <numeric> for std::iota

Summary: Some compilers require <numeric> header.

(cherry picked from FBD14868132)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit e50e89be9e596bdfa844f4f81aa3762b118a6feb by maks
[BOLT] Handle R_X86_64_converted_reloc_bit

Summary:
In binutils 2.30 a bfd linker accidentally started modifying some
relocations on output under `-q/--emit-relocs` by turning on
R_X86_64_converted_reloc_bit. As a result, BOLT ignored such
relocations and failed to correctly update the binary.

This diff filters out R_X86_64_converted_reloc_bit from the relocation
type.

(cherry picked from FBD14907832)
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Relocation.h
Commit 8f982685183ae7679d718ddf24c78028b2395aa0 by maks
[BOLT] Reduce warnings for non-simple functions

Summary:
If a function was already marked as non-simple, there's no reason to
issue a warning that it has a reference in the middle of an
instruction. Besides, sometimes there wouldn't be instructions
disassembled at a given entry, and the warning would be incorrect.

(cherry picked from FBD14938227)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 27dcec97171795fb2e15327cccba7fce0a426285 by maks
[BOLT] Abort processing if the profile has no valid data

Summary:
It's possible to pass a profile in invalid format to BOLT, and we
silently ignore it. This could cause a regression as such scenario can
go undetected. We should abort processing if no valid data was seen in
the profile and issue a warning if it was partially invalid.

(cherry picked from FBD14941211)
The file was modifiedbolt/src/DataReader.cpp
Commit 22ba3dc8166f0eea150d03bf7eb18b813cfac4e6 by maks
[BOLT] Add another section to the list of hot text movers

Summary:

(cherry picked from FBD14954472)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 31fc56b313cb4281a10e7cb55979c9c611732b44 by maks
[BOLT] Fix adjustFunctionBoundaries w.r.t. entry points

Summary:
Don't consider symbols in another section when processing
additional entry points for a function.

(cherry picked from FBD14962853)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
Commit ffae5e73f3fbbd43570ae32ceae63b07e838fcb8 by maks
[BOLT] Fix an issue with std:errc

Summary:
On some platforms
`llvm::make_error_code(std::errc::no_such_process) == std::errc::no_such_process`
evaluates to false.

(cherry picked from FBD14944405)
The file was modifiedbolt/src/DataAggregator.cpp
Commit 99ef4c90c160be962c9d2e60a4429f8f169f86e3 by maks
[BOLT] Basic support for split functions

Summary:
This adds very basic and limited support for split functions.
In non-relocation mode, split functions are ignored, while their debug
info is properly updated. No support in the relocation mode yet.

Split functions consist of a main body and one or more fragments.
For fragments, the main part is called their parent. Any fragment
could only be entered via its parent or another fragment.

The short-term goal is to correctly update debug information for split
functions, while the long-term goal is to have a complete support
including full optimization. Note that if we don't detect split
bodies, we would have to add multiple entry points via tail calls,
which we would rather avoid.

Parent functions and fragments are represented by a `BinaryFunction`
and are marked accordingly. For now they are marked as non-simple, and
thus only supported in non-relocation mode. Once we start building a
CFG, it should be a common graph (i.e. the one that includes all
fragments) in the parent function.

The function discovery is unchanged, except for the detection of
`\.cold\.` pattern in the function name, which automatically marks the
function as a fragment of another function.

Because of the local function name ambiguity, we cannot rely on the
function name to establish child fragment and parent relationship.
Instead we rely on disassembly processing.

`BinaryContext::getBinaryFunctionContainingAddress()` now returns a
parent function if an address from its fragment is passed.

There's no jump table support at the moment. Jump tables can have
source and destinations in both fragment and parent.

Parent functions that enter their fragments via C++ exception handling
mechanism are not yet supported.

(cherry picked from FBD14970569)
The file was modifiedbolt/src/DWARFRewriter.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 433f3e3e02afd7e1120ea30418927d8b41e345f8 by maks
[BOLT] Process CFIs for functions with FDE size mismatch

Summary:
If a function size indicated in FDE is different from the one in the
symbol table, we can keep processing the function as we are using the
max size for internal purposes. Typically this happens for
assembly-written functions with padding at the end. This padding is not
included in FDE, but it is in the symbol table.

(cherry picked from FBD14987653)
The file was modifiedbolt/src/Exceptions.cpp
Commit 3b422eafd027e2b665ad4fa5b2343437bc9a6f9d by maks
[BOLT] Fix non-determinism in shrink wrapping

Summary:
Iterating over SmallPtrSet is non-deterministic. Change it to
SmallSetVector. Similarly, do not sort a vector of ProgramPoint when
computing the dominance frontier, as ProgramPoint uses the pointer value
to determine order. Use a SmallSetVector there too to avoid duplicates
instead of sorting + uniqueing.

(cherry picked from FBD14992085)
The file was modifiedbolt/src/Passes/ShrinkWrapping.cpp
The file was modifiedbolt/src/Passes/DominatorAnalysis.h
The file was modifiedbolt/src/Passes/ShrinkWrapping.h
Commit d9f1bd42fd06f14d0532e7974e2d33bae8140e2f by maks
[cmake] Only build enabled targets

Summary:
When attempting to build llvm-bolt with `-DLLVM_ENABLE_TARGETS="X86"`, I
encountered an error:

```
CMake Error at cmake/modules/AddLLVM.cmake:559 (add_dependencies):
  The dependency target "AArch64CommonTableGen" of target
  "LLVMBOLTTargetAArch64" does not exist.
Call Stack (most recent call first):
  cmake/modules/AddLLVM.cmake:607 (llvm_add_library)
  tools/llvm-bolt/src/Target/AArch64/CMakeLists.txt:1 (add_llvm_library)
```

The issue is that the `llvm-bolt/src/Target/AArch64` subdirectory is
added by CMake unconditionally. The LLVM project, on the other hand,
only adds the subdirectories that are enabled, by using a `foreach` loop
over `LLVM_TARGETS_TO_BUILD`. Copying that same loop, from
`llvm/lib/Target/CMakeLists.txt`, to this project avoids the error.

(cherry picked from FBD15030236)
The file was modifiedbolt/src/Target/CMakeLists.txt
Commit eba1a67730f856b06e75a2603eca035049c1caa4 by maks
Fix casting issues on macOS

Summary:
`size_t` is platform-dependent, and on macOS it is defined as
`unsigned long long`. This is not the same type as is used in many calls
to templated functions that expect the same type. As a result, on macOS,
calls to `std::max` fail because a template function that takes
`uint64_t, unsigned long long` cannot be found.

To work around the issue:

* Specify explicit `std::max` and `std::min` functions where necessary,
  to work around the compiler trying (and failing) to find a suitable
  instantiation.
* For lambda return types, specify an explicit return type where necessary.
* For `operator ==()` calls, use an explicit cast where necessary.

(cherry picked from FBD15030283)
The file was modifiedbolt/src/Passes/Aligner.cpp
The file was modifiedbolt/src/ProfileReader.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/Passes/CachePlusReorderAlgorithm.cpp
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
Commit 4e4d39c21cb612f20cc65572db71c5bcc2a32145 by maks
[BOLT] Update symbols for secondary entry points

Summary:
Update the output ELF symbol table for symbols representing
secondary entry points for functions. Previously, those were left
unchanged in the symtab.

(cherry picked from FBD15010517)
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit 91b2de3c23c14f1df9dd8db9404dd00f8a9bd64a by maks
[BOLT] Minimize BOLT's diff with LLVM by removing trivial changes (NFC)

Summary: BOLT works as a series of patches rebased onto upstream LLVM at revision `f137ed238db`. Some of these patches introduce unnecessary whitespace changes or includes. Remove these to minimize the diff with upstream LLVM.

(cherry picked from FBD15064122)
The file was modifiedbolt/llvm.patch
Commit 492e4a515ea0d4bbf2c703ae9f8416797277059d by maks
[BOLT] Automatically enable -hot-text

Summary:
Enable -hot-text by default if reordering functions.

Also fail immediately if function reordering is specified on the command
line in non-relocation mode.

(cherry picked from FBD15095178)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Passes/ReorderFunctions.cpp
Commit 5717b0c427d3ae2c928c1d46beea496d71ceaa37 by maks
[perf2bolt] Fix print report for pre-aggregated profile

Summary:
For pre-aggregated profile, we were using the number of records in the
profile for `NumTraces` ignoring the counts per record. As a result,
the reported percentage of mismatched traces was bogus.

(cherry picked from FBD15093123)
The file was modifiedbolt/src/DataAggregator.cpp
Commit caa0fafa18e4e6a468e9ee585e4d714c1f6cb0b3 by maks
[BOLT] Fix profile reading in non-reloc mode

Summary:
In non-relocation mode we may execute multiple re-write passes either
because we need to split large functions or update debug information for
large functions (in this context large functions are functions that do
not fit into the original function boundaries after optimizations).

When we execute another pass, we reset RewriteInstance and run most of
the steps such as disassembly and profile matching for the 2nd or 3rd
time. However, when we match a profile, we check `Used` flag, and don't
use the profile for the 2nd time. Since we didn't reset the flag while
resetting the rest of the states, we ignored profile for all functions.
Resetting the flag in-between rewrite passes solves the problem.

(cherry picked from FBD15110959)
The file was modifiedbolt/src/DataReader.cpp
The file was modifiedbolt/src/DataReader.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 21ee0e98c716423a837e7c131e317ef4cca6a88b by maks
[BOLT] Fix symboltable update bug

Summary:
Commit "Update symbols for secondary entry points" introduced
a bug by using getBinaryFunctionContainingAddress() instead of
getBinaryFunctionAtAddress() regarding ICF'd functions. Only the latter
would fetch the correct BinaryFunction object for addresses of functions
that were ICF'd. As a result of this bug, the dynamic symbol table was
not updated for function symbols that were folded by ICF.

(cherry picked from FBD15112941)
The file was modifiedbolt/src/RewriteInstance.cpp
Commit 2b1523362eb03ac407b995af24e7e2f46f076ba1 by maks
[BOLT] Strip debug sections by default

Summary:
We used to ignore debug sections by default, but we kept them in the
binary which led to invalid debug information in the output. It's better
to strip debug info and print a warning to the user.

Note: we are not updating debug info by default due to high memory
requirements for large applications.

(cherry picked from FBD15128947)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/ExecutableFileMemoryManager.cpp
Commit f1dfd38dece6f1fd56a2c5d995bc3b41b7db6dc6 by maks
[BOLT][NFC] Move DynoStats out of BinaryFunction

Summary: Move DynoStats into separate source files.

(cherry picked from FBD15138883)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/CMakeLists.txt
The file was addedbolt/src/DynoStats.cpp
The file was addedbolt/src/DynoStats.h
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
Commit 310b32fbe57852506e7b9fd9409c0b08f0ffc20c by maks
[BOLT] Limit jump table size by containing object

Summary:
While checking for a size of a jump table, we've used containing
section as a boundary. This worked for most cases as typically jump
tables are not marked with symbol table entries. However, the compiler
may generate objects for indirect goto's.

(cherry picked from FBD15158905)
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 4b55967d9e99ed9425218b6d8a8bab5c2fc51682 by maks
[perf2bot] Pass `-f` flag to perf

Summary:
perf tool requires the input data to be owned by the current user or
root, otherwise it rejects the input. Use `-f` option to override this
behavior.

(cherry picked from FBD15160678)
The file was modifiedbolt/src/DataAggregator.cpp
Commit fee61231ef2cbb1afdae6e3fb73570bace964275 by maks
[BOLT] Move JumpTable management to BinaryContext

Summary:
Make BinaryContext responsible for creation and management of
JumpTables. This will be used for detection and resolution of jump table
conflicts across functions.

(cherry picked from FBD15196017)
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/JumpTable.h
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/JumpTable.cpp
Commit f1fde4415459573e59effe4b33854d46b8090556 by maks
[BOLT] Improve ICP activation policy and hot jt processing

Summary:
Previously, ICP worked with a budget of N targets to convert to
direct calls. As long as the frequency of up to N of the hottest targets
surpassed a given fraction (threshold) of the total frequency, say, 90%,
then the optimization would convert a number of targets (up to N) to
direct calls. Otherwise, it would completely abort processing this call
site. The intent was to convert a given fraction of the indirect call
site frequency to use direct calls instead, but this ends up being a
"all or nothing" strategy.

In this patch we change this to operate with the same strategy seem in
LLVM's ICP, with two thresholds. The idea is that the hottest target of
an indirect call site will be compared against these two thresholds: one
checks its frequency relative to the total frequency of the original
indirect call site, and the other checks its frequency relative to the
remaining, unconverted targets (excluding the hottest targets that were
already converted to direct calls). The remaining threshold is typically
set higher than the total threshold. This allows us more control over
ICP.

I expose two pairs of knobs, one for jump tables and another for
indirect calls.

To improve the promotion of hot jump table indices when we have memory
profile, I also fix a bug that could cause us to promote extra indices
besides the hottest ones as seen in the memory profile. When we have the
memory profile, I reapply the dual threshold checks to the memory
profile which specifies exactly which indices are hot. I then update N,
the number of targets to be promoted, based on this new information, and
update frequency information.

To allow us to work with smaller profiles, I also created an option in
perf2bolt to filter out memory samples outside the statically allocated
area of the binary (heap/stack). This option is on by default.

(cherry picked from FBD15187832)
The file was modifiedbolt/src/Passes/IndirectCallPromotion.cpp
The file was modifiedbolt/src/Passes/IndirectCallPromotion.h
The file was modifiedbolt/src/DataAggregator.cpp
Commit 4755825447d4b6920afebf405d2c9fe2cb4ffa3f by maks
Parse statically defined tracepoint markers from .note.stapsdt section

Summary:
    Parse statically defined tracepoints(SDT) markers from the ELF file, and store them.
    Add an option to print SDTs (-print-sdt).
    Add test case for parsing and printing SDTs.

(cherry picked from FBD15366712)
The file was modifiedbolt/src/RewriteInstance.h
The file was modifiedbolt/src/BinarySection.h
The file was modifiedbolt/src/BinaryContext.h
The file was modifiedbolt/src/RewriteInstance.cpp
Commit ca659e4336badb4bc324d9d64e17355563a34c13 by maks
Preserve nops that are SDT markers in binaries and disable SDT conflicting optimizations

Summary:
SDT markers that appears as nops in the assembly, are preserved and not eliminated.
Functions with SDT markers are also flagged. Inlining and folding are disabled for
functions that have SDT markers.

(cherry picked from FBD15379799)
The file was modifiedbolt/src/Passes/Inliner.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/Passes/IdenticalCodeFolding.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit d047df12c5fc22feb025f055e4a7f745c96f5efa by maks
[BOLT] Add an option to specialize memcpy() for 1 byte copy

Summary:
Add an option:

  -memcpy1-spec=func1,func2:cs1,func3:cs1:cs2,...

to specialize calls to memcpy() in listed functions (the name could be
supplied in regex) for size 1. The optimization will dynamically check
if the size argument equals to 1 and execute a one byte copy, otherwise
it will call memcpy() as usual. Specific call sites could be indicated
after ":" using their numeric count from the start of the function.

(cherry picked from FBD15428936)
The file was modifiedbolt/src/Passes/BinaryPasses.h
The file was modifiedbolt/src/Passes/RetpolineInsertion.cpp
The file was modifiedbolt/src/BinaryPassManager.cpp
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/MCPlusBuilder.h
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
Commit be344c8de7ca8529454fcf9c61b78ce1ce0f3307 by maks
[BOLT] Refactor handling of interproc refs

Summary:
Move handling of interprocedural references to BinaryContext.

Post-process indirect branches immediately after the CFG is built.

This is almost NFC. Since indirect branches are now post-processed
before the profile data is processed it interferes with the way the
profile data in YAML format is handled.

(cherry picked from FBD15456003)
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/BinaryContext.cpp
The file was modifiedbolt/src/BinaryContext.h
Commit f57d3c00fcb32859c632057d6a4b4c3807165658 by maks
[BOLT] Better verification of jump tables

Summary:
Run analyzeIndirectBranch() using basic block boundaries instead of
running ad-hoc validation of the jump table assumptions.

(cherry picked from FBD15465034)
The file was modifiedbolt/src/Target/X86/X86MCPlusBuilder.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
Commit e5b1d9cd8c8a78b7db0063c5be7a051387d49421 by maks
[BOLT][NFC] Fix white space

(cherry picked from FBD15485688)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/Passes/Inliner.h
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryData.h
The file was modifiedbolt/src/Passes/ValidateInternalCalls.cpp
The file was modifiedbolt/src/BinaryBasicBlock.h
The file was modifiedbolt/src/BinarySection.cpp
Commit c8038da36e2c2ebc95578d32d9a86e780114b2de by maks
Minor-fix: remove duplicate definition of SPT optimization timer
Summary:

(cherry picked from FBD28111560)
The file was modifiedbolt/src/Passes/FrameAnalysis.cpp
Commit 9ef9a7b1be107fc910e559b3aef3755a3b7221af by maks
[BOLT] Use regex matching for function names passed on command line

Summary:
Options such as `-print-only`, `-skip-funcs`, etc. now take regular
expressions. Internally, the option is converted to '^funcname$' form
prior to regex matching. This ensures that names without special
symbols will match exactly, i.e. "foo" will not match "foo123".

(cherry picked from FBD15551930)
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinaryFunction.h
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/BinaryFunction.cpp
Commit 3df2c9ea1f09de82f8455a429ce0c1adbad9304e by maks
Update SDT locations after bolt reordering

Summary: Update SDT locations in .note section to match the new location after bolt reorder the code.

(cherry picked from FBD15427779)
The file was modifiedbolt/src/BinaryFunction.cpp
The file was modifiedbolt/src/RewriteInstance.cpp
The file was modifiedbolt/src/Passes/BinaryPasses.cpp
The file was modifiedbolt/src/BinarySection.h