# **Fast micro-architectural simulation**

# on FPGAs using dynamic binary modification.

Oscar Palomar, John Mawer, Konstantinos Iordanou, Cosmin Gorgovan, Andy Nisbet, Will Toms and Mikel Lujan School of Computer Science, The University of Manchester {first}.{last}@manchester.ac.uk

MANCHESTER 1824

The University of Manchester



Engineering and Physical Sciences **Research Council** 

#### **1. New Architecture Challenges** • Advances in run-times, compilers, microarchitecture, and chip fabrication? – Delivering small improvements. • Transferred to FPGA IP: • Increased design/implementation complexity?

## 4. Simulator Design Overview

- MAMBO dynamic instrumentation captures Load/store events including push/pop.
- PC changing events branches, calls, returns.
- Events passed to userspace IP management.
  - CPU pipeline model (e.g CortexA7 in-order CPU)
- Power, performance, fault resilience/wearout issues.
- Heterogeneity in microarchitectures, GPUs, DSPs.
- Emergent workloads (computer vision, bigdata).
- HW/SW co-design as a potential solution but it's hard, vertical expertise needed!

### 2. HW/SW Co-Design

- Creating an extensible co-designed infrastructure which:
  - Efficiently evaluate  $\mu$ -architecture IP.
  - Evaluate IP on real applications.
  - Directly use FPGA implementation of IP.
  - Easily support ISA extensions, custom hardware accelerators and entire processor cores.

- Configurable coherent cache memory simulation.
- Performance statistics for the pipeline model and the coherent cache System Hierarchy.



#### **3. Mast Library**

A C++ library and hardware interface standard/Bluespec library which:

- Discovers Hardware Blocks.
- Allows discovery of MAST compliant IP on an FPGA.
- Allows management of compliant hardware e.g. locking blocks, reconfiguration of FPGA.
- Provides a userSpace interface to hardware.
- Enables IP accelerator blocks to be easily and efficiently integrated into applications.





#### • Largest overall deviation is 1.5%

| management                                                                                                                                         | Out -of order     Cortex A15                                   |
|----------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| <ul> <li><u>FPGA hardware Prototyping</u></li> <li>Branch Predictors</li> <li>Slam Filter</li> <li>ISA extensions</li> <li>accelerators</li> </ul> | <ul> <li>Power</li> <li>Temperature</li> <li>Faults</li> </ul> |

### 7. References

Cosmin Gorgovan, Amanieu d'Antras, and Mikel Luján. Mambo: A low-overhead dynamic binary modification tool for arm. ACM Trans. Archit. Code Optim., 13(1):14:1–14:26, April 2016.

[2] J. Mawer, O. Palomar, C. Gorgovan, A. Nisbet, W. Toms, and M. Luján. The potential of dynamic binary modification and cpu-fpga socs for simulation. In 2017 IEEE 25th FCCM Annual International Symposium, April 2017.