

## NeuroSoC Project

Irem Boybat, IBM Research Europe - Zurich NeuroEdge, 18th January 2024

ŘÌ







Schweizerische Eidgenossenschaft Confédération suisse Confederazione Svizzera Confederaziun svizra

This work was supported by European Union (Horizon Europe Grant Agreement n°101070634), Swiss State Secretariat for Education, Research and Innovation (SERI) under contracts number SBFI 22.00202 and 23.00205 and UK Research and Innovation (UKRI) under the UK government's Horizon Europe funding guarantee [grant number 10040829]

WHIMMIN MAN

#### Presentation overview

- NeuroSoC project overview and rationale
- Al at the edge and promise of in-memory computing
- Building blocks of the edge SoC
  - o Computational phase-change memory technology
  - o Analog in-memory computing tiles based on phase-change memory devices
  - o NeuroSoC SoC architecture
  - o Algorithms and software tools
  - o Applications requirements, integration, and use- cases demonstrations



About NeuroSoC

NeuroSoC stands for:

## A multiprocessor System-on-Chip with In-Memory neural processing unit

A 42-month EU/UKRI/Switzerland funded project aiming at using Phase Change Memory and FD-SOI 28 nm technologies to develop an advanced multiprocessor System-on-Chip



#### NeuroSoC at a glance

Call and Topic/Activity:

🏶 GA number:

Type of action:

Project cost:

Duration:

🏶 Website:

HORIZON-CL4-2021-DIGITAL-EMERGING-01-01 – Ultra-low-power, secure processors for edge computing (RIA)
101070634
RIA (Research & Innovation Action)
7 952 677 EUR (only beneficiaries)
42 months; start 1 September 2022
www.neurosoc.eu



# An European strong value chain



st h

## NeuroSoC Rationale

Significant research on highly energy efficient and lowlatency non-von Neumann computing paradigms such as inmemory computing (IMC)



Develop a flexible computing system where an analog IMC-based neural processing unit is integrated into a multi-processor functional safe and secure system-on-chip

To tackle the requirements of a wide set of edge-Al applications.

Relying on a solid, mature, and qualified reliable Phase Change Memory technology

> Will enable the creation of an industrially proven path answering to the level of maturity need compatible with a mass volume production and cost



#### ST roadmap towards the Al at the edge

| s        | STM32                        |                                              |                                                                  |                                                    | STM32 portfolio                                                  |                                                                                                       |                                                                                  |                                                |
|----------|------------------------------|----------------------------------------------|------------------------------------------------------------------|----------------------------------------------------|------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------------------------------------------|
| ¢        | MPU                          | (100 ×<br>VEARS) *<br>COMMITTEE              |                                                                  |                                                    |                                                                  | Up to<br>209                                                                                          | TM32MP1 S<br>0 1 GHz Cortex-A7<br>MHz Cortex-M4 400                              | TM32MP2<br>.5 GHz Cortex-A35<br>MHz Cortex-M33 |
| *        | High-<br>performance<br>MCUs |                                              |                                                                  | STM32F2<br>Up to 398 CoreMark<br>120 MHz Cortex-M3 | STM32F4<br>Up to 608 CoreMark<br>180 MHz Cortex-M4               | STM32F7<br>1082 CoreMark<br>216 MHz Cortex-M7<br>STM32H5<br>Up to 1023 CoreMark<br>250 MHz Cortex-M33 | STM32H7<br>Up to 3224 CoreMark<br>Up to 550 MHz Cortex -M7<br>240 MHz Cortex -M4 | STM32N6<br>MCU with neural<br>processing unit  |
| <b>》</b> | Mainstream<br>MCUs           |                                              |                                                                  | STM32F3<br>245 CoreMark<br>72 MHz Cortex-M4        | STM32G4<br>569 CoreMark<br>170 MHz Cortex-M4                     |                                                                                                       |                                                                                  | Mixed-signal MCUs                              |
|          |                              | STM32C0<br>114 CoreMark<br>48 MHz Cortex M0+ | STM32F0<br>106 CoreMark<br>48 MHz Cortex-M0                      | STM32G0<br>142 CoreMark<br>64 MHz Cortex-M0+       | STM32F1<br>177 CoreMark<br>72 MHz Cortex-M3                      |                                                                                                       |                                                                                  |                                                |
|          | ltra-low-power<br>MCUs       |                                              | STM32L0<br>75 CoreMark<br>32 MHz Cortex-M0+                      | STM32L4<br>273 CoreMark<br>80 MHz Cortex-M4        | STM32L4+<br>409 CoreMark<br>120 MHz Cortex-M4                    | STM32L5<br>443 CoreMark<br>110 MHz Cortex-M33                                                         | STM32U5<br>651 CoreMark<br>160 MHz Cortex-M33                                    |                                                |
| 9        | Wireless<br>MCUs             |                                              | STM32WL<br>162 CoreMark<br>48 MHz Cortex-M4<br>48 MHz Cortex-M0+ | STM32WB0<br>64 MHz Cortex-M0+                      | STM32WB<br>216 CoreMark<br>64 MHz Cortex-M4<br>32 MHz Cortex-M0+ | STM32WBA<br>407 CoreMark<br>100 MHz Cortex-M33                                                        |                                                                                  |                                                |
|          | 57                           | Latest                                       | product generation                                               | Radio co-processor only                            | New series intr                                                  | roduced in 2023                                                                                       | Pre-announcement                                                                 |                                                |

STM32N6 upcoming general-purpose microcontroller with ST Neural-Art Accelerator™, a Neural Processing Unit



## Evolution of Neural Processing Units (NPU)



۶<mark>۳</mark>



#### Analog in-memory computing basics

M columns/bitlines



N rows/wordlines

## IBM HERMES project chip



nit-cells)

- Each of the 64 cores comprises 256x256 crossbar arrays of unit cells with peripheral circuitry (4M unit-cells)
- On-chip local and global digital processing as well as a communication fabric
- Each unit cell comprises of four phase-change memory devices (16M PCM devices)



#### IBM Heterogeneous architecture with 2D-mesh



- A heterogeneous architecture that combines AIMC compute cores with special-function compute cores for auxiliary digital computation
- A dense and efficient circuit-switched 2D mesh serves as the communication fabric

11



#### IBM AI HW Kit

#### Rasch et al., Proc. AICAS (2021) Le Gallo et al., APL Machine Learning (2023)

#### Overview

#### https://github.com/IBM/aihwkit

- Simulator that focuses on the algorithmic level and algorithmic advances of Analog in-memory computing
- AIMC training and inference simulations
- Bring your own models and datasets to evaluate the impact of emerging AIMC hardware on your DL workloads using the flexibility of PyTorch



#### Roadmap

- Additional neural network layers
- Algorithmic advances to improve training and inference accuracy
- Premium hardware demonstrations

Real hardware demonstrations

- The IBM Analog HW Acceleration Kit is an excellent tool for developing and testing algorithms for hardwareaware training
- Equipped with an inference simulator with drift and statistical (programming) noise models calibrated on hardware, direct HW access will be enabled in the near future
- Full GPU support and substantial online documentation

#### Presentation overview

- NeuroSoC project overview and rationale
- Al at the edge and promise of in-memory computing
- Building blocks of the edge SoC
  - o Computational phase-change memory technology
  - o Analog in-memory computing tiles based on phase-change memory devices
  - o NeuroSoC SoC architecture
  - o Algorithms and software tools
  - o Applications requirements, integration, and use- cases demonstrations

## Focus on the PCM memory

- Characterization and modelling of a Phase Change Memory (PCM) device developed by ST-I in FD-SOI 28nm technology as building block of the In-Memory Computing (IMC) tile
- Optimization vs temporal drift and noise
- Statistical evaluation of programming algorithms, current distributions, and reliability of the analog IMC PCM cell
- Characterization of the computational precision and compensation (drift/read noise/temperature dependence)
- Development the analog IMC tile



#### Phase-change memory



Amorphous

Disordered, high resistance



Commonly used phase change materials



Wuttig & Yamada, Nature Materials, 2007 Le Gallo et al., J. Phys. D, 2020

- A nanometric volume of phase change material between two electrodes
- A reversible phase transition is induced via Joule heating between crystalline (SET) and amorphous phases (RESET)
- Continuum of conductance levels can be achieved via intermediate phase configurations



### ST High Density Embedded PCM Cell in 28nm FDSOI



## NeuroSoC SoC Architecture



NeuroSoC System on Chip system level architecture comprises of:

- Cluster of PCM analog in-memory computing tiles
- Non-volatile memory and SRAM memory support
- Functional safe host processor
- Specialized digital processing units
- RISC-V co-processor

### IMC PCM tile

- Leveraging the multilevel PCM device to design an analog IMC tile
  - Definition of the unit cell and a suitable array structure
  - Design of the associated digital and analog circuits
  - Anticipate inputs from security analysis to make the resulting IMC tile more robust against side channels attacks and for improved security





## RISC-V Co processor features

- Complement the computing capabilities of the analog in-memory computing (AIMC) tiles and other specialized digital processing units (DPUs) present in the IMNPU
- Handles the execution of Deep Neural Network (DNN)-related workloads that must be executed at higher dynamic range for accuracy concerns, exploiting the floating-point arithmetic
- Supports various activation functions (ReLu, sigmoid, tahn), complex layers such as upsampling, depth-wise, softmax





#### NeuroSoC Toolchain Optional inputs Existing trained deep learning models Hardware Training data constraints К 🖧 🕅 30 ONNX conversion Layer leuroSoC Platfe Generation of Emulation model, source code, hardware configuration Execution NeuroSoC Execution on emulator platform INMPU silicon

#### 🏶 Software toolchain

- From high level network description
- Converted in an intermediate format
- Optimized for the specific platform (reconfigurable)
- Performance and functional emulation
- Execution on hardware once available

neur%SoC

# Applications requirements, integration, and use- cases demonstrations

- Investigate edge-applications where NeuroSoC can offer a compelling advantage.
  - Selection and qualification of applications.
  - Benchmarking of SoA and emerging solutions.
  - Proposition of an evaluation framework.
    - Toolchains, Accuracy, Power, Size, Throughput
  - Assessment of performances vs requirements.



ERGO

0.1

Watts (W

Maxim Integrated

MAX7800

0.01

NDP2

0.01

0.001

STMicroelectronics

Kneion

KL520 NPU

Greenwaves

GAP9

**STM 32** 

neur%SoC

**RK358** 

NXP

Kneron KL720 NPU

i.MX 8M Plus

Xilin

Kria 260



### Contact details

#### STMicroelectronics

- 🏶 M. Giulio Urlini, Project Coordinator
- giulio[dot]urlini[at]stmicroelectronics[dot]com

#### 🏶 Benkei

- Mrs Fabienne Brutin, Project administrative manager,
- fabienne[at]benkei[dot]fr

۶<mark>۳</mark>

## Acknowledgments

The NeuroSoC project is supported by European Union (Horizon Europe Grant Agreement n°101070634), Swiss State Secretariat for Education, Research and Innovation (SERI) under contracts number SBFI 22.00202 and 23.00205 and UK Research and Innovation (UKRI) under the UK government's Horizon Europe funding guarantee [grant number 10040829]



Funded by the European Union



Schweizerische Eidgenossenschaft Confédération suisse Confederazione Svizzera Confederaziun svizra

Stay tuned: <a href="https://www.linkedin.com/company/neurosoc">www.neurosoc.eu</a>, <a href="https://www.linkedin.com/company/neurosoc">https://www.linkedin.com/company/neurosoc</a>