[TETC] Near-Memory Computing with Compressed Embedding Table for Perso…

SMRL 0 234 2023.12.20 15:15

Jeongmin Lim, Young Geun Kim, Sung Woo Chung, Farinaz Koushanfar, and Joonho Kong, "Near-Memory Computing with Compressed Embedding Table for Personalized Recommendation", IEEE Transactions on Emerging Topics in Computing, Accepted.

Abstract

Deep learning (DL)-based recommendation models play an important role in many real-world applications. However, an embedding layer, which is a key part of the DL-based recommendation models, requires sparse memory accesses to a very large memory space followed by the pooling operations (i.e., reduction operations). It makes the system overprovision memory capacity for model deployment. Moreover, ith conventional CPU-based architecture, it is difficult to exploit the locality, causing a huge burden for data transfer between the CPU and memory. To resolve this problem, we propose an embedding vector element quantization and compression method to reduce the memory footprint (capacity) required by the embedding tables. In addition, to reduce the amount of data transfer and memory access, we propose near-memory acceleration hardware with an SRAM buffer that stores the frequently accessed embedding vectors. Our quantization and compression method results in compression ratios of 3.95–4.14 for embedding tables in widely used datasets while negligibly affecting the inference accuracy. Our acceleration technique with 3D stacked DRAM memories, which facilitates the near-memory processing in the logic die with high DRAM bandwidth, leads to 4.9×–5.4× embedding layer speedup as compared to the 8-core CPU-based execution while reducing the memory energy consumption by 5.9×–12.1×, on average.

Comments

로그인한 회원만 댓글 등록이 가능합니다.

번호	제목	글쓴이	날짜	조회
12	[ISLPED] Exploring the Relation between Monolithic 3D L1 GPU…	SMRL	05.07	1913
11	[ICCD] A High-Performance Processing-in-Memory Accelerator f…	SMRL	09.23	1767
10	[ISLPED] Temperature-aware Adaptive VM Allocation in Heterog…	SMRL	05.07	1714
9	[TC] Signal Strength-aware Adaptive Offloading with Local Im…	SMRL	09.02	1411
8	[TC] An Adaptive Thermal Management Framework for Heterogene…	SMRL	01.27	1156
7	[ESL] Enhancing Matrix Multiplication with a Monolithic 3D B…	SMRL	05.11	1133
6	[ESL] IDRA: An In-storage Data Reorganization Accelerator fo…	SMRL	03.10	1004
5	[MICRO] On-demand Mobile CPU Cooling with Thin-Film Thermoel…	SMRL	02.22	970
4	[ESL] Quant-PIM: An Energy-efficient Processing-in-memory Ac…	SMRL	01.07	959
3	[DATE] Stealth ECC: A Data-Width Aware Adaptive ECC Scheme f…	SMRL	11.11	797
2	[DATE] Twin ECC: A Data Duplication Based ECC for Strong DRA…	SMRL	11.16	616
열람중	[TETC] Near-Memory Computing with Compressed Embedding Table…	SMRL	12.20	235

Category

Publication Highlights

[TETC] Near-Memory Computing with Compressed Embedding Table for Perso…

Comments