Monday, June 23 – Wednesday, June 25: Main Program
Comming Soon!
List of Accepted Papers
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIM
Authors: Yuanpeng Zhang (Peking University); Xing Hu (houmo.ai); Xi Chen (Southeast University); Zhihang Yuan (houmo.ai); Cong Li, Jingchen Zhu, Zhao Wang, Chenguang Zhang (Peking University); Xin Si (Southeast University); Wei Gao, Qiang Wu (houmo.ai); Runsheng Wang, Guangyu Sun (Peking University)
Area Bloating and the Future of Specialization
Authors: Qixuan Yu, David Wentzlaff (Princeton University)
MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
Authors: Akshat Ramachandran (Georgia Institute of Technology); Souvik Kundu (Intel Labs); Tushar Krishna (Georgia Institute of Technology)
Avalanche: Optimizing Cache Utilization via Matrix Reordering for Sparse Matrix Multiplication Accelerator
Authors: Gwangeun Byeon, Seongwook Kim, Hyungjin Kim, Sukhyun Han (Sungkyunkwan University); Jinkwon Kim (KAIST); Prashant Nair (University of British Columbia); Taewook Kang, Seokin Hong (Sungkyunkwan University)
Synchronization for Fault-Tolerant Quantum Computers
Authors: Satvik Maurya, Swamit Tannu (University of Wisconsin-Madison)
Efficient and Scalable Quantum Circuit Simulator using Computational Reuse
Authors: Meng Wang (University of British Columbia); Swamit Tannu (U. of Wisconsin); Prashant J Nair (The University of British Columbia (UBC))
Light-weight Cache Replacement for Instruction Heavy Workloads
Authors: Daniel A. Jimenez (Texas A&M University and BSC and ARM); Setu Gupta, Ahmad Hassani, Saba Mostofi, Paul Gratz (Texas A&M University); Elvira Teran (Texas A&M International University); Krishnam Tibrewala (Texas A&M University and AMD)
Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers
Authors: Junyao Zhang (Duke University); Hanrui Wang (University of California, Los Angeles); Qi Ding (Massachusetts Institute of Technology); Jiaqi Gu (Arizona State University); Reouven Assouly, William D. Oliver, Song Han (Massachusetts Institute of Technology); Kenneth R. Brown, Hai "Helen" Li, Yiran Chen (Duke University)
The XOR Cache: A Catalyst for Compression
Authors: Zhewen Pan, Joshua San Miguel (University of Wisconsin-Madison)
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Authors: Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu (Shanghai Jiao Tong University); Guohao Dai (Shanghai Jiao Tong University; Infinigence-AI)
MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Counting
Authors: Suhas Vittal (Georgia Tech); Salman Qazi (Google); Poulami Das (UT Austin); Moin Qureshi (Georgia Tech)
Enabling Ahead Prediction with Practical Energy Constraints
Authors: Chester(Lingzhe) Cai, Aniket Deshmukh, Yale Patt (UT Austin)
WindServe: Efficient Phase-Disaggregated LLM Serving with Stream-based Dynamic Scheduling
Authors: Jingqi Feng, Yukai Huang, Rui Zhang, Sicheng Liang, Ming Yan, Jie Wu (Fudan University)
ANVIL: An In-Storage Accelerator for Name.Value Data Stores
Authors: Ryan Wong (University of Illinois Urbana-Champaign); Nikita Kim (Carnegie Mellon University/NVIDIA); Aniket Das, Kevin Higgs (University of Illinois Urbana-Champaign); Engin Ipek (Samsung); Sapan Agarwal (Sandia National Laboratories); Saugata Ghose (University of Illinois Urbana-Champaign); Ben Feinberg (Sandia National Laboratories)
ArtMem: Adaptive Migration in Reinforcement Learning-Enabled Tiered Memory
Authors: Xinyue Yi (Xiamen University); Hongchao Du (City University of Hong Kong); Yu Wang (Xiamen University); Jie Zhang (Peking University); Qiao Li (Xiamen University); Chun Jason Xue (Mohamed bin Zayed University of Artificial Intelligence)
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
Authors: Dahu Feng (Tsinghua university); Erhu Feng, Dong Du (Shanghai Jiao Tong University); Pinjie Xu (SenseTime Research); Yubin Xia, Haibo Chen (Shanghai Jiao Tong University); Rong Zhao (Tsinghua University)
Lumina: Real-Time Neural Rendering by Exploiting Computational Redundancy
Authors: Yu Feng (Shanghai Jiao Tong University); Weikai Lin (University of Rochester); Yuge Cheng, Zihan Liu, Jingwen Leng (Shanghai Jiao Tong University); Minyi Guo (Shanghai Jiaotong University); Chen Chen, Shixuan Sun (Shanghai Jiao Tong University); Yuhao Zhu (University of Rochester)
Assassyn: A Unified Abstraction for Architectural Simulation and Implementation
Authors: Jian Weng (KAUST); Boyang Han (CUHK); Derui Gao (KAUST); Ruijie Gao (University of Glasgow); Wanning Zhang (Tsinghua University); An Zhong (Jilin University); Ceyu Xu (Duke University); Jihao Xin (King Abdullah University of Science and Technology); Yangzhixin Luo (KAUST); Lisa Wu Wills (Duke University); Marco Canini (KAUST)
USPS: Universal Predicate Pushdown to Smart Storage
Authors: Ipoom Jeong (Yonsei University); Jinghan Huang, Chuxuan Hu, Dohyun Park, Jaeyoung Kang, Nam Sung Kim, Yongjoo Park (UIUC)
LightNobel: Overcoming Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Authors: Seunghee Han, Soongyu Choi, Joo-Young Kim (KAIST)
When Mitigations Backfire: Timing Channel Attacks and Defense for PRAC-Based Rowhammer Mitigations
Authors: Jeonghyun Woo (The University of British Columbia); Joyce Qu, Gururaj Saileshwar (University of Toronto); Prashant Nair (The University of British Columbia)
H -LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
Authors: Cong Li (Peking University); Yihan Yin (Peking university); Xintong Wu, Jingchen Zhu (Peking University); Zhutianya Gao (Shanghai Jiao Tong University); Dimin Niu (Alibaba Group Inc.); Qiang Wu (Houmo AI); Xin Si (SouthEast University); Yuan Xie (HKUST); Chen Zhang (Shanghai Jiao Tong University); Guangyu Sun (Peking University)
The Sparsity-Aware LazyGPU Architecture
Authors: Changxi Liu, Yu Miao (National University of Singapore); Yifan Sun (William & Mary); Trevor E. Carlson (National University of Singapore)
Evaluating Ruche Networks: Physically Scalable, Cost-Effective, Bandwidth-Flexible NoCs
Authors: Dai Cheol Jung, Michael Taylor (University of Washington)
Heliostat: Harnessing Ray Tracing Accelerators for Page Table Walks
Authors: Yuan Feng, Yuke Li (University of California, Merced); Jiwon Lee (Samsung Electronics); Won Woo Ro (Yonsei University); Hyeran Jeon (University of California, Merced)
Chip Architectures Under Advanced Computing Sanctions
Authors: August Ning, David Wentzlaff (Princeton University)
ANSMET: Approximate Nearest Neighbor Search with Near-Memory Processing and Hybrid Early Termination
Authors: Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, Mingyu Gao (Tsinghua University)
Neoscope: How Resilient Is My SoC to Workload Churn?
Authors: Joseph Rogers (Norwegian University of Science and Technology); Lieven Eeckhout (Ghent University); Taha Soliman (Bosch GmbH); Magnus Jahre (Norwegian University of Science and Technology)
DX100: Programmable Data Access Accelerator for Indirection
Authors: Alireza Khadem (University of Michigan); Kamalavasan Kamalakkannan (Los Alamos National Lab); Zhenyan Zhu, Akash Poptani, Yufeng Gu (University of Michigan); Jered Benjamin Dominguez-Trujillo (Los Alamos National Lab); Nishil Talati (University of Michigan/AMD Research); Daichi Fujiki (Institute of Science Tokyo); Scott Mahlke (University of Michigan/Nvidia Research); Galen Shipman (Los Alamos National Lab); Reetuparna Das (University of Michigan)
Cramming a data center into One Cabinet, a Co-Exploration of Computing and Hardware Architecture of Waferscale-Chip
Authors: Xingmao Yu, Dingcheng Jiang, Jinyi Deng (Tsinghua University); Jingyao Liu (Tsinghua); Yang Hu (Tsinghua University); Chao Li (SJTU); Shouyi Yin (Tsinghua)
DiTile-DGNN: An Efficient Accelerator for Distributed Dynamic Graph Neural Network Inference
Authors: Jiaqi Yang (The George Washington University); Hao Zheng (University of Central Florida); Ahmed Louri (George Washington U.)
NeuSET: An Accelerator for Neural Scene Representation with Sparse Encoding Table
Authors: Tianbo Liu (University of Science and Technology of China); Xinkai Song (Institute of Computing Technology, Chinese Academy of Sciences); Zhifei Yue (University of Science and Technology of China); Rui Wen, Xing Hu (Institute of Computing Technology, Chinese Academy of Sciences); Zhuoran Song (Shanghai Jiao Tong University); Yuanbo Wen (Institute of Computing Technology, Chinese Academy of Sciences); Yifan Hao (ICT, Chinese Academy of Sciences); Wei Li, Zidong Du, Rui Zhang, Jiaming Guo (Institute of Computing Technology, Chinese Academy of Sciences); Di Huang (Chinese Academy of Sciences, Institute of Computing Technology); Shaohui Peng (Institute of Software, CAS); GuangZhong Sun (University of Science and Technology of China); Qi Guo (Institute of Computing Technology, Chinese Academy of Sciences); Tianshi Chen (Cambricon Technologies, Beijing, China)
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Authors: Minsu Kim (KAIST); Seongmin Hong (HyperAccel); RyeoWook Ko, Soongyu Choi (KAIST); Hunjong Lee, Junsoo Kim, Joo-Young Kim (HyperAccel); Jongse Park (KAIST)
QR-Map: A Map-Based Approach to Quantum Circuit Abstraction for Qubit Reuse Optimization
Authors: Hyungseok Kim, Enhyeok Jang, Seungwoo Choi, Youngmin Kim, Won Woo Ro (Yonsei University)
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs
Authors: Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bunz, Ramesh Karri, Siddharth Garg, Brandon Reagen (New York University)
SWIPER: Minimizing Fault-Tolerant Quantum Program Latency via Speculative Window Decoding
Authors: Joshua Viszlai, Jason Chadwick, Sarang Joshi (University of Chicago); Gokul Ravi (University of Michigan); Yanjing Li, Fred Chong (University of Chicago)
Fair-CO2: Fair Attribution for Cloud Carbon Emissions
Authors: Leo Han (Cornell); Jash Kakadia, Benjamin C. Lee (University of Pennsylvania); Udit Gupta (Cornell)
Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression
Authors: Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou (Duke University); Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu (AMD); Hai "Helen" Li, Yiran Chen (Duke University)
Chimera: Communication Fusion for Hybrid Parallelism in Large Language Models
Authors: Le Qin (Hong Kong University of Science and Technology (Guangzhou)); Junwei Cui, Weilin Cai, Jiayi Huang (The Hong Kong University of Science and Technology (Guangzhou))
Dynamic Load Balancer in Intel Xeon Scalable Processor: Performance Analyses, Enhancements, and Guidelines
Authors: Jiaqi Lou, Srikar Vanavasam (UIUC); Yifan Yuan (Meta); Ren Wang (Intel Labs); Nam Sung Kim (UIUC)
Single Spike ANNS for Energy Efficient Inference at the Edge
Authors: Rhys Gretsch, Michael Beyeler, Jeremy Lau, Timothy Sherwood (UC Santa Barbara)
XHarvest: Rethinking High-Performance and Cost-Efficient SSD Architecture with CXL-Driven Harvesting
Authors: Li Peng (Peking University); Wenbo Wu (Huazhong University and Science and Technology); Shushu Yi (Peking University); Xianzhang Chen (Chongqing University); Chenxi Wang, Shengwen Liang (Institute of Computing Technology, Chinese Academy of Sciences); Zhe Wang (Institute of Computing Technology, CAS); Nong Xiao (Sun Yat-sen University); Qiao Li (Xiamen University); Mingzhe Zhang (Institute of Information Engineering, Chinese Academy of Sciences); Jie Zhang (Peking University)
Garibaldi: A Pairwise Instruction-Data Management Scheme for Enhancing Shared Last-Level Cache Performance in Server Workloads
Authors: Jaewon Kwon, Yongju Lee, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro (Yonsei University)
PuDHammer: Experimental Analysis of Read Disturbance Effects of Processing-using-DRAM in Real DRAM Chips
Authors: Ismail Emir Yuksel, Akash Sood, Ataberk Olgun (ETH Zurich); O.uzhan Canpolat (TOBB ETU and ETH Zurich); Haocong Luo, Nisa Bostanci (ETH Zurich); Mohammad Sadrosadati (ETH Zurich); Giray Yaglikci, Onur Mutlu (ETH Zurich)
FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypE
Authors: Haomin Li, Fangxin Liu (Shanghai Jiao Tong University); Yichi Chen (Tianjin University); Zongwu Wang, Shiyuan Huang, Ning Yang (Shanghai Jiao Tong University); Dongxu Lyu, Li Jiang (Shanghai Jiaotong University)
In-Storage Acceleration of Retrieval Augmented Generation as a Service
Authors: Rohan Mahapatra, Harsha Santhanam, Christopher Priebe, Hanyang Xu, Hadi S. Esmaeilzadeh (UCSD)
Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design
Authors: Tianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xiaotao Jia, Chunming Hu, Weisheng Zhao (Beihang University)
SEAL: A Single-Event Architecture for In-Sensor Visual Localization
Authors: Ryan Hou (University of Wisconsin-Madison; University of Michigan); Thomas Twomey (University of Michigan); Vasileios Milionis (University of Patras); Evangelos Dikopoulos (University of Michigan); Tianrui Ma (Chinese Academy of Sciences); Yuhao Zhu (University of Rochester); Georgios Tzimpragos (University of Wisconsin-Madison)
Hardware-aware Calibration Protocol for Quantum Computers
Authors: Yuchen Zhu (Rensselaer Polytechnic Institute); Jinglei Cheng (University of Pittsburgh); Boxi Li (Forschungszentrum Julich); Kecheng Liu, Yidong Zhou (Rensselaer Polytechnic Institute); Hanrui Wang (UCLA); Yufei Ding (UCSD); Zhiding Liang (Rensselaer Polytechnic Institute)
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
Authors: Arash Nasr-Esfahany, Mohammad Alizadeh (MIT & Google); Victor Lee, Hanna Alam, Brett W. Coon, David Culler, Vidushi Dadu, Martin Dixon (Google); Henry M. Levy (Google & University of Washington); Santosh Pandey (Rutgers University); Parthasarathy Ranganathan (Google); Amir Yazdanbakhsh (Google DeepMind)
Rethinking Prefetching for Intermittent Computing
Authors: Gan Fang (Purdue University); Jianping Zeng (Samsung Electronics); Aditya Gupta, Changhee Jung (Purdue University)
End-to-End Analysis of Low-Overhead Transversal Architectures for Reconfigurable Atom Arrays
Authors: Hengyun Zhou, Casey Duckering, Chen Zhao (QuEra Computing); Dolev Bluvstein, Madelyn Cain (Harvard University); Aleksander Kubica (Yale University); Sheng-Tao Wang (QuEra Computing); Mikhail D. Lukin (Harvard University)
NetCrafter: Tailoring Network Traffic for Non-Uniform Bandwidth Multi-GPU Systems
Authors: Amel Fatima, Yang Yang (University of Virginia); Yifan Sun (William & Mary); Rachata Ausavarungnirun (MangoBoost Inc.); Adwait Jog (University of Virginia)
OptiPIM: Optimizing Processing In-Memory Acceleration Using Integer Linear Programming
Authors: Jiantao Liu (ETH Zurich); Minxuan Zhou (Illinois Institute of Technology); Yue Pan (University of California, San Diego); Chien-Yi Yang (University of California San Diego); Lana Josipovic (ETH Zurich); Tajana Rosing (UCSD)
CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error Correction
Authors: Xiang Fang, Keyi Yin, Yuchen Zhu, Jixuan Ruan, Dean Tullsen (University of California San Diego); Zhiding Liang (Rensselaer Polytechnic Institute); Andrew Sornborger (LANL); Ang Li (PNNL); Travis Humble (Quantum Science Center, Oak Ridge National Laboratory); Yufei Ding (University of California San Diego); Yunong Shi (AWS Quantum Technologies)
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
Authors: Cong Guo, Chiyue Wei (Duke University); Jiaming Tang (MIT); Bowen Duan (Duke University); Song Han (MIT); Hai "Helen" Li (Duke University); Yiran Chen (Duke)
HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches
Authors: Xintong Li, Zhiyao Li, Mingyu Gao (Tsinghua University)
PD Constraint-aware Physical/Logical Topology Co-Design for Network on Wafer
Authors: Qize Yang, Taiquan Wei, Sihan Guan, Chengran Li, Haoran Shang, Jinyi Deng, Huizheng Wang (Tsinghua University); Chao Li (SJTU); Lei Wang, Yan Zhang (Shanghai Artificial Intelligence Laboratory); Shouyi Yin, Yang Hu (Tsinghua University)
Distributed Quantum Computing in Quantum Data Center with Reconfigurable Fabric
Authors: Hezi Zhang, Yiran Xu, Haotian Hu, Keyi Yin (University of California San Diego); Hassan Shapourian (Cisco); Jiapeng Zhao (Cisco Quantum Lab); Ramana Rao Kompella (Cisco Systems); Reza Nejabati (Cisco Quantum Lab); Yufei Ding (UCSD)
Forest: Access-aware GPU UVM Management
Authors: Mao Lin, Yuan Feng (University of California, Merced); Guilherme Cox (NVIDIA); Hyeran Jeon (UC-Merced)
Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs
Authors: Ali Hajiabadi (ETH Zurich); Trevor E. Carlson (National University of Singapore)
WarmCache: Exploiting STT-RAM Cache for Low-Power Intermittent Systems
Authors: Noureldin Hassan (University of Central Florida); Byounguk Min, Changhee Jung (Purdue University); Yan Solihin (U. of Central Florida); Jongouk Choi (University of Central Florida)
Debunking the CUDA Myth Towards GPU-centric AI Systems
Authors: Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho (KAIST); Huijong Jeong, Taesu Kim, Hyungjun Kim (SqueezeBits); Joonhyung Lee (NAVER); Jinseop Im, Ranggi Hwang (KAIST); Se Jung Kwon, Dongsoo Lee (NAVER); Minsoo Rhu (KAIST)
Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window
Authors: Seungjae Moon, Junseo Cha, Hyunjun Park, Joo-Young Kim (HyperAccel)
MD-pipe: A Strong Scaling Enhanced Pipeline Architecture for Ab Initio Accuracy Molecular Dynamics
Authors: Ning Kang, Guojun Yuan (Institute of Computing Technology, Chinese Academy of Sciences); Zihan Yan, Beining Zhang, Boyang Li, Zeyu Li (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences); Shuo Wang (Southwest University); Guanglei Chen (Institute of Computing Technology, Chinese Academy of Sciences); Jiayi Rao (University of Chinese Academy of Sciences); Zhan Wang, Weile Jia, Ninghui Sun, Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences)
LightML: A Photonic Accelerator for Efficient General Purpose Machine Learning
Authors: Liang Liu, Sadra Rahimi Kari (University of Pittsburgh); Xin Xin (U. of Central Florida); Nathan Youngblood, Youtao Zhang (University of Pittsburgh); Jun Yang (U. of Pittsburgh)
Magellan: A High-Performance Loop-Guided Prefetcher for Indirect Memory Access
Authors: Gelin Fu, Tian Xia, Mingzhuo Yu (Xi'an Jiaotong University); Prashant Nair, Mieszko Lis (The University of British Columbia); pengju ren (Xi'an Jiaotong University)
A Wafer-scale Fabric for 3D Parallel DNN Training
Authors: Saeed Rashidi (Meta); William Won (Georgia Institute of Technology); Sudarshan Srinivasan (Intel); Puneet Gupta (UCLA); Tushar Krishna (Georgia Tech)
NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly
Authors: Heewoo Kim (University of Colorado, Boulder); Sanjay Sri Vallabh Singapuram (University of Michigan); Haojie Ye (NVIDIA); Joseph Izraelevitz (University of Colorado, Boulder); Trevor Mudge, Ronald Dreslinski (University of Michigan); Nishil Talati (University of Michigan/AMD Research)
LUTensor: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
Authors: Zhiwen Mo (Shanghai Jiao Tong University and Microsoft Research); Lei Wang (Peking University and Microsoft Research); Jianyu Wei (University of Science and Technology of China and Microsoft Research); Zhichen Zeng (University of Washington and Microsoft Research); Shijie Cao, Lingxiao Ma (Microsoft Research); Naifeng Jing (Shanghai Jiao Tong University); Ting Cao, Jilong Xue, Fan Yang, Mao Yang (Microsoft Research)
FAST:An FHE Accelerator for Scalable-parallelism with Tunable-bit
Authors: Shengyu Fan, Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences); Liang Kong (Ant Research); Guiming Shi (Tsinghua University); Guang Fan (Ant Research); Rui Hou, Dan Meng (Institute of Information Engineering, Chinese Academy of Sciences); Mingzhe Zhang (Ant Research)
Leveraging control-flow similarity to reduce branch predictor cold effects in microservices
Authors: Haris Volos, Stylianos Vassiliou, Georgia Antoniou (University of Cyprus); Davide Basilio Bartolini (Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland); Yiannakis Sazeides (University of Cyprus)
GPUs All Grown-Up: Fully Device-Driven SpMV Using GPU Work Graphs
Authors: Fabian Wildgrube, Pete Ehrett, Paul Trojahn (AMD); Richard Membarth (Technische Hochschule Ingolstadt (THI)); Brad Beckmann, Dominik Baumeister (AMD); Matthaus Chajdas (AMD (Now at Intel))
NUPEA: Optimizing Critical Loads on Spatial Dataflow Architectures via Non-Uniform Processing-Element Access
Authors: Souradip Ghosh, Graham Gobieski, Keyi Zhang, Brandon Lucia, Nathan Beckmann, Tony Nowatzki (Efficient Computer Company)
Adaptive CHERI Compartmentalization for Heterogeneous Accelerators
Authors: Jianyi Cheng (University of Edinburgh); A. Theodore Markettos, Alexandre Joannou, Paul Metzger, Matthew Naylor, Peter Rugg, Timothy M. Jones (University of Cambridge)
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-constrained Pruning
Authors: Boxun Xu, Yuxuan Yin, Vikram Iyer, Peng Li (University of California, Santa Barbara)
AMALI: An Analytical Model for Accurately Modeling LLM Inference on Modern GPUs
Authors: Shiheng Cao (USTC); Zhibin Yu (Shenzhen Institutes of Advanced Technology(SIAT), Chinese Academy of Science(CAS)); Junshi Chen (University of Science and Technology of China); Hong An, Junmin Wu (USTC)
Genesis: A Compiler for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers
Authors: Zihan Chen, Jiakang Li, Minghao Guo, Henry Chen, Zirui Li (Rutgers University); Joel Bierman (North Carolina State University); Yipeng Huang (Rutgers University); Huiyang Zhou, Yuan Liu (North Carolina State University); Eddy Z. Zhang (Rutgers University)
Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors
Authors: Sunho Lee (KAIST); Seonjin Na (Georgia Institute of Technology); Jeongwon Choi, Jinwon Pyo, Jaehyuk Huh (KAIST)
InfiniMind: A Learning-Optimized Large-Scale Brain-Computer Interface
Authors: Yeongwoo Jang, Daye Jung, Seunghyun Song (Seoul National University); Hunjun Lee (Hanyang University); Jangwoo Kim (Seoul National University)
Achieving Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis
Authors: Hanna Cha, Sungchul Lee, Jounghoo Lee, Yeonan Ha (Yonsei University); Joonsung Kim (Sungkyunkwan University); Youngsok Kim (Yonsei University)
Constant-Rate Entanglement Distillation for Fast Quantum Interconnects
Authors: Christopher Pattison (California Institute of Technology); Gefen Baranes (MIT & Harvard University); Juan Pablo Bonilla Ataides, Mikhail D. Lukin (Harvard University); Hengyun Zhou (QuEra Computing)
Profile-Guided Temporal Prefetching
Authors: Mengming Li, Qijun Zhang (Hong Kong University of Science and Technology); Yichuan Gao (Intel Corporation); Wenji Fang, Yao Lu (Hong Kong University of Science and Technology); Yongqing Ren (Intel Corporation); Zhiyao Xie (Hong Kong University of Science and Technology)
HardHarvest: Hardware-Supported Core Harvesting for Microservices
Authors: Jovan Stojkovic (University of Illinois at Urbana-Champaign); Chunao Liu (Purdue University); Muhammad Shahbaz (Purdue University and University of Michigan); Josep Torrellas (University of Illinois at Urbana-Champaign)
RTSpMSpM: Harnessing Ray Tracing for Efficient Sparse Matrix Computations
Authors: Hongrui Zhang (University of California, Riverside); Yunan Zhang (Google); Hung-Wei Tseng (University of California, Riverside)
AiF: Accelerating On-Device LLM Inference Using In-Flash Processing
Authors: Jaeyong Lee, Hyunjoo Kim, Sanghoon Oh, Myoungjun Chun (Seoul National University); Myungsuk Kim (Kyungpook National University); Jihong Kim (Seoul National University)
Caravan: A Hardware/Software Co-Design for Efficient SIMD Neighbor Search on Point Clouds
Authors: Pedro Henrique Exenberger Becker, Franyell Silfa, Jose Maria Arnau, Antonio Gonzalez (Polytechnic University of Catalonia)
Hermes: Algorithm-System Co-design for Efficient Retrieval Augmented Generation At-Scale
Authors: Michael Shen, Muhammad Umar (Cornell University); Kiwan Maeng (Penn State); G. Edward Suh (NVIDIA, Cornell University); Udit Gupta (Cornell)
MeshFlow: Efficient 2D Tensor Parallelism for Distributed DNN Training
Authors: Hyoungwook Nam, Gerasimos Gerogiannis (University of Illinois at Urbana-Champaign); Josep Torrellas (UIUC)
Telos: A Dataflow Accelerator for Sparse Triangular Solver of Partial Differential Equations
Authors: Xiaochen Hao, Hao Luo, Chu Wang, Chao Yang, Yun Liang (Peking University)
DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign
Authors: Derrick Quinn, E. Ezgi Yucel (Cornell University); Martin Prammer (Carnegie Mellon University); Zhenxing Fan, Kevin Skadron (University of Virginia); Jignesh Patel (Carnegie Mellon University); Jose F. Martinez, Mohammad Alian (Cornell University)
IDEA-GP: Instruction-Driven Architecture with Efficient Online Workload Allocation for Geometric Perception
Authors: Suquan Zhang, Yu Hu, Yunfei Xiang, Dawei Zhao, Yuanfan Xu, Qingmin Liao, Jincheng Yu, Yu Wang (Tsinghua University)
Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core
Authors: Dian Jiao, Xianglong Deng, Zhiwei Wang, Shengyu Fan (Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences); Yi Chen (Huazhong University of Science and Technology); Dan Meng, Rui Hou (Chinese Academy); Mingzhe Zhang (Ant Research)
TrioSim: A Lightweight Simulator for Large-Scale DNN Workloads on Multi-GPU Systems
Authors: Ying Li (William & Mary); Yuhui Bao (Northeastern University); Gongyu Wang (Lightmatter Inc.); Xinxin Mei (Jefferson Lab); Pranav Vaid (Stanford University); Anandaroop Ghosh (Lightmatter Inc.); Adwait Jog (University of Virginia); Darius Bunandar (Lightmatter Inc.); Ajay Joshi (Boston University / Lightmatter Inc.); Yifan Sun (William & Mary)
EOD: Enabling Low Latency GNN Inference via Near-Memory Concatenate Aggregation
Authors: Taehwan Kim, Yunki Han, Seohye Ha, Jiwan Kim, Lee-Sup Kim (KAIST)
S-SYNC: Shuttle and Swap Co-Optimization in Quantum Charge-Coupled Devices
Authors: Chenghong Zhu, Xian Wu (Thrust of Artificial Intelligence, Information Hub,The Hong Kong University of Science and Technology (Guangzhou)); Jingbo Wang (Beijing Academy of Quantum Information Sciences); Xin Wang (Thrust of Artificial Intelligence, Information Hub,The Hong Kong University of Science and Technology (Guangzhou))
Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers
Authors: Yingheng Li, Yue Dai, Aditya Pawar, Rongchao Dong, Jun Yang, Youtao Zhang, Xulong Tang (University of Pittsburgh)
ARTERY: Fast Quantum Feedback using Branch Prediction
Authors: Wuwei Tian, Liqiang Lu, Siwei Tan (Zhejiang University); Yun Liang (Peking University); Tingting Li, Kaiwen Zhou, Xinghui Jia, Jianwei Yin (Zhejiang University)
Reconfigurable Stream Network Architecture
Authors: Chengyue Wang (UCLA); Xiaofan Zhang (Google); Jason Cong (UCLA); James C. Hoe (MangoBoost and Carnegie Mellon University)
HeterRAG: Heterogeneous Processing-in-Memory Acceleration for Retrieval-augmented Generation
Authors: Chaoqiang Liu, Haifeng Liu (Huazhong University of Science and Technology); Dan Chen (National University of Singapore); Yu Huang, Yi Zhang (Huazhong University of Science and Technology); Wenjing Xiao (Guangxi University); XIAOFEI LIAO, Hai Jin (Huazhong University of Science and Technology)
DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction
Authors: Chunshu Wu, Ruibing Song, Chuan Liu, Pouya Haghi (University of Rochester); Ang Li (PNNL); Michael Huang (Rochester); Tong (Tony) Geng (University of Rochester)
LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
Authors: Hyungyo Kim, Nachuan Wang, Qirong Xia, Jinghan Huang (University of Illinois at Urbana-Champaign); Amir Yazdanbakhsh (Google DeepMind); Nam Sung Kim (University of Illinois at Urbana-Champaign)
RAGO: A Systematic Framework for Designand Optimization of Retrieval-Augmented Generation Serving
Authors: Wenqi Jiang, Suvinay Subramanian, Cat Graves (Google); Gustavo Alonso (ETH Zurich); Amir Yazdanbakhsh (Google DeepMind); Vidushi Dadu (Google)
Nyx: Virtualizing dataflow execution on shared FPGA platforms
Authors: Panagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios Pnevmatikatos (National Technical University of Athens, Greece)
HPVM-HDC: A Heterogeneous Programming System for Accelerating Hyperdimensional Computing
Authors: Russel Arbore, Xavier Routh, Abdul Rafae Noor, Akash Kothari (University of Illinois Urbana-Champaign); Haichao Yang, Weihong Xu, Sumukh Pinge (University of California San Diego); Vikram Adve (University of Illinois Urbana-Champaign); Tajana S Rosing (University of California San Diego); Minxuan Zhou (Illinois Institute of Technology)
ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Authors: Yongwon Shin (POSTECH); Dookyung Kang, Hyojin Sung (Seoul National University)
HiPER: Hierarchically-Composed Processing for Efficient Robot Learning-Based Control
Authors: Justin Ting (University of Michigan); Minsik Kim, Junkang Zhu (University of Michigan, Ann Arbor); Cody Sheng (University of Michigan Ann Arbor); Zhengya Zhang (University of Michigan, Ann Arbor)
Zettafly: A Network Topology with Flexible Non-blocking Regions for Large-Scale AI and HPC Systems
Authors: Dezun Dong, Ziyu Wang, Fei Lei (National University of Defense Technology)
Corki: Enabling Real-time Embodied AI Robots via Algorithm-Architecture Co-Design
Authors: Yiyang Huang, Yuhui Hao (Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences); Bo Yu (Shenzhen Institute of Artificial Intelligence and Robotics for Society); Feng Yan (Meituan); Yuxin Yang, Feng Min, Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences); Lin Ma (Meituan); Shaoshan Liu (Shenzhen Institute of Artificial Intelligence and Robotics for Society); Qiang Liu (Tianjin University); Yiming Gan (Institute of Computing Technology, Chinese Academy of Sciences)
Avant-Garde: Empowering GPUs with Scaled Numeric Formats
Authors: Minseong Gil (Korea University); Dongho Ha (MangoBoost Inc.); Simla Burcu Harma (EPFL); Myung Kuk Yoon (Ewha Womans University); Babak Falsafi (EPFL); Won Woo Ro (Yonsei University); Yunho Oh (Korea University)
WSC-LLM: Efficient LLM Service and Architecture Co-exploration for Wafer-scale Chips
Authors: Zheng Xu, Dehao Kong, Jinxi Li, Jiaxin Liu, Jingxiang Hou (Tsinghua University); Xu Dai (Shanghai Artificial Intelligence Laboratory); Chao Li (SJTU); Shaojun Wei, Yang Hu, Shouyi Yin (Tsinghua University)
Precise exceptions in relaxed architectures
Authors: Ben Simner, Alasdair Armstrong, Thomas Bauereiss (University of Cambridge); Brian Campbell, Ohad Kammar (University of Edinburgh); Jean Pichon-Pharabod (Aarhus University); Peter Sewell (University of Cambridge)
UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency
Authors: Xia Zhao, GUANGDA ZHANG, LU Wang, HUADONG DAI (Defense Innovation Institute)
AQB8: Energy-Efficient Ray Traversal Accelerator Through Hierarchical Quantization
Authors: Yen-Chieh Huang, Chen-Pin Yang, Tsung Tai Yeh (National Yang Ming Chiao Tung University)
TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model
Authors: Guyue Huang (University of California, Santa Barbara); Hao Li (University of Minnesota, Twin Cities); Le Qin (Hong Kong University of Science and Technology (Guangzhou)); Jiayi Huang (HKUST(GZ)); Yangwook Kang (Samsung Electronics); Yufei Ding (UCSD); Yuan Xie (HKUST)
RAP: Reconfigurable Automata Processor
Authors: Ziyuan Wen, Alexis Le Glaunec, Konstantinos Mamouras, Kaiyuan Yang (Rice University)
FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering
Authors: Seock-Hwan Noh (DGIST); Banseok Shin (Samsung Electronics); Jeik Choi (DEEPX); Seungpyo Lee (Fitogether); Jaeha Kung (Korea University); Yeseong Kim (DGIST)
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
Authors: Chang Eun Song, Priyansh Bhatnagar, ZIHAN XIA (University of California, San Diego); Nam Sung Kim (UIUC); Tajana S Rosing, Mingu Kang (University of California, San Diego)
Process Only Where You Look: Hardware and Algorithm Co-optimization for Efficient Gaze-Tracked Image Rendering in Virtual Reality
Authors: Haiyu Wang, Wenxuan Liu, Kenny Chen, Qi Sun, Sai Qian Zhang (New York University)
Folded Banks: 3D-Stacked HBM Design for Fine-Grained Random-Access Bandwidth
Authors: Vignesh Adhinarayanan, Brad Beckmann (AMD); Wantong Li (University of California, Riverside); Mohammad Seyedzadeh (AMD); Sergey Blagodurov (Advanced Micro Devices (AMD)); Derrick Aguren, Hayden Hyungdong Lee (AMD)
Variational Quantum Algorithms in the era of Early Fault Tolerance
Authors: Siddharth Dangwal (University of Chicago); Suhas Vittal (Georgia Tech); Lennart Maximilian Seifert, Fred Chong (University of Chicago); Gokul Ravi (University of Michigan)
MagiCache: A Virtual In-Cache Computing Engine
Authors: Renhao Fan, Yikai Cui (Department of Computer Science and Technology, Tsinghua University); Mingyu Wang (School of Microelectronics Science and Technology, Sun Yat-Sen University); Weike Li, Zhaolin Li (Department of Computer Science and Technology, Tsinghua University)
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
Authors: Haneul Park, Jiaqi Lou (University of Illinois, Urbana-Champaign); Sangjin Lee (Chung-Ang University); Yifan Yuan (Meta); KyoungSoo Park (Seoul National University); Yongseok Son (Chung-Ang University); Ipoom Jeong (Yonsei University); Nam Sung Kim (University of Illinois, Urbana-Champaign)
BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT
Authors: Jiale Yan, Hiroaki Ito, Yuta Nagahara, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki (Institute of Science Tokyo)
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
Authors: Kangqi Chen, Rakesh Nadig, Nika Mansouri Ghiasi, Yu Liang, Haiyu Mao (ETH Zurich); Jisung Park (POSTECH); Manos Frouzakis (ETH Zurich); Mohammad Sadrosadati (ETH Zurich); Onur Mutlu (ETH Zurich)