Program for Workshops and Tutorials is available here. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.

Sunday, June 22: The ISCA'25 Special Events and Reception

3:30 PM – 4:30 PM: Special Invited Talk

Abstract
Sustainability is the most important problem. But it won't happen without Steps 1, 2, and 3. I will identify them and explain why.

Speaker
Yale's short bio: Yale Patt is a teacher at The University of Texas at Austin and the Virginia Cockrell Centennial Endowed Chair in the Cockrell School of Engineering. He enjoys teaching the required freshman Intro to Computing course, using his motivated bottom-up approach every other Fall semester. His research in aggressive branch prediction and out-of-order execution has changed the basic structure of microprocessors. He earned obligatory degrees from reputable universities a long time ago. More information is available on his website for those who want it.

Yale Patt

      For Symposium and Workshops & Tutorials Participants.

4:45 PM – 6:15 PM: Special Panel

Moderator:
Dejan S. Milojicic (HPE)

Panelists:
Wen-Mei Hwu (NVIDIA), Norm Jouppi (Google), Hironori Kasahara (Waseda University), Yale Patt (University of Texas at Austin), Guri Sohi (University of Wisconsin-Madison), and Carole-Jean Wu (Meta)

Moderator's and Panelists' bio:

Dejan S. Milojicic is an HPE Fellow and VP at Hewlett Packard Labs, Milpitas, CA [1998-present]. Previously, he worked at the OSF Research Institute, Cambridge, MA [1994-1998] and Institute "Mihajlo Pupin", Belgrade, Serbia [1983-1991]. He received his Ph.D. from the University of Kaiserslautern, Germany (1993); and his MSc/BSc from Belgrade University, Serbia (1983/86). Dejan has over 280 papers, 2 books, and 96 granted and 67 pending patents. Dejan is an IEEE Fellow (2010), ACM Distinguished Engineer (2008), and HKN and USENIX member. Dejan was on 10 Ph.D. thesis committees, and he mentored over 80 interns. Dejan was president of the IEEE Computer Society (2014), editor-in-chief of IEEE Computing Now and Distributed Systems Online and he has served on many editorial boards and TPCs.

Dejan S. Milojicic

Wen-Mei W. Hwu is a Senior Distinguished Research Scientist and Senior Director of Research at NVIDIA. He is also a Professor Emeritus at the University of Illinois at Urbana-Champaign after 34 years of service. His research is in parallel architecture, algorithms, and infrastructure software for data intensive and computational intelligence applications. He received the ACM/IEEE Eckert-Mauchly Award, ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the IEEE Computer Society Charles Babbage Award, the ISCA Influential Paper Award, the MICRO Test-of-Time Award, the IEEE Computer Society B. R. Rau Award, and the CGO Test-of-Time Award. He is a Fellow of IEEE and ACM.

Wen-Mei Hwu

Norm P. Jouppi is a Google Fellow. Norm received his Ph.D. in electrical engineering from Stanford University in 1984. While at Stanford he was one of the principal architects and designers of the MIPS microprocessor. Before joining Google in 2013 Norm was known for his innovations in computer memory systems and was the principal architect and lead designer of several microprocessors. He has been the tech lead for Google's Tensor Processing Units (TPUs) since their inception in 2013. He is a Fellow of the ACM, IEEE, and AAAS, and a member of the National Academy of Engineering. He has received multiple awards, including the ACM/IEEE Eckert-Mauchly Award and Seymour Cray Computer Engineering Award.

Norm Jouppi

Hironori Kasahara is a professor in the CSE department, a director at Advanced Multicore Research Institute, an ex-SEVP at Waseda University, an IEEE Life Fellow, a member of the Engineering Academy of Japan and the Science Council of Japan, and a chair of JST "SPRING" Ph.D. fostering program and IEEE Fran Allen Medal. He was the 2018 IEEE Computer Society President. His research interests include co-designing architecture and compilers, parallelizing-optimizing data locality, and power-reducing compilers for HPC to real-time embedded systems. He participated in developing three Top 1 Supercomputers, NWT based on his OSCAR architecture, Earth Simulator, and K.

Hironori Kasahara

Yale Patt is a teacher at The University of Texas at Austin and the Virginia Cockrell Centennial Endowed Chair in the Cockrell School of Engineering. He enjoys teaching the required freshman Intro to Computing course, using his motivated bottom-up approach every other Fall semester. His research in aggressive branch prediction and out-of-order execution has changed the basic structure of microprocessors. He earned obligatory degrees from reputable universities a long time ago. More information is available on his website for those who want it.

Yale Patt

Guri Sohi has been at the University of Wisconsin-Madison since 1985 where he currently is a Vilas Research Professor. Over the past four decades he has worked on the design of high-performance processors and computer systems and results from his research can be found in almost every high-end microprocessor in the market today. He has worked with an outstanding group of graduate students who have gone on to make their own impact in the field of computer architecture. His work has received a variety of recognitions within the university, nationally, as well as internationally.

Guri Sohi

Carole-Jean Wu is a Director of AI Research at Meta, leading the Systems and Machine Learning Research team. She is a founding member and a Vice President of MLCommons ? a non-profit organization that aims to accelerate machine learning innovations for everyone. Dr. Wu's expertise sits at the intersection of computer architecture and machine learning with a focus on performance, energy efficiency and sustainability. She is passionate about pathfinding and tackling system challenges to enable efficient, scalable, and environmentally-sustainable AI technologies. Her work has been recognized with several IEEE Micro Top Picks and ACM/IEEE Best Paper Awards. She is in the Hall of Fame of ISCA, HPCA, IISWC, and serves on the study committee of the National Academies. She earned her M.A. and Ph.D. from Princeton University and B.Sc. from Cornell University.

Carole-Jean Wu

      For Symposium and Workshops & Tutorials Participants.

6:30 PM – 8:30 PM: ISCA'25 Reception

Location: the Rihga Loyal Hotel Tokyo

participants with “Workshops & Tutorials Ofnly Registration” needs an extra reception ticket for each to attend the reception.



Day 1: Monday, June 23

8:30 AM – 8:45 AM: Opening Remarks (Plenary)

Location: Okuma Auditorium (Main)

8:45 AM – 9:45 AM: Plenary session

Abstract
TBA.

Speaker
TBA.

9:45 AM – 10:05 AM: Coffee Break & Posters

Location: Research Innovation Center

10:05 AM – 11:25 AM

Session Chair: Koji Inoue
10:05 AM – 10:25 AM
WSC-LLM: Efficient LLM Service and Architecture Co-exploration for Wafer-scale Chips
Zheng Xu, Dehao Kong, Jinxi Li, Jiaxin Liu, Jingxiang Hou, Xu Dai, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

10:25 AM – 10:45 AM
LightML: A Photonic Accelerator for Efficient General Purpose Machine Learning
Liang Liu, Sadra Rahimi Kari, Xin Xin, Nathan Youngblood, Youtao Zhang, Jun Yang

10:45 AM – 11:05 AM
FRED: A Wafer-scale Fabric for 3D Parallel DNN Training
Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

11:05 AM – 11:25 AM
PD Constraint-aware Physical/Logical Topology Co-Design for Network on Wafer
Qize Yang, Taiquan Wei, Sihan Guan, Chengran Li, Haoran Shang, Jinyi Deng, Huizheng Wang, Chao Li, Lei Wang, Yan Zhang, Shouyi Yin, Yang Hu
Session Chair: Caroline Trippel
10:05 AM – 10:25 AM
Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design
Tianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xiaotao Jia, Chunming Hu, Weisheng Zhao

10:25 AM – 10:45 AM
Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs
Ali Hajiabadi, Trevor E. Carlson

10:45 AM – 11:05 AM
FAST:An FHE Accelerator for Scalable-parallelism with Tunable-bit
Shengyu Fan, Xianglong Deng, Liang Kong, Guiming Shi, Guang Fan, Rui Hou, Dan Meng, Mingzhe Zhang

11:05 AM – 11:25 AM
Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core
Dian Jiao, Xianglong Deng, Zhiwei Wang, Shengyu Fan, Yi Chen, Dan Meng, Rui Hou, Mingzhe Zhang
Session Chair: Matt Sinclair
10:05 AM – 10:25 AM
Heliostat: Harnessing Ray Tracing Accelerators for Page Table Walks
Yuan Feng, Yuke Li, Jiwon Lee, Won Woo Ro, Hyeran Jeon

10:25 AM – 10:45 AM
Forest: Access-aware GPU UVM Management
Mao Lin, Yuan Feng, Guilherme Cox, Hyeran Jeon

10:45 AM – 11:05 AM
Avant-Garde: Empowering GPUs with Scaled Numeric Formats
Minseong Gil, Dongho Ha, Simla Burcu Harma, Myung Kuk Yoon, Babak Falsafi, Won Woo Ro, Yunho Oh

11:05 AM – 11:25 AM
CoopRT: Accelerating BVH Traversal for Ray Tracing via Cooperative Threads
Yavuz Selim Tozlu, Huiyang Zhou

11:25 AM – 12:25 PM: Lunch

Location:

12:25 PM – 12:55 AM: Coffee Break & Posters

Location: Research Innovation Center

12:55 PM – 02:15 PM

Session Chair: Mark Hill
12:55 PM – 01:15 PM
The XOR Cache: A Catalyst for Compression
Zhewen Pan, Joshua San Miguel

01:15 PM – 01:35 PM
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
Cong Li, Yihan Yin, Xintong Wu, Jingchen Zhu, Zhutianya Gao, Dimin Niu, Qiang Wu, Xin Si, Yuan Xie, Chen Zhang, Guangyu Sun

01:35 PM – 01:55 PM
Precise exceptions in relaxed architectures
Ben Simner, Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Ohad Kammar, Jean Pichon-Pharabod, Peter Sewell

01:55 PM – 02:15 PM
Rethinking Prefetching for Intermittent Computing
Gan Fang, Jianping Zeng, Aditya Gupta, Changhee Jung

02:15 PM – 02:40 PM: Coffee Break & Posters

Location: Research Innovation Center

02:40 PM – 04:20 PM

Session Chair: Swamit Tannu
02:40 PM – 03:00 PM
Hardware-aware Calibration Protocol for Quantum Computers
Yuchen Zhu, Jinglei Cheng, Boxi Li, Kecheng Liu, Yidong Zhou, Hanrui Wang, Yufei Ding, Zhiding Liang

03:00 PM – 03:20 PM
Constant-Rate Entanglement Distillation for Fast Quantum Interconnects
Christopher Pattison, Gefen Baranes, Juan Pablo Bonilla Ataides, Mikhail D. Lukin, Hengyun Zhou

03:20 PM – 03:40 PM
S-SYNC: Shuttle and Swap Co-Optimization in Quantum Charge-Coupled Devices
Chenghong Zhu, Xian Wu, Jingbo Wang, Xin Wang, Chenghong Zhu, Zhixin Song

03:40 PM – 04:00 PM
ARTERY: Fast Quantum Feedback using Branch Prediction
Wuwei Tian, Liqiang Lu, Siwei Tan, Yun Liang, Tingting Li, Kaiwen Zhou, Xinghui Jia, Jianwei Yin

04:00 PM – 04:20 PM
Qtenon: Towards Low-Latency Architecture Integration for Accelerating Hybrid Quantum-Classical Computing
Chenning Tao, Liqiang Lu, Size Zheng, Li-Wen Chang, Minghua Shen, Hanyu Zhang, Fangxin Liu, Kaiwen Zhou, Jianwei Yin
Session Chair: Won Woo Ro
02:40 PM – 03:00 PM
HiPER: Hierarchically-Composed Processing for Efficient Robot Learning-Based Control
Justin Ting, Minsik Kim, Junkang Zhu, Cody Sheng, Zhengya Zhang

03:00 PM – 03:20 PM
Dadu-Corki: Algorithm-Architecture Co-Design for Embodied AI-powered Robotic Manipulation
Yiyang Huang, Yuhui Hao, Bo Yu, Feng Yan, Yuxin Yang, Feng Min, Yinhe Han, Lin Ma, Shaoshan Liu, Qiang Liu, Yiming Gan

03:20 PM – 03:40 PM
Process Only Where You Look: Hardware and Algorithm Co-optimization for Efficient Gaze-Tracked Foveated Rendering in Virtual Reality
Haiyu Wang, Wenxuan Liu, Kenny Chen, Qi Sun, Sai Qian Zhang

03:40 PM – 04:00 PM
RTSpMSpM: Harnessing Ray Tracing for Efficient Sparse Matrix Computations
Hongrui Zhang, Yunan Zhang, Hung-Wei Tseng

04:00 PM – 04:20 PM
AQB8: Energy-Efficient Ray Tracing Accelerator through Multi-Level Quantization
Yen-Chieh Huang, Chen-Pin Yang, Tsung Tai Yeh
Session Chair: Mingzhe Zhang
02:40 PM – 03:00 PM
ANVIL: An In-Storage Accelerator for Name?Value Data Stores
Ryan Wong, Nikita Kim, Aniket Das, Kevin Higgs, Engin Ipek, Sapan Agarwal, Saugata Ghose, Ben Feinberg

03:00 PM – 03:20 PM
ArtMem: Adaptive Migration in Reinforcement Learning-Enabled Tiered Memory
Xinyue Yi, Hongchao Du, Yu Wang, Jie Zhang, Qiao Li, Chun Jason Xue

03:20 PM – 03:40 PM
UPP: Universal Predicate Pushdown to Smart Storage
Ipoom Jeong, Jinghan Huang, Chuxuan Hu, Dohyun Park, Jaeyoung Kang, Nam Sung Kim, Yongjoo Park

03:40 PM – 04:00 PM
XHarvest: Rethinking High-Performance and Cost-Efficient SSD Architecture with CXL-Driven Harvesting
Li Peng, Wenbo Wu, Shushu Yi, Xianzhang Chen, Chenxi Wang, Shengwen Liang, Zhe Wang, Nong Xiao, Qiao Li, Mingzhe Zhang, Jie Zhang

04:00 PM – 04:20 PM
In-Storage Acceleration of Retrieval Augmented Generation as a Service
Rohan Mahapatra, Harsha Santhanam, Christopher Priebe, Hanyang Xu, Hadi S. Esmaeilzadeh

04:20 PM – 04:50 PM: Coffee Break & Posters

Location: Research Innovation Center

04:50 PM – 06:50 PM

Session Chair: Sai Qian Zhang
04:50 PM – 05:10 PM
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai

05:10 PM – 05:30 PM
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim, Seongmin Hong, RyeoWook Ko, Soongyu Choi, Hunjong Lee, Junsoo Kim, Joo-Young Kim, Jongse Park

05:30 PM – 05:50 PM
Chimera: Communication Fusion for Hybrid Parallelism in Large Language Models
Le Qin, Junwei Cui, Weilin Cai, Jiayi Huang

05:50 PM – 06:10 PM
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

06:10 PM – 06:30 PM
AiF: Accelerating On-Device LLM Inference Using In-Flash Processing
Jaeyong Lee, Hyunjoo Kim, Sanghoon Oh, Myoungjun Chun, Myungsuk Kim, Jihong Kim

06:30 PM – 06:50 PM
LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
Hyungyo Kim, Nachuan Wang, Qirong Xia, Jinghan Huang, Amir Yazdanbakhsh, Nam Sung Kim
Session Chair: Gabe Loh
04:50 PM – 05:10 PM
Enabling Ahead Prediction with Practical Energy Constraints
Chester(Lingzhe) Cai, Aniket Deshmukh, Yale Patt

05:10 PM – 05:30 PM
Profile-Guided Temporal Prefetching
Mengming Li, Qijun Zhang, Yichuan Gao, Wenji Fang, Yao Lu, Yongqing Ren, Zhiyao Xie

05:30 PM – 05:50 PM
WarmCache: Exploiting STT-RAM Cache for Low-Power Intermittent Systems
Noureldin Hassan, Byounguk Min, Changhee Jung, Yan Solihin, Jongouk Choi

05:50 PM – 06:10 PM
Magellan: A High-Performance Loop-Guided Prefetcher for Indirect Memory Access
Gelin Fu, Tian Xia, Mingzhuo Yu, Prashant Nair, Mieszko Lis, pengju ren

06:10 PM – 06:30 PM
Leveraging control-flow similarity to reduce branch predictor cold effects in microservices
Haris Volos, Stylianos Vassiliou, Georgia Antoniou, Davide Basilio Bartolini, Yiannakis Sazeides
Session Chair: Divya Mahajan
04:50 PM – 05:10 PM
Cramming a Data Center into One Cabinet: A Co-Exploration of Computing and Hardware Architecture of Waferscale Chip
Xingmao Yu, Dingcheng Jiang, Jinyi Deng, Jingyao Liu, Yang Hu, Chao Li, Shouyi Yin

05:10 PM – 05:30 PM
Fair-CO2: Fair Attribution for Cloud Carbon Emissions
Leo Han, Jash Kakadia, Benjamin C. Lee, Udit Gupta

05:30 PM – 05:50 PM
Dynamic Load Balancer in Intel Xeon Scalable Processor: Performance Analyses, Enhancements, and Guidelines
Jiaqi Lou, Srikar Vanavasam, Yifan Yuan, Ren Wang, Nam Sung Kim

05:50 PM – 06:10 PM
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
Haneul Park, Jiaqi Lou, Sangjin Lee, Yifan Yuan, KyoungSoo Park, Yongseok Son, Ipoom Jeong, Nam Sung Kim

06:10 PM – 06:30 PM
Single-Address-Space FaaS with Jord
Yuanlong Li, Atri Bhattacharyya, Madhur Kumar, Abhishek Bhattacharjee, Yoav Etsion, Babak Falsafi, Sanidhya Kashyap, Mathias Payer

06:30 PM – 06:50 PM
HardHarvest: Hardware-Supported Core Harvesting for Microservices
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas

06:30 PM – 07:10 PM: Coffee Break & Posters

Location: Research Innovation Center

07:00 PM – 08:30 PM: SIGARCH/TCCA Business meeting

Location: Okuma Auditorium (Small)


Day 2: Tuesday, June 24

08:30 AM – 09:50 AM

Session Chair: Gururaj Saileshwar
08:30 AM – 08:50 AM
MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Counting
Suhas Vittal, Salman Qazi, Poulami Das, Moin Qureshi

08:50 AM – 09:10 AM
When Mitigations Backfire: Timing Channel Attacks and Defense for PRAC-Based Rowhammer Mitigations
Jeonghyun Woo, Joyce Qu, Gururaj Saileshwar, Prashant Nair

09:10 AM – 09:30 AM
PuDHammer: Experimental Analysis of Read Disturbance Effects of Processing-using-DRAM in Real DRAM Chips
Ismail Emir Yuksel, Akash Sood, Ataberk Olgun, O?uzhan Canpolat, Haocong Luo, Nisa Bostanci, Mohammad Sadrosadati, Giray Yaglikci, Onur Mutlu

09:30 AM – 09:50 AM
DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh Management
Hritvik Taneja, Moin Qureshi
Session Chair: Gagandeep Singh
08:30 AM – 08:50 AM
Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-Aware Cache Compression
Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai "Helen" Li, Yiran Chen

08:50 AM – 09:10 AM
Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window
Seungjae Moon, Junseo Cha, Hyunjun Park, Joo-Young Kim

09:10 AM – 09:30 AM
MeshSlice: Efficient 2D Tensor Parallelism for Distributed DNN Training
Hyoungwook Nam, Gerasimos Gerogiannis, Josep Torrellas

09:30 AM – 09:50 AM
Zettafly: A Network Topology with Flexible Non-blocking Regions for Large-Scale AI and HPC Systems
Dezun Dong, Ziyu Wang, Fei Lei
Session Chair: Mohammad Alian
08:30 AM – 08:50 AM
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIM
Yuanpeng Zhang, Xing Hu, Xi Chen, Zhihang Yuan, Cong Li, Jingchen Zhu, Zhao Wang, Chenguang Zhang, Xin Si, Wei Gao, Qiang Wu, Runsheng Wang, Guangyu Sun

08:50 AM – 09:10 AM
OptiPIM: Optimizing Processing-in-Memory Acceleration Using Integer Linear Programming
Jiantao Liu, Minxuan Zhou, Yue Pan, Chien-Yi Yang, Lana Josipovic, Tajana Rosing

09:10 AM – 09:30 AM
HeterRAG: Heterogeneous Processing-in-Memory Acceleration for Retrieval-augmented Generation
Chaoqiang Liu, Haifeng Liu, Dan Chen, Yu Huang, Yi Zhang, Wenjing Xiao, XIAOFEI LIAO, Hai Jin

09:30 AM – 09:50 AM
ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Yongwon Shin, Dookyung Kang, Hyojin Sung

9:50 AM – 10:20 AM: Coffee Break & Posters

Location: Research Innovation Center

10:20 AM – 11:45 AM: Award Ceremony (Plenary)

Location: Okuma Auditorium (Main)

11:45 AM – 12:45 PM: Lunch

Location:

12:45 PM – 02:45 PM

Session Chair: Jeffrey Stuechli
12:45 PM – 01:05 PM
Single Spike Artificial Neural Networks
Rhys Gretsch, Michael Beyeler, Jeremy Lau, Timothy Sherwood

01:05 PM – 01:25 PM
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Chiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai (Helen) Li, Yiran Chen

01:25 PM – 01:45 PM
Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with Error-constrained Pruning
Boxun Xu, Yuxuan Yin, Vikram Iyer, Peng Li

01:45 PM – 02:05 PM
Hermes: Algorithm-System Co-design for Efficient Retrieval Augmented Generation At-Scale
Michael Shen, Muhammad Umar, Kiwan Maeng, G. Edward Suh, Udit Gupta

02:05 PM – 02:25 PM
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
Wenqi Jiang, Suvinay Subramanian, Cat Graves, Gustavo Alonso, Amir Yazdanbakhsh, Vidushi Dadu, Suvinay Subramanian

02:25 PM – 02:45 PM
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai "Helen" Li, Yiran Chen
Session Chair: Hyeran Jeon
12:45 PM – 01:05 PM
Light-weight Cache Replacement for Instruction Heavy Workloads
Saba Mostofi, Setu Gupta, Ahmad Hassani, Krishnam Tibrewala, Elvira Teran, Paul Gratz, Daniel A. Jimenez

01:05 PM – 01:25 PM
The Sparsity-Aware LazyGPU Architecture
Changxi Liu, Yu Miao, Yifan Sun, Trevor E. Carlson

01:25 PM – 01:45 PM
Evaluating Ruche Networks: Physically Scalable, Cost-Effective, Bandwidth-Flexible NoCs
Dai Cheol Jung, Michael Taylor

01:45 PM – 02:05 PM
Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server Workloads
Jaewon Kwon, Yongju Lee, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro

02:05 PM – 02:25 PM
NetCrafter: Tailoring Network Traffic for Non-Uniform Bandwidth Multi-GPU Systems
Amel Fatima, Yang Yang, Yifan Sun, Rachata Ausavarungnirun, Adwait Jog

02:25 PM – 02:45 PM
Caravan: A Hardware/Software Co-Design for Efficient SIMD Neighbor Search on Point Clouds
Pedro Henrique Exenberger Becker, Franyell Silfa, Jose Maria Arnau, Antonio Gonzalez
Session Chair: Daichi Fujiki
12:45 PM – 01:05 PM
ANSMET: Approximate Nearest Neighbor Search with Near-Memory Processing and Hybrid Early Termination
Yiwei Li, Yuxin Jin, Boyu Tian, Huanchen Zhang, Mingyu Gao, Yiwei Li

01:05 PM – 01:25 PM
DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign
Derrick Quinn, E. Ezgi Y?cel, Martin Prammer, Zhenxing Fan, Kevin Skadron, Jignesh Patel, Jos? F. Mart?nez, Mohammad Alian

01:25 PM – 01:45 PM
EOD: Enabling Low Latency GNN Inference via Near-Memory Concatenate Aggregation
Taehwan Kim, Yunki Han, Seohye Ha, Jiwan Kim, Lee-Sup Kim

01:45 PM – 02:05 PM
RAP: Reconfigurable Automata Processor
Ziyuan Wen, Alexis Le Glaunec, Konstantinos Mamouras, Kaiyuan Yang

02:05 PM – 02:25 PM
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
Chang Eun Song, Priyansh Bhatnagar, ZIHAN XIA, Nam Sung Kim, Tajana S Rosing, Mingu Kang

02:25 PM – 02:45 PM
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
Kangqi Chen, Rakesh Nadig, Nika Mansouri Ghiasi, Yu Liang, Haiyu Mao, Jisung Park, Manos Frouzakis, Mohammad Sadrosadati, Onur Mutlu

02:45 PM – 03:15 PM: Coffee Break & Posters

Location: Research Innovation Center

03:15 PM – 05:15 PM

Session Chair: Udit Gupta
03:15 PM – 03:35 PM
MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
Akshat Ramachandran, Souvik Kundu, Tushar Krishna

03:35 PM – 03:55 PM
Topology-Aware Virtualization over Inter-Core Connected Neural Processing Units
Dahu Feng, Erhu Feng, Dong Du, Pinjie Xu, Yubin Xia, Haibo Chen, Rong Zhao

03:55 PM – 04:15 PM
Chip Architectures Under Advanced Computing Sanctions
August Ning, David Wentzlaff

04:15 PM – 04:35 PM
DiTile-DGNN: An Efficient Accelerator for Distributed Dynamic Graph Neural Network Inference
Jiaqi Yang, Hao Zheng, Ahmed Louri

04:35 PM – 04:55 PM
NeuSET: An Accelerator for Neural Scene Representation with Sparse Encoding Table
Tianbo Liu, Xinkai Song, Zhifei Yue, Rui Wen, Xing Hu, Zhuoran Song, Yuanbo Wen, Yifan Hao, Wei Li, Zidong Du, Rui Zhang, Jiaming Guo, Di Huang, Shaohui Peng, GuangZhong Sun, Qi Guo, Tianshi Chen

04:55 PM – 05:15 PM
FATE: Boosting the Performance of Hyper-Dimensional Computing Intelligence with Flexible Numerical DAta TypE
Haomin Li, Fangxin Liu, Yichi Chen, Zongwu Wang, Shiyuan Huang, Ning Yang, Dongxu Lyu, Li Jiang
Session Chair: Hyesoon Kim
03:15 PM – 03:35 PM
WindServe: Efficient Phase-Disaggregated LLM Serving with Stream-based Dynamic Scheduling
Jingqi Feng, Yukai Huang, Rui Zhang, Sicheng Liang, Ming Yan, Jie Wu

03:35 PM – 03:55 PM
Neoscope: How Resilient Is My SoC to Workload Churn?
Joseph Rogers, Lieven Eeckhout, Taha Soliman, Magnus Jahre

03:55 PM – 04:15 PM
CORD: Low-Latency, Bandwidth-Efficient and Scalable Release Consistency via Directory Ordering
Yanpeng Yu, Nicolai Oswald, Anurag Khandelwal, Nicolai Oswald

04:15 PM – 04:35 PM
Nyx: Virtualizing dataflow execution on shared FPGA platforms
Panagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios Pnevmatikatos

04:35 PM – 04:55 PM
HPVM-HDC: A Heterogeneous Programming System for Accelerating Hyperdimensional Computing
Russel Arbore, Xavier Routh, Abdul Rafae Noor, Akash Kothari, Haichao Yang, Weihong Xu, Sumukh Pinge, Minxuan Zhou, Tajana Rosing, Vikram Adve

04:55 PM – 05:15 PM
UGPU: Dynamically Constructing Unbalanced GPUs for Enhanced Resource Efficiency
Xia Zhao, GUANGDA ZHANG, LU Wang, HUADONG DAI
Session Chair: Huiyang Zhou
03:15 PM – 03:35 PM
Synchronization for Fault-Tolerant Quantum Computers
Satvik Maurya, Swamit Tannu

03:35 PM – 03:55 PM
SWIPER: Minimizing Fault-Tolerant Quantum Program Latency via Speculative Window Decoding
Joshua Viszlai, Jason Chadwick, Sarang Joshi, Gokul Ravi, Yanjing Li, Fred Chong

03:55 PM – 04:15 PM
CaliQEC: In-situ Qubit Calibration for Surface Code Quantum Error Correction
Xiang Fang, Keyi Yin, Yuchen Zhu, Jixuan Ruan, Dean Tullsen, Zhiding Liang, Andrew Sornborger, Ang Li, Travis Humble, Yufei Ding, Yunong Shi, Dean Tullsen

04:15 PM – 04:35 PM
Variational Quantum Algorithms in the era of Early Fault Tolerance
Siddharth Dangwal, Suhas Vittal, Lennart Maximilian Seifert, Fred Chong, Gokul Ravi

04:35 PM – 04:55 PM
Resource Analysis of Low-Overhead Transversal Architectures for Reconfigurable Atom Arrays
Hengyun Zhou, Casey Duckering, Chen Zhao, Dolev Bluvstein, Madelyn Cain, Aleksander Kubica, Sheng-Tao Wang, Mikhail D. Lukin

04:55 PM – 05:15 PM
SwitchQNet: Optimizing Distributed Quantum Computing for Quantum Data Centers with Switch Networks
Hezi Zhang, Yiran Xu, Haotian Hu, Keyi Yin, Hassan Shapourian, Jiapeng Zhao, Ramana Rao Kompella, Reza Nejabati, Yufei Ding

05:15 PM – 05:45 PM: Coffee Break & Posters

Location: Research Innovation Center

05:45 PM – 06:45 PM: Move to Chinzan-so (10 Min Bus Ride)

Buses depart from the Auditorium.

06:45 PM – 09:30 PM: Banquet & Japanese Garden Walk (finding Japanese Fireflies)

Location: Chinzan-so


Day 3: Wednesday, June 25

8:30 AM – 9:30 AM: Plenary session

Abstract
Great innovation thrives on unexpected connections. This talk shares the journey of a unique partnership between Ethiopian and US researchers, which advanced hardware security through novel privacy-enhanced microarchitectures. By crossing geographic, disciplinary, and cultural boundaries, our team was challenged to step out of its comfort zone, sparking fresh creativity and expanding the scope of innovative solutions in data privacy. Join us to discover how embracing unfamiliar perspectives can propel advances in zero-trust systems and help shape the future of architectural innovation.

Speaker
Fitsum's short bio: Fitsum Assamnew Andargie is an Assistant Professor and current Chair of Computer Engineering at the School of Electrical and Computer Engineering, Addis Ababa University. He also serves as an Adjunct Assistant Research Scientist in the Computer Science and Engineering Department at the University of Michigan. His research interests focus on accelerating graph applications using software and hardware co-design, and developing privacy-enhanced hardware computational infrastructures, particularly supporting artificial intelligence applications in healthcare. Born in Addis Ababa and raised in Jimma, Ethiopia, he completed his Bachelor's and Master's degrees in Electrical and Computer Engineering at Addis Ababa University. He earned his doctorate in Computer Engineering from Addis Ababa University, collaborating closely with the University of Toronto and the University of Michigan.

Fitsum Assamnew Andargie

Todd's short bio: Todd Austin is the S. Jack Hu Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan and Director of the Computer Engineering Lab. His research spans computer architecture, secure system design, verification, and performance analysis. Previously, he directed C-FAR, a multi-university SRC/DARPA-funded computer engineering research center. Before academia, he was a Senior Computer Architect at Intel. He created the SimpleScalar Tool Set and co-authored Structured Computer Architecture, 6th Ed. He also co-founded Agita Labs and InTempo Design. He is an IEEE Fellow, and he has received the ACM Maurice Wilkes and IEEE Ramakrishna Rau awards. He earned his PhD from the University of Wisconsin.

Todd Austin

9:30 AM – 10:30 AM: Coffee Break

Location:

10:00 AM – 11:40 AM

Session Chair: Nathan Bleier
10:00 AM – 10:20 AM
Assassyn: A Unified Abstraction for Architectural Simulation and Implementation
Jian Weng, Boyang Han, Derui Gao, Ruijie Gao, Wanning Zhang, An Zhong, Ceyu Xu, Jihao Xin, Yangzhixin Luo, Lisa Wu Wills, Marco Canini

10:20 AM – 10:40 AM
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
Arash Nasr-Esfahany, Mohammad Alizadeh, Victor Lee, Hanna Alam, Brett W. Coon, David Culler, Vidushi Dadu, Martin Dixon, Henry M. Levy, Santosh Pandey, Parthasarathy Ranganathan, Amir Yazdanbakhsh

10:40 AM – 11:00 AM
AMALI: An Analytical Model for Accurately Modeling LLM Inference on Modern GPUs
Shiheng Cao, Zhibin Yu, Junshi Chen, Hong An, Junmin Wu

11:00 AM – 11:20 AM
GCStack+GCScaler: Fast and Accurate GPU Performance Analyses Using Fine-Grained Stall Cycle Accounting and Interval Analysis
Hanna Cha, Sungchul Lee, Jounghoo Lee, Yeonan Ha, Joonsung Kim, Youngsok Kim

11:20 AM – 11:40 AM
TrioSim: A Lightweight Simulator for Large-Scale DNN Workloads on Multi-GPU Systems
Ying Li, Yuhui Bao, Gongyu Wang, Xinxin Mei, Pranav Vaid, Anandaroop Ghosh, Adwait Jog, Darius Bunandar, Ajay Joshi, Yifan Sun
Session Chair: Masaaki Kondo
10:00 AM – 10:20 AM
Accelerating Simulation of Quantum Circuits under Noise via Computational Reuse
Meng Wang, Swamit Tannu, Prashant J Nair

10:20 AM – 10:40 AM
Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers
Junyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran Chen

10:40 AM – 11:00 AM
QR-Map: A Map-Based Approach to Quantum Circuit Abstraction for Qubit Reuse Optimization
Hyungseok Kim, Enhyeok Jang, Seungwoo Choi, Youngmin Kim, Won Woo Ro

11:00 AM – 11:20 AM
Genesis: A Compiler for Hamiltonian Simulation on Hybrid CV-DV Quantum Computers
Zihan Chen, Jiakang Li, Minghao Guo, Henry Chen, Zirui Li, Joel Bierman, Yipeng Huang, Huiyang Zhou, Yuan Liu, Eddy Z. Zhang

11:20 AM – 11:40 AM
Reinforcement Learning-Guided Graph State Generation in Photonic Quantum Computers
Yingheng Li, Yue Dai, Aditya Pawar, Rongchao Dong, Jun Yang, Youtao Zhang, Xulong Tang
Session Chair: EJ Kim
10:00 AM – 10:20 AM
HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches
Xintong Li, Zhiyao Li, Mingyu Gao

10:20 AM – 10:40 AM
NUPEA: Optimizing Critical Loads on Spatial Dataflow Architectures via Non-Uniform Processing-Element Access
Souradip Ghosh, Graham Gobieski, Keyi Zhang, Brandon Lucia, Nathan Beckmann, Tony Nowatzki

10:40 AM – 11:00 AM
DX100: Programmable Data Access Accelerator for Indirection
Alireza Khadem, Kamalavasan Kamalakkannan, Zhenyan Zhu, Akash Poptani, Yufeng Gu, Jered Benjamin Dominguez-Trujillo, Nishil Talati, Daichi Fujiki, Scott Mahlke, Galen Shipman, Reetuparna Das

11:00 AM – 11:20 AM
SEAL: A Single-Event Architecture for In-Sensor Visual Localization
Ryan Hou, Thomas Twomey, Vasileios Milionis, Evangelos Dikopoulos, Tianrui Ma, Yuhao Zhu, Georgios Tzimpragos

11:20 AM – 11:40 AM
IDEA-GP: Instruction-Driven Architecture with Efficient Online Workload Allocation for Geometric Perception
Suquan Zhang, Yu Hu, Yunfei Xiang, Dawei Zhao, Yuanfan Xu, Qingmin Liao, Jincheng Yu, Yu Wang

11:40 AM – 12:55 PM: Lunch

Location:

12:55 PM – 01:25 PM: Coffee Break & Posters

Location: Research Innovation Center

01:25 PM – 02:45 PM

Session Chair: TBA
01:25 PM – 01:45 PM
Meta's Second Generation AI Chip: Model-Chip Co-Design and Productionization Experiences
Joel Coburn, Chunqiang Tang, Adam Hutchin, Ajit Mathews, Alex Mastro, Amin Firoozshahian, Amit Nagpal, Aravind Sukumaran-Rajam, Arushi Sharma, Ashwin Kamath, Ashwin Narasimha, Bhasker Jakka, Brian Dodds, Cao Gao, David Reiss, Deboleena Roy, Eleanor Ozer, Emmanuel Menage, Eran Tal, Erum Kazi, Feixiong Zhang, Guoqiang Jerry Chen, Hangchen Yu, Harikrishna Reddy, Harish Dixit, Indu Kalyanaraman, Jack Montgomery, Jian Huang, Jinghan Yang, Jiyuan Zhang, Jongsoo Park, Junhan Hu, Kaustubh Gondkar, Mahesh Maddury, Maxim Naumov, Mike Tsai, Mohammed Sourouri, Neeraj Agrawal, Olivia Wu, Siji Medaiyese, Pankaj Kansal, Pavan Shetty, Poorvaja Ramani, Pritesh Modi, Raviteja Chinta, Richard Wareing, Roman Levenstein, Sameer Abu Asal, Saritha Dwarakapuram, Sathish Sekar, Satish Nadathur, Shreya Varshini, Sterling Hughes, Tanmay Zargar, Truls Edvard Stokke, Tyler Graf, Xiaolong Xie, Xun Jiao, Zitong Zeng

01:45 PM – 02:05 PM
Scaling Llama 3 Training with Efficient Parallelism Strategies
Weiwei Chu, Xinfeng Xie, Jiecao Yu, Jie Wang, Amar Phanishayee, Chunqiang Tang, Yuchen Hao, Jianyu Huang, Mustafa Ozdal, Jun Wang, Vedanuj Goswami, Naman Goyal, Abhishek Kadian, Andrew Gu, Chris Cai, Feng Tian, Xiaodong Wang, Min Si, Pavan Balaji, Ching-Hsiang Chu, Jongsoo Park

02:05 PM – 02:25 PM
DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter Workloads
Wei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi, Neha Gholkar, Shobhit Kanaujia, Maxim Naumov, Kalyan Subramanian, Valentin Andrei, Yifan Yuan, Chunqiang Tang

02:25 PM – 02:45 PM
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shanyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y.X. Wei
Session Chair: Hyojin Sung
01:25 PM – 01:45 PM
Avalanche: Optimizing Cache Utilization via Matrix Reordering for Sparse Matrix Multiplication Accelerator
Gwangeun Byeon, Seongwook Kim, Hyungjin Kim, Sukhyun Han, Jinkwon Kim, Prashant Nair, Taewook Kang, Seokin Hong

01:45 PM – 02:05 PM
Debunking the CUDA Myth Towards GPU-based AI Systems - Evaluation of the Performance and Programmability of Intel's Gaudi NPU for AI Model Serving -
Yunjae Lee, Juntaek Lim, Jehyeon Bang, Eunyeong Cho, Huijong Jeong, Taesu Kim, Hyungjun Kim, Joonhyung Lee, Jinseop Im, Ranggi Hwang, Se Jung Kwon, Dongsoo Lee, Minsoo Rhu

02:05 PM – 02:25 PM
GPUs All Grown-Up: Fully Device-Driven SpMV Using GPU Work Graphs
Fabian Wildgrube, Pete Ehrett, Paul Trojahn, Richard Membarth, Brad Beckmann, Dominik Baumeister, Matth?us Chajdas

02:25 PM – 02:45 PM
Telos: A Dataflow Accelerator for Sparse Triangular Solver of Partial Differential Equations
Xiaochen Hao, Hao Luo, Chu Wang, Chao Yang, Yun Liang
Session Chair: Yan Solihin
01:25 PM – 01:45 PM
MagiCache: A Virtual In-Cache Computing Engine
Renhao Fan, Yikai Cui, Weike Li, Mingyu Wang, Zhaolin Li

01:45 PM – 02:05 PM
Folded Banks: 3D-Stacked HBM Design for Fine-Grained Random-Access Bandwidth
Vignesh Adhinarayanan, Brad Beckmann, Wantong Li, Mohammad Seyedzadeh, Sergey Blagodurov, Derrick Aguren, Hayden Hyungdong Lee

02:05 PM – 02:25 PM
NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly
Heewoo Kim, Sanjay Sri Vallabh Singapuram, Haojie Ye, Joseph Izraelevitz, Trevor Mudge, Ronald Dreslinski, Nishil Talati

02:45 PM – 03:15 PM: Coffee Break & Posters

Location: Research Innovation Center

03:15 PM – 04:55 PM

Session Chair: Augusto Vega
03:15 PM – 03:35 PM
Reconfigurable Stream Network Architecture
Chengyue Wang, Xiaofan Zhang, Jason Cong, James C. Hoe

03:35 PM – 03:55 PM
DS-TPU: Dynamical System for on-Device Lifelong Graph Learning with Nonlinear Node Interaction
Chunshu Wu, Ruibing Song, Chuan Liu, Pouya Haghi, Ang Li, Michael Huang, Tong (Tony) Geng

03:55 PM – 04:15 PM
TRACI: Network Acceleration of Input-Dynamic Communication for Large-Scale Deep Learning Recommendation Model
Guyue Huang, Hao Li, Le Qin, Jiayi Huang, Yangwook Kang, Yufei Ding, Yuan Xie

04:15 PM – 04:35 PM
FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering
Seock-Hwan Noh, Banseok Shin, Jeik Choi, Seungpyo Lee, Jaeha Kung, Yeseong Kim

04:35 PM – 04:55 PM
BingoGCN: Towards Scalable and Efficient GNN Acceleration with Fine-Grained Partitioning and SLT
Jiale Yan, Hiroaki Ito, Yuta Nagahara, Kazushi Kawamura, Masato Motomura, Thiem Van Chu, Daichi Fujiki
Session Chair: Karthik Swaminathan
03:15 PM – 03:35 PM
Lumina: Real-Time Neural Rendering by Exploiting Computational Redundancy
Yu Feng, Weikai Lin, Yuge Cheng, Zihan Liu, Jingwen Leng, Minyi Guo, Chen Chen, Shixuan Sun, Yuhao Zhu

03:35 PM – 03:55 PM
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han, Soongyu Choi, Joo-Young Kim

03:55 PM – 04:15 PM
MD-pipe: A Strong Scaling Enhanced Pipeline Architecture for Ab Initio Accuracy Molecular Dynamics
Ning Kang, Guojun Yuan, Zihan Yan, Beining Zhang, Boyang Li, Zeyu Li, Shuo Wang, Guanglei Chen, Jiayi Rao, Zhan Wang, Weile Jia, Ninghui Sun, Guangming Tan

04:15 PM – 04:35 PM
InfiniMind: A Learning-Optimized Large-Scale Brain-Computer Interface
Yeongwoo Jang, Daye Jung, Seunghyun Song, Hunjun Lee, Jangwoo Kim
Session Chair: Pradip Bose
03:15 PM – 03:35 PM
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs
Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt B?nz, Ramesh Karri, Siddharth Garg, Brandon Reagen

03:35 PM – 03:55 PM
Adaptive CHERI Compartmentalization for Heterogeneous Accelerators
Jianyi Cheng, A. Theodore Markettos, Alexandre Joannou, Paul Metzger, Matthew Naylor, Peter Rugg, Timothy M. Jones

03:55 PM – 04:15 PM
Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors
Sunho Lee, Seonjin Na, Jeongwon Choi, Jinwon Pyo, Jaehyuk Huh

04:15 PM – 04:35 PM
SpecASan: Mitigating Transient Execution Attacks Using Speculative Address Sanitization
Saber Ganjisaffar, Esmaeil Mohmmadian Koruyeh, Jason Zellmer, Hodjat Asghari Esfeden, Chengyu Song, Nael Abu-Ghazaleh, Esmaeil Mohmmadian Koruyeh

04:55 PM – 05:25 PM: Coffee Break & Posters

Location: Research Innovation Center

05:25 PM – 05:40 AM: Closing (Plenary)

Location: Okuma Auditorium (Main)