Program for Workshops and Tutorials is available here. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.



ISCA Opening Reception: June 30, 6:00 PM – 8:00 PM

Location: Buen Ayre (floor map)

Breakfast: Hilton guests have morning breakfasts included as part of their room reservations. Most hotels in Buenos Aires usually provide breakfast to guests as well.
Therefore, ISCA'24 does not include group breakfast for attendees as part of the program.

Day 1: Monday, July 1

8:45 AM – 9:00 AM: Opening Remarks

Location: Pacífico B (floor map)

9:00 AM – 10:00 AM: Plenary session

Abstract
The leitmotif of my talk will be the thesis that past advances in computer architecture continue to be relevant in our field today and will dictate the future. In that context, I will touch upon how the current AI accelerators are based on the systolic arrays, how the long vector processors of today are influenced by Cray supercomputers, and how past architecture ideas to support different data formats are reused for mixed precision support in HPC and for layer-by-layer optimization of energy-efficient AI accelerators. This talk will describe some of the related contributions of UPC Department of Computer Arcthiecture (DAC) to the scientific community, especially in the fields of superscalar and vector processors. I will briefly discuss specific UPC DAC contributions which have been incorporated into current high-performance processors, including supercomputers and accelerators aimed at efficient execution of AI applications. In the second part of my talk, I will describe the current research topics at the Barcelona Supercomputing Center (BSC), as well as the chips designed at BSC. Finally, I will conclude with our future vision of how Europe can develop competitive chips based on RISC V to be used in the design of supercomputers and accelerators for AI in the coming years.

Speaker
Mateo Valero is the Director of the Barcelona Supercomputing Center. His field of research is focused on High Performance Architecture, and he has published more than 700 articles. His numerous awards include Eckert Mauchly, Seymour Cray, Charles Babbage, Harry H. Goode, ACM Distinguished Service Award; Two national research awards; The European ICT Program “Hall of Fame”, one of the 25 most influential European researchers in IT. Recognition of “HPCWire Reader’s Choice Awards” for exceptional leadership in HCP. Member of ten academies and Doctor Honoris Causa from 11 Universities.

10:05 AM – 10:50 AM

Session Chair: Gabriel Loh
10:05 AM – 10:20 AM
GhOST: a GPU Out-of-Order Scheduling Technique for stall reduction
I. Chaturvedi, B. Godala, Y. Wu, Z. Xu, K. Iliakis, P. Eleftherakis, S. Xydis, D. Soudris, T. Sorensen, S. Campanoni, T. Aamodt, D. August

10:20 AM – 10:35 AM
AVM-BTB: Adaptive and Virtualized Multi-level Branch Target Buffer
Y. Liu, X. Li, T. Zhang, T. Liu, Q. Guo, F. Zhang, J. Wang

10:35 AM – 10:50 AM
The Maya Cache: A Storage-efficient and Secure Fully-associative Last-level Cache
A. Bhatla, Navneet, B. Panda
Session Chair: Ulya Karpuzcu
10:05 AM – 10:20 AM
DS-GL: Advancing Graph Learning via Harnessing Nature’s Power within Scalable Dynamical Systems
R. Song, C. Wu, C. Liu, A. Li, M. Huang, T. Geng

10:20 AM – 10:35 AM
ReAIM: A ReRAM-based Adaptive Ising Machine for Solving Combinatorial Optimization Problems
H. Chiang, C. Nien, H. Cheng, K. Huang

10:35 AM – 10:50 AM
Mirage: An RNS-Based Photonic Accelerator for DNN Training
C. Demirkiran, G. Yang, D. Bunandar, A. Joshi

10:50 AM – 11:15 AM: Break

10:50 AM – 11:15 AM: Posters for Sessions 1A, 1B

Location: Foyer Pacífico Central (floor map)

11:15 AM – 12:45 PM

Session Chair: Josep Torrellas
11:15 AM – 11:30 AM
Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Execution
R. Bera, A. Ranganathan, J. Rakshit, S. Mahto, A. Nori, J. Gaur, A. Olgun, K. Kanellopoulos, M. Sadrosadati, S. Subramoney, O. Mutlu

11:30 AM – 11:45 AM
QuTracer: Mitigating Quantum Gate and Measurement Errors by Tracing Subsets of Qubits
P. Li, J. Liu, A. Gonzales, Z. Saleem, H. Zhou, P. Hovland

11:45 AM – 12:00 PM
Splitwise: Efficient Generative LLM Inference Using Phase Splitting
P. Patel, E. Choukse, C. Zhang, A. Shah, Í. Goiri, S. Maleki, R. Bianchini

12:00 PM – 12:15 PM
HiFi-DRAM: Enabling High-fidelity DRAM Research by Uncovering Sense Amplifiers with IC Imaging
M. Marazzi, T. Sachsenweger, F. Solt, P. Zeng, K. Takashi, M. Yarema, K. Razavi

12:15 PM – 12:30 PM
Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms
Q. Huang, P. Tsai, J. Emer, A. Parashar

12:30 PM – 12:45 PM
A Tale of Two Domains: Exploring Efficient Architecture Design for Truly Autonomous Things
X. Hou, T. Xu, C. Li, C. Xu, J. Liu, Y. Hu, J. Zhao, J. Leng, K. Cheng, M. Guo

12:45 PM – 2:00 PM: Lunch

Location: Pacífico A (floor map)

1:30 PM – 2:00 PM: Posters for Session 2

Location: Foyer Pacífico Central (floor map)

2:00 PM – 3:15 PM

Session Chair: Alexandros Daglis
2:00 PM – 2:15 PM
Determining the Minimum Number of Virtual Networks for Different Coherence Protocols
W. Li, N. Oswald, A. Goens, V. Nagarajan, D. Sorin

2:15 PM – 2:30 PM
A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
J. Tong, A. Itagi, P. Chatarasi, T. Krishna

2:30 PM – 2:45 PM
Waferscale Network Switches
S. Chen, S. Pal, R. Kumar

2:45 PM – 3:00 PM
The Case For Data Centre Hyperloops
G. López-Paradís, I. Hair, S. Kannan, R. Rabbat, P. Murray, A. Lopes, R. Zahedi, W. Zuo, J. Balkind

3:00 PM – 3:15 PM
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMMs
J. Hong, S. Noh, C. Lim, S. Park, J. Kim, H. Kim, Y. Kim, J. Lee
Session Chair: Huiyang Zhou
2:00 PM – 2:15 PM
Compiler Optimization for Bosonic Quantum Computing
J. Zhou, Y. Liu, Y. Shi, A. Javadi-Abhari, G. Li

2:15 PM – 2:30 PM
Tetris: A Compilation Framework for VQA Applications in Quantum Computing
Y. Jin, Z. Li, F. Hua, T. Hao, H. Zhou, Y. Huang, E. Zhang

2:30 PM – 2:45 PM
Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays
H. Wang, P. Liu, D. Tan, Y. Liu, J. Gu, D. Pan, J. Cong, U. Acar, S. Han

2:45 PM – 3:00 PM
Suppressing Correlated Noise in Quantum Computers via Context-Aware Compiling
A. Seif, H. Liao, V. Tripathi, K. Krsulich, P. Jurcevic, M. Malekakhlagh, A. Javadi-Abhari

3:00 PM – 3:15 PM
A SAT Scalpel for Lattice Surgery
D. Tan, M. Niu, C. Gidney

3:15 PM – 3:45 PM: Break

3:15 PM – 3:45 PM: Posters for Sessions 3A, 3B

Location: Foyer Pacífico Central (floor map)

3:45 PM – 5:00 PM

Session Chair: Jongse Park
3:45 PM – 4:00 PM
PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models
Y. Lee, H. Kim, M. Rhu

4:00 PM – 4:15 PM
pSyncPIM: Partially Synchronous Execution of Sparse Matrix Operations for All-bank PIM Architectures
D. Baek, S. Hwang, J. Huh

4:15 PM – 4:30 PM
NDSearch: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing
Y. Wang, S. Li, Q. Zheng, L. Song, Z. Li, A. Chang, H. Li, Y. Chen

4:30 PM – 4:45 PM
Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing
H. Liu, L. Zheng, Y. Huang, J. Zhou, C. Liu, R. Wang, X. Liao, H. Jin, J. Xue

4:45 PM – 5:00 PM
Exploiting Similarity Opportunity of Emerging AI Models on 3D Hybrid Bonding Architecture
Z. Yue, H. Wang, J. Fang, J. Deng, G. Lu, F. Tu, R. Guo, Y. Li, Y. Qin, Y. Wang, C. Li, H. Han, S. Wei, Y. Hu, S. Yin
Session Chair: Anand Sivasubramaniam
3:45 PM – 4:00 PM
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Y. Choi, J. Kim, M. Rhu

4:00 PM – 4:15 PM
Derm: SLA-aware Resource Management for Highly Dynamic Microservices
L. Chen, S. Luo, C. Lin, Z. Mo, H. Xu, K. Ye, C. Xu

4:15 PM – 4:30 PM
SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud
J. Stojkovic, P. Misra, Í. Goiri, S. Whitlock, E. Choukse, M. Das, C. Bansal, J. Lee, Z. Sun, H. Qiu, R. Zimmermann, S. Samal, B. Warrier, A. Raniwala, R. Bianchini

4:30 PM – 4:45 PM
Designing Cloud Servers for Lower Carbon
J. Wang, D. Berger, F. Kazhamiaka, C. Irvene, C. Zhang, E. Choukse, K. Frost, R. Fonseca, B. Warrier, C. Bansal, J. Stern, R. Bianchini, A. Sriraman

4:45 PM – 5:00 PM
EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency
J. Stojkovic, N. Iliakopoulou, T. Xu, H. Franke, J. Torrellas

5:00 PM – 5:30 PM: Break

5:00 PM – 5:30 PM: Posters for Sessions 4A, 4B

Location: Foyer Pacífico Central (floor map)

5:30 PM – 6:30 PM

Session Chair: Matt Sinclair
5:30 PM – 5:45 PM
AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures
J. Rogers, T. Soliman, M. Jahre

5:45 PM – 6:00 PM
FireAxe: Partitoned FPGA-Accelerated Simulation of Large-Scale RTL Designs
J. Whangbo, E. Lim, C. Zhang, K. Anderson, A. Gonzalez, R. Gupta, N. Krishnakumar, S. Karandikar, B. Nikolic, Y. Shao, K. Asanovic

6:00 PM – 6:15 PM
Harpocrates: Breaking the Silence of CPU Faults through Hardware-in-the-Loop Program Generation
N. Karystinos, O. Chatzopoulos, G. Fragkoulis, G. Papadimitriou, D. Gizopoulos, S. Gurumurthi

6:15 PM – 6:30 PM
The Dataflow Abstract Machine Simulator Framework
N. Zhang, R. Lacouture, G. Sohn, P. Mure, Q. Zhang, F. Kjolstad, K. Olukotun
Session Chair: Aporva Amarnath
5:30 PM – 5:45 PM
Tartan: Microarchitecting a Robotic Processor
M. Bakhshalipour, P. Gibbons

5:45 PM – 6:00 PM
Collision Prediction for Robotics Accelerators
D. Shah, T. Aamodt

6:00 PM – 6:15 PM
BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing
S. Han, S. Moon, T. Suh, J. Heo, J. Kim

6:15 PM – 6:30 PM
QUETZAL: Vector Acceleration Framework For Modern Genome Sequence Analysis
J. Pavon, I. Valdivieso, C. Morales, C. Hernandez, M. Aslan, J. Lindegger, Y. Yuan, R. Bagué, M. Alser, O. Mutlu, S. Marco-Sola, O. Ergin, N. Talati, M. Valero, O. Unsal, A. Cristal

6:30 PM – 7:00 PM: Break

6:30 PM – 7:00 PM: Posters for Sessions 5A, 5B

Location: Foyer Pacífico Central (floor map)

7:00 PM – 8:00 PM: Business Meeting

Location: Pacífico B (floor map)


Day 2: Tuesday, July 2

9:00 AM – 10:00 AM: Plenary session

Abstract
With the exponential growth of data, widespread use of AI, and digitization of everything, we've seen an increase in the frequency and sophistication of cyberattacks. While the industry has responded with mitigations and frequent patches, it appears that the security threats are outracing solutions. This talk will focus on novel, long-term approaches, and architecture enhancements to improve the trustworthiness of hardware platforms. Topics include responsibly securing AI models, increasing security in the post-quantum era, and achieving the pinnacle of data privacy with encrypted computing.

Speaker
Sridhar R. Iyengar is a vice president at Intel Labs and the director of security and privacy research at Intel Corporation. He is responsible for innovations in security and privacy that differentiate Intel products and establish trustworthiness as a fundamental value on all Intel platforms. His areas of research include new security architectures and solutions to protect confidentiality, integrity, identity, and privacy. He has earned a bachelor's degree in electrical engineering from the Indian Institute of Technology, Madras, India, and a master's degree in computer science from the University of Wisconsin, Madison.

10:05 AM – 11:20 AM

Session Chair: Mingyu Gao
10:05 AM – 10:20 AM
HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing
J. Huang, J. Lou, S. Vanavasam, X. Kong, H. Ji, I. Jeong, E. Lee, D. Zhuo, N. Kim

10:20 AM – 10:35 AM
NDPBridge: Enabling Cross-Bank Coordination in Near-DRAM-Bank Processing Architectures
B. Tian, Y. Li, L. Jiang, S. Cai, M. Gao

10:35 AM – 10:50 AM
UM-PIM: DRAM-based PIM with Uniform & Shared Memory Space
Y. Zhao, M. Gao, F. Liu, Y. Hu, Z. Wang, H. Lin, J. Li, H. Xian, H. Dong, T. Yang, N. Jing, X. Liang, L. Jiang

10:50 AM – 11:05 AM
MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing
N. Ghiasi, M. Sadrosadati, H. Mustafa, A. Gollwitzer, C. Firtina, J. Eudine, H. Mao, J. Lindegger, M. Cavlak, M. Alser, J. Park, O. Mutlu

11:05 AM – 11:20 AM
On Error Correction for Nonvolatile PiM
H. Cilasun, S. Resch, Z. Chowdhury, M. Zabihi, Y. Lv, B. Zink, J. Wang, S. Sapatnekar, U. Karpuzcu
Session Chair: Dimitrios Skarlatos
10:05 AM – 10:20 AM
MetaLeak: Uncovering Side Channels in Secure Processor Architectures Exploiting Metadata
M. Chowdhuryy, H. Zheng, F. Yao

10:20 AM – 10:35 AM
sNPU: Trusted Execution Environments on Integrated NPUs
E. Feng, D. Feng, D. Du, Y. Xia, H. Chen

10:35 AM – 10:50 AM
Counter-light Memory Encryption
X. Wang, W. Xiong, J. Kotra, A. Jones, X. Jian

10:50 AM – 11:05 AM
Perspective: A Principled Framework for Pliable and Secure Speculation in Operating Systems
T. Kim, D. Rudo, K. Zhao, Z. Zhao, D. Skarlatos

11:05 AM – 11:20 AM
HEAP: A Fully Homomorphic Encryption Accelerator with Parallelized Bootstrapping
R. Agrawal, A. Chandrakasan, A. Joshi
Session Chair: Avi Mendelson
10:05 AM – 10:20 AM
SPD: Open-Source RISC-V Manycore with Scalable Resource Organization
D. Jung, M. Ruttenberg, P. Gao, S. Davidson, D. Petrisko, K. Li, A. Kamath, L. Cheng, S. Xie, P. Pan, Z. Zhao, Z. Yue, B. Veluri, S. Muralitharan, A. Sampson, A. Lumsdaine, Z. Zhang, C. Batten, M. Oskin, D. Richmond, M. Taylor

10:20 AM – 10:35 AM
HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks & SmartNICs
A. Kokolis, A. Psistakis, B. Reidys, J. Huang, J. Torrellas

10:35 AM – 10:50 AM
BlitzCoin: Fully Decentralized Hardware Power Management for Accelerator-Rich SoCs
M. Cochet, K. Swaminathan, E. Loscalzo, J. Zuckerman, M. Santos, D. Giri, A. Buyuktosunoglu, T. Jia, D. Brooks, G. Wei, K. Shepard, L. Carloni, P. Bose

10:50 AM – 11:05 AM
Exploring System-Aware Parallelization for Efficient Large-Scale Machine Learning MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
S. Hsia, A. Golden, B. Acun, N. Ardalani, Z. DeVito, G. Wei, D. Brooks, C. Wu

11:05 AM – 11:20 AM
Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs
Y. Feng, S. Na, H. Kim, H. Jeon

11:20 AM – 11:45 AM: Break

11:20 AM – 11:45 AM: Posters for Sessions 6A, 6B, 6C

Location: Foyer Pacífico Central (floor map)

11:45 AM – 12:45 PM

Session Chair: John Carter
11:45 AM – 12:00 PM
Intel Accelerator Ecosystem: An SoC-Oriented Perspective
Y. Yuan, R. Wang, N. Ranganathan, N. Rao, S. Kumar, P. Lantz, V. Sanjeepan, J. Cabrera, A. Kwatra, R. Sankaran, I. Jeong, N. Kim

12:00 PM – 12:15 PM
Circular Reconfigurable Parallel Processor for Edge Computing
Y. Li, J. Zhu, Y. Fu, Y. Lei, T. Nagata, R. Braidwood, H. Fu, J. Zheng, W. Luk, H. Fan

12:15 PM – 12:30 PM
Realizing the AMD Exascale Heterogeneous Processor Vision
A. Smith, G. Loh, M. Schulte, M. Ignatowski, S. Naffziger, M. Mantor, M. Fowler, N. Kalyanasundharam, V. Alla, N. Malaya, J. Greathouse, E. Chapman, R. Swaminathan

12:30 PM – 12:45 PM
TCP: A Tensor Contraction Processor for AI Workloads
H. Kim, Y. Choi, J. Park, B. Bae, H. Jeong, S. Lee, J. Yeon, M. Kim, C. Park, B. Gu, C. Lee, J. Bae, S. Bae, Y. Cha, W. Choe, J. Choi, J. Ha, H. Han, N. Hwang, S. Hwang, K. Jang, H. Je, H. Jeon, J. Jeon, H. Jeong, Y. Jung, D. Kang, H. Kim, M. Kim, M. Kim, S. Kim, S. Kim, W. Kim, Y. Kim, Y. Kim, Y. Ku, J. Lee, J. Lee, K. Lee, S. Lee, M. Noh, H. Oh, G. Park, S. Park, J. Seo, J. Seong, J. Paik, N. Lopes, S. Yoo

12:45 PM – 2:45 PM: AWARDS LUNCH

Location: Pacífico A (floor map)

2:45 PM – 3:45 PM

Session Chair: Krishna Tushar
2:45 PM – 3:00 PM
Cambricon-D: Full-Network Differential Acceleration for Diffusion Models
W. Kong, Y. Hao, Y. Zhao, X. Song, X. Li, M. Zou, R. Zhang, C. Liu, Y. Wen, P. Jin, X. Hu, W. Li, Z. Du, Q. Guo, Z. Xu, T. Chen

3:00 PM – 3:15 PM
Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation
X. Pan, Y. An, S. Liang, B. Mao, M. Zhang, Q. Li, M. Jung, J. Zhang

3:15 PM – 3:30 PM
Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications
Y. Yang, J. Emer, D. Sanchez

3:30 PM – 3:45 PM
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
K. Shivdikar, N. Agostini, M. Jayaweera, G. Jonatan, J. Abellán, A. Joshi, J. Kim, D. Kaeli
Session Chair: Jaewoong Sim
2:45 PM – 3:00 PM
Compiler-Directed Whole-System Persistence
J. Zeng, T. Zhang, C. Jung

3:00 PM – 3:15 PM
Memento: An Adaptive, Compiler-Assisted Register File Cache for GPUs
M. Shoushtary, J. Arnau, J. Murgadas, A. Gonzalez

3:15 PM – 3:30 PM
Soter: Analytical Tensor-Architecture Modeling and Automatic Tensor Program Tuning for Spatial Accelerators
F. Wang, M. Shen, Y. Ding, N. Xiao

3:30 PM – 3:45 PM
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
Y. Zhao, D. Wu, J. Wang

3:45 PM – 4:15 PM: Break

3:45 PM – 4:15 PM: Posters for Sessions 8A, 8B

Location: Foyer Pacífico Central (floor map)

4:15 PM – 5:15 PM

Title
Panel: Designing Computer Systems for Sustainability

Panel Description
Digital technologies have enabled a plethora of new applications. The dramatic increase in the amount of compute per person has unlocked significant economic growth and improved the quality of many aspects of our lives. Despite the positive societal benefits, as computing becomes increasingly ubiquitous, so does its environmental footprint. This panel provides us the space to examine the environmental implications of computing from the ever-increasing energy use and greenhouse gas (GHG) emissions among others. We will look into the environmental footprint of computing across its overall lifecycle holistically and discuss how the balancing of carbon footprint between operational and embodied carbon opens new directions and development opportunities for a sustainable hardware-software ecosystem. What tools and metrics do we need to enable low-carbon computer systems by putting environmental sustainability as a first design principle? Can and how can we as an industry and research community do more to reduce the environmental impact of computers? What underinvested research directions should the community focus on in order to build environmentally-sustainable computer systems for the next decades to come?

Moderator
Carole-Jean Wu (Meta)

Panelists
Tamar Eilam (IBM Research), Babak Falsafi (EPFL / SDEA), Gage Hills (Harvard University), Bobbie Manne (AMD)

Bio
Tamar Eilam is an IBM Fellow and Chief Scientist for Sustainable Computing in the IBM T. J. Watson Research Center, New York. Tamar is leading research aimed at drastically reducing the carbon footprint associated with computing across infrastructure, systems, and software, data and AI. Tamar completed a Ph.D. degree in Computer Science in the Technion, Israel, in 2000. She joined the IBM T.J. Watson Research Center in New York as a Research Staff Member that same year. She was recognized as an IBM Fellow in 2014.

Babak Falsafi is a Professor in the School of Computer and Communication Sciences, the founding president of Swiss Datacenter Efficiency Association (SDEA) an industrial/academic consortium certifying full-stack efficiency and emissions in datacenter operation, and the founder of EcoCloud, a research center at EPFL investigating sustainable information technology since 2012. He has made numerous contributions to cloud-native technologies including a workload-optimized CPU design that laid the foundation for the first generation of Cavium ARM server CPUs, ThunderX. He is a recipient of an Alfred P. Sloan Research Fellowship, and a Fellow of ACM and IEEE.

Gage Hills is an Assistant Professor of Electrical Engineering in Harvard’s School of Engineering and Applied Sciences (SEAS), where he leads the Nano-Design Research Group. His research focuses on developing energy-efficient and environmentally sustainable computing systems, by combining new technology advances across nanomaterials, devices, sensors, circuits, architectures, and integration techniques. Before Harvard, he finished his PhD at Stanford in 2018 and spent a few years as a post-doc at MIT.

Srilatha (Bobbie) Manne received her PhD in 1999 and has spent over two decades as a computer architect working on the intersection between power and performance. She has worked at industrial research labs and product teams at companies such as Intel, AMD, Cavium, and Microsoft. She has over 30 publications and over 40 patents pending or granted. She is also active in the academic community, serving on PCs for all major architecture conferences and as the General Chair of ISCA 2019. She is currently a Senior Fellow at AMD Research and Advanced Development (RAD), analyzing efficiency and sustainability issues in CPU and GPU designs.

Carole-Jean Wu is a Director at Meta. She is a founding member and a Vice President of MLCommons—a non-profit organization that aims to accelerate machine learning for the benefit of all. Prior to Meta/Facebook, She was a tenured professor at ASU. Dr. Wu's work spans across datacenter infrastructures and edge systems with emphasis on efficiency and sustainability. Her work has been recognized with IEEE Micro Top Picks and ACM/IEEE Best Paper Awards as well as the prestigious NSF CAREER award. She earned her PhD degree from Princeton.

7:00 PM – 10:00 PM: Gala Dinner & Show

Journey through the glory days of Buenos Aires with a memorable tango show and gala dinner at Tango Porteño, one of Argentina's most renowned tango venues. Shine your shoes!



Day 3: Wednesday, July 3

9:00 AM – 10:00 AM: Plenary session

Abstract
The compute demands of AI and robotics continue to rise due to the rapidly growing volume of data to be processed; the increasingly complex algorithms for higher quality of results; and the demands for energy efficiency and real-time performance. In this talk, we will discuss the design of efficient tailored hardware accelerators and the co-design of algorithms and hardware that reduce the energy consumption while delivering swift real-time and robust performance for applications including deep neural networks, data analytics with sparse tensor algebra, and autonomous navigation. Throughout the talk, we will highlight important design principles, methodologies, and tools that can facilitate an effective design process and various forms of co-design that can broaden the design space.

Speaker
Vivienne Sze is a Professor in the Electrical Engineering and Computer Science Department at MIT. She works on computing systems that enable energy-efficient machine learning, computer vision, and video compression/processing for a wide range of applications, including autonomous navigation, digital health, and the internet of things. Her work has been recognized by various awards, including faculty awards from Google, Facebook, and Qualcomm, the Symposium on VLSI Circuits Best Student Paper Award, the IEEE Custom Integrated Circuits Conference Outstanding Invited Paper Award, the IEEE Micro Top Picks Award and the International Symposium on Performance Analysis of Systems and Software Best Paper Award. As a member of the Joint Collaborative Team on Video Coding, she received the Primetime Engineering Emmy Award for the development of the High-Efficiency Video Coding video compression standard. She is a co-editor of High Efficiency Video Coding (HEVC): Algorithms and Architectures (Springer, 2014) and co-author of Efficient Processing of Deep Neural Networks (Synthesis Lectures on Computer Architecture, Morgan Claypool, 2020). For more information about Prof. Sze's research, please visit http://sze.mit.edu.

10:05 AM – 11:20 AM

Session Chair: Sumanth Gudaparthi
10:05 AM – 10:20 AM
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
R. Hwang, J. Wei, S. Cao, C. Hwang, X. Tang, T. Cao, M. Yang

10:20 AM – 10:35 AM
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition
Y. Qin, Y. Wang, Z. Zhao, X. Yang, Y. Zhou, S. Wei, Y. Hu, S. Yin

10:35 AM – 10:50 AM
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
J. Lee, W. Lee, J. Sim

10:50 AM – 11:05 AM
Heterogeneous Acceleration Pipeline for Recommendation System Training
M. Adnan, Y. Maboud, D. Mahajan, P. Nair

11:05 AM – 11:20 AM
LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
H. Zhang, A. Ning, R. Prabhakar, D. Wentzlaff
Session Chair: Xun Jian
10:05 AM – 10:20 AM
DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands
H. Nam, S. Baek, M. Wi, M. Kim, J. Park, C. Song, N. Kim, J. Ahn

10:20 AM – 10:35 AM
(MC)^2: Lazy MemCopy at the Memory Controller
A. Kamath, S. Peter

10:35 AM – 10:50 AM
DyLeCT: Achieving Huge-page-like Translation Performance For Hardware-compressed Memory
G. Panwar, M. Laghari, E. Choukse, X. Jian

10:50 AM – 11:05 AM
Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers
Y. Ryu, Y. Kim, G. Jung, J. Ahn, J. Kim

11:05 AM – 11:20 AM
PrIDE: Achieving Secure Rowhammer Mitigation with Low-Cost In-DRAM Trackers
A. Jaleel, G. Saileshwar, S. Keckler, M. Qureshi

11:20 AM – 11:45 AM: Break

11:20 AM – 11:45 AM: Posters for Sessions 9A, 9B, 10A, 10B

Location: Foyer Pacífico Central (floor map)

11:45 AM – 1:00 PM

Session Chair: Gilles Pokam
11:45 AM – 12:00 PM
A New Formulation of Neural Data Prefetching
Q. Duong, A. Jain, C. Lin

12:00 PM – 12:15 PM
UDP: Utility-Driven Fetch Directed Instruction Prefetching
S. Oh, M. Xu, T. Khan, B. Kasikci, H. Litz

12:15 PM – 12:30 PM
Triangel: A High-Performance, Accurate, Timely, On-Chip Temporal Prefetcher
S. Ainsworth, L. Mukhanov

12:30 PM – 12:45 PM
Alternate Path Fetch
A. Deshmukh, C. Cai, Y. Patt

12:45 PM – 1:00 PM
Alternate Path µ-op Cache Prefetching
S. Singh, A. Perais, A. Jimborean, A. Ros
Session Chair: Bahar Asgari
11:45 AM – 12:00 PM
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
Y. Kim, C. Oh, J. Hwang, W. Kim, S. Oh, Y. Lee, H. Sharma, A. Yazdanbakhsh, J. Park

12:00 PM – 12:15 PM
BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling
Y. Feng, T. Ma, Y. Zhu, X. Zhang

12:15 PM – 12:30 PM
BitNN: A Bit-Serial Accelerator for K-Nearest Neighbor Search in Point Clouds
M. Han, L. Wang, L. Xiao, H. Zhang, T. Cai, J. Xu, Y. Wu, C. Zhang, X. Xu

12:30 PM – 12:45 PM
Cicero: Real-Time Neural Rendering by Radiance Warping and Memory Optimizations
Y. Feng, Z. Liu, J. Leng, M. Guo, Y. Zhu

12:45 PM – 1:00 PM
GameStreamSR: Enabling Neural-Augmented Game Streaming on Commodity Mobile Platforms
S. Bhuyan, Z. Ying, M. Kandemir, M. Gowda, C. Das