Technology Vendors - MemCon | Kisaco Research

Technology Vendors - MemCon

Memory Con
March 2025
Silicon Valley, CA

Why Should Technology Vendors Attend MemCon 2024?

We attract technology vendors from across the edge infrastructure, emerging technology, processor/chips and memory/storage/interconnect sectors, with companies including VMWare, Broadcom, Cisco, Intel, AMD, Micron and Rambus. These vendors are coming together to:

  • Undestand end-user requirements - from AI vendors, hyperscalers and system vendors.

  • Understand future architectures to further improve their technologies

  • Find new pools of customers

If you'd like to find out more information about attending as an AI vendors, register your interest here

CONFIRM YOUR PLACE HERE

Featured Speakers Include

Author:

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Author:

Petr Lapukhov

Network Engineer
NVIDIA

Petr Lapukhov is a Network Engineer at Meta. He has 20+ years in the networking industry, designing and operating large scale networks. He has a depth of experience in developing and operating software for network control and monitoring. His past experience includes CCIE/CCDE training and UNIX system administration.

Petr Lapukhov

Network Engineer
NVIDIA

Petr Lapukhov is a Network Engineer at Meta. He has 20+ years in the networking industry, designing and operating large scale networks. He has a depth of experience in developing and operating software for network control and monitoring. His past experience includes CCIE/CCDE training and UNIX system administration.

Author:

Tejas Chopra

Senior Engineer of Software
Netflix

Tejas Chopra is a Sr. Engineer at Netflix working on Machine Learning Platform for Netflix Studios and a Founder at GoEB1 which is the world’s first and only thought leadership platform for immigrants.Tejas is a recipient of the prestigious EB1A (Einstein) visa in US. Tejas is a Tech 40 under 40 Award winner, a TEDx speaker, a Senior IEEE Member, an ACM member, and has spoken at conferences and panels on Cloud Computing, Blockchain, Software Development and Engineering Leadership.Tejas has been awarded the ‘International Achievers Award, 2023’ by the Indian Achievers’ Forum. He is an Adjunct Professor for Software Development at University of Advancing Technology, Arizona, an Angel investor and a Startup Advisor to startups like Nillion. He is also a member of the Advisory Board for Flash Memory Summit.Tejas’ experience has been in companies like Box, Apple, Samsung, Cadence, and Datrium. Tejas holds a Masters Degree in ECE from Carnegie Mellon University, Pittsburgh.

Tejas Chopra

Senior Engineer of Software
Netflix

Tejas Chopra is a Sr. Engineer at Netflix working on Machine Learning Platform for Netflix Studios and a Founder at GoEB1 which is the world’s first and only thought leadership platform for immigrants.Tejas is a recipient of the prestigious EB1A (Einstein) visa in US. Tejas is a Tech 40 under 40 Award winner, a TEDx speaker, a Senior IEEE Member, an ACM member, and has spoken at conferences and panels on Cloud Computing, Blockchain, Software Development and Engineering Leadership.Tejas has been awarded the ‘International Achievers Award, 2023’ by the Indian Achievers’ Forum. He is an Adjunct Professor for Software Development at University of Advancing Technology, Arizona, an Angel investor and a Startup Advisor to startups like Nillion. He is also a member of the Advisory Board for Flash Memory Summit.Tejas’ experience has been in companies like Box, Apple, Samsung, Cadence, and Datrium. Tejas holds a Masters Degree in ECE from Carnegie Mellon University, Pittsburgh.

Author:

Galen Shipman

Computer Scientist
Los Alamos National Laboratories

Galen Shipman is a computer scientist at Los Alamos National Laboratory (LANL). His interests include programming models, scalable runtime systems, and I/O.  As Chief Architect he leads architecture and technology of Advanced Technology Systems (ATS) at LANL. He has led performance engineering across LANL’s multi-physics integrated codes and the advancement and integration of next-generation programming models such as the Legion programming system as part of LANL's next-generation code project, Ristra. His work in storage systems and I/O is currently focused on composable micro-services as part of the Mochi project. His prior work in scalable software for HPC include major contributions to broadly used technologies including the Lustre parallel file system and Open MPI.

Galen Shipman

Computer Scientist
Los Alamos National Laboratories

Galen Shipman is a computer scientist at Los Alamos National Laboratory (LANL). His interests include programming models, scalable runtime systems, and I/O.  As Chief Architect he leads architecture and technology of Advanced Technology Systems (ATS) at LANL. He has led performance engineering across LANL’s multi-physics integrated codes and the advancement and integration of next-generation programming models such as the Legion programming system as part of LANL's next-generation code project, Ristra. His work in storage systems and I/O is currently focused on composable micro-services as part of the Mochi project. His prior work in scalable software for HPC include major contributions to broadly used technologies including the Lustre parallel file system and Open MPI.

Agenda Highlights


Opening Keynote: How Data and Workloads are Changing the Design of Systems, Clusters and Datacenters

Author:

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Memory Optimizations for Machine Learning

As Machine Learning continues to forge its way into diverse industries and applications, optimizing computational resources, particularly memory, has become a critical aspect of effective model deployment. This session, "Memory Optimizations for Machine Learning," aims to offer an exhaustive look into the specific memory requirements in Machine Learning tasks and the cutting-edge strategies to minimize memory consumption efficiently.
We'll begin by demystifying the memory footprint of typical Machine Learning data structures and algorithms, elucidating the nuances of memory allocation and deallocation during model training phases. The talk will then focus on memory-saving techniques such as data quantization, model pruning, and efficient mini-batch selection. These techniques offer the advantage of conserving memory resources without significant degradation in model performance.
Additional insights into how memory usage can be optimized across various hardware setups, from CPUs and GPUs to custom ML accelerators, will also be presented. 

Author:

Tejas Chopra

Senior Engineer of Software
Netflix

Tejas Chopra is a Sr. Engineer at Netflix working on Machine Learning Platform for Netflix Studios and a Founder at GoEB1 which is the world’s first and only thought leadership platform for immigrants.Tejas is a recipient of the prestigious EB1A (Einstein) visa in US. Tejas is a Tech 40 under 40 Award winner, a TEDx speaker, a Senior IEEE Member, an ACM member, and has spoken at conferences and panels on Cloud Computing, Blockchain, Software Development and Engineering Leadership.Tejas has been awarded the ‘International Achievers Award, 2023’ by the Indian Achievers’ Forum. He is an Adjunct Professor for Software Development at University of Advancing Technology, Arizona, an Angel investor and a Startup Advisor to startups like Nillion. He is also a member of the Advisory Board for Flash Memory Summit.Tejas’ experience has been in companies like Box, Apple, Samsung, Cadence, and Datrium. Tejas holds a Masters Degree in ECE from Carnegie Mellon University, Pittsburgh.

Tejas Chopra

Senior Engineer of Software
Netflix

Tejas Chopra is a Sr. Engineer at Netflix working on Machine Learning Platform for Netflix Studios and a Founder at GoEB1 which is the world’s first and only thought leadership platform for immigrants.Tejas is a recipient of the prestigious EB1A (Einstein) visa in US. Tejas is a Tech 40 under 40 Award winner, a TEDx speaker, a Senior IEEE Member, an ACM member, and has spoken at conferences and panels on Cloud Computing, Blockchain, Software Development and Engineering Leadership.Tejas has been awarded the ‘International Achievers Award, 2023’ by the Indian Achievers’ Forum. He is an Adjunct Professor for Software Development at University of Advancing Technology, Arizona, an Angel investor and a Startup Advisor to startups like Nillion. He is also a member of the Advisory Board for Flash Memory Summit.Tejas’ experience has been in companies like Box, Apple, Samsung, Cadence, and Datrium. Tejas holds a Masters Degree in ECE from Carnegie Mellon University, Pittsburgh.

Indirect/Irregular Workloads within Large Simulations and How to Improve Access through Co-Design

Los Alamos National Laboratory's (LANL) has a diverse set of High Performance Computing codes. Analysis of many of these codes indicate they are heavily memory bound with sparse memory accesses. High Bandwidth Memory (HBM) has proven a significant advancement in improving the performance of these codes but the roadmap for major (step function) improvements in memory technologies is unclear. Addressing this challenge will require a renewed focus on high performance memory and processor technologies that take a more aggressive and holistic view of advancements in ISA, microarchitecture, and memory controller technologies. Beyond scientific simulations, advancements in performance of sparse memory accesses will benefit graph analysis, DLRM inference, and database workloads.

Author:

Galen Shipman

Computer Scientist
Los Alamos National Laboratories

Galen Shipman is a computer scientist at Los Alamos National Laboratory (LANL). His interests include programming models, scalable runtime systems, and I/O.  As Chief Architect he leads architecture and technology of Advanced Technology Systems (ATS) at LANL. He has led performance engineering across LANL’s multi-physics integrated codes and the advancement and integration of next-generation programming models such as the Legion programming system as part of LANL's next-generation code project, Ristra. His work in storage systems and I/O is currently focused on composable micro-services as part of the Mochi project. His prior work in scalable software for HPC include major contributions to broadly used technologies including the Lustre parallel file system and Open MPI.

Galen Shipman

Computer Scientist
Los Alamos National Laboratories

Galen Shipman is a computer scientist at Los Alamos National Laboratory (LANL). His interests include programming models, scalable runtime systems, and I/O.  As Chief Architect he leads architecture and technology of Advanced Technology Systems (ATS) at LANL. He has led performance engineering across LANL’s multi-physics integrated codes and the advancement and integration of next-generation programming models such as the Legion programming system as part of LANL's next-generation code project, Ristra. His work in storage systems and I/O is currently focused on composable micro-services as part of the Mochi project. His prior work in scalable software for HPC include major contributions to broadly used technologies including the Lustre parallel file system and Open MPI.

How to Improve Data Movement using Accelerated Networks? (CXL, PCIe, Infiniband, Ethernet, Optical)

Author:

Stephen Bates

VP & Chief Architect, Emerging Storage Systems
Huawei

Stephen is the VP and  Chief Architect of Emerging Storage Systems at Huawei's Toronto Emerging Storage Lab. He and his team research all aspects of next-generation storage systems from media to programming interfaces to filesystems to virtualized storage to applications.

Stephen is an expert in performance storage, persistent and non-volatile memory, computer networking, signal processing and error correction coding. He is also very active in both the SNIA and NVM Express standard bodies.

Prior to Huawei he was the CTO of Eideticom which is a pioneer company in NVMe-based computational storage. He was also formerly in the CTO office at PMC-Sierra, an Assistant Professor at The Univeristy of Alberta and a Principal Engineer at Massana Inc. Stephen has a PhD from the University of Edinburgh and is a Senior Member of the IEEE.

Stephen Bates

VP & Chief Architect, Emerging Storage Systems
Huawei

Stephen is the VP and  Chief Architect of Emerging Storage Systems at Huawei's Toronto Emerging Storage Lab. He and his team research all aspects of next-generation storage systems from media to programming interfaces to filesystems to virtualized storage to applications.

Stephen is an expert in performance storage, persistent and non-volatile memory, computer networking, signal processing and error correction coding. He is also very active in both the SNIA and NVM Express standard bodies.

Prior to Huawei he was the CTO of Eideticom which is a pioneer company in NVMe-based computational storage. He was also formerly in the CTO office at PMC-Sierra, an Assistant Professor at The Univeristy of Alberta and a Principal Engineer at Massana Inc. Stephen has a PhD from the University of Edinburgh and is a Senior Member of the IEEE.

Author:

Paul Crumley

Senior Technical Staff Member
IBM Research

Paul G Crumley, a Senior Technical Staff Member at IBM Research, enjoys creating systems to solve problems beyond the reach of current technology.

 

Paul’s current project integrates secure, compliant AI capabilities with enterprise Hybrid Cloud allowing clients to extract new business value from their data.

 

Paul’s previous work includes the design and construction of distributed, and high-performance computing systems at CMU, Transarc, and IBM Research. Projects include The Andrew Project at CMU, ASCI White, IBM Global Storage Architecture, Blue Gene Supercomputers, IBM Cloud, and IBM Cognitive Systems. Paul has managed data centers, and brings his first-hand knowledge of these environments, combined with experience of automation and robustness, to the design of AI for Hybrid Cloud infrastructure.

Paul Crumley

Senior Technical Staff Member
IBM Research

Paul G Crumley, a Senior Technical Staff Member at IBM Research, enjoys creating systems to solve problems beyond the reach of current technology.

 

Paul’s current project integrates secure, compliant AI capabilities with enterprise Hybrid Cloud allowing clients to extract new business value from their data.

 

Paul’s previous work includes the design and construction of distributed, and high-performance computing systems at CMU, Transarc, and IBM Research. Projects include The Andrew Project at CMU, ASCI White, IBM Global Storage Architecture, Blue Gene Supercomputers, IBM Cloud, and IBM Cognitive Systems. Paul has managed data centers, and brings his first-hand knowledge of these environments, combined with experience of automation and robustness, to the design of AI for Hybrid Cloud infrastructure.

Memory Optimizations for Large Language Models: From Training to Inference

Large Language Models (LLMs) have revolutionized natural language processing but have posed significant challenges in training and inference due to their enormous memory requirements. In this talk, we delve into techniques and optimizations to mitigate memory constraints across the entire lifecycle of LLMs.

The first segment explores Memory Optimized LLM Training. We discuss Training challenges and cover different techniques under Parameter Efficient Fine Tuning (PEFT). like prompt tuning with LoRA, and adapters.

LLMs inference is more memory bound rather than compute bound, In this section we will explore inference optimizations mostly for transformer architectures like Paged Key-Value (KV) Cache, Speculative Decoding, Quantization, Inflight Batching strategies, Flash Attention, each contributing to enhanced inference speed and efficiency.

Finally, we explore the concept of Coherent Memory, and how it helps with Inference optimizations by KV Cache offloading and LoRA weight re-computation.

By illuminating these advancements, this talk aims to provide a comprehensive understanding of state-of-the-art memory optimization techniques for LLMs, empowering practitioners to push the boundaries of natural language processing further.

Author:

Arun Raman

Deep Learning Solutions Architect
NVIDIA

Arun Raman is an AI solution architect at NVIDIA, adept at navigating the intricate challenges of deploying AI applications across edge, cloud, and on-premises environments within the consumer Internet industry. In his current role, he works on the design of end-to-end accelerated AI pipelines, for consumer internet customers meticulously addressing preprocessing, training, and inference optimizations.  His experience extends beyond AI, having worked with distributed systems and multi-cloud infrastructure. He shares practical strategies and real-world experiences, empowering organizations to leverage AI effectively.

Arun Raman

Deep Learning Solutions Architect
NVIDIA

Arun Raman is an AI solution architect at NVIDIA, adept at navigating the intricate challenges of deploying AI applications across edge, cloud, and on-premises environments within the consumer Internet industry. In his current role, he works on the design of end-to-end accelerated AI pipelines, for consumer internet customers meticulously addressing preprocessing, training, and inference optimizations.  His experience extends beyond AI, having worked with distributed systems and multi-cloud infrastructure. He shares practical strategies and real-world experiences, empowering organizations to leverage AI effectively.