Hyperscalers - MemCon | Kisaco Research

Hyperscalers - MemCon

Memory Con
March 2025
Silicon Valley, CA

Why Should Hyperscalers Attend MemCon 2024?

We attract hyperscales from the likes of Google, Amazon, Meta, Microsoft, Alibaba and more as they come together to:

  • Understand emerging technologies - meeting start-ups and technology vendors releasing new products and understanding how they can buy these products to improve their systems architecture.

  • Understand end-user issues - learning from end-users of the technology (AI vendors and enterprises) on what particular issues they are having when implementing and running applications.

  • Connect with technology vendors and research labs - creating partnerships with various tech vendors and labs.

If you'd like to find out more information about attending as an AI vendors, register your interest here

CONFIRM YOUR PLACE HERE

Featured Speakers Include

Author:

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Author:

Tirthankar Lahiri

SVP, Data & In-Memory Technologies
Oracle

Tirthankar Lahiri is Vice President of the Data and In-Memory Technologies group for Oracle Database and is responsible for the Oracle Database Engine (including Database In-Memory, Data and Indexes, Space Management, Transactions, and the Database File System), the Oracle TimesTen In-Memory Database, and Oracle NoSQLDB. Tirthankar has 22 years of experience in the Database industry and has worked extensively in a variety of areas including Manageability, Performance, Scalability, High Availability, Caching, Distributed Concurrency Control, In-Memory Data Management, NoSQL architectures, etc. He has 27 issued and has several pending patents in these areas. Tirthankar has a B.Tech in Computer Science from the Indian Institute of Technology (Kharagpur) and an MS in Electrical Engineering from Stanford University.

Tirthankar Lahiri

SVP, Data & In-Memory Technologies
Oracle

Tirthankar Lahiri is Vice President of the Data and In-Memory Technologies group for Oracle Database and is responsible for the Oracle Database Engine (including Database In-Memory, Data and Indexes, Space Management, Transactions, and the Database File System), the Oracle TimesTen In-Memory Database, and Oracle NoSQLDB. Tirthankar has 22 years of experience in the Database industry and has worked extensively in a variety of areas including Manageability, Performance, Scalability, High Availability, Caching, Distributed Concurrency Control, In-Memory Data Management, NoSQL architectures, etc. He has 27 issued and has several pending patents in these areas. Tirthankar has a B.Tech in Computer Science from the Indian Institute of Technology (Kharagpur) and an MS in Electrical Engineering from Stanford University.

Author:

Petr Lapukhov

Network Engineer
NVIDIA

Petr Lapukhov is a Network Engineer at Meta. He has 20+ years in the networking industry, designing and operating large scale networks. He has a depth of experience in developing and operating software for network control and monitoring. His past experience includes CCIE/CCDE training and UNIX system administration.

Petr Lapukhov

Network Engineer
NVIDIA

Petr Lapukhov is a Network Engineer at Meta. He has 20+ years in the networking industry, designing and operating large scale networks. He has a depth of experience in developing and operating software for network control and monitoring. His past experience includes CCIE/CCDE training and UNIX system administration.

Author:

Brett Dodds

Senior Director, Azure Memory Devices
Microsoft

Brett Dodds

Senior Director, Azure Memory Devices
Microsoft

Agenda Highlights


Opening Keynote: How Data and Workloads are Changing the Design of Systems, Clusters and Datacenters

Author:

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Zaid Kahn

VP & GM, Cloud AI & Advanced Systems
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Memory Optimizations for Large Language Models: From Training to Inference

Large Language Models (LLMs) have revolutionized natural language processing but have posed significant challenges in training and inference due to their enormous memory requirements. In this talk, we delve into techniques and optimizations to mitigate memory constraints across the entire lifecycle of LLMs.

The first segment explores Memory Optimized LLM Training. We discuss Training challenges and cover different techniques under Parameter Efficient Fine Tuning (PEFT). like prompt tuning with LoRA, and adapters.

LLMs inference is more memory bound rather than compute bound, In this section we will explore inference optimizations mostly for transformer architectures like Paged Key-Value (KV) Cache, Speculative Decoding, Quantization, Inflight Batching strategies, Flash Attention, each contributing to enhanced inference speed and efficiency.

Finally, we explore the concept of Coherent Memory, and how it helps with Inference optimizations by KV Cache offloading and LoRA weight re-computation.

By illuminating these advancements, this talk aims to provide a comprehensive understanding of state-of-the-art memory optimization techniques for LLMs, empowering practitioners to push the boundaries of natural language processing further.

Author:

Arun Raman

Deep Learning Solutions Architect
NVIDIA

Arun Raman is an AI solution architect at NVIDIA, adept at navigating the intricate challenges of deploying AI applications across edge, cloud, and on-premises environments within the consumer Internet industry. In his current role, he works on the design of end-to-end accelerated AI pipelines, for consumer internet customers meticulously addressing preprocessing, training, and inference optimizations.  His experience extends beyond AI, having worked with distributed systems and multi-cloud infrastructure. He shares practical strategies and real-world experiences, empowering organizations to leverage AI effectively.

Arun Raman

Deep Learning Solutions Architect
NVIDIA

Arun Raman is an AI solution architect at NVIDIA, adept at navigating the intricate challenges of deploying AI applications across edge, cloud, and on-premises environments within the consumer Internet industry. In his current role, he works on the design of end-to-end accelerated AI pipelines, for consumer internet customers meticulously addressing preprocessing, training, and inference optimizations.  His experience extends beyond AI, having worked with distributed systems and multi-cloud infrastructure. He shares practical strategies and real-world experiences, empowering organizations to leverage AI effectively.

How are Memory Innovations Impacting the Total Cost of Ownership in Scaling-Up and Power Consumption

Author:

Helen Byrne

VP, Solution Architect
Graphcore

Helen leads the Solution Architects team at Graphcore, helping innovators build their AI solutions using Graphcore’s Intelligence Processing Units (IPUs). She has been at Graphcore for more than 5 years, previously leading AI Field Engineering and working in AI Research, working on problems in Distributed Machine Learning. Before landing in the technology industry, she worked in Investment Banking. Her background is in Mathematics and she has a MSc in Artificial Intelligence.

Helen Byrne

VP, Solution Architect
Graphcore

Helen leads the Solution Architects team at Graphcore, helping innovators build their AI solutions using Graphcore’s Intelligence Processing Units (IPUs). She has been at Graphcore for more than 5 years, previously leading AI Field Engineering and working in AI Research, working on problems in Distributed Machine Learning. Before landing in the technology industry, she worked in Investment Banking. Her background is in Mathematics and she has a MSc in Artificial Intelligence.