Systems Design Bottlenecks in the Datacenter and in HPC – Where is Memory Innovation Needed? | Kisaco Research

Working back from the question,"what do future systems architectures need to look like?", this panel will investigate the current memory, bandwidth and latency bottlenecks in systems today, and compare and contrast datacenter and HPC examples. In discussing the characteristics, similarities, and differences between various server workloads and use cases, such as AIML co-design, acceleration of scientific workloads and others, this panel will attempt to establish context for why memory innovation is so important.


Session Topics: 
Emerging Memories
External Memory
Systems Design
Use Case
Speaker(s): 
Moderator

Author:

Rob Ober

Chief Platform Architect
NVIDIA

Rob is NVIDIA’s data center Chief Platform Architect, working with Hyperscalers to build GPU clusters for AI and Deep Learning, develop systems and platform architecture, and influence the HW and SW GPU roadmaps at NVIDIA. His interest in AI and DL was driven by its impact on computer science and computer architecture.

With more than 35 years experience, Rob was Senior Fellow of Enterprise Technology at SanDisk / FusionIO, Corporate Fellow and Chief Architect at LSI; Fellow and Architect at AMD; Chief Architect at Infineon; Manager of Technologies at Apple Computer, as well as designer of supercomputers, mainframes, and networks.

Rob has over 40 international patents in processor architecture, storage systems, SSDs, networks, wireless, power management, and mobile devices. He has developed architecture and implementation of CRAY, ARM, PowerPC, ARC, Sparc, TriCore and x86 processors.

Rob Ober

Chief Platform Architect
NVIDIA

Rob is NVIDIA’s data center Chief Platform Architect, working with Hyperscalers to build GPU clusters for AI and Deep Learning, develop systems and platform architecture, and influence the HW and SW GPU roadmaps at NVIDIA. His interest in AI and DL was driven by its impact on computer science and computer architecture.

With more than 35 years experience, Rob was Senior Fellow of Enterprise Technology at SanDisk / FusionIO, Corporate Fellow and Chief Architect at LSI; Fellow and Architect at AMD; Chief Architect at Infineon; Manager of Technologies at Apple Computer, as well as designer of supercomputers, mainframes, and networks.

Rob has over 40 international patents in processor architecture, storage systems, SSDs, networks, wireless, power management, and mobile devices. He has developed architecture and implementation of CRAY, ARM, PowerPC, ARC, Sparc, TriCore and x86 processors.

Author:

Nick Wright

Chief Architect & Head, Advanced Technology Group
NERSC

Nick Wright is the advanced technologies group lead and the NERSC chief architect. He focuses upon evaluating future technologies for potential application in scientific computing. He led the effort to optimize the architecture of the Perlmutter machine, the first NERSC platform designed to meet needs of both large scale simulation and data analysis from experimental facilities. Before moving to NERSC, he was a member of the Performance Modeling and Characterization (PMaC) group at the San Diego Supercomputing Center. He earned both his undergraduate and doctoral degrees in chemistry at the University of Durham in England.

Nick Wright

Chief Architect & Head, Advanced Technology Group
NERSC

Nick Wright is the advanced technologies group lead and the NERSC chief architect. He focuses upon evaluating future technologies for potential application in scientific computing. He led the effort to optimize the architecture of the Perlmutter machine, the first NERSC platform designed to meet needs of both large scale simulation and data analysis from experimental facilities. Before moving to NERSC, he was a member of the Performance Modeling and Characterization (PMaC) group at the San Diego Supercomputing Center. He earned both his undergraduate and doctoral degrees in chemistry at the University of Durham in England.

Author:

Zaid Kahn

GM, Cloud AI & Advanced Systems Engineering
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Zaid Kahn

GM, Cloud AI & Advanced Systems Engineering
Microsoft

Zaid is currently GM in Cloud Hardware Infrastructure Engineering where he leads a team focusing on advanced architecture and engineering efforts for AI. He is passionate about building balanced teams of artists and soldiers that solve incredibly difficult problems at scale.

Prior to Microsoft Zaid was head of infrastructure engineering at LinkedIn responsible for all aspects of engineering for Datacenters, Compute, Networking, Storage and Hardware. He also lead several software development teams spanning from BMC, network operating systems, server and network fleet automation to SDN efforts inside the datacenter and global backbone including edge. He introduced the concept of disaggregation inside LinkedIn and pioneered JDM with multiple vendors through key initiatives like OpenSwitch, Open19 essentially controlling destiny for hardware development at LinkedIn. During his 9 year tenure at LinkedIn his team scaled network and systems 150X, members from 50M to 675M, and hiring someone every 7 seconds on the LinkedIn Platform.

Prior to LinkedIn Zaid was Network Architect at WebEx responsible for building the MediaTone network and later I built a startup that built a pattern recognition security chip using NPU/FPGA. Zaid holds several patents in networking and SDN and is also a recognized industry leader. He previously served as a board member of the Open19 Foundation and San Francisco chapter of Internet Society. Currently he serves on DE-CIX and Pensando advisory boards.

Author:

David Emberson

Senior Distinguished Technologist
HPE

David Emberson is Senior Distinguished Technologist for HPC System Architecture, where he is working on future memory system designs for HPE Cray systems. He began his career at MIT's Digital Systems Laboratory, where he built one of the first portable computers in 1975. He has held positions at Prime Computer, Megatest, Ametek Computer Research, and Sun Microsystems. At Sun, Mr. Emberson was a member of the SPARC architecture committee, managed the SparcStation 10 and SparcStation 20 programs, and was Senior Director at SunLabs. His consulting clients have included the Hypertransport Consortium, AMD, Intel, Atheros, PathScale, Qlogic and numerous startup companies.

At HPE he was Technical Director of HPE's PathForward program for the Department of Energy's Exascale Computing Program. His current research is in memory system design for HPC systems. He serves on the JEDEC J42.2 (HBM) committee and is a Senior Member of IEEE. Mr. Emberson has a B.S. in Electrical Engineering from MIT. He holds nineteen patents.

David Emberson

Senior Distinguished Technologist
HPE

David Emberson is Senior Distinguished Technologist for HPC System Architecture, where he is working on future memory system designs for HPE Cray systems. He began his career at MIT's Digital Systems Laboratory, where he built one of the first portable computers in 1975. He has held positions at Prime Computer, Megatest, Ametek Computer Research, and Sun Microsystems. At Sun, Mr. Emberson was a member of the SPARC architecture committee, managed the SparcStation 10 and SparcStation 20 programs, and was Senior Director at SunLabs. His consulting clients have included the Hypertransport Consortium, AMD, Intel, Atheros, PathScale, Qlogic and numerous startup companies.

At HPE he was Technical Director of HPE's PathForward program for the Department of Energy's Exascale Computing Program. His current research is in memory system design for HPC systems. He serves on the JEDEC J42.2 (HBM) committee and is a Senior Member of IEEE. Mr. Emberson has a B.S. in Electrical Engineering from MIT. He holds nineteen patents.

Author:

Uri Rosenberg

Specialist Technical Manager, AI/ML
Amazon Web Services

Uri Rosenberg is the Specialist Technical Manager of AI & ML services within enterprise support at Amazon Web Services (AWS) EMEA. Uri works to empower enterprise customers on all things ML: from underwater computer vision models that monitor fish to training models on satellite images in space; from optimizing costs to strategic discussions on deep learning and ethics. Uri brings his extensive experience to drive success of customers at all stages of ML adoption.

Before AWS, Uri led the ML projects at AT&T innovation center in Israel, working on deep learning models with extreme security and privacy constraints.

Uri is also an AWS certified Lead Machine learning subject matter expert and holds an MsC in Computer Science from Tel-Aviv Academic College, where his research focused on large scale deep learning models.

Uri Rosenberg

Specialist Technical Manager, AI/ML
Amazon Web Services

Uri Rosenberg is the Specialist Technical Manager of AI & ML services within enterprise support at Amazon Web Services (AWS) EMEA. Uri works to empower enterprise customers on all things ML: from underwater computer vision models that monitor fish to training models on satellite images in space; from optimizing costs to strategic discussions on deep learning and ethics. Uri brings his extensive experience to drive success of customers at all stages of ML adoption.

Before AWS, Uri led the ML projects at AT&T innovation center in Israel, working on deep learning models with extreme security and privacy constraints.

Uri is also an AWS certified Lead Machine learning subject matter expert and holds an MsC in Computer Science from Tel-Aviv Academic College, where his research focused on large scale deep learning models.