Systems Design Bottlenecks in the Datacenter and in HPC – Where is Memory Innovation Needed?

Working back from the question,"what do future systems architectures need to look like?", this panel will investigate the current memory, bandwidth and latency bottlenecks in systems today, and compare and contrast datacenter and HPC examples. In discussing the characteristics, similarities, and differences between various server workloads and use cases, such as AIML co-design, acceleration of scientific workloads and others, this panel will attempt to establish context for why memory innovation is so important.

Session Topics:

Emerging Memories

External Memory

Systems Design

Use Case

Speaker(s):

Moderator

Author:

Rob Ober

Chief Platform Architect

NVIDIA

Rob is NVIDIA’s data center Chief Platform Architect, working with Hyperscalers to build GPU clusters for AI and Deep Learning, develop systems and platform architecture, and influence the HW and SW GPU roadmaps at NVIDIA. His interest in AI and DL was driven by its impact on computer science and computer architecture.

With more than 35 years experience, Rob was Senior Fellow of Enterprise Technology at SanDisk / FusionIO, Corporate Fellow and Chief Architect at LSI; Fellow and Architect at AMD; Chief Architect at Infineon; Manager of Technologies at Apple Computer, as well as designer of supercomputers, mainframes, and networks.

Rob has over 40 international patents in processor architecture, storage systems, SSDs, networks, wireless, power management, and mobile devices. He has developed architecture and implementation of CRAY, ARM, PowerPC, ARC, Sparc, TriCore and x86 processors.

Author:

Nick Wright

Chief Architect & Head, Advanced Technology Group

NERSC

Nick Wright is the advanced technologies group lead and the NERSC chief architect. He focuses upon evaluating future technologies for potential application in scientific computing. He led the effort to optimize the architecture of the Perlmutter machine, the first NERSC platform designed to meet needs of both large scale simulation and data analysis from experimental facilities. Before moving to NERSC, he was a member of the Performance Modeling and Characterization (PMaC) group at the San Diego Supercomputing Center. He earned both his undergraduate and doctoral degrees in chemistry at the University of Durham in England.

Author:

Zaid Kahn

VP, Cloud AI & Advanced Systems Engineering

Microsoft

Zaid is currently a VP in Microsoft’s Silicon, Cloud Hardware, and Infrastructure Engineering organization where he leads systems engineering and hardware development for Azure including AI systems and infrastructure. Zaid is part of the technical leadership team across Microsoft that sets AI hardware strategy for training and inference. Zaid's teams are also responsible for software and hardware engineering efforts developing specialized compute systems, FPGA network products and ASIC hardware accelerators.

Prior to Microsoft Zaid was head of infrastructure at LinkedIn where he was responsible for all aspects of architecture and engineering for Datacenters, Networking, Compute, Storage and Hardware. Zaid also led several software development teams focusing on building and managing infrastructure as code. This included zero touch provisioning, software-defined networking, network operating systems (SONiC, OpenSwitch), self-healing networks, backbone controller, software defined storage and distributed host-based firewalls. The network teams Zaid led built the global network for LinkedIn, including POP's, peering for edge services, IPv6 implementation, DWDM infrastructure and datacenter network fabric. The hardware and datacenter engineering teams Zaid led were responsible for water cooling to the racks, optical fiber infrastructure and open hardware development which was contributed to the Open Compute Project Foundation (OCP).

Zaid holds several patents in networking and is a sought-after keynote speaker at top tier conferences and events. Zaid is currently the chairperson for the OCP Foundation Board. He is also currently on the EECS External Advisory Board (EAB) at UC Berkeley and a board member of Internet Ecosystem Innovation Committee (IEIC), a global internet think tank promoting internet diversity. Zaid has a Bachelor of Science in Computer Science and Physics from the University of the South Pacific.

Author:

David Emberson

Senior Distinguished Technologist

HPE

David Emberson is Senior Distinguished Technologist for HPC System Architecture, where he is working on future memory system designs for HPE Cray systems. He began his career at MIT's Digital Systems Laboratory, where he built one of the first portable computers in 1975. He has held positions at Prime Computer, Megatest, Ametek Computer Research, and Sun Microsystems. At Sun, Mr. Emberson was a member of the SPARC architecture committee, managed the SparcStation 10 and SparcStation 20 programs, and was Senior Director at SunLabs. His consulting clients have included the Hypertransport Consortium, AMD, Intel, Atheros, PathScale, Qlogic and numerous startup companies.

At HPE he was Technical Director of HPE's PathForward program for the Department of Energy's Exascale Computing Program. His current research is in memory system design for HPC systems. He serves on the JEDEC J42.2 (HBM) committee and is a Senior Member of IEEE. Mr. Emberson has a B.S. in Electrical Engineering from MIT. He holds nineteen patents.

Author:

Uri Rosenberg

Specialist Technical Manager, AI/ML

Amazon Web Services

Uri Rosenberg is the Specialist Technical Manager of AI & ML services within enterprise support at Amazon Web Services (AWS) EMEA. Uri works to empower enterprise customers on all things ML: from underwater computer vision models that monitor fish to training models on satellite images in space; from optimizing costs to strategic discussions on deep learning and ethics. Uri brings his extensive experience to drive success of customers at all stages of ML adoption.

Before AWS, Uri led the ML projects at AT&T innovation center in Israel, working on deep learning models with extreme security and privacy constraints.

Uri is also an AWS certified Lead Machine learning subject matter expert and holds an MsC in Computer Science from Tel-Aviv Academic College, where his research focused on large scale deep learning models.

Session Job Focus: