Who Cares About Heterogeneity? Finding Homes for Novel AI Hardware | Kisaco Research

As scientific and machine learning workloads converge in the world of HPC, and supercomputing centers gear up for the era of exascale computing, discussions on heterogeneous systems design abound. HPC leaders increasingly need to support converged application workloads that extend beyond AI/HPC to include other computational kernels/patterns like data analytics, graph algorithms, and uncertainty quantification. In this sector, the value of heterogeneity in systems design is clear and promising, even if the method for executing these concepts is still to be determined.

However, in many industrial sectors, enterprise end customers simply use the 'threat' of heterogeneity as a tool to extract some discount from their main/incumbent vendor. The job of IT is hard enough, planning for compute, storage and networking needs, that adding a lot of compute specialization is often not high on a CIO’s priority list. 

So, who cares about heterogeneity? Where will heterogeneity in systems design change the game, and what will be its level and quality? 

Session Topics: 
Chip Design
ML at Scale
Novel AI Hardware
Systems Design
Speaker(s): 

Author:

Wahid Bhimji

Acting Group Lead, Data & Analytics
NERSC

Wahid Bhimji is acting Group Lead and a Big Data Architect in the Data and Analytics Services Group at NERSC. His interests include machine learning and data management. Recently he led several projects applying AI for science including deep learning at scale, generative models and probabilistic programming. He coordinates aspects of machine learning deployment for the Lab's CS-Area and NERSC: including the upcoming Perlmutter HPC system and plans for future NERSC machines. Previously he was user lead for the commissioning of Cori Phase 1, particularly data services, and for the Burst Buffer. Wahid has worked for many years in Scientific Computing and Data Analysis in Academia and the U.K. Government and has a Ph.D. in High-Energy Particle Physics.

Wahid Bhimji

Acting Group Lead, Data & Analytics
NERSC

Wahid Bhimji is acting Group Lead and a Big Data Architect in the Data and Analytics Services Group at NERSC. His interests include machine learning and data management. Recently he led several projects applying AI for science including deep learning at scale, generative models and probabilistic programming. He coordinates aspects of machine learning deployment for the Lab's CS-Area and NERSC: including the upcoming Perlmutter HPC system and plans for future NERSC machines. Previously he was user lead for the commissioning of Cori Phase 1, particularly data services, and for the Burst Buffer. Wahid has worked for many years in Scientific Computing and Data Analysis in Academia and the U.K. Government and has a Ph.D. in High-Energy Particle Physics.

Author:

Weifeng Zhang

Chief Scientist, Heterogeneous Computing
Alibaba

Weifeng Zhang is the Chief Scientist of Heterogeneous Computing at Alibaba Cloud Infrastructure, responsible for performance optimization of large scale distributed applications at the data centers. Weifeng also leads the effort to build the acceleration platform for various ML workloads via heterogeneous resource pooling based on the compiler technology. Prior to joining Alibaba, Weifeng was a Director of Engineering at Qualcomm Inc, focusing on GPU compiler and performance optimizations. Weifeng received his B.Sc. from Wuhan University, China and PhD in Computer Science from University of California, San Diego.

Weifeng Zhang

Chief Scientist, Heterogeneous Computing
Alibaba

Weifeng Zhang is the Chief Scientist of Heterogeneous Computing at Alibaba Cloud Infrastructure, responsible for performance optimization of large scale distributed applications at the data centers. Weifeng also leads the effort to build the acceleration platform for various ML workloads via heterogeneous resource pooling based on the compiler technology. Prior to joining Alibaba, Weifeng was a Director of Engineering at Qualcomm Inc, focusing on GPU compiler and performance optimizations. Weifeng received his B.Sc. from Wuhan University, China and PhD in Computer Science from University of California, San Diego.

Author:

Cedric Bourrasset

Head, High Performance AI Business Unit
Atos

Dr. Cedric Bourrasset is AI Business Leader for High Performance Computing Business Unit at Atos. He is also AI product manager for the Atos Codex AI suite, software enabling AI workloads into HPC environments as well as integrating a computer vision solution. He joined Atos in 2016 as an expert in the HPC/AI domain.

Previously, Cedric received his Ph.D. in Electronics and computer vision from the Blaise Pascal University of Clermont-Ferrand defending the dataflow model of computation for FPGA High Level Synthesis problematic in embedded machine learning applications.

Cedric Bourrasset

Head, High Performance AI Business Unit
Atos

Dr. Cedric Bourrasset is AI Business Leader for High Performance Computing Business Unit at Atos. He is also AI product manager for the Atos Codex AI suite, software enabling AI workloads into HPC environments as well as integrating a computer vision solution. He joined Atos in 2016 as an expert in the HPC/AI domain.

Previously, Cedric received his Ph.D. in Electronics and computer vision from the Blaise Pascal University of Clermont-Ferrand defending the dataflow model of computation for FPGA High Level Synthesis problematic in embedded machine learning applications.

Author:

Bhupender Thakur

Product Manager, Scientific Computing
Roche

Bhupender Thakur is Product portfolio owner for several High Performance and Big Data platforms for research and early development at Roche. He is the Agile product portfolio owner of on-premise HPC services delivering compute and storage clusters in several locations across the USA, Germany and Switzerland, and product owner for workflow applications for NGS and Oncology research supporting Roche Avenio product offerings.

Bhupender leads a cross functional squad of developers, product owners, architects and subject matter experts, working on roadmaps for existing and new research offerings and leading discussions on planning, lifecycle, operations and business continuity.

He holds a PhD in Theoretical and Computational Nuclear Physics from the University of Delaware.

Bhupender Thakur

Product Manager, Scientific Computing
Roche

Bhupender Thakur is Product portfolio owner for several High Performance and Big Data platforms for research and early development at Roche. He is the Agile product portfolio owner of on-premise HPC services delivering compute and storage clusters in several locations across the USA, Germany and Switzerland, and product owner for workflow applications for NGS and Oncology research supporting Roche Avenio product offerings.

Bhupender leads a cross functional squad of developers, product owners, architects and subject matter experts, working on roadmaps for existing and new research offerings and leading discussions on planning, lifecycle, operations and business continuity.

He holds a PhD in Theoretical and Computational Nuclear Physics from the University of Delaware.