[Efficient Server AI Track]: Lowering the Costs of Fine-Tuning Foundation Models | Kisaco Research

Pre-training Foundation Models is prohibitively expensive and therefore impossible for many companies. This is especially true if the models are Large Language Models (LLMs). However, people hope that Foundation Models will live up to the promise of learning more generally than classical Artificial Intelligence (AI) models. The dream is that if you provide just a few examples to Foundation Models, they could extrapolate the high-level, abstract representation of the problem and learn how to accomplish tasks that they have never been trained to execute before. So, the question is, how can you lower the cost of fine-tuning pre-trained Foundation Models for your needs? This is what we will discuss in this panel. We make available to you our personal experience, synthetized in a set of principles, so that you can discover how we found ways to lower the cost of fine-tuning pre-trained Foundational Models across multiple domains. 

Speaker(s): 
Moderator

Author:

Fausto Artico

Head of Innovation and Data Science
GSK

Fausto has two PhDs (Information & Computer Science respectively), earning his second master’s and PhD at the University of California, Irvine. Fausto also holds multiple certifications from MIT, Columbia University, London School of Economics and Political Science, Kellogg School of Management, University of Cambridge and soon also from the University of California, Berkeley. He has worked in multi-disciplinary teams and has over 20 years of experience in academia and industry.

As a Physicist, Mathematician, Engineer, Computer Scientist, and High-Performance Computing (HPC) and Data Science expert, Fausto has worked on key projects at European and American government institutions and with key individuals, like Nobel Prize winner Michael J. Prather. After his time at NVIDIA corporation in Silicon Valley, Fausto worked at the IBM T J Watson Center in New York on Exascale Supercomputing Systems for the US government (e.g., Livermore and Oak Ridge Labs).

Fausto Artico

Head of Innovation and Data Science
GSK

Fausto has two PhDs (Information & Computer Science respectively), earning his second master’s and PhD at the University of California, Irvine. Fausto also holds multiple certifications from MIT, Columbia University, London School of Economics and Political Science, Kellogg School of Management, University of Cambridge and soon also from the University of California, Berkeley. He has worked in multi-disciplinary teams and has over 20 years of experience in academia and industry.

As a Physicist, Mathematician, Engineer, Computer Scientist, and High-Performance Computing (HPC) and Data Science expert, Fausto has worked on key projects at European and American government institutions and with key individuals, like Nobel Prize winner Michael J. Prather. After his time at NVIDIA corporation in Silicon Valley, Fausto worked at the IBM T J Watson Center in New York on Exascale Supercomputing Systems for the US government (e.g., Livermore and Oak Ridge Labs).

Panellists

Author:

Lisa Cohen

Director of Data Science & Engineering, Google BARD & Assistant
Google

Lisa Cohen

Director of Data Science & Engineering, Google BARD & Assistant
Google

Author:

Jeff Boudier

Product Director
Hugging Face

Jeff Boudier is a product director at Hugging Face, creator of Transformers, the leading open-source NLP library. Previously Jeff was a co-founder of Stupeflix, acquired by GoPro, where he served as director of Product Management, Product Marketing, Business Development and Corporate Development.

Jeff Boudier

Product Director
Hugging Face

Jeff Boudier is a product director at Hugging Face, creator of Transformers, the leading open-source NLP library. Previously Jeff was a co-founder of Stupeflix, acquired by GoPro, where he served as director of Product Management, Product Marketing, Business Development and Corporate Development.

Author:

Helen Byrne

VP, Solution Architect
Graphcore

Helen leads the Solution Architects team at Graphcore, helping innovators build their AI solutions using Graphcore’s Intelligence Processing Units (IPUs). She has been at Graphcore for more than 5 years, previously leading AI Field Engineering and working in AI Research, working on problems in Distributed Machine Learning. Before landing in the technology industry, she worked in Investment Banking. Her background is in Mathematics and she has a MSc in Artificial Intelligence.

Helen Byrne

VP, Solution Architect
Graphcore

Helen leads the Solution Architects team at Graphcore, helping innovators build their AI solutions using Graphcore’s Intelligence Processing Units (IPUs). She has been at Graphcore for more than 5 years, previously leading AI Field Engineering and working in AI Research, working on problems in Distributed Machine Learning. Before landing in the technology industry, she worked in Investment Banking. Her background is in Mathematics and she has a MSc in Artificial Intelligence.