Mar 2, 2023
In this episode, Ben Sorscher, a PhD
student at Stanford, sheds light on the challenges posed by the
ever-increasing size of data sets used to train machine learning
models, specifically large language models. The sheer size of these
data sets has been pushing the limits of scaling, as the cost of
training and the environmental impact of the electricity they
consume becomes increasingly enormous.
As a solution, Ben discusses the concept of “data pruning” - a
method of reducing the size of data sets without sacrificing model
performance. Data pruning involves selecting the most important or
representative data points and removing the rest, resulting in a
smaller, more efficient data set that still produces accurate
results.
Throughout the podcast, Ben delves into the intricacies of data
pruning, including the benefits and drawbacks of the technique, the
practical considerations for implementing it in machine learning
models, and the potential impact it could have on the field of
artificial intelligence.
Craig Smith Twitter: https://twitter.com/craigss
Eye on A.I.
Twitter: https://twitter.com/EyeOn_AI