Google DeepMind, Google's AI research lab, has published new research on training AI models that claims to speed up both training speed and energy efficiency by orders of magnitude, compared to other models. Gives 13 times more performance and 10 times more power efficiency than conventional methods. The new JEST training method comes timely as the conversation about the environmental impact of AI data centers is intensifying.
DeepMind's method, called JEST or joint instance selection, departs from traditional AI model training techniques in a simple way. Conventional training methods focus on individual data points for training and learning, while JEST trains based on entire batches. The JEST method first builds a small AI model that classifies data quality from the highest quality sources, classifying batches by quality. It then compares this ranking to a larger, lower-quality set. The smaller JEST model determines the most suitable batches for training, and then a larger model is trained from the results of the smaller model.
The paper available here provides a more complete description of the process used in the study itself and the future of the research.
The DeepMind researchers make clear in their paper that “this ability to shift the data selection process toward the distribution of small, well-structured datasets” is essential to the success of the JEST method. Success is the right word for this research. DeepMind claims that “our approach outperforms state-of-the-art models with 13× fewer iterations and 10× fewer calculations.”
Of course, this system relies entirely on the quality of its training data, as bootstrapping techniques fall apart without human-curated data sets of the highest possible quality. Nowhere is the mantra “garbage in, garbage out” more true than in the way one tries to “move forward” in one's training process. This makes the JEST method more difficult for hobbyists or amateur AI developers than most others, as it requires expert-level research skills to validate initial high-level training data.
The JEST research doesn't come a moment too soon, as the tech industry and global governments begin to discuss the extreme power demands of artificial intelligence. AI workloads add about 4.3 GW in 2023, roughly matching the annual electricity consumption of the nation of Cyprus. And things certainly aren't slowing down: A single ChatGPT application costs 10 times more power than a Google search, and Arm's CEO predicts that AI could take over a quarter of the United States' power grid by 2030. will
If and how JEST methods are adopted by major players in the AI space remains to be seen. The GPT-4o reportedly cost $100 million to train, and future larger models could soon hit the billion-dollar mark, so firms are likely looking for ways to save their wallets in this area. . Optimists believe that JEST methods will be used to keep current training productivity rates at a much lower power draw, reduce AI costs, and help the planet. However, it is more likely that the capital machine will put the pedal to the metal, using JEST methods to maximize power draw for hyper-fast training output, cost savings versus output scale, who will win. ?