Energy-efficient AI model could be a game changer, 50 times better efficiency with no performance hit

AI researchers rethink how neural networks work for huge efficiency gains

By Shawn Knight June 26, 2024, 15:23 7 comments

Energy-efficient AI model could be a game changer, 50 times better efficiency with no performance hit

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

Cutting corners: Researchers from the University of California, Santa Cruz, have devised a way to run a billion-parameter-scale large language model using just 13 watts of power – about as much as a modern LED light bulb. For comparison, a data center-grade GPU used for LLM tasks requires around 700 watts.

AI up to this point has largely been a race to be first, with little consideration for metrics like efficiency. Looking to change that, the researchers trimmed out an intensive technique called matrix multiplication. This technique assigns words to numbers, stores them in matrices, and multiples them together to create language. As you can imagine, it is rather hardware intensive.

The team's revised approach instead forces all of the numbers in their neural network matrices to be ternary, meaning they can only have one of three values: negative one, zero, or positive one. This key change was inspired by a paper from Microsoft, and means that all computation involves summing rather than multiplying – an approach that is far less hardware intensive.

Speaking of, the team also created custom hardware using a highly-customizable circuit called a field-programmable gate array (FPGA). The custom hardware allowed them to maximize all of the energy-saving features baked into the neural network.

Running on the custom hardware, the team's neural network is more than 50 times more efficient than a typical setup. Best yet, it provides the same sort of performance as a top-tier model like Meta's Llama.

It's worth noting that custom hardware isn't necessary with the new approach – it's just icing on the cake. The neural network was designed to run on standard GPUs that are common in the AI industry, and testing revealed roughly 10 times less memory consumption compared to a multiplication-based neural network. Requiring less memory could open the door to full-fledged neural networks on mobile devices like smartphones.

With these sort of efficiency gains in play and given a full data center worth of power, AI could soon take another huge leap forward.

Image credit: Emrah Tolu, Pixabay

7 comments 171 likes and shares

// Related Stories

Tech Jobs: Find the next step in your career

Featured on TechSpot