NPU vs. GPU: What's the Difference?

Julio Franco

Posts: 9,116   +2,063
Staff member

Today, hardware and software applications of AI have advanced to become purpose-built for optimizing artificial intelligence and neural network operations. These include neural processing units (NPUs), which are often compared to graphics processing units (GPUs) in terms of their ability to accelerate AI tasks. NPUs are increasingly common pieces of hardware designed for cutting-edge AI/ML tasks at the fastest possible speeds. But how are they different?

Let's briefly explore NPUs and GPUs, compare their differences, and examine the strengths and drawbacks of each.

What is an NPU?

NPU stands for Neural Processing Unit. An NPU is a specialized piece of hardware designed to optimize the performance of tasks related to artificial intelligence and neural networks.

That might make NPUs sound like they belong in research labs and military bases, but NPUs – despite being a relatively novel invention – are increasingly common. Soon you will start to see NPUs in desktop and laptop computers, and most modern smartphones have NPUs integrated into their main CPUs, including iPhones, Google Pixel, and Samsung Galaxy models from the past few years.

Neural processing units help support (as their name suggests) neural engines and network algorithms, and those are used in highly advanced settings like autonomous driving and natural language processing (NLP), as well as routine applications like face recognition, voice recognition and image processing on your phone.

Editor's Note:
This guest blog post was written by the staff at Pure Storage, an US-based publicly traded tech company dedicated to enterprise all-flash data storage solutions. Pure Storage keeps a very active blog, this is one of their "Purely Educational" posts that we are reprinting here with their permission.

What is a GPU?

GPU stands for Graphics Processing Unit. Originally developed for rendering graphics in video games and multimedia applications, GPU uses have evolved significantly, and they're now used in many different applications that require parallel processing managing complex computations.

The unique strength of GPUs lie in how rapidly and efficiently they perform thousands of small tasks simultaneously. This makes them particularly good at complex tasks with many simultaneous computations, such as rendering graphics, simulating physics, and even training neural networks.

NPU vs GPU: Differences

Architecturally speaking, NPUs are even more equipped for parallel processing than GPUs. NPUs feature a higher number of smaller processing units versus GPUs. NPUs can also incorporate specialized memory hierarchies and data flow optimizations that make processing deep learning workloads particularly efficient. GPUs have a larger number of more versatile cores compared to NPUs. Historically, those cores are put to use in various computational tasks through parallel processing, but NPUs are especially well-designed for neural network algorithms.

NPUs are particularly good at working with short and repetitive tasks. Incorporated into modern computing systems, NPUs can relieve GPUs of the burden of handling matrix operations that are inherent to neural networks and leave the GPU to process rendering tasks or general-purpose computing.

Compared to GPUs, NPUs excel in tasks that depend on intensive deep learning computations. NLP, speech recognition, and computer vision are a few examples of places where NPUs excel relative to GPUs. GPUs have more of a general-purpose architecture than NPUs and can struggle to compete with NPUs in processing large-scale language models or edge computing applications.

NPU vs GPU: Performance

When put side by side, the biggest difference in performance between NPUs and GPUs is in efficiency and battery life. Since NPUs are specially designed for neural network operations, they require far less power to execute the same processes as a GPU at comparable speeds.

That comparison is much more of a statement on the current complexity and application of neural networks than it is the architectural differences between the two types of hardware. NPUs are architecturally optimized for AI/ML workloads and surpass GPUs in handling the most complex workloads like deep learning inference and training.

Specialized hardware in NPUs for matrix multiplications and activation functions mean they achieve superior performance and efficiency compared to GPUs in tasks like real-time language translation, image recognition in autonomous vehicles, and image analysis in medical applications.

Implementation Concerns and Storage Demands

At the enterprise level, NPUs can be integrated into existing infrastructure and data processing pipelines. NPUs can be deployed alongside CPUs, GPUs and other accelerators within data centers to achieve the greatest possible computational power for AI tasks. However, when all the AI/ML processing elements are incorporated into enterprise data center operations, hazards of data access and storage can arise.

Fully optimized NPUs and GPUs processing AI/ML workloads can process data at such high speeds that traditional storage systems may struggle to keep up, leading to potential bottlenecks in data retrieval and processing.

In application, NPUs don't dictate specific storage accommodations – however, operating them at peak efficiency relies on them having extremely fast access to vast datasets. NPUs processing AI/ML workloads often require huge volumes of data to train and infer accurate models from, plus the ability to sort, access, change and store that data extremely rapidly. Solutions for this at the enterprise level come in the form of flash storage and holistically managed storage infrastructures.

To recap, NPUs are specially designed and architected to execute neural network operations, making them particularly effective at handling the small and repetitive tasks associated with AI/ML operations.

At face value, GPUs sound similar: hardware components designed to perform small operations simultaneously. However, NPUs have a clear advantage in neural workloads due to their optimization for tasks like matrix multiplications and activation functions. This makes NPUs superior to GPUs for handling deep learning computations, particularly in terms of efficiency and speed.

Permalink to story:

 
Why should I need a NPU locally ?! Train it on my data ? Well I think I m not stupid but not genius as well . So , this AI wont advance much locally . lol .
 
There are a couple articles floating around out there that suggest ways in which NPUs could accelerate gaming performance in some cases. The biggest issue is the same thing that multi-GPU solutions face: the overhead in communicating across the different compute devices and the necessary syncs that have to happen between them. Remains to be seen.

The biggest thing I'm excited about with generative AI is for NPCs. If NPCs are able to react to you in real time (voice or text inputs, whatever) without using pre-determined scripts and able to change the environment accordingly, that could be incredibly awesome! The first games that do it will be the butt of jokes and headlines as players find ways to exploit them to say things out of character, but the potential is amazing.

We've all been at that moment in a game where we have to pick from a set of options and not a single one represents what we want to actually do in that moment, whether it be punch someone in the face or use a totally different tactic to a situation that would otherwise require some unnecessary sacrifice. Of course many of us also prefer picking from a set of choices rather than making something up (I know I have because I am lazy or because I am like "what I didn't get that"), so generative AI will need to keep that in mind. But still, very cool potential! AI art (visual, sound) is another cool idea, but I see less potential there until it can create whole scenes from scratch to represent those NPC choices. That's a ways off, at least a couple of years.

But will you *need* NPUs for any of that? I doubt it. But like GPUs were to CPUs, I see the advantage. It also means less competition for the GPU so more FPS and more multitasking capability. Biggest limitation remains memory/VRAM.

Techspot put out an article on how many Gigs of system RAM you need as a gamer. Hypothetically, if a game comes out and loads an LLM to make those NPCs more reactive, you may very well need as much RAM as you can get your hands on to run the largest, most capable models. Or they might just use an API/cloud service. If the latter, then you don't need an NPU. Who knows.

It's currently the wild west. But no, I'm not shelling out for the marketing gimmick of an AI PC. The implementations are all snake-oil until the reviewers brave the waters and sift out the diamonds from the bullshit, and Microsoft isn't very trustworthy these days. Still, there's a light at the end of the tunnel. How many cave-ins (job losses and snake-oil) will we have to weather to get there? I dunno.
 
What about GPUs with AI accelerators, e.g. AMD 7900XT or Nvidia equivalent, how do they compare to NPUs?
Microsoft says AI PCs should have at least 40 TOPS of AI compute power and I suppose that it be harnessed from this type of GPUs.
From Nvidia's blog: "GeForce RTX 4090 GPU offers more than 1,300 TOPS".
How can one measure GPUs "TOPS", is there any data table available comparing existing GPUs and NPUs for reference?
 
What about GPUs with AI accelerators, e.g. AMD 7900XT or Nvidia equivalent, how do they compare to NPUs?
Microsoft says AI PCs should have at least 40 TOPS of AI compute power and I suppose that it be harnessed from this type of GPUs.
From Nvidia's blog: "GeForce RTX 4090 GPU offers more than 1,300 TOPS".
How can one measure GPUs "TOPS", is there any data table available comparing existing GPUs and NPUs for reference?
A comparison is exactly what I was expecting from this article especially when the article mentioned NPU's:

"However, NPUs have a clear advantage in neural workloads due to their optimization for tasks like matrix multiplications and activation functions. This makes NPUs superior to GPUs for handling deep learning computations, particularly in terms of efficiency and speed."

Why doesn't the article present objective data to prove NPU's are superior to GPUs for AI?
 
Why doesn't the article present objective data to prove NPU's are superior to GPUs for AI?
Saying that "evidence abounds" is a drastic understatement. Does one need evidence that water is wet?

If you need a better answer, consider that an NVidia GPU has shader cores, raytracing cores, texture mapping units, and render output units -- none of which are particularly useful to AI operations. The Tensor cores are -- but here again, they're not optimized for AI, as, in addition to supporting int and FP16 operations, they also support FP32: a precision far above what AI requires.
 
Back