From left to right: PhD student Chris (Shaopeng) Lin, Assistant Professor Gururaj Saileshwar and undergraduate student Joyce Qu developed GPUHammer to investigate the vulnerability of graphics processing units to a critical hardware attack. (Photo: Matt Hintsa)
Originally designed to render graphics for gamers and video editors, graphics processing units (GPUs) have evolved into the core computing engines that power today’s artificial intelligence (AI) models and cloud-based machine learning services.
Recognizing GPUs’ increasing importance, a team of computer scientists at the University of Toronto set out to test their vulnerability to a critical hardware attack known to affect the memory in central processing units (CPUs).
Their research shows for the first time that these Rowhammer-style attacks are effective against GPUs with GDDR memory, which is commonly found in graphics cards.
In turn, AI models that run on these GPUs are at risk of “catastrophic brain damage” — a degradation in accuracy from 80 to 0.1 per cent, according to Gururaj Saileshwar, assistant professor in the Department of Computer Science. That has significant implications for the accuracy of AI applications that rely on these models, like a hospital’s medical imaging analysis or a bank’s fraud detection system.
In a Rowhammer attack, memory cells are tricked into flipping bits — tiny pieces of data — by rapidly accessing nearby rows of cells over and over. This creates electrical interference that causes errors in parts of the memory the attacker didn’t directly touch, potentially allowing them to bypass security or take control of a system.
“Traditionally, security has been thought of at the software layer, but we’re increasingly seeing physical effects at the hardware layer that can be leveraged as vulnerabilities,” said Saileshwar.
Saileshwar, alongside second-year computer science PhD student Chris (Shaopeng) Lin and fourth-year computer science undergrad Joyce Qu, developed the proof-of-concept ‘GPUHammer’ attack on the GDDR6 memory in an NVIDIA RTX A6000, a GPU widely used for high-performance computing. Their paper has been accepted to USENIX Security 2025, a top-tier computer security conference.
They found that a single bit flip to change the exponent of an AI model’s weight could cause a massive reduction in the model’s accuracy.
“This introduces a new way AI models can fail — at the hardware level,” said Saileshwar.
The GPU users most at risk are those managing cloud computing environments, rather than individual home or office users. That’s because in the cloud, multiple users could be accessing a particular GPU at the same time, allowing an attacker to tamper with another user’s data processing.
The researchers’ attack had to account for the differences between CPU and GPU memory, Saileshwar explained. GPUs are tougher targets due to their faster memory refresh rates, slower memory latency and other architectural differences. Ultimately, the researchers leveraged the GPU’s parallelism, or its ability to run multiple operations simultaneously to optimize their hammering patterns. That adjustment led to the bit flips that demonstrated a successful attack.
But it wasn’t easy. “Hammering on GPUs is like hammering blind,” Saileshwar said, noting that they nearly gave up after failing to trigger any bit flips.
On CPUs, there are tools that help researchers inspect the memory interface, which help researchers understand how memory accesses behave and how instructions are sent from the CPU to memory. But because GPU memory chips are soldered onto the GPU board, there is no easy way to perform a similar inspection, he explained. The only signal they had was observing the bit flips that they eventually triggered.
The researchers privately disclosed their findings to NVIDIA earlier this year, and the company issued a security notice to its customers in July.
NVIDIA’s suggested remedy, enabling a feature called Error Correction Code (ECC), can repel a GPUHammer attack, but at the cost of slowing down machine learning tasks by up to 10 per cent, the researchers found. And while the affected GPUs have some built-in defenses, GPUHammer was able to get past them, showing that the current generation of mitigations isn’t foolproof. Future attacks involving more bit flips might be able to overwhelm even the ECC’s remedies, they noted.
The findings highlight a need for greater attention to GPU security, work that is “just beginning,” said Saileshwar.
“More investigation will probably reveal more issues,” he noted. “And that’s important, because we’re running incredibly valuable workloads on GPUs. AI models are being used in real-world settings like health care, finance and cybersecurity. If there are vulnerabilities that allow attackers to tamper with those models at the hardware level, we need to find them before they’re exploited.”